That respect these constraints. In order to achieve this: (i) All

That respect these constraints. In order to achieve this: (i) All agents that do not satisfy the constraints are discarded; (ii) for each algorithm, the agent leading to the best performance in average is selected; (iii) we build the list of agents whose performances are not significantly different. This list is obtained by using a paired sampled Z-test with a confidence level of 95 , allowing us to determine when two agents are statistically equivalent (more details in S3 File). The results will help us to identify, for each experiment, the most suitable algorithm(s) depending on the constraints the agents must satisfy. This protocol is an extension of the one presented in [4].4 BBRL libraryBBRL (standing for Benchmaring tools for Bayesian SNDX-275 biological activity Reinforcement Learning) is a C++ opensource library for Bayesian Reinforcement Learning (discrete state/action spaces). This library provides high-level features, while remaining as flexible and documented as possible to address the needs of any researcher of this field. To this end, we developed a complete command-line interface, along with a comprehensive website: https://github.com/mcastron/BBRL. BBRL focuses on the core operations required to apply the comparison benchmark presented in this paper. To do a complete experiment with the BBRL library, follow these five steps: 1. We create a test and a prior distribution. Those distributions are represented by Flat Dirichlet Multinomial distributions (FDM), parameterised by a state space X, an action space U, a vector of GS-9620 custom synthesis parameters , and reward function . For more information about the FDM distributions, check Section 5.2. ./BBRL-DDS –mdp_distrib generation \ –name \ –short_name \ –n_states –n_actions \ –ini_state \ –transition_weights \ <(1)> ???<(nX nU nX)> \ –reward_type “RT_CONSTANT” \ –reward_means \ <(x(1), u(1), x(1))> ???<(x(nX), u(nU), x(nX))> \ –output A distribution file is created.PLOS ONE | DOI:10.1371/journal.pone.0157088 June 15,6 /Benchmarking for Bayesian Reinforcement Learning2. We create an experiment. An experiment is defined by a set of N MDPs, drawn from a test distribution defined in a distribution file, a discount factor and a horizon limit T. ./BBRL-DDS –new_experiment \ –name \ –mdp_distribution “DirMultiDistribution” \ –mdp_distribution_file \ –n_mdps –n_simulations_per_mdp 1 \ –discount_factor <> –horizon_limit \ –compress_output \ –output An experiment file is created and can be used to conduct the same experiment for several agents. 3. We create an agent. An agent is defined by an algorithm alg, a set of parameters , and a prior distribution defined in a distribution file, on which the created agent will be trained. ./BBRL-DDS –offline_learning \ –agent [] \ –mdp_distribution “DirMultiDistribution”] –mdp_distribution_file \ –output \ An agent file is created. The file also stores the computation time observed during the offline training phase. 4. We run the experiment. We need to provide an experiment file, an algorithm alg and an agent file. ./BBRL-DDS –run experiment \ –experiment \ –experiment_file \ –agent \ –agent_file \ –n_threads 1 \ –compress_output \ –safe_simulations \ –refresh_frequency 60 \ –backup_frequency 900 \ –output A result file is created. This file contains a set of.That respect these constraints. In order to achieve this: (i) All agents that do not satisfy the constraints are discarded; (ii) for each algorithm, the agent leading to the best performance in average is selected; (iii) we build the list of agents whose performances are not significantly different. This list is obtained by using a paired sampled Z-test with a confidence level of 95 , allowing us to determine when two agents are statistically equivalent (more details in S3 File). The results will help us to identify, for each experiment, the most suitable algorithm(s) depending on the constraints the agents must satisfy. This protocol is an extension of the one presented in [4].4 BBRL libraryBBRL (standing for Benchmaring tools for Bayesian Reinforcement Learning) is a C++ opensource library for Bayesian Reinforcement Learning (discrete state/action spaces). This library provides high-level features, while remaining as flexible and documented as possible to address the needs of any researcher of this field. To this end, we developed a complete command-line interface, along with a comprehensive website: https://github.com/mcastron/BBRL. BBRL focuses on the core operations required to apply the comparison benchmark presented in this paper. To do a complete experiment with the BBRL library, follow these five steps: 1. We create a test and a prior distribution. Those distributions are represented by Flat Dirichlet Multinomial distributions (FDM), parameterised by a state space X, an action space U, a vector of parameters , and reward function . For more information about the FDM distributions, check Section 5.2. ./BBRL-DDS –mdp_distrib generation \ –name \ –short_name \ –n_states –n_actions \ –ini_state \ –transition_weights \ <(1)> ???<(nX nU nX)> \ –reward_type “RT_CONSTANT” \ –reward_means \ <(x(1), u(1), x(1))> ???<(x(nX), u(nU), x(nX))> \ –output A distribution file is created.PLOS ONE | DOI:10.1371/journal.pone.0157088 June 15,6 /Benchmarking for Bayesian Reinforcement Learning2. We create an experiment. An experiment is defined by a set of N MDPs, drawn from a test distribution defined in a distribution file, a discount factor and a horizon limit T. ./BBRL-DDS –new_experiment \ –name \ –mdp_distribution “DirMultiDistribution” \ –mdp_distribution_file \ –n_mdps –n_simulations_per_mdp 1 \ –discount_factor <> –horizon_limit \ –compress_output \ –output An experiment file is created and can be used to conduct the same experiment for several agents. 3. We create an agent. An agent is defined by an algorithm alg, a set of parameters , and a prior distribution defined in a distribution file, on which the created agent will be trained. ./BBRL-DDS –offline_learning \ –agent [] \ –mdp_distribution “DirMultiDistribution”] –mdp_distribution_file \ –output \ An agent file is created. The file also stores the computation time observed during the offline training phase. 4. We run the experiment. We need to provide an experiment file, an algorithm alg and an agent file. ./BBRL-DDS –run experiment \ –experiment \ –experiment_file \ –agent \ –agent_file \ –n_threads 1 \ –compress_output \ –safe_simulations \ –refresh_frequency 60 \ –backup_frequency 900 \ –output A result file is created. This file contains a set of.