强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等） 17.37MB

sinat_39620217需要积分:9(1积分=1元)

资源文件列表:

DRL_code.zip 大约有381个文件

examples/
examples/Baselines/
examples/Baselines/GridDispatch_competition/
examples/Baselines/GridDispatch_competition/README.md 334B
examples/Baselines/Halite_competition/
examples/Baselines/Halite_competition/torch/
examples/Baselines/Halite_competition/torch/rl_trainer/
examples/Baselines/Halite_competition/torch/rl_trainer/controller.py 20.6KB
examples/DDPG/
examples/DDPG/train.py 5.27KB
examples/AlphaZero/
examples/AlphaZero/Coach.py 8.8KB
examples/A2C/
examples/A2C/actor.py 4.52KB
examples/A2C/atari_model.py 3.17KB
examples/DQN/
examples/DQN/README.md 849B
examples/AlphaZero/README.md 1.91KB
examples/A2C/atari_agent.py 4KB
examples/Baselines/GridDispatch_competition/torch/
examples/Baselines/GridDispatch_competition/torch/grid_model.py 2.54KB
examples/Baselines/GridDispatch_competition/torch/README.md 1.6KB
examples/AlphaZero/.pic/
examples/AlphaZero/.pic/perfect_moves_rate.png 64.44KB
examples/DDPG/mujoco_model.py 2.1KB
examples/DQN_variant/
examples/DQN_variant/train.py 6.56KB
examples/CARLA_SAC/
examples/CARLA_SAC/carla_agent.py 1.71KB
examples/Baselines/Halite_competition/torch/train.py 8.93KB
examples/CARLA_SAC/train.py 5.4KB
examples/DQN/requirements.txt 43B
examples/CARLA_SAC/evaluate.py 2.62KB
examples/CARLA_SAC/carla_model.py 3.29KB
examples/Baselines/Halite_competition/torch/rl_trainer/obs_parser.py 3.27KB
examples/Baselines/Halite_competition/torch/rl_trainer/agent.py 4.21KB
examples/Baselines/Halite_competition/paddle/
examples/Baselines/Halite_competition/paddle/rl_trainer/
examples/Baselines/Halite_competition/paddle/rl_trainer/obs_parser.py 3.27KB
examples/ES/
examples/ES/train.py 7.53KB
examples/ES/obs_filter.py 6.09KB
examples/IMPALA/
examples/IMPALA/atari_model.py 2.85KB
examples/ES/noise.py 955B
examples/MADDPG/
examples/MADDPG/README.md 3.16KB
examples/IMPALA/actor.py 3.9KB
examples/IMPALA/README.md 1.84KB
examples/ES/optimizers.py 1.82KB
examples/DDPG/mujoco_agent.py 1.98KB
examples/MADDPG/requirements.txt 56B
examples/AlphaZero/connect4_aiplayer.py 4.72KB
examples/AlphaZero/utils.py 1.8KB
examples/AlphaZero/main.py 2.78KB
examples/Baselines/GridDispatch_competition/paddle/
examples/Baselines/GridDispatch_competition/paddle/grid_agent.py 1.85KB
examples/DQN/train.py 4.31KB
examples/Baselines/Halite_competition/paddle/README.md 3.39KB
examples/Baselines/GridDispatch_competition/paddle/grid_model.py 2.55KB
examples/Baselines/Halite_competition/paddle/rl_trainer/utils.py 7.59KB
examples/CQL/
examples/CQL/mujoco_agent.py 1.83KB
examples/Baselines/Halite_competition/paddle/rl_trainer/replay_memory.py 3.66KB
examples/Baselines/Halite_competition/paddle/rl_trainer/algorithm.py 5.32KB
examples/Baselines/Halite_competition/torch/encode_model.py 972B
examples/AlphaZero/alphazero_agent.py 3.64KB
examples/CARLA_SAC/env_utils.py 3.87KB
examples/CARLA_SAC/env_config.py 2.72KB
examples/Baselines/Halite_competition/paddle/rl_trainer/model.py 2.25KB
examples/AlphaZero/connect4_game.py 7.87KB
examples/Baselines/Halite_competition/paddle/rl_trainer/controller.py 20.55KB
examples/AlphaZero/connect4_model.py 3.13KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim_model.py 6.4KB
examples/IMPALA/train.py 9.43KB
examples/ES/es.py 1.22KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim_agent.py 8.61KB
examples/TD3/
examples/TD3/mujoco_agent.py 1.88KB
examples/Baselines/GridDispatch_competition/paddle/train.py 7.05KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/submit_model.py 5.18KB
examples/Baselines/Halite_competition/torch/rl_trainer/policy.py 2.54KB
examples/NeurIPS2019-Learn-to-Move-Challenge/
examples/NeurIPS2019-Learn-to-Move-Challenge/env_wrapper.py 16.85KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/pelvisBasedObs_scaler.npz 4.22KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/pelvisBasedObs_scaler.npz 4.22KB
examples/Baselines/Halite_competition/torch/rl_trainer/algorithm.py 5.36KB
examples/Baselines/Halite_competition/paddle/rl_trainer/policy.py 2.46KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/test.py 3.33KB
examples/NeurIPS2019-Learn-to-Move-Challenge/actor.py 1.86KB
examples/Baselines/Halite_competition/torch/test.ipynb 1.56KB
examples/ES/actor.py 4.37KB
examples/Baselines/Halite_competition/paddle/test.py 1.39KB
examples/NeurIPS2019-Learn-to-Move-Challenge/evaluate.py 11.41KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/env_wrapper.py 9.75KB
examples/NeurIPS2019-Learn-to-Move-Challenge/evaluate_args.py 2.46KB
examples/ES/README.md 1.47KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/submit_model.py 5.54KB
examples/DQN_variant/replay_memory.py 4.09KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/official_obs_scaler.npz 2.2KB
examples/NeurIPS2019-Learn-to-Move-Challenge/official_obs_scaler.npz 2.2KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/test.py 2.51KB
examples/Baselines/Halite_competition/torch/README.md 3.39KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty2.sh 256B
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty3_first_target.sh 338B
examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_agent.py 3.51KB
examples/ES/utils.py 2.06KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty1.sh 255B
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/es_agent.py 2.85KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/evaluate.py 2.79KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty3.sh 292B
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/powernet_model.py 2.6KB
examples/PPO/
examples/PPO/atari_config.py 2.19KB
examples/NeurIPS2019-Learn-to-Move-Challenge/replay_memory.py 60B
examples/PPO/agent.py 4.43KB
examples/ES/requirements.txt 58B
examples/AlphaZero/actor.py 6.72KB
examples/PPO/mujoco_config.py 2.18KB
examples/Baselines/GridDispatch_competition/paddle/env_wrapper.py 4.52KB
examples/Baselines/Halite_competition/paddle/encode_model.py 974B
examples/Baselines/GridDispatch_competition/torch/env_wrapper.py 4.52KB
examples/tutorials/
examples/tutorials/homework/
examples/tutorials/homework/lesson4/
examples/tutorials/homework/lesson4/policy_gradient_pong/
examples/tutorials/homework/lesson4/policy_gradient_pong/model.py 1.08KB
examples/Baselines/Halite_competition/paddle/train.py 8.82KB
examples/tutorials/homework/lesson3/
examples/tutorials/homework/lesson3/dqn_mountaincar/
examples/tutorials/homework/lesson3/dqn_mountaincar/replay_memory.py 1.64KB
examples/tutorials/parl2_dygraph/
examples/tutorials/parl2_dygraph/lesson3/
examples/tutorials/parl2_dygraph/lesson3/dqn/
examples/tutorials/parl2_dygraph/lesson3/dqn/train.py 4.7KB
examples/tutorials/lesson5/
examples/tutorials/lesson5/ddpg/
examples/tutorials/lesson5/ddpg/replay_memory.py 1.64KB
examples/tutorials/homework/lesson4/policy_gradient_pong/agent.py 2.87KB
examples/tutorials/lesson1/
examples/tutorials/lesson1/gridworld.py 6.62KB
examples/tutorials/homework/lesson5/
examples/tutorials/homework/lesson5/ddpg_quadrotor/
examples/tutorials/homework/lesson5/ddpg_quadrotor/quadrotor_model.py 1.92KB
examples/Baselines/GridDispatch_competition/paddle/README.md 1.61KB
examples/tutorials/lesson4/
examples/tutorials/lesson4/policy_gradient/
examples/tutorials/lesson4/policy_gradient/agent.py 2.87KB
examples/CQL/train.py 4.36KB
examples/tutorials/homework/lesson4/policy_gradient_pong/train.py 4.23KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/multi_head_ddpg.py 4.82KB
examples/AlphaZero/requirements.txt 37B
examples/DQN/cartpole_agent.py 3.17KB
examples/A2C/.result/
examples/A2C/.result/result_a2c_paddle0.png 193.24KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/replay_memory.py 3.6KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_server.py 11.88KB
examples/others/
examples/others/deepes.py 3.13KB
examples/SAC/
examples/SAC/mujoco_model.py 2.55KB
examples/tutorials/homework/lesson2/
examples/tutorials/homework/lesson2/q_learning_frozenlake/
examples/tutorials/homework/lesson2/q_learning_frozenlake/agent.py 2.73KB
examples/tutorials/lesson2/
examples/tutorials/lesson2/q_learning/
examples/tutorials/lesson2/q_learning/agent.py 2.73KB
examples/CQL/README.md 1.51KB
examples/Baselines/GridDispatch_competition/torch/train.py 7.04KB
examples/Baselines/Halite_competition/torch/requirements.txt 25B
examples/Baselines/Halite_competition/paddle/rl_trainer/agent.py 4.03KB
examples/Baselines/Halite_competition/torch/rl_trainer/model.py 2.24KB
examples/DDPG/README.md 1.11KB
examples/DQN/cartpole_model.py 1.3KB
examples/Baselines/Halite_competition/paddle/submission.py 99.84KB
examples/A2C/requirements.txt 67B
examples/DDPG/requirements.txt 58B
examples/Baselines/Halite_competition/paddle/test.ipynb 1.46KB
examples/MADDPG/train.py 6.93KB
examples/TD3/requirements.txt 58B
examples/SAC/requirements.txt 58B
examples/CQL/requirements.txt 121B
examples/A2C/README.md 1.4KB
examples/A2C/train.py 7.1KB
examples/Baselines/Halite_competition/torch/config.py 1.35KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/test.py 3.22KB
examples/MADDPG/simple_model.py 3.59KB
examples/QuickStart/
examples/QuickStart/cartpole_model.py 1.23KB
examples/IMPALA/atari_agent.py 2.91KB
examples/Baselines/Halite_competition/torch/submission.py 100.1KB
examples/TD3/README.md 1.24KB
examples/QuickStart/cartpole_agent.py 2.27KB
examples/SAC/train.py 5.09KB
examples/MADDPG/simple_agent.py 4.43KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/env_wrapper.py 17.21KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/env_wrapper.py 28.33KB
examples/DQN_variant/atari_model.py 3.3KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/mlp_model.py 6.49KB
examples/OAC/
examples/OAC/requirements.txt 58B
examples/NeurIPS2019-Learn-to-Move-Challenge/README.md 3.2KB
examples/TD3/train.py 5.12KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/es.py 1.57KB
examples/PPO/requirements_mujoco.txt 58B
examples/PPO/env_utils.py 6.95KB
examples/NeurIPS2019-Learn-to-Move-Challenge/train.py 11.9KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/mlp_model.py 6.46KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty1.sh 341B
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty2.sh 320B
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/es.py 1.23KB
examples/PPO/requirements_atari.txt 74B
examples/tutorials/homework/lesson5/ddpg_quadrotor/quadrotor_agent.py 2.65KB
examples/QMIX/
examples/QMIX/replay_buffer.py 3.33KB
examples/PPO/mujoco_model.py 1.96KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/README.md 659B
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/velocity_distribution.png 27.9KB
examples/tutorials/homework/lesson5/ddpg_quadrotor/train.py 6.11KB
examples/QuickStart/README.md 435B
examples/QuickStart/requirements.txt 43B
examples/tutorials/parl2_dygraph/lesson5/
examples/tutorials/parl2_dygraph/lesson5/ddpg/
examples/tutorials/parl2_dygraph/lesson5/ddpg/replay_memory.py 1.64KB
examples/tutorials/parl2_dygraph/lesson3/dqn/replay_memory.py 1.64KB
examples/tutorials/parl2_dygraph/lesson3/homework/
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/replay_memory.py 1.64KB
examples/tutorials/parl2_dygraph/lesson5/homework/
examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/
examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/quadrotor_model.py 2.13KB
examples/QMIX/qmix_config.py 2.69KB
examples/tutorials/parl2_dygraph/lesson3/dqn/agent.py 2.79KB
examples/tutorials/homework/lesson3/dqn_mountaincar/model.py 1.11KB
examples/tutorials/lesson3/
examples/tutorials/lesson3/dqn/
examples/tutorials/lesson3/dqn/model.py 1.11KB
examples/tutorials/homework/lesson2/q_learning_frozenlake/train.py 2.56KB
examples/tutorials/parl2_dygraph/lesson3/dqn/model.py 1.3KB
examples/QMIX/rnn_model.py 1.45KB
examples/A2C/a2c_config.py 1.29KB
examples/DQN/cartpole.jpg 110.07KB
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/model.py 1.3KB
examples/tutorials/lesson5/ddpg/env.py 6.33KB
examples/AlphaZero/.pic/good_moves_rate.png 60.06KB
examples/Baselines/Halite_competition/torch/rl_trainer/replay_memory.py 3.6KB
examples/CARLA_SAC/README.md 2.78KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/utils.py 3.25KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md 6.94KB
examples/tutorials/lesson5/ddpg/train.py 4.25KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/evaluate.py 2.79KB
examples/DQN_variant/atari_agent.py 4.11KB
examples/IMPALA/impala_config.py 1.5KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/competition.png 184.81KB
examples/PPO/storage.py 3.09KB
examples/OAC/mujoco_agent.py 1.85KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty3_first_target.sh 416B
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/es_agent.py 1.62KB
examples/tutorials/lesson3/dqn/replay_memory.py 1.64KB
examples/Baselines/Halite_competition/paddle/config.py 1.35KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/README.md 718B
examples/QMIX/utils.py 1.66KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/powernet_model.py 2.67KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/curriculum-learning.png 158.38KB
examples/Baselines/GridDispatch_competition/torch/grid_agent.py 1.97KB
examples/CARLA_SAC/.benchmark/
examples/CARLA_SAC/.benchmark/Lane_bend.gif 3.19MB
examples/tutorials/parl2_dygraph/README.md 1.38KB
examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/quadrotor_agent.py 2.01KB
examples/tutorials/parl2_dygraph/lesson3/dqn/algorithm.py 2.86KB
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/agent.py 2.79KB
examples/tutorials/lesson4/policy_gradient/algorithm.py 1.7KB
examples/tutorials/lesson4/policy_gradient/model.py 1.04KB
examples/tutorials/homework/lesson3/dqn_mountaincar/train.py 4.72KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_pb2.py 7.22KB
examples/tutorials/lesson3/dqn/agent.py 3.89KB
examples/Baselines/Halite_competition/torch/rl_trainer/utils.py 7.64KB
examples/tutorials/homework/lesson3/dqn_mountaincar/agent.py 3.89KB
examples/tutorials/homework/lesson2/sarsa_frozenlake/
examples/tutorials/homework/lesson2/sarsa_frozenlake/gridworld.py 6.53KB
examples/tutorials/homework/lesson2/q_learning_frozenlake/gridworld.py 6.53KB
examples/DQN_variant/.benchmark/
examples/DQN_variant/.benchmark/Dueling DQN.png 218.21KB
examples/tutorials/lesson2/sarsa/
examples/tutorials/lesson2/sarsa/gridworld.py 6.53KB
examples/tutorials/requirements.txt 126B
examples/tutorials/lesson2/sarsa/train.py 2.95KB
examples/tutorials/lesson2/q_learning/gridworld.py 6.53KB
examples/tutorials/lesson5/ddpg/model.py 1.73KB
examples/SAC/mujoco_agent.py 1.83KB
examples/tutorials/lesson3/dqn/train.py 4.82KB
examples/IMPALA/requirements.txt 74B
examples/DQN_variant/requirements.txt 79B
examples/TD3/mujoco_model.py 2.54KB
examples/Baselines/Halite_competition/torch/test.py 1.44KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/args.py 3.51KB
examples/tutorials/parl2_dygraph/lesson4/
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/agent.py 1.8KB
examples/tutorials/parl2_dygraph/lesson4/homework/
examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/
examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/agent.py 1.8KB
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/train.py 4.67KB
examples/OAC/README.md 1.04KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/utils.py 13.97KB
examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/train.py 4.29KB
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/algorithm.py 1.94KB
examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/model.py 1.35KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/utils.py 2.59KB
examples/QMIX/qmix_agent.py 5.35KB
examples/OAC/mujoco_model.py 2.55KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/README.md 700B
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/demo.gif 4.58MB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/fastest.png 270.84KB
examples/PPO/atari_model.py 2.03KB
examples/PPO/README.md 2.48KB
examples/Baselines/Halite_competition/paddle/requirements.txt 32B
examples/tutorials/lesson5/ddpg/algorithm.py 3.46KB
examples/tutorials/lesson5/ddpg/agent.py 2.67KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_pb2_grpc.py 1.93KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_client.py 4.25KB
examples/tutorials/homework/lesson2/sarsa_frozenlake/train.py 2.67KB
examples/QMIX/qmixer_model.py 3.06KB
examples/QMIX/train.py 6.51KB
examples/tutorials/lesson4/policy_gradient/train.py 3.66KB
examples/CQL/mujoco_model.py 2.78KB
examples/tutorials/parl2_dygraph/requirements.txt 130B
examples/SAC/README.md 1.24KB
examples/NeurIPS2019-Learn-to-Move-Challenge/train_args.py 2.73KB
examples/DQN_variant/README.md 2.65KB
examples/QMIX/README.md 1.31KB
examples/QMIX/requirements.txt 37B
examples/QMIX/env_wrapper.py 3.11KB
examples/QuickStart/train.py 3.83KB
examples/AlphaZero/MCTS.py 5.83KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/train.py 4.21KB
examples/tutorials/lesson3/dqn/algorithm.py 3.02KB
examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/train.py 6.06KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/agent.py 2.31KB
examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_model.py 5.81KB
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/model.py 1.26KB
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/train.py 3.65KB
examples/tutorials/lesson2/q_learning/train.py 2.85KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty3.sh 416B
examples/tutorials/README.md 1.74KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/agent.py 19.08KB
examples/tutorials/homework/lesson2/sarsa_frozenlake/agent.py 2.77KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/agent.py 12.96KB
examples/tutorials/lesson2/sarsa/agent.py 2.77KB
examples/PPO/train.py 5.99KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/images/
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/images/l2rpn.jpeg 69.44KB
examples/OAC/train.py 5.45KB
examples/Baselines/Halite_competition/paddle/model/
examples/Baselines/Halite_competition/paddle/model/latest_ship_model.pth 325.14KB
examples/AlphaZero/Arena.py 3.24KB
examples/QuickStart/performance.gif 237.51KB
examples/NeurIPS2019-Learn-to-Move-Challenge/image/
examples/NeurIPS2019-Learn-to-Move-Challenge/image/performance.gif 782.27KB
examples/CARLA_SAC/.benchmark/carla_sac.png 141.86KB
examples/A2C/.result/result_a2c_paddle1.png 203.23KB
examples/Baselines/Halite_competition/torch/model/
examples/Baselines/Halite_competition/torch/model/latest_ship_model.pth 338.03KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/last course.png 360.06KB
examples/QMIX/images/
examples/QMIX/images/paddle2.0_qmix_result.png 97.1KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/algorithm.py 3.69KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/env.py 6.33KB
examples/CARLA_SAC/model.ckpt 4.63MB
examples/ES/mujoco_agent.py 2.77KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/model.py 1.94KB
examples/ES/es_config.py 1.2KB
examples/ES/mujoco_model.py 1.93KB

资源介绍:

强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）内涵20+强化学习经典算法代码。对应使用教程什么的参考博客：多智能体（前沿算法+原理） https://blog.csdn.net/sinat_39620217/article/details/115299073?spm=1001.2014.3001.5502 强化学习基础篇（单智能体算法） https://blog.csdn.net/sinat_39620217/category_10940146.html

# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge <img src="image/competition.png" alt="PARL" width="800"/> This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation. For more technical details about our solution, we provide: 1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually. 2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop. 3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop. 3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient. **Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person. <img src="image/demo.gif" alt="PARL" width="500"/> ## Dependencies - python3.6 - [parl==1.0](https://github.com/PaddlePaddle/PARL) - [paddlepaddle==1.5.1](https://github.com/PaddlePaddle/Paddle) - [osim-rl](https://github.com/stanfordnmbl/osim-rl) - [grpcio==1.12.1](https://grpc.io/docs/quickstart/python.html) - tqdm - tensorflow (To use tensorboard) ## Part1: Final submitted model ### Result For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds. | Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes | |----------------------------|---------------------------------|------------|-------------------| | 9968.5404 | 9980.3952 | 0.0026 | 5000 | ### Test - How to Run 1. Enter the sub-folder `final_submit` 2. Download the model file from online storage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3) 3. Unpack the file by using: `tar zxvf saved_model.tar.gz` 4. Launch the test script: `python test.py` ## Part2: Curriculum learning <img src="image/curriculum-learning.png" alt="PARL" width="500"/> #### 1. Target: Run as fast as possible <img src="image/fastest.png" alt="PARL" width="800"/> ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest ``` #### 2. Target: run at 3.0 m/s ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [RunFastest model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \ --act_penalty_lowerbound 1.5 ``` #### 3. target: walk at 2.0 m/s ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 3.0m/s model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \ --act_penalty_lowerbound 0.75 ``` #### 4. target: walk slowly at 1.25 m/s <img src="image/last course.png" alt="PARL" width="800"/> ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 2.0m/s model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \ --act_penalty_lowerbound 0.6 ``` ## Part3: Training in random velocity environment for round2 evaluation As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7)) ```bash # server python simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head # client (Suggest: 100+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \ --act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3 ``` ### Test trained model ```bash python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM] ``` ### Other implementation details <img src="image/velocity_distribution.png" alt="PARL" width="800"/> Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :) ## Acknowledgments We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.

标题	大小	时间
Java基础教程	60.63KB	2月前

天语E500_V0820_20100820刷机包1	422.34KB	2月前
数字逻辑-交通灯系统设计(HUST) 1-12关头歌	39.25KB	2月前
web前端 html+css+js+jquery 网易云音乐官网模仿	3.46MB	2月前
spacedsk 适用于win11 最新版本1.0.50下载	3.39MB	2月前

FoxitPDFEdit汉化版	2.1MB	2月前
c# 编写的虚拟键盘	616.31KB	2月前
JAVA课程设计，学生管理系统，设计SQL server数据库操作	1020.61KB	2月前