首页下载资源人工智能强化学习算法合集(DQN、DDPG、SAC、TD3、MADDPG、QMIX等等)

ZIP强化学习算法合集(DQN、DDPG、SAC、TD3、MADDPG、QMIX等等)

sinat_3962021717.37MB需要积分:1

资源文件列表:

DRL_code.zip 大约有381个文件
  1. examples/
  2. examples/Baselines/
  3. examples/Baselines/GridDispatch_competition/
  4. examples/Baselines/GridDispatch_competition/README.md 334B
  5. examples/Baselines/Halite_competition/
  6. examples/Baselines/Halite_competition/torch/
  7. examples/Baselines/Halite_competition/torch/rl_trainer/
  8. examples/Baselines/Halite_competition/torch/rl_trainer/controller.py 20.6KB
  9. examples/DDPG/
  10. examples/DDPG/train.py 5.27KB
  11. examples/AlphaZero/
  12. examples/AlphaZero/Coach.py 8.8KB
  13. examples/A2C/
  14. examples/A2C/actor.py 4.52KB
  15. examples/A2C/atari_model.py 3.17KB
  16. examples/DQN/
  17. examples/DQN/README.md 849B
  18. examples/AlphaZero/README.md 1.91KB
  19. examples/A2C/atari_agent.py 4KB
  20. examples/Baselines/GridDispatch_competition/torch/
  21. examples/Baselines/GridDispatch_competition/torch/grid_model.py 2.54KB
  22. examples/Baselines/GridDispatch_competition/torch/README.md 1.6KB
  23. examples/AlphaZero/.pic/
  24. examples/AlphaZero/.pic/perfect_moves_rate.png 64.44KB
  25. examples/DDPG/mujoco_model.py 2.1KB
  26. examples/DQN_variant/
  27. examples/DQN_variant/train.py 6.56KB
  28. examples/CARLA_SAC/
  29. examples/CARLA_SAC/carla_agent.py 1.71KB
  30. examples/Baselines/Halite_competition/torch/train.py 8.93KB
  31. examples/CARLA_SAC/train.py 5.4KB
  32. examples/DQN/requirements.txt 43B
  33. examples/CARLA_SAC/evaluate.py 2.62KB
  34. examples/CARLA_SAC/carla_model.py 3.29KB
  35. examples/Baselines/Halite_competition/torch/rl_trainer/obs_parser.py 3.27KB
  36. examples/Baselines/Halite_competition/torch/rl_trainer/agent.py 4.21KB
  37. examples/Baselines/Halite_competition/paddle/
  38. examples/Baselines/Halite_competition/paddle/rl_trainer/
  39. examples/Baselines/Halite_competition/paddle/rl_trainer/obs_parser.py 3.27KB
  40. examples/ES/
  41. examples/ES/train.py 7.53KB
  42. examples/ES/obs_filter.py 6.09KB
  43. examples/IMPALA/
  44. examples/IMPALA/atari_model.py 2.85KB
  45. examples/ES/noise.py 955B
  46. examples/MADDPG/
  47. examples/MADDPG/README.md 3.16KB
  48. examples/IMPALA/actor.py 3.9KB
  49. examples/IMPALA/README.md 1.84KB
  50. examples/ES/optimizers.py 1.82KB
  51. examples/DDPG/mujoco_agent.py 1.98KB
  52. examples/MADDPG/requirements.txt 56B
  53. examples/AlphaZero/connect4_aiplayer.py 4.72KB
  54. examples/AlphaZero/utils.py 1.8KB
  55. examples/AlphaZero/main.py 2.78KB
  56. examples/Baselines/GridDispatch_competition/paddle/
  57. examples/Baselines/GridDispatch_competition/paddle/grid_agent.py 1.85KB
  58. examples/DQN/train.py 4.31KB
  59. examples/Baselines/Halite_competition/paddle/README.md 3.39KB
  60. examples/Baselines/GridDispatch_competition/paddle/grid_model.py 2.55KB
  61. examples/Baselines/Halite_competition/paddle/rl_trainer/utils.py 7.59KB
  62. examples/CQL/
  63. examples/CQL/mujoco_agent.py 1.83KB
  64. examples/Baselines/Halite_competition/paddle/rl_trainer/replay_memory.py 3.66KB
  65. examples/Baselines/Halite_competition/paddle/rl_trainer/algorithm.py 5.32KB
  66. examples/Baselines/Halite_competition/torch/encode_model.py 972B
  67. examples/AlphaZero/alphazero_agent.py 3.64KB
  68. examples/CARLA_SAC/env_utils.py 3.87KB
  69. examples/CARLA_SAC/env_config.py 2.72KB
  70. examples/Baselines/Halite_competition/paddle/rl_trainer/model.py 2.25KB
  71. examples/AlphaZero/connect4_game.py 7.87KB
  72. examples/Baselines/Halite_competition/paddle/rl_trainer/controller.py 20.55KB
  73. examples/AlphaZero/connect4_model.py 3.13KB
  74. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/
  75. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim_model.py 6.4KB
  76. examples/IMPALA/train.py 9.43KB
  77. examples/ES/es.py 1.22KB
  78. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim_agent.py 8.61KB
  79. examples/TD3/
  80. examples/TD3/mujoco_agent.py 1.88KB
  81. examples/Baselines/GridDispatch_competition/paddle/train.py 7.05KB
  82. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/
  83. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/submit_model.py 5.18KB
  84. examples/Baselines/Halite_competition/torch/rl_trainer/policy.py 2.54KB
  85. examples/NeurIPS2019-Learn-to-Move-Challenge/
  86. examples/NeurIPS2019-Learn-to-Move-Challenge/env_wrapper.py 16.85KB
  87. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/pelvisBasedObs_scaler.npz 4.22KB
  88. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/pelvisBasedObs_scaler.npz 4.22KB
  89. examples/Baselines/Halite_competition/torch/rl_trainer/algorithm.py 5.36KB
  90. examples/Baselines/Halite_competition/paddle/rl_trainer/policy.py 2.46KB
  91. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/test.py 3.33KB
  92. examples/NeurIPS2019-Learn-to-Move-Challenge/actor.py 1.86KB
  93. examples/Baselines/Halite_competition/torch/test.ipynb 1.56KB
  94. examples/ES/actor.py 4.37KB
  95. examples/Baselines/Halite_competition/paddle/test.py 1.39KB
  96. examples/NeurIPS2019-Learn-to-Move-Challenge/evaluate.py 11.41KB
  97. examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/
  98. examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/env_wrapper.py 9.75KB
  99. examples/NeurIPS2019-Learn-to-Move-Challenge/evaluate_args.py 2.46KB
  100. examples/ES/README.md 1.47KB
  101. examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/submit_model.py 5.54KB
  102. examples/DQN_variant/replay_memory.py 4.09KB
  103. examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/official_obs_scaler.npz 2.2KB
  104. examples/NeurIPS2019-Learn-to-Move-Challenge/official_obs_scaler.npz 2.2KB
  105. examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/test.py 2.51KB
  106. examples/Baselines/Halite_competition/torch/README.md 3.39KB
  107. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/
  108. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty2.sh 256B
  109. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty3_first_target.sh 338B
  110. examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_agent.py 3.51KB
  111. examples/ES/utils.py 2.06KB
  112. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty1.sh 255B
  113. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/
  114. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/
  115. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/es_agent.py 2.85KB
  116. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/
  117. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/evaluate.py 2.79KB
  118. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty3.sh 292B
  119. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/powernet_model.py 2.6KB
  120. examples/PPO/
  121. examples/PPO/atari_config.py 2.19KB
  122. examples/NeurIPS2019-Learn-to-Move-Challenge/replay_memory.py 60B
  123. examples/PPO/agent.py 4.43KB
  124. examples/ES/requirements.txt 58B
  125. examples/AlphaZero/actor.py 6.72KB
  126. examples/PPO/mujoco_config.py 2.18KB
  127. examples/Baselines/GridDispatch_competition/paddle/env_wrapper.py 4.52KB
  128. examples/Baselines/Halite_competition/paddle/encode_model.py 974B
  129. examples/Baselines/GridDispatch_competition/torch/env_wrapper.py 4.52KB
  130. examples/tutorials/
  131. examples/tutorials/homework/
  132. examples/tutorials/homework/lesson4/
  133. examples/tutorials/homework/lesson4/policy_gradient_pong/
  134. examples/tutorials/homework/lesson4/policy_gradient_pong/model.py 1.08KB
  135. examples/Baselines/Halite_competition/paddle/train.py 8.82KB
  136. examples/tutorials/homework/lesson3/
  137. examples/tutorials/homework/lesson3/dqn_mountaincar/
  138. examples/tutorials/homework/lesson3/dqn_mountaincar/replay_memory.py 1.64KB
  139. examples/tutorials/parl2_dygraph/
  140. examples/tutorials/parl2_dygraph/lesson3/
  141. examples/tutorials/parl2_dygraph/lesson3/dqn/
  142. examples/tutorials/parl2_dygraph/lesson3/dqn/train.py 4.7KB
  143. examples/tutorials/lesson5/
  144. examples/tutorials/lesson5/ddpg/
  145. examples/tutorials/lesson5/ddpg/replay_memory.py 1.64KB
  146. examples/tutorials/homework/lesson4/policy_gradient_pong/agent.py 2.87KB
  147. examples/tutorials/lesson1/
  148. examples/tutorials/lesson1/gridworld.py 6.62KB
  149. examples/tutorials/homework/lesson5/
  150. examples/tutorials/homework/lesson5/ddpg_quadrotor/
  151. examples/tutorials/homework/lesson5/ddpg_quadrotor/quadrotor_model.py 1.92KB
  152. examples/Baselines/GridDispatch_competition/paddle/README.md 1.61KB
  153. examples/tutorials/lesson4/
  154. examples/tutorials/lesson4/policy_gradient/
  155. examples/tutorials/lesson4/policy_gradient/agent.py 2.87KB
  156. examples/CQL/train.py 4.36KB
  157. examples/tutorials/homework/lesson4/policy_gradient_pong/train.py 4.23KB
  158. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/multi_head_ddpg.py 4.82KB
  159. examples/AlphaZero/requirements.txt 37B
  160. examples/DQN/cartpole_agent.py 3.17KB
  161. examples/A2C/.result/
  162. examples/A2C/.result/result_a2c_paddle0.png 193.24KB
  163. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/replay_memory.py 3.6KB
  164. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_server.py 11.88KB
  165. examples/others/
  166. examples/others/deepes.py 3.13KB
  167. examples/SAC/
  168. examples/SAC/mujoco_model.py 2.55KB
  169. examples/tutorials/homework/lesson2/
  170. examples/tutorials/homework/lesson2/q_learning_frozenlake/
  171. examples/tutorials/homework/lesson2/q_learning_frozenlake/agent.py 2.73KB
  172. examples/tutorials/lesson2/
  173. examples/tutorials/lesson2/q_learning/
  174. examples/tutorials/lesson2/q_learning/agent.py 2.73KB
  175. examples/CQL/README.md 1.51KB
  176. examples/Baselines/GridDispatch_competition/torch/train.py 7.04KB
  177. examples/Baselines/Halite_competition/torch/requirements.txt 25B
  178. examples/Baselines/Halite_competition/paddle/rl_trainer/agent.py 4.03KB
  179. examples/Baselines/Halite_competition/torch/rl_trainer/model.py 2.24KB
  180. examples/DDPG/README.md 1.11KB
  181. examples/DQN/cartpole_model.py 1.3KB
  182. examples/Baselines/Halite_competition/paddle/submission.py 99.84KB
  183. examples/A2C/requirements.txt 67B
  184. examples/DDPG/requirements.txt 58B
  185. examples/Baselines/Halite_competition/paddle/test.ipynb 1.46KB
  186. examples/MADDPG/train.py 6.93KB
  187. examples/TD3/requirements.txt 58B
  188. examples/SAC/requirements.txt 58B
  189. examples/CQL/requirements.txt 121B
  190. examples/A2C/README.md 1.4KB
  191. examples/A2C/train.py 7.1KB
  192. examples/Baselines/Halite_competition/torch/config.py 1.35KB
  193. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/test.py 3.22KB
  194. examples/MADDPG/simple_model.py 3.59KB
  195. examples/QuickStart/
  196. examples/QuickStart/cartpole_model.py 1.23KB
  197. examples/IMPALA/atari_agent.py 2.91KB
  198. examples/Baselines/Halite_competition/torch/submission.py 100.1KB
  199. examples/TD3/README.md 1.24KB
  200. examples/QuickStart/cartpole_agent.py 2.27KB
  201. examples/SAC/train.py 5.09KB
  202. examples/MADDPG/simple_agent.py 4.43KB
  203. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/env_wrapper.py 17.21KB
  204. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/env_wrapper.py 28.33KB
  205. examples/DQN_variant/atari_model.py 3.3KB
  206. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/mlp_model.py 6.49KB
  207. examples/OAC/
  208. examples/OAC/requirements.txt 58B
  209. examples/NeurIPS2019-Learn-to-Move-Challenge/README.md 3.2KB
  210. examples/TD3/train.py 5.12KB
  211. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/es.py 1.57KB
  212. examples/PPO/requirements_mujoco.txt 58B
  213. examples/PPO/env_utils.py 6.95KB
  214. examples/NeurIPS2019-Learn-to-Move-Challenge/train.py 11.9KB
  215. examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/mlp_model.py 6.46KB
  216. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty1.sh 341B
  217. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty2.sh 320B
  218. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/es.py 1.23KB
  219. examples/PPO/requirements_atari.txt 74B
  220. examples/tutorials/homework/lesson5/ddpg_quadrotor/quadrotor_agent.py 2.65KB
  221. examples/QMIX/
  222. examples/QMIX/replay_buffer.py 3.33KB
  223. examples/PPO/mujoco_model.py 1.96KB
  224. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/README.md 659B
  225. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/
  226. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/velocity_distribution.png 27.9KB
  227. examples/tutorials/homework/lesson5/ddpg_quadrotor/train.py 6.11KB
  228. examples/QuickStart/README.md 435B
  229. examples/QuickStart/requirements.txt 43B
  230. examples/tutorials/parl2_dygraph/lesson5/
  231. examples/tutorials/parl2_dygraph/lesson5/ddpg/
  232. examples/tutorials/parl2_dygraph/lesson5/ddpg/replay_memory.py 1.64KB
  233. examples/tutorials/parl2_dygraph/lesson3/dqn/replay_memory.py 1.64KB
  234. examples/tutorials/parl2_dygraph/lesson3/homework/
  235. examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/
  236. examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/replay_memory.py 1.64KB
  237. examples/tutorials/parl2_dygraph/lesson5/homework/
  238. examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/
  239. examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/quadrotor_model.py 2.13KB
  240. examples/QMIX/qmix_config.py 2.69KB
  241. examples/tutorials/parl2_dygraph/lesson3/dqn/agent.py 2.79KB
  242. examples/tutorials/homework/lesson3/dqn_mountaincar/model.py 1.11KB
  243. examples/tutorials/lesson3/
  244. examples/tutorials/lesson3/dqn/
  245. examples/tutorials/lesson3/dqn/model.py 1.11KB
  246. examples/tutorials/homework/lesson2/q_learning_frozenlake/train.py 2.56KB
  247. examples/tutorials/parl2_dygraph/lesson3/dqn/model.py 1.3KB
  248. examples/QMIX/rnn_model.py 1.45KB
  249. examples/A2C/a2c_config.py 1.29KB
  250. examples/DQN/cartpole.jpg 110.07KB
  251. examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/model.py 1.3KB
  252. examples/tutorials/lesson5/ddpg/env.py 6.33KB
  253. examples/AlphaZero/.pic/good_moves_rate.png 60.06KB
  254. examples/Baselines/Halite_competition/torch/rl_trainer/replay_memory.py 3.6KB
  255. examples/CARLA_SAC/README.md 2.78KB
  256. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/utils.py 3.25KB
  257. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md 6.94KB
  258. examples/tutorials/lesson5/ddpg/train.py 4.25KB
  259. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/evaluate.py 2.79KB
  260. examples/DQN_variant/atari_agent.py 4.11KB
  261. examples/IMPALA/impala_config.py 1.5KB
  262. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/competition.png 184.81KB
  263. examples/PPO/storage.py 3.09KB
  264. examples/OAC/mujoco_agent.py 1.85KB
  265. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty3_first_target.sh 416B
  266. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/es_agent.py 1.62KB
  267. examples/tutorials/lesson3/dqn/replay_memory.py 1.64KB
  268. examples/Baselines/Halite_competition/paddle/config.py 1.35KB
  269. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/README.md 718B
  270. examples/QMIX/utils.py 1.66KB
  271. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/powernet_model.py 2.67KB
  272. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/curriculum-learning.png 158.38KB
  273. examples/Baselines/GridDispatch_competition/torch/grid_agent.py 1.97KB
  274. examples/CARLA_SAC/.benchmark/
  275. examples/CARLA_SAC/.benchmark/Lane_bend.gif 3.19MB
  276. examples/tutorials/parl2_dygraph/README.md 1.38KB
  277. examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/quadrotor_agent.py 2.01KB
  278. examples/tutorials/parl2_dygraph/lesson3/dqn/algorithm.py 2.86KB
  279. examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/agent.py 2.79KB
  280. examples/tutorials/lesson4/policy_gradient/algorithm.py 1.7KB
  281. examples/tutorials/lesson4/policy_gradient/model.py 1.04KB
  282. examples/tutorials/homework/lesson3/dqn_mountaincar/train.py 4.72KB
  283. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_pb2.py 7.22KB
  284. examples/tutorials/lesson3/dqn/agent.py 3.89KB
  285. examples/Baselines/Halite_competition/torch/rl_trainer/utils.py 7.64KB
  286. examples/tutorials/homework/lesson3/dqn_mountaincar/agent.py 3.89KB
  287. examples/tutorials/homework/lesson2/sarsa_frozenlake/
  288. examples/tutorials/homework/lesson2/sarsa_frozenlake/gridworld.py 6.53KB
  289. examples/tutorials/homework/lesson2/q_learning_frozenlake/gridworld.py 6.53KB
  290. examples/DQN_variant/.benchmark/
  291. examples/DQN_variant/.benchmark/Dueling DQN.png 218.21KB
  292. examples/tutorials/lesson2/sarsa/
  293. examples/tutorials/lesson2/sarsa/gridworld.py 6.53KB
  294. examples/tutorials/requirements.txt 126B
  295. examples/tutorials/lesson2/sarsa/train.py 2.95KB
  296. examples/tutorials/lesson2/q_learning/gridworld.py 6.53KB
  297. examples/tutorials/lesson5/ddpg/model.py 1.73KB
  298. examples/SAC/mujoco_agent.py 1.83KB
  299. examples/tutorials/lesson3/dqn/train.py 4.82KB
  300. examples/IMPALA/requirements.txt 74B
  301. examples/DQN_variant/requirements.txt 79B
  302. examples/TD3/mujoco_model.py 2.54KB
  303. examples/Baselines/Halite_competition/torch/test.py 1.44KB
  304. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/args.py 3.51KB
  305. examples/tutorials/parl2_dygraph/lesson4/
  306. examples/tutorials/parl2_dygraph/lesson4/policy_gradient/
  307. examples/tutorials/parl2_dygraph/lesson4/policy_gradient/agent.py 1.8KB
  308. examples/tutorials/parl2_dygraph/lesson4/homework/
  309. examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/
  310. examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/agent.py 1.8KB
  311. examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/train.py 4.67KB
  312. examples/OAC/README.md 1.04KB
  313. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/utils.py 13.97KB
  314. examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/train.py 4.29KB
  315. examples/tutorials/parl2_dygraph/lesson4/policy_gradient/algorithm.py 1.94KB
  316. examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/model.py 1.35KB
  317. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/utils.py 2.59KB
  318. examples/QMIX/qmix_agent.py 5.35KB
  319. examples/OAC/mujoco_model.py 2.55KB
  320. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/README.md 700B
  321. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/demo.gif 4.58MB
  322. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/fastest.png 270.84KB
  323. examples/PPO/atari_model.py 2.03KB
  324. examples/PPO/README.md 2.48KB
  325. examples/Baselines/Halite_competition/paddle/requirements.txt 32B
  326. examples/tutorials/lesson5/ddpg/algorithm.py 3.46KB
  327. examples/tutorials/lesson5/ddpg/agent.py 2.67KB
  328. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_pb2_grpc.py 1.93KB
  329. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_client.py 4.25KB
  330. examples/tutorials/homework/lesson2/sarsa_frozenlake/train.py 2.67KB
  331. examples/QMIX/qmixer_model.py 3.06KB
  332. examples/QMIX/train.py 6.51KB
  333. examples/tutorials/lesson4/policy_gradient/train.py 3.66KB
  334. examples/CQL/mujoco_model.py 2.78KB
  335. examples/tutorials/parl2_dygraph/requirements.txt 130B
  336. examples/SAC/README.md 1.24KB
  337. examples/NeurIPS2019-Learn-to-Move-Challenge/train_args.py 2.73KB
  338. examples/DQN_variant/README.md 2.65KB
  339. examples/QMIX/README.md 1.31KB
  340. examples/QMIX/requirements.txt 37B
  341. examples/QMIX/env_wrapper.py 3.11KB
  342. examples/QuickStart/train.py 3.83KB
  343. examples/AlphaZero/MCTS.py 5.83KB
  344. examples/tutorials/parl2_dygraph/lesson5/ddpg/train.py 4.21KB
  345. examples/tutorials/lesson3/dqn/algorithm.py 3.02KB
  346. examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/train.py 6.06KB
  347. examples/tutorials/parl2_dygraph/lesson5/ddpg/agent.py 2.31KB
  348. examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_model.py 5.81KB
  349. examples/tutorials/parl2_dygraph/lesson4/policy_gradient/model.py 1.26KB
  350. examples/tutorials/parl2_dygraph/lesson4/policy_gradient/train.py 3.65KB
  351. examples/tutorials/lesson2/q_learning/train.py 2.85KB
  352. examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty3.sh 416B
  353. examples/tutorials/README.md 1.74KB
  354. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/agent.py 19.08KB
  355. examples/tutorials/homework/lesson2/sarsa_frozenlake/agent.py 2.77KB
  356. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/agent.py 12.96KB
  357. examples/tutorials/lesson2/sarsa/agent.py 2.77KB
  358. examples/PPO/train.py 5.99KB
  359. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/images/
  360. examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/images/l2rpn.jpeg 69.44KB
  361. examples/OAC/train.py 5.45KB
  362. examples/Baselines/Halite_competition/paddle/model/
  363. examples/Baselines/Halite_competition/paddle/model/latest_ship_model.pth 325.14KB
  364. examples/AlphaZero/Arena.py 3.24KB
  365. examples/QuickStart/performance.gif 237.51KB
  366. examples/NeurIPS2019-Learn-to-Move-Challenge/image/
  367. examples/NeurIPS2019-Learn-to-Move-Challenge/image/performance.gif 782.27KB
  368. examples/CARLA_SAC/.benchmark/carla_sac.png 141.86KB
  369. examples/A2C/.result/result_a2c_paddle1.png 203.23KB
  370. examples/Baselines/Halite_competition/torch/model/
  371. examples/Baselines/Halite_competition/torch/model/latest_ship_model.pth 338.03KB
  372. examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/last course.png 360.06KB
  373. examples/QMIX/images/
  374. examples/QMIX/images/paddle2.0_qmix_result.png 97.1KB
  375. examples/tutorials/parl2_dygraph/lesson5/ddpg/algorithm.py 3.69KB
  376. examples/tutorials/parl2_dygraph/lesson5/ddpg/env.py 6.33KB
  377. examples/CARLA_SAC/model.ckpt 4.63MB
  378. examples/ES/mujoco_agent.py 2.77KB
  379. examples/tutorials/parl2_dygraph/lesson5/ddpg/model.py 1.94KB
  380. examples/ES/es_config.py 1.2KB
  381. examples/ES/mujoco_model.py 1.93KB

资源介绍:

强化学习算法合集(DQN、DDPG、SAC、TD3、MADDPG、QMIX等等)内涵20+强化学习经典算法代码。对应使用教程什么的参考博客: 多智能体(前沿算法+原理) https://blog.csdn.net/sinat_39620217/article/details/115299073?spm=1001.2014.3001.5502 强化学习基础篇(单智能体算法) https://blog.csdn.net/sinat_39620217/category_10940146.html
# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge

PARL

This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation. For more technical details about our solution, we provide: 1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually. 2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop. 3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop. 3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient. **Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person.

PARL

## Dependencies - python3.6 - [parl==1.0](https://github.com/PaddlePaddle/PARL) - [paddlepaddle==1.5.1](https://github.com/PaddlePaddle/Paddle) - [osim-rl](https://github.com/stanfordnmbl/osim-rl) - [grpcio==1.12.1](https://grpc.io/docs/quickstart/python.html) - tqdm - tensorflow (To use tensorboard) ## Part1: Final submitted model ### Result For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds. | Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes | |----------------------------|---------------------------------|------------|-------------------| | 9968.5404 | 9980.3952 | 0.0026 | 5000 | ### Test - How to Run 1. Enter the sub-folder `final_submit` 2. Download the model file from online storage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3) 3. Unpack the file by using: `tar zxvf saved_model.tar.gz` 4. Launch the test script: `python test.py` ## Part2: Curriculum learning

PARL

#### 1. Target: Run as fast as possible

PARL

```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest ``` #### 2. Target: run at 3.0 m/s ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [RunFastest model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \ --act_penalty_lowerbound 1.5 ``` #### 3. target: walk at 2.0 m/s ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 3.0m/s model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \ --act_penalty_lowerbound 0.75 ``` #### 4. target: walk slowly at 1.25 m/s

PARL

```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 2.0m/s model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \ --act_penalty_lowerbound 0.6 ``` ## Part3: Training in random velocity environment for round2 evaluation As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7)) ```bash # server python simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head # client (Suggest: 100+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \ --act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3 ``` ### Test trained model ```bash python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM] ``` ### Other implementation details

PARL

Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :) ## Acknowledgments We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.
100+评论
captcha