Although the trained agent could complete levels quite fast and quite well (at least faster and better than I played □), it still did not totally satisfy me. It has been a while since I have released my A3C implementation ( A3C code) for training an agent to play super mario bros. Specifically, The OpenAI Five dispatched a team of casters and ex-pros with MMR rankings in the 99.95th percentile of Dota 2 players in August 2018. Talking about performance, my PPO-trained agent could complete 31/32 levels, which is much better than what I expected at the beginning.įor your information, PPO is the algorithm proposed by OpenAI and used for training OpenAI Five, which is the first AI to beat the world champions in an esports game. By using Proximal Policy Optimization (PPO) algorithm introduced in the paper Proximal Policy Optimization Algorithms paper. Here is my python source code for training an agent to play super mario bros. Proximal Policy Optimization (PPO) for playing Super Mario Bros Introduction
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |