[2009.04416] Phasic Policy Gradientopen searchopen navigation menucontact arXivsubscribe to arXiv mailings

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function. Using separate networks avoids interference between objectives, while using a shared network allows useful features to be shared. PPG is able to achieve the best of both worlds by splitting optimization into two phases, one that advances training and one that distills features. PPG also enables the value function to be more aggressively optimized with a higher level of sample reuse. Compared to PPO, we find that PPG significantly improves sample efficiency on the challenging Procgen Benchmark.

5 mentions: @karlcobbe@hillbig@Montreal_AI@hillbig@jonathanrraiman
Date: 2020/09/16 23:21

Referring Tweets

@hillbig PPG (Phasic Policy Gradient) is a new on-policy RL method. A policy network is trained in two alternating phases; 1) training w/ the same objective as PPO, and 2) training w/ a value function loss just for representation learning. t.co/i0mop2mLmC
@hillbig PPG (Phasic Policy Gradient)は方策オン強化学習で、方策と価値を同時に学習すると干渉し性能劣化してしまうため、方策ネットワークの学習時に、方策由来の誤差で学習するフェーズと、表現学習のためだけに価値由来の誤差で学習するフェーズを交互に行う。PPOを大きく改善。t.co/i0mop2mLmC
@karlcobbe Excited to share our recent work on Phasic Policy Gradient, a new RL algorithm which improves sample efficiency by performing policy optimization and auxiliary optimization in two alternating phases. Check out the paper and code! t.co/EiOWyUereB
@jonathanrraiman Turns out just sharing params between the critic and policy was not without consequences 😈 cool paper on a new rl algo from @karlcobbe PPG t.co/jTUvo6KTLg
@Montreal_AI Phasic Policy Gradient Cobbe et al.: t.co/8UuVGOVPJY Code: t.co/g2ZaNTvkyf #PhasicPolicyGradient #PolicyGradient #ReinforcementLearning t.co/PCJlgBo4mW

Related Entries

Read more [1902.02102] BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling
0 users, 9 mentions 2019/02/08 21:47
Read more [1906.01618] Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representatio...
0 users, 4 mentions 2019/06/05 06:48
Read more [2002.03629] Nonlinear Equation Solving: A Faster Alternative to Feedforward Computationcontact arXi...
0 users, 5 mentions 2020/02/14 18:52
Read more [2002.12880] Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary ...
0 users, 5 mentions 2020/03/02 02:21
Read more [2002.10342] Comparing View-Based and Map-Based Semantic Labelling in Real-Time SLAMcontact arXivarX...
0 users, 3 mentions 2020/03/08 23:20