Natural Actor-Critic

  • Jan Peters
  • Sethu Vijayakumar
  • Stefan Schaal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke’s Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.


Reinforcement Learning Fisher Information Matrix Natural Gradient Imitation Learning Motor Primitive 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)CrossRefGoogle Scholar
  2. 2.
    Bagnell, J., Schneider, J.: Covariant policy search. In: International Joint Conference on Artificial Intelligence (2003)Google Scholar
  3. 3.
    Baird, L.C.: Advantage Updating. Wright Lab. Tech. Rep. WL-TR-93-1146 (1993)Google Scholar
  4. 4.
    Baird, L.C., Moore, A.W.: Gradient descent for general reinforcement learning. In: Advances in Neural Information Processing Systems 11 (1999)Google Scholar
  5. 5.
    Bartlett, P.: An introduction to reinforcement learning theory: Value function methods. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures on Machine Learning. LNCS (LNAI), vol. 2600, pp. 184–202. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)zbMATHGoogle Scholar
  7. 7.
    Boyan, J.: Least-squares temporal difference learning. In: Machine Learning: Proceedings of the Sixteenth International Conference, pp. 49–56 (1999)Google Scholar
  8. 8.
    Bradtke, S., Ydstie, E., Barto, A.G.: Adaptive Linear Quadratic Control Using Policy Iteration. University of Massachusetts, Amherst, MA (1994)Google Scholar
  9. 9.
    Ijspeert, A., Nakanishi, J., Schaal, S.: Learning rhythmic movements by demonstration using nonlinear oscillators. In: IEEE International Conference on Intelligent Robots and Systems (IROS 2002), pp. 958–963 (2002)Google Scholar
  10. 10.
    Kakade, S.A.: Natural policy gradient. In: Advances in Neural Information Processing Systems 14 (2002)Google Scholar
  11. 11.
    Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems 12 (2000)Google Scholar
  12. 12.
    Moon, T., Stirling, W.: Mathematical Methods and Algorithms for Signal Processing. Prentice Hall, Englewood Cliffs (2000)Google Scholar
  13. 13.
    Peters, J., Vijaykumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: IEEE International Conference on Humandoid Robots (2003)Google Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  15. 15.
    Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jan Peters
    • 1
  • Sethu Vijayakumar
    • 2
  • Stefan Schaal
    • 1
  1. 1.University of Southern CaliforniaLos AngelesUSA
  2. 2.University of EdinburghEdinburghUnited Kingdom

Personalised recommendations