Skip to main content

Advertisement

SpringerLink
  • Log in
Book cover

European Conference on Machine Learning

ECML 2007: Machine Learning: ECML 2007 pp 466–477Cite as

  1. Home
  2. Machine Learning: ECML 2007
  3. Conference paper
Policy Gradient Critics

Policy Gradient Critics

  • Daan Wierstra1 &
  • Jürgen Schmidhuber1,2 
  • Conference paper
  • 5272 Accesses

  • 4 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 4701)

Abstract

We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov Decision Processes (POMDPs) that require long-term memories of past observations and actions. The approach involves estimating a policy gradient for an Actor through a Policy Gradient Critic which evaluates probability distributions on actions. Gradient-based updates of history-conditional action probability distributions enable the algorithm to learn a mapping from memory states (or event histories) to probability distributions on actions, solving POMDPs through a combination of memory and stochasticity. This goes beyond previous approaches to learning purely reactive POMDP policies, without giving up their advantages. Preliminary results on important benchmark tasks show that our approach can in principle be used as a general purpose POMDP algorithm that solves RL problems in both continuous and discrete action domains.

Keywords

  • Reinforcement Learn
  • Recurrent Neural Network
  • Partially Observable Markov Decision Process
  • Reinforcement Learn Algorithm
  • Reinforcement Learn Method

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge, MA (1998)

    Google Scholar 

  2. Singh, S., Jaakkola, T., Jordan, M.: Learning without state-estimation in partially observable markovian decision processes. In: International Conference on Machine Learning, pp. 284–292 (1994)

    Google Scholar 

  3. Aberdeen, D.: Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Australian National University (2003)

    Google Scholar 

  4. Meuleau, N.L., Kim, K., Kaelbling, L.P.: Learning finite-state controllers for partially observable environments. In: Proc. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI 1999), pp. 427–436. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  5. Gomez, F.J., Schmidhuber, J.: Co-evolving recurrent neurons learn deep memory POMDPs. In: Proc. of the 2005 conference on genetic and evolutionary computation (GECCO), Washington, D. C., ACM Press, New York (2005)

    Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)

    CrossRef  Google Scholar 

  7. Werbos, P.: Back propagation through time: What it does and how to do it. Proceedings of the IEEE 78, 1550–1560 (1990)

    CrossRef  Google Scholar 

  8. Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent networks. Neural Computation 1(2), 270–280 (1989)

    CrossRef  Google Scholar 

  9. Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, Los Alamitos (2001)

    Google Scholar 

  10. Schmidhuber, J.: RNN overview, with links to a dozen journal publications (2004) http://www.idsia.ch/~juergen/rnn.html

  11. Littman, M., Cassandra, A., Kaelbling, L.: Learning policies for partially observable environments: Scaling up. In: Prieditis, A., Russell, S. (eds.) Machine Learning: Proceedings of the Twelfth International Conference, pp. 362–370. Morgan Kaufmann Publishers, San Francisco, CA (1995)

    Google Scholar 

  12. Wieland, A.: Evolving neural network controllers for unstable systems. In: Proceedings of the International Joint Conference on Neural Networks, Seattle, WA, pp. 667–673. IEEE, Piscataway, NJ (1991)

    Google Scholar 

  13. Bakker, B.: Reinforcement learning with long short-term memory. Advances in Neural Information Processing Syst. 14 (2002)

    Google Scholar 

  14. Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: Proc. 15th International Conf. on Machine Learning, pp. 323–331. Morgan Kaufmann, San Francisco, CA (1998)

    Google Scholar 

  15. Bakker, B.: The State of Mind: Reinforcement Learning with Recurrent Neural Networks. PhD thesis, Leiden University (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA), CH-6928 Manno-Lugano, Switzerland

    Daan Wierstra & Jürgen Schmidhuber

  2. Department of Embedded Systems and Robotics, Technical University Munich, D-85748 Garching, Germany

    Jürgen Schmidhuber

Authors
  1. Daan Wierstra
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Jürgen Schmidhuber
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

    Rights and permissions

    Reprints and Permissions

    Copyright information

    © 2007 Springer-Verlag Berlin Heidelberg

    About this paper

    Cite this paper

    Wierstra, D., Schmidhuber, J. (2007). Policy Gradient Critics. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_43

    Download citation

    • .RIS
    • .ENW
    • .BIB
    • DOI: https://doi.org/10.1007/978-3-540-74958-5_43

    • Publisher Name: Springer, Berlin, Heidelberg

    • Print ISBN: 978-3-540-74957-8

    • Online ISBN: 978-3-540-74958-5

    • eBook Packages: Computer ScienceComputer Science (R0)

    Share this paper

    Anyone you share the following link with will be able to read this content:

    Sorry, a shareable link is not currently available for this article.

    Provided by the Springer Nature SharedIt content-sharing initiative

    Over 10 million scientific documents at your fingertips

    Switch Edition
    • Academic Edition
    • Corporate Edition
    • Home
    • Impressum
    • Legal information
    • Privacy statement
    • California Privacy Statement
    • How we use cookies
    • Manage cookies/Do not sell my data
    • Accessibility
    • FAQ
    • Contact us
    • Affiliate program

    Not logged in - 34.239.152.207

    Not affiliated

    Springer Nature

    © 2023 Springer Nature Switzerland AG. Part of Springer Nature.