Skip to main content

Pseudorehearsal in Value Function Approximation

  • Conference paper
  • First Online:
Agent and Multi-Agent Systems: Technology and Applications (KES-AMSTA 2017)

Abstract

Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Murphy, K.P.: A survey of pomdp solution techniques. Environment 2, X3 (2000)

    Google Scholar 

  2. Johard, L., Ruffaldi, E.: A connectionist actor-critic algorithm for faster learning and biological plausibility. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3903–3909. IEEE (2014)

    Google Scholar 

  3. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  4. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  5. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)

    MATH  Google Scholar 

  6. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  7. Geist, M., Pietquin, O.: A brief survey of parametric value function approximation. Rapport interne, Supélec (2010)

    Google Scholar 

  8. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1, no. 1. MIT Press, Cambridge (1998)

    Google Scholar 

  9. Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  10. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989)

    Article  Google Scholar 

  11. French, R.M.: Semi-distributed representations and catastrophic forgetting in connectionist networks. Connection Sci. 4(3–4), 365–377 (1992)

    Article  Google Scholar 

  12. Coop, R., Mishtal, A., Arel, I.: Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1623–1634 (2013)

    Article  Google Scholar 

  13. Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285 (1990)

    Article  Google Scholar 

  14. Hinton, G.E., Plaut, D.C.: Using fast weights to deblur old memories. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society, pp. 177–186 (1987)

    Google Scholar 

  15. Lin, L.-J.: Reinforcement learning for robots using neural networks. Technical report, DTIC Document (1993)

    Google Scholar 

  16. Adam, S., Busoniu, L., Babuska, R.: Experience replay for real-time reinforcement learning control. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 201–212 (2012)

    Article  Google Scholar 

  17. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  18. Robins, A.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Sci. 7(2), 123–146 (1995)

    Article  Google Scholar 

  19. Frean, M., Robins, A.: Catastrophic forgetting in simple networks: an analysis of the pseudorehearsal solution. Netw. Comput. Neural Syst. 10(3), 227–236 (1999)

    Article  MATH  Google Scholar 

  20. Goodrich, B.F.: Neuron clustering for mitigating catastrophic forgetting in supervised and reinforcement learning. Ph.D. dissertation, University of Tennessee (2015)

    Google Scholar 

  21. Baddeley, B.: Reinforcement learning in continuous time and space: interference and not ill conditioning is the main problem when using distributed function approximators. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(4), 950–956 (2008)

    Article  Google Scholar 

  22. McClelland, J.L., McNaughton, B.L., O’Reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995)

    Article  Google Scholar 

  23. Robins, A., McCallum, S.: The consolidation of learning during sleep: comparing the pseudorehearsal and unlearning accounts. Neural Netw. 12(7), 1191–1206 (1999)

    Article  Google Scholar 

  24. Hattori, M.: A biologically inspired dual-network memory model for reduction of catastrophic forgetting. Neurocomputing 134, 262–268 (2014)

    Article  Google Scholar 

  25. Breitwieser, L., Bauer, R., Meglio, A.D., Johard, L., Kaiser, M., Manca, M., Mazzara, M., Rademakers, F., Talanov, M.: The biodynamo project: creating a platform for large-scale reproducible biological simulations. In: 4th Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4) (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Mazzara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Marochko, V., Johard, L., Mazzara, M. (2018). Pseudorehearsal in Value Function Approximation. In: Jezic, G., Kusek, M., Chen-Burger, YH., Howlett, R., Jain, L. (eds) Agent and Multi-Agent Systems: Technology and Applications. KES-AMSTA 2017. Smart Innovation, Systems and Technologies, vol 74. Springer, Cham. https://doi.org/10.1007/978-3-319-59394-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59394-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59393-7

  • Online ISBN: 978-3-319-59394-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics