Pseudorehearsal in Value Function Approximation

Marochko, Vladimir; Johard, Leonard; Mazzara, Manuel

doi:10.1007/978-3-319-59394-4_18

Vladimir Marochko⁸,
Leonard Johard⁸ &
Manuel Mazzara⁸

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 74))

Included in the following conference series:

KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications

Abstract

Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Murphy, K.P.: A survey of pomdp solution techniques. Environment 2, X3 (2000)
Google Scholar
Johard, L., Ruffaldi, E.: A connectionist actor-critic algorithm for faster learning and biological plausibility. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3903–3909. IEEE (2014)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Article MathSciNet MATH Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
MATH Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Geist, M., Pietquin, O.: A brief survey of parametric value function approximation. Rapport interne, Supélec (2010)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1, no. 1. MIT Press, Cambridge (1998)
Google Scholar
Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)
Article MathSciNet MATH Google Scholar
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989)
Article Google Scholar
French, R.M.: Semi-distributed representations and catastrophic forgetting in connectionist networks. Connection Sci. 4(3–4), 365–377 (1992)
Article Google Scholar
Coop, R., Mishtal, A., Arel, I.: Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1623–1634 (2013)
Article Google Scholar
Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285 (1990)
Article Google Scholar
Hinton, G.E., Plaut, D.C.: Using fast weights to deblur old memories. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society, pp. 177–186 (1987)
Google Scholar
Lin, L.-J.: Reinforcement learning for robots using neural networks. Technical report, DTIC Document (1993)
Google Scholar
Adam, S., Busoniu, L., Babuska, R.: Experience replay for real-time reinforcement learning control. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 201–212 (2012)
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Robins, A.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Sci. 7(2), 123–146 (1995)
Article Google Scholar
Frean, M., Robins, A.: Catastrophic forgetting in simple networks: an analysis of the pseudorehearsal solution. Netw. Comput. Neural Syst. 10(3), 227–236 (1999)
Article MATH Google Scholar
Goodrich, B.F.: Neuron clustering for mitigating catastrophic forgetting in supervised and reinforcement learning. Ph.D. dissertation, University of Tennessee (2015)
Google Scholar
Baddeley, B.: Reinforcement learning in continuous time and space: interference and not ill conditioning is the main problem when using distributed function approximators. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(4), 950–956 (2008)
Article Google Scholar
McClelland, J.L., McNaughton, B.L., O’Reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995)
Article Google Scholar
Robins, A., McCallum, S.: The consolidation of learning during sleep: comparing the pseudorehearsal and unlearning accounts. Neural Netw. 12(7), 1191–1206 (1999)
Article Google Scholar
Hattori, M.: A biologically inspired dual-network memory model for reduction of catastrophic forgetting. Neurocomputing 134, 262–268 (2014)
Article Google Scholar
Breitwieser, L., Bauer, R., Meglio, A.D., Johard, L., Kaiser, M., Manca, M., Mazzara, M., Rademakers, F., Talanov, M.: The biodynamo project: creating a platform for large-scale reproducible biological simulations. In: 4th Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4) (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Innopolis University, Universitetskaya street 1, 420500, Innopolis, Russia
Vladimir Marochko, Leonard Johard & Manuel Mazzara

Authors

Vladimir Marochko
View author publications
You can also search for this author in PubMed Google Scholar
Leonard Johard
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Mazzara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Mazzara .

Editor information

Editors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Gordan Jezic
Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Mario Kusek
School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, United Kingdom
Yun-Heh Jessica Chen-Burger
Fern Barrow, Bournemouth University, Poole, Dorset, United Kingdom
Robert J. Howlett
University of Canberra, Canberra, Aust Capital Terr, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marochko, V., Johard, L., Mazzara, M. (2018). Pseudorehearsal in Value Function Approximation. In: Jezic, G., Kusek, M., Chen-Burger, YH., Howlett, R., Jain, L. (eds) Agent and Multi-Agent Systems: Technology and Applications. KES-AMSTA 2017. Smart Innovation, Systems and Technologies, vol 74. Springer, Cham. https://doi.org/10.1007/978-3-319-59394-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-59394-4_18
Published: 24 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59393-7
Online ISBN: 978-3-319-59394-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics