Abstract
Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Murphy, K.P.: A survey of pomdp solution techniques. Environment 2, X3 (2000)
Johard, L., Ruffaldi, E.: A connectionist actor-critic algorithm for faster learning and biological plausibility. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3903–3909. IEEE (2014)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Geist, M., Pietquin, O.: A brief survey of parametric value function approximation. Rapport interne, Supélec (2010)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1, no. 1. MIT Press, Cambridge (1998)
Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989)
French, R.M.: Semi-distributed representations and catastrophic forgetting in connectionist networks. Connection Sci. 4(3–4), 365–377 (1992)
Coop, R., Mishtal, A., Arel, I.: Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1623–1634 (2013)
Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285 (1990)
Hinton, G.E., Plaut, D.C.: Using fast weights to deblur old memories. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society, pp. 177–186 (1987)
Lin, L.-J.: Reinforcement learning for robots using neural networks. Technical report, DTIC Document (1993)
Adam, S., Busoniu, L., Babuska, R.: Experience replay for real-time reinforcement learning control. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 201–212 (2012)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Robins, A.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Sci. 7(2), 123–146 (1995)
Frean, M., Robins, A.: Catastrophic forgetting in simple networks: an analysis of the pseudorehearsal solution. Netw. Comput. Neural Syst. 10(3), 227–236 (1999)
Goodrich, B.F.: Neuron clustering for mitigating catastrophic forgetting in supervised and reinforcement learning. Ph.D. dissertation, University of Tennessee (2015)
Baddeley, B.: Reinforcement learning in continuous time and space: interference and not ill conditioning is the main problem when using distributed function approximators. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(4), 950–956 (2008)
McClelland, J.L., McNaughton, B.L., O’Reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995)
Robins, A., McCallum, S.: The consolidation of learning during sleep: comparing the pseudorehearsal and unlearning accounts. Neural Netw. 12(7), 1191–1206 (1999)
Hattori, M.: A biologically inspired dual-network memory model for reduction of catastrophic forgetting. Neurocomputing 134, 262–268 (2014)
Breitwieser, L., Bauer, R., Meglio, A.D., Johard, L., Kaiser, M., Manca, M., Mazzara, M., Rademakers, F., Talanov, M.: The biodynamo project: creating a platform for large-scale reproducible biological simulations. In: 4th Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4) (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Marochko, V., Johard, L., Mazzara, M. (2018). Pseudorehearsal in Value Function Approximation. In: Jezic, G., Kusek, M., Chen-Burger, YH., Howlett, R., Jain, L. (eds) Agent and Multi-Agent Systems: Technology and Applications. KES-AMSTA 2017. Smart Innovation, Systems and Technologies, vol 74. Springer, Cham. https://doi.org/10.1007/978-3-319-59394-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-59394-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59393-7
Online ISBN: 978-3-319-59394-4
eBook Packages: EngineeringEngineering (R0)