Learning what to memorize: Using intrinsic motivation to form useful memory in partially observable reinforcement learning

Demir, Alper

doi:10.1007/s10489-022-04328-z

Learning what to memorize: Using intrinsic motivation to form useful memory in partially observable reinforcement learning

Published: 20 February 2023

Volume 53, pages 19074–19092, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Alper Demir ORCID: orcid.org/0000-0003-2646-4850¹

359 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

Reinforcement Learning faces an important challenge in partially observable environments with long-term dependencies. In order to learn in an ambiguous environment, an agent has to keep previous perceptions in a memory. Earlier memory-based approaches use a fixed method to determine what to keep in the memory, which limits them to certain problems. In this study, we follow the idea of giving the control of the memory to the agent by allowing it to take memory-changing actions. Thus, the agent becomes more adaptive to the dynamics of an environment. Further, we formalize an intrinsic motivation to support this learning mechanism, which guides the agent to memorize distinctive events and enable it to disambiguate its state in the environment. Our overall approach is tested and analyzed on several partial observable tasks with long-term dependencies. The experiments show a clear improvement in terms of learning performance compared to other memory based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human-in-the-loop machine learning: a state of the art

Article Open access 17 August 2022

Explainable artificial intelligence: a comprehensive review

Article 18 November 2021

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

References

Astrom KJ (1965) Optimal control of markov processes with incomplete state information. J Math Anal Appl 10(1):174–205
Article MathSciNet MATH Google Scholar
Bakker B (2001) Reinforcement learning with LSTM in non-Markovian tasks with longterm dependencies. Memory 1–18
Barto AG (2013) Intrinsic motivation and reinforcement learning. In: Intrinsically motivated learning in natural and artificial systems. Springer, pp 17–47
Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying Count-Based exploration and intrinsic motivation. NIPS
Bȯhmer W., Springenberg JT, Boedecker J, Riedmiller MA, Obermayer K (2015) Autonomous learning of state representations for control: an emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. Künstliche Intell 29(4):353–362. https://doi.org/10.1007/s13218-015-0356-1
Article Google Scholar
Chrisman L (1992) Reinforcement learning with perceptual aliasing: the perceptual distinctions approach. In: AAAI’92. AAAI Press, pp 183–188
Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2018) Deep reinforcement learning that matters. In: AAAI 2018, pp 3207–3214
Hill A, Raffin A, Ernestus M, Gleave A, Kanervisto A, Traore R, Dhariwal P, Hesse C, Klimov O, Nichol A, Plappert M, Radford A, Schulman J, Sidor S, Wu Y (2018) Stable baselines https://github.com/hill-a/stable-baselines
Icarte RT, Valenzano RA, Klassen TQ, Christoffersen P, Farahmand A, McIlraith SA (2020) The act of remembering: a study in partially observable reinforcement learning. arXiv:2010.01753
Icarte RT, Waldie E, Klassen TQ, Valenzano RA, Castro MP, McIlraith SA (2019) Learning reward machines for partially observable reinforcement learning. In: NeurIPS 2019, pp 15497–15508
James MR, Singh S (2004) Learning and discovery of predictive state representations in dynamical systems with reset. In: Proceedings of the twenty-first international conference on Machine learning, p 53
Kulkarni TD (2016) Deep reinforcement learning with temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, vol 48, pp 3675–3683
Lanzi PL (2000) Adaptive agents with reinforcement learning and interal memory. From Animals to Animats 6
Le TP, Vien NA, Chung T (2018) A deep hierarchical reinforcement learning algorithm in partially observable markov decision processes. IEEE Access 6:49089–49102
Article Google Scholar
Li R, Cai Z, Huang T, Zhu W (2021) Anchor: The achieved goal to replace the subgoal for hierarchical reinforcement learning. Knowl-Based Syst 225:107128. https://doi.org/10.1016/j.knosys.2021.107128
Article Google Scholar
Lin LJ, Mitchell TM (1992) Memory approaches to reinforcement learning in non-Markovian domains. Citeseer
Littman ML (1994) Memoryless policies: theoretical limitations and practical results. In: From animals to animats 3, pp 238–245
Littman ML, Sutton RS (2002) Predictive representations of state. In: Advances in neural information processing systems, pp 1555–1561
Loch J, Singh SP (1998) Using eligibility traces to find the best memoryless policy in partially observable markov decision processes. In: ICML’98. Morgan Kaufmann Publishers Inc, pp 323–331
McCallum A (1996) Reinforcement Learning with Selective Perception and Hidden State. Ph.D. thesis, University of Rochester NY
Meuleau N, Kim KE, Kaelbling LP, Cassandra AR (1999) Solving POMDPs by Searching the Space of Finite Policies. UAI’99 pp 417–426
Meuleau N, Peshkin L, Kim K, Kaelbling LP (1999) Learning finite-state controllers for partially observable environments. In: Laskey KB , Prade H (eds) UAI ’99. Morgan Kaufmann, pp 427–436
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: ICML 2016. PMLR, pp 1928–1937
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv:1312.5602
Ostrovski G, Bellemare MG, Van Den Oord A, Munos R (2017) Count-based exploration with neural density models. In: ICML 2017, vol 6, pp 4161–4175. JMLR.org
Oudeyer PY, Kaplan F (2009) What is intrinsic motivation? a typology of computational approaches. Front Neurorobot 3(NOV):6. https://doi.org/10.3389/neuro.12.006.2007
Google Scholar
Peshkin L, Meuleau N, Kaelbling LP (1999) Learning policies with external memory. In: Bratko I, Dzeroski S (eds) ICML 1999. Morgan Kaufmann, pp 307–314
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Singh S, Lewis RL, Barto AG, Sorg J (2010) Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans Auto Mental Dev 2(2):70–82
Article Google Scholar
Singh SP, Littman ML, Jong NK, Pardoe D, Stone P (2003) Learning predictive state representations. In: ICML-03, pp 712–719
Steckelmacher D, Roijers DM, Harutyunyan A, Vrancx P, Plisnier H, Nowé A (2018) Reinforcement learning in POMDPs with memoryless options and option-observation initiation sets. In: AAAI 2018, pp 4099–4106
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay. arXiv:1611.01224
Whitehead SD, Ballard DH (1991) Learning to perceive and act by trial and error. Mach Learn 7(1):45–83
Article Google Scholar
Zheng L, Cho SY (2011) A modified memory-based reinforcement learning method for solving POMDP problems. Neural Process Lett 33(2):187–200
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Scientific and Technological Research Council of Turkey under Grant No. 120E427. Authors would also like to thank Hüseyin Aydın, Erkin Çilden and Faruk Polat for their support.

Author information

Authors and Affiliations

İzmir University of Economics, Department of Computer Engineering, 35330, İzmir, Turkey
Alper Demir

Authors

Alper Demir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alper Demir.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Demir, A. Learning what to memorize: Using intrinsic motivation to form useful memory in partially observable reinforcement learning. Appl Intell 53, 19074–19092 (2023). https://doi.org/10.1007/s10489-022-04328-z

Download citation

Accepted: 05 November 2022
Published: 20 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10489-022-04328-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning what to memorize: Using intrinsic motivation to form useful memory in partially observable reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Human-in-the-loop machine learning: a state of the art

Explainable artificial intelligence: a comprehensive review

Multi-agent deep reinforcement learning: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning what to memorize: Using intrinsic motivation to form useful memory in partially observable reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Human-in-the-loop machine learning: a state of the art

Explainable artificial intelligence: a comprehensive review

Multi-agent deep reinforcement learning: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation