Skip to main content
Log in

Learning what to memorize: Using intrinsic motivation to form useful memory in partially observable reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Reinforcement Learning faces an important challenge in partially observable environments with long-term dependencies. In order to learn in an ambiguous environment, an agent has to keep previous perceptions in a memory. Earlier memory-based approaches use a fixed method to determine what to keep in the memory, which limits them to certain problems. In this study, we follow the idea of giving the control of the memory to the agent by allowing it to take memory-changing actions. Thus, the agent becomes more adaptive to the dynamics of an environment. Further, we formalize an intrinsic motivation to support this learning mechanism, which guides the agent to memorize distinctive events and enable it to disambiguate its state in the environment. Our overall approach is tested and analyzed on several partial observable tasks with long-term dependencies. The experiments show a clear improvement in terms of learning performance compared to other memory based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Astrom KJ (1965) Optimal control of markov processes with incomplete state information. J Math Anal Appl 10(1):174–205

    Article  MathSciNet  MATH  Google Scholar 

  2. Bakker B (2001) Reinforcement learning with LSTM in non-Markovian tasks with longterm dependencies. Memory 1–18

  3. Barto AG (2013) Intrinsic motivation and reinforcement learning. In: Intrinsically motivated learning in natural and artificial systems. Springer, pp 17–47

  4. Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying Count-Based exploration and intrinsic motivation. NIPS

  5. Bȯhmer W., Springenberg JT, Boedecker J, Riedmiller MA, Obermayer K (2015) Autonomous learning of state representations for control: an emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. Künstliche Intell 29(4):353–362. https://doi.org/10.1007/s13218-015-0356-1

    Article  Google Scholar 

  6. Chrisman L (1992) Reinforcement learning with perceptual aliasing: the perceptual distinctions approach. In: AAAI’92. AAAI Press, pp 183–188

  7. Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2018) Deep reinforcement learning that matters. In: AAAI 2018, pp 3207–3214

  8. Hill A, Raffin A, Ernestus M, Gleave A, Kanervisto A, Traore R, Dhariwal P, Hesse C, Klimov O, Nichol A, Plappert M, Radford A, Schulman J, Sidor S, Wu Y (2018) Stable baselines https://github.com/hill-a/stable-baselines

  9. Icarte RT, Valenzano RA, Klassen TQ, Christoffersen P, Farahmand A, McIlraith SA (2020) The act of remembering: a study in partially observable reinforcement learning. arXiv:2010.01753

  10. Icarte RT, Waldie E, Klassen TQ, Valenzano RA, Castro MP, McIlraith SA (2019) Learning reward machines for partially observable reinforcement learning. In: NeurIPS 2019, pp 15497–15508

  11. James MR, Singh S (2004) Learning and discovery of predictive state representations in dynamical systems with reset. In: Proceedings of the twenty-first international conference on Machine learning, p 53

  12. Kulkarni TD (2016) Deep reinforcement learning with temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, vol 48, pp 3675–3683

  13. Lanzi PL (2000) Adaptive agents with reinforcement learning and interal memory. From Animals to Animats 6

  14. Le TP, Vien NA, Chung T (2018) A deep hierarchical reinforcement learning algorithm in partially observable markov decision processes. IEEE Access 6:49089–49102

    Article  Google Scholar 

  15. Li R, Cai Z, Huang T, Zhu W (2021) Anchor: The achieved goal to replace the subgoal for hierarchical reinforcement learning. Knowl-Based Syst 225:107128. https://doi.org/10.1016/j.knosys.2021.107128

    Article  Google Scholar 

  16. Lin LJ, Mitchell TM (1992) Memory approaches to reinforcement learning in non-Markovian domains. Citeseer

  17. Littman ML (1994) Memoryless policies: theoretical limitations and practical results. In: From animals to animats 3, pp 238–245

  18. Littman ML, Sutton RS (2002) Predictive representations of state. In: Advances in neural information processing systems, pp 1555–1561

  19. Loch J, Singh SP (1998) Using eligibility traces to find the best memoryless policy in partially observable markov decision processes. In: ICML’98. Morgan Kaufmann Publishers Inc, pp 323–331

  20. McCallum A (1996) Reinforcement Learning with Selective Perception and Hidden State. Ph.D. thesis, University of Rochester NY

  21. Meuleau N, Kim KE, Kaelbling LP, Cassandra AR (1999) Solving POMDPs by Searching the Space of Finite Policies. UAI’99 pp 417–426

  22. Meuleau N, Peshkin L, Kim K, Kaelbling LP (1999) Learning finite-state controllers for partially observable environments. In: Laskey KB , Prade H (eds) UAI ’99. Morgan Kaufmann, pp 427–436

  23. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: ICML 2016. PMLR, pp 1928–1937

  24. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv:1312.5602

  25. Ostrovski G, Bellemare MG, Van Den Oord A, Munos R (2017) Count-based exploration with neural density models. In: ICML 2017, vol 6, pp 4161–4175. JMLR.org

  26. Oudeyer PY, Kaplan F (2009) What is intrinsic motivation? a typology of computational approaches. Front Neurorobot 3(NOV):6. https://doi.org/10.3389/neuro.12.006.2007

    Google Scholar 

  27. Peshkin L, Meuleau N, Kaelbling LP (1999) Learning policies with external memory. In: Bratko I, Dzeroski S (eds) ICML 1999. Morgan Kaufmann, pp 307–314

  28. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  29. Singh S, Lewis RL, Barto AG, Sorg J (2010) Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans Auto Mental Dev 2(2):70–82

    Article  Google Scholar 

  30. Singh SP, Littman ML, Jong NK, Pardoe D, Stone P (2003) Learning predictive state representations. In: ICML-03, pp 712–719

  31. Steckelmacher D, Roijers DM, Harutyunyan A, Vrancx P, Plisnier H, Nowé A (2018) Reinforcement learning in POMDPs with memoryless options and option-observation initiation sets. In: AAAI 2018, pp 4099–4106

  32. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press

  33. Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay. arXiv:1611.01224

  34. Whitehead SD, Ballard DH (1991) Learning to perceive and act by trial and error. Mach Learn 7(1):45–83

    Article  Google Scholar 

  35. Zheng L, Cho SY (2011) A modified memory-based reinforcement learning method for solving POMDP problems. Neural Process Lett 33(2):187–200

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Scientific and Technological Research Council of Turkey under Grant No. 120E427. Authors would also like to thank Hüseyin Aydın, Erkin Çilden and Faruk Polat for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alper Demir.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Demir, A. Learning what to memorize: Using intrinsic motivation to form useful memory in partially observable reinforcement learning. Appl Intell 53, 19074–19092 (2023). https://doi.org/10.1007/s10489-022-04328-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04328-z

Keywords

Navigation