Skip to main content

Intrinsically Motivated High-Level Planning for Agent Exploration

  • Conference paper
  • First Online:
AIxIA 2023 – Advances in Artificial Intelligence (AIxIA 2023)


This paper proposes a new open-ended learning framework which aims at implementing an autonomous agent using intrinsic motivations (IM) at two different levels.

At the first level, the IM paradigm is exploited by the agent to learn new operational skills, described in terms of sub-symbolic options. After discovering the options, the agent iteratively: (1) executes them to explore the world, collecting the necessary data and (2) automatically abstracts the collected data into a high-level representation of the domain, expressed in PPDDL language.

At the second level, the IM paradigm is used to exploit the abstracted representation of the domain by identifying particular symbolic states deemed promising according to a specific criterium, which in the present work is the farthest distance covered by the agent (i.e., the most promising states are those that rest at the frontier of the visited space). Once these states are identified, they can be successively reached through an internally generated high-level plan and used as promising starting points for discovering new knowledge.

The presented framework is tested in the so-called Treasure Game domain described in the recent literature. The tests we have performed show that the proposed idea of implementing intrinsic motivations at two different levels of abstraction facilitates the discovery of new knowledge, compared to a previous approach proposed in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    The mask is the list of state variables changed by a specific option [16].


  1. Baldassarre, G., Mirolli, M.: Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Heidelberg (2013).

    Book  Google Scholar 

  2. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Disc. Event Dyn. Syst. 13(1), 41–77 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. Adv. Neural Inf. Process. Syst. 29 (2016)

    Google Scholar 

  4. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48 (2009)

    Google Scholar 

  5. Blaes, S., Vlastelica Pogančić, M., Zhu, J., Martius, G.: Control what you can: intrinsically motivated task-planning agent. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  6. Bonet, B., Geffner, H.: MGPT: a probabilistic planner based on heuristic search. J. Artif. Int. Res. 24(1), 933–944 (2005)

    MATH  Google Scholar 

  7. Campari, T., Lamanna, L., Traverso, P., Serafini, L., Ballan, L.: Online learning of reusable abstract models for object goal navigation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14850–14859 (2022).

  8. Colas, C., Fournier, P., Chetouani, M., Sigaud, O., Oudeyer, P.Y.: Curious: intrinsically motivated modular multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 1331–1340. PMLR (2019)

    Google Scholar 

  9. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Article  MATH  Google Scholar 

  10. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  11. Forestier, S., Portelas, R., Mollard, Y., Oudeyer, P.Y.: Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190 (2017)

  12. Frank, M., Leitner, J., Stollenga, M., Förster, A., Schmidhuber, J.: Curiosity driven reinforcement learning for motion planning on humanoids. Front. Neurorobot. 7, 25 (2014)

    Article  Google Scholar 

  13. Ghallab, M., et al.: PDDL–the planning domain definition language (1998).

  14. Jong, N.K., Hester, T., Stone, P.: The utility of temporal abstraction in reinforcement learning. In: AAMAS, no. 1, pp. 299–306. Citeseer (2008)

    Google Scholar 

  15. Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. Adv. Neural Inf. Process. Syst., 1015–1023 (2009)

    Google Scholar 

  16. Konidaris, G., Kaelbling, L.P., Lozano-Perez, T.: From skills to symbols: learning symbolic representations for abstract high-level planning. J. Artif. Intell. Res. 61, 215–289 (2018).

  17. Lamanna, L., et al.: Planning for learning object properties. In: Williams, B., Chen, Y., Neville, J. (eds.) Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, 7–14 February 2023, pp. 12005–12013. AAAI Press (2023).

  18. Lamanna, L., Serafini, L., Saetti, A., Gerevini, A., Traverso, P.: Online grounding of symbolic planning domains in unknown environments. In: Kern-Isberner, G., Lakemeyer, G., Meyer, T. (eds.) Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning, KR 2022, Haifa, Israel, 31 July–5 August 2022 (2022).

  19. Machado, M.C., Bellemare, M.G., Bowling, M.: A laplacian framework for option discovery in reinforcement learning. arXiv preprint arXiv:1703.00956 (2017)

  20. Mann, T.A., Mannor, S., Precup, D.: Approximate value iteration with temporally extended actions. J. Artif. Intell. Res. 53, 375–438 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  21. Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco (2004)

    MATH  Google Scholar 

  22. Niel, R., Wiering, M.A.: Hierarchical reinforcement learning for playing a dynamic dungeon crawler game. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1159–1166. IEEE (2018)

    Google Scholar 

  23. Oddi, A., et al.: Integrating open-ended learning in the sense-plan-act robot control paradigm. In: ECAI 2020, the 24th European Conference on Artificial Intelligence (2020)

    Google Scholar 

  24. Oudeyer, P.Y., Kaplan, F., Hafner, V.: Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11(2), 265–286 (2007)

    Article  Google Scholar 

  25. Parisi, S., Dean, V., Pathak, D., Gupta, A.: Interesting object, curious agent: learning task-agnostic exploration. Adv. Neural. Inf. Process. Syst. 34, 20516–20530 (2021)

    Google Scholar 

  26. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  27. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(null), 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  28. Romero, A., Baldassarre, G., Duro, R.J., Santucci, V.G.: Analysing autonomous open-ended learning of skills with different interdependent subgoals in robots. In: 2021 20th International Conference on Advanced Robotics (ICAR), pp. 646–651. IEEE (2021)

    Google Scholar 

  29. Romero, A., Baldassarre, G., Duro, R.J., Santucci, V.G.: Autonomous learning of multiple curricula with non-stationary interdependencies. In: 2022 IEEE International Conference on Development and Learning (ICDL), pp. 272–279. IEEE (2022)

    Google Scholar 

  30. Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27(3), 832–837 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  31. Sanner, S.: Relational dynamic influence diagram language (rddl): language description (2010).

  32. Santucci, V.G., Baldassarre, G., Mirolli, M.: Grail: a goal-discovering robotic architecture for intrinsically-motivated learning. IEEE Trans. Cogn. Dev. Syst. 8(3), 214–231 (2016)

    Article  Google Scholar 

  33. Santucci, V.G., Oudeyer, P.Y., Barto, A., Baldassarre, G.: Intrinsically motivated open-ended learning in autonomous robots. Front. Neurorobot. 13, 115 (2020)

    Article  Google Scholar 

  34. Sartor, G., Zollo, D., Mayer, M.C., Oddi, A., Rasconi, R., Santucci, V.G.: Autonomous generation of symbolic knowledge via option discovery. In: Proceedings of the 9th Italian workshop on Planning and Scheduling (IPS 2021), vol. 3065. CEUR Workshop Proceedings. (2021)

    Google Scholar 

  35. Seepanomwan, K., Santucci, V.G., Baldassarre, G.: Intrinsically motivated discovered outcomes boost user’s goals achievement in a humanoid robot. In: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 178–183 (2017)

    Google Scholar 

  36. Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS 2004, pp. 1281–1288. MIT Press, Cambridge (2004)

    Google Scholar 

  37. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)

    MATH  Google Scholar 

  38. Sutton, R.S., Precup, D., Singh, S.: Between MDPS and semi-MDPS: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  39. Younes, H., Littman, M.: PPDDL1.0: An Extension to PDDL for Expressiong Planning Domains with Probabilistic Effects. Technical report, Carnegie Mellon University, CMU-CS-04-167 (2004)

    Google Scholar 

Download references


This work has been supported by the European Union’s Horizon 2020, research and innovation programme under GA 101070381 (‘PILLAR-Robots - Purposeful Intrinsically motivated Lifelong Learning Autonomous Robots’) and PNRR MUR project PE0000013-FAIR.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Gabriele Sartor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sartor, G., Oddi, A., Rasconi, R., Santucci, V.G. (2023). Intrinsically Motivated High-Level Planning for Agent Exploration. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47545-0

  • Online ISBN: 978-3-031-47546-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics