Intrinsically Motivated High-Level Planning for Agent Exploration

Sartor, Gabriele; Oddi, Angelo; Rasconi, Riccardo; Santucci, Vieri Giuliano

doi:10.1007/978-3-031-47546-7_9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14318))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

494 Accesses

Abstract

This paper proposes a new open-ended learning framework which aims at implementing an autonomous agent using intrinsic motivations (IM) at two different levels.

At the first level, the IM paradigm is exploited by the agent to learn new operational skills, described in terms of sub-symbolic options. After discovering the options, the agent iteratively: (1) executes them to explore the world, collecting the necessary data and (2) automatically abstracts the collected data into a high-level representation of the domain, expressed in PPDDL language.

At the second level, the IM paradigm is used to exploit the abstracted representation of the domain by identifying particular symbolic states deemed promising according to a specific criterium, which in the present work is the farthest distance covered by the agent (i.e., the most promising states are those that rest at the frontier of the visited space). Once these states are identified, they can be successively reached through an internally generated high-level plan and used as promising starting points for discovering new knowledge.

The presented framework is tested in the so-called Treasure Game domain described in the recent literature. The tests we have performed show that the proposed idea of implementing intrinsic motivations at two different levels of abstraction facilitates the discovery of new knowledge, compared to a previous approach proposed in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Option Discovery for Autonomous Generation of Symbolic Knowledge

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Reducing the Planning Horizon Through Reinforcement Learning

Notes

1.
The mask is the list of state variables changed by a specific option [16].

References

Baldassarre, G., Mirolli, M.: Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-32375-1
Book Google Scholar
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Disc. Event Dyn. Syst. 13(1), 41–77 (2003)
Article MathSciNet MATH Google Scholar
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. Adv. Neural Inf. Process. Syst. 29 (2016)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48 (2009)
Google Scholar
Blaes, S., Vlastelica Pogančić, M., Zhu, J., Martius, G.: Control what you can: intrinsically motivated task-planning agent. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Bonet, B., Geffner, H.: MGPT: a probabilistic planner based on heuristic search. J. Artif. Int. Res. 24(1), 933–944 (2005)
MATH Google Scholar
Campari, T., Lamanna, L., Traverso, P., Serafini, L., Ballan, L.: Online learning of reusable abstract models for object goal navigation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14850–14859 (2022). https://doi.org/10.1109/CVPR52688.2022.01445
Colas, C., Fournier, P., Chetouani, M., Sigaud, O., Oudeyer, P.Y.: Curious: intrinsically motivated modular multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 1331–1340. PMLR (2019)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Article MATH Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)
Google Scholar
Forestier, S., Portelas, R., Mollard, Y., Oudeyer, P.Y.: Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190 (2017)
Frank, M., Leitner, J., Stollenga, M., Förster, A., Schmidhuber, J.: Curiosity driven reinforcement learning for motion planning on humanoids. Front. Neurorobot. 7, 25 (2014)
Article Google Scholar
Ghallab, M., et al.: PDDL–the planning domain definition language (1998). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.212
Jong, N.K., Hester, T., Stone, P.: The utility of temporal abstraction in reinforcement learning. In: AAMAS, no. 1, pp. 299–306. Citeseer (2008)
Google Scholar
Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. Adv. Neural Inf. Process. Syst., 1015–1023 (2009)
Google Scholar
Konidaris, G., Kaelbling, L.P., Lozano-Perez, T.: From skills to symbols: learning symbolic representations for abstract high-level planning. J. Artif. Intell. Res. 61, 215–289 (2018). http://lis.csail.mit.edu/pubs/konidaris-jair18.pdf
Lamanna, L., et al.: Planning for learning object properties. In: Williams, B., Chen, Y., Neville, J. (eds.) Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, 7–14 February 2023, pp. 12005–12013. AAAI Press (2023). http://ojs.aaai.org/index.php/AAAI/article/view/26416
Lamanna, L., Serafini, L., Saetti, A., Gerevini, A., Traverso, P.: Online grounding of symbolic planning domains in unknown environments. In: Kern-Isberner, G., Lakemeyer, G., Meyer, T. (eds.) Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning, KR 2022, Haifa, Israel, 31 July–5 August 2022 (2022). http://proceedings.kr.org/2022/53/
Machado, M.C., Bellemare, M.G., Bowling, M.: A laplacian framework for option discovery in reinforcement learning. arXiv preprint arXiv:1703.00956 (2017)
Mann, T.A., Mannor, S., Precup, D.: Approximate value iteration with temporally extended actions. J. Artif. Intell. Res. 53, 375–438 (2015)
Article MathSciNet MATH Google Scholar
Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco (2004)
MATH Google Scholar
Niel, R., Wiering, M.A.: Hierarchical reinforcement learning for playing a dynamic dungeon crawler game. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1159–1166. IEEE (2018)
Google Scholar
Oddi, A., et al.: Integrating open-ended learning in the sense-plan-act robot control paradigm. In: ECAI 2020, the 24th European Conference on Artificial Intelligence (2020)
Google Scholar
Oudeyer, P.Y., Kaplan, F., Hafner, V.: Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11(2), 265–286 (2007)
Article Google Scholar
Parisi, S., Dean, V., Pathak, D., Gupta, A.: Interesting object, curious agent: learning task-agnostic exploration. Adv. Neural. Inf. Process. Syst. 34, 20516–20530 (2021)
Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(null), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Romero, A., Baldassarre, G., Duro, R.J., Santucci, V.G.: Analysing autonomous open-ended learning of skills with different interdependent subgoals in robots. In: 2021 20th International Conference on Advanced Robotics (ICAR), pp. 646–651. IEEE (2021)
Google Scholar
Romero, A., Baldassarre, G., Duro, R.J., Santucci, V.G.: Autonomous learning of multiple curricula with non-stationary interdependencies. In: 2022 IEEE International Conference on Development and Learning (ICDL), pp. 272–279. IEEE (2022)
Google Scholar
Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27(3), 832–837 (1956)
Article MathSciNet MATH Google Scholar
Sanner, S.: Relational dynamic influence diagram language (rddl): language description (2010). http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf
Santucci, V.G., Baldassarre, G., Mirolli, M.: Grail: a goal-discovering robotic architecture for intrinsically-motivated learning. IEEE Trans. Cogn. Dev. Syst. 8(3), 214–231 (2016)
Article Google Scholar
Santucci, V.G., Oudeyer, P.Y., Barto, A., Baldassarre, G.: Intrinsically motivated open-ended learning in autonomous robots. Front. Neurorobot. 13, 115 (2020)
Article Google Scholar
Sartor, G., Zollo, D., Mayer, M.C., Oddi, A., Rasconi, R., Santucci, V.G.: Autonomous generation of symbolic knowledge via option discovery. In: Proceedings of the 9th Italian workshop on Planning and Scheduling (IPS 2021), vol. 3065. CEUR Workshop Proceedings. CEUR-WS.org (2021)
Google Scholar
Seepanomwan, K., Santucci, V.G., Baldassarre, G.: Intrinsically motivated discovered outcomes boost user’s goals achievement in a humanoid robot. In: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 178–183 (2017)
Google Scholar
Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS 2004, pp. 1281–1288. MIT Press, Cambridge (2004)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
MATH Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPS and semi-MDPS: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)
Article MathSciNet MATH Google Scholar
Younes, H., Littman, M.: PPDDL1.0: An Extension to PDDL for Expressiong Planning Domains with Probabilistic Effects. Technical report, Carnegie Mellon University, CMU-CS-04-167 (2004)
Google Scholar

Download references

Acknowledgements

This work has been supported by the European Union’s Horizon 2020, research and innovation programme under GA 101070381 (‘PILLAR-Robots - Purposeful Intrinsically motivated Lifelong Learning Autonomous Robots’) and PNRR MUR project PE0000013-FAIR.

Author information

Authors and Affiliations

University of Turin, Turin, Italy
Gabriele Sartor
Institute of Cognitive Sciences and Technologies (ISTC-CNR), Rome, Italy
Angelo Oddi, Riccardo Rasconi & Vieri Giuliano Santucci

Authors

Gabriele Sartor
View author publications
You can also search for this author in PubMed Google Scholar
Angelo Oddi
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Rasconi
View author publications
You can also search for this author in PubMed Google Scholar
Vieri Giuliano Santucci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriele Sartor .

Editor information

Editors and Affiliations

University of Rome Tor Vergata, Rome, Italy
Roberto Basili
Sapienza University of Rome, Rome, Italy
Domenico Lembo
Roma Tre University, Rome, Italy
Carla Limongelli
National Research Council, Rome, Italy
Andrea Orlandini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sartor, G., Oddi, A., Rasconi, R., Santucci, V.G. (2023). Intrinsically Motivated High-Level Planning for Agent Exploration. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-47546-7_9
Published: 02 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47545-0
Online ISBN: 978-3-031-47546-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Intrinsically Motivated High-Level Planning for Agent Exploration

Abstract

Access this chapter

Similar content being viewed by others

Option Discovery for Autonomous Generation of Symbolic Knowledge

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Reducing the Planning Horizon Through Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Intrinsically Motivated High-Level Planning for Agent Exploration

Abstract

Access this chapter

Similar content being viewed by others

Option Discovery for Autonomous Generation of Symbolic Knowledge

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Reducing the Planning Horizon Through Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation