Evaluation of Pretrained Large Language Models in Embodied Planning Tasks

Sarkisyan, Christina; Korchemnyi, Alexandr; Kovalev, Alexey K.; Panov, Aleksandr I.

doi:10.1007/978-3-031-33469-6_23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13921))

Included in the following conference series:

International Conference on Artificial General Intelligence

1044 Accesses

Abstract

Modern pretrained large language models (LLMs) are increasingly being used in zero-shot or few-shot learning modes. Recent years have seen increased interest in applying such models to embodied artificial intelligence and robotics tasks. When given in a natural language, the agent needs to build a plan based on this prompt. The best solutions use LLMs through APIs or models that are not publicly available, making it difficult to reproduce the results. In this paper, we use publicly available LLMs to build a plan for an embodied agent and evaluate them in three modes of operation: 1) the subtask evaluation mode, 2) the full autoregressive plan generation, and 3) the step-by-step autoregressive plan generation. We used two prompt settings: prompt-containing examples of one given task and a mixed prompt with examples of different tasks. Through extensive experiments, we have shown that the subtask evaluation mode, in most cases, outperforms others with a task-specific prompt, whereas the step-by-step autoregressive plan generation posts better performance in the mixed prompt setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Application of Pretrained Large Language Models in Embodied Artificial Intelligence

Article Open access 01 December 2022

ProgPrompt: program generation for situated robot task planning using large language models

Article Open access 28 August 2023

Generative Artificial Intelligence: Opportunities and Challenges of Large Language Models

References

Ahn, M., Brohan, A., Brown, N., Chebotar, Y., et al.: Do as i can and not as i say: grounding language in robotic affordances (2022)
Google Scholar
Anderson, P., Fernando, B., Johnson, M., Gould, S.: SPICE: semantic propositional image caption evaluation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 382–398. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_24
Chapter Google Scholar
Black, S., Biderman, S., Hallahan, E., Anthony, Q., et al.: GPT-NeoX-20B: an open-source autoregressive language model (2022)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Google Scholar
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., et al.: PaLM: scaling language modeling with pathways (2022)
Google Scholar
Driess, D., Xia, F., Sajjadi, M.S.M., Lynch, C., et al.: PaLM-E: an embodied multimodal language model (2023)
Google Scholar
Gao, L., Biderman, S., Black, S., Golding, L., et al.: The pile: an 800GB dataset of diverse text for language modeling (2020)
Google Scholar
Gramopadhye, M., Szafir, D.: Generating executable action plans with environmentally-aware language models (2022)
Google Scholar
Huang, W., Abbeel, P., Pathak, D., Mordatch, I.: Language models as zero-shot planners: extracting actionable knowledge for embodied agents. In: ICML (2022)
Google Scholar
Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., et al.: AI2-THOR: an interactive 3D environment for visual AI (2017)
Google Scholar
Kovalev, A.K., Panov, A.I.: Application of pretrained large language models in embodied artificial intelligence. Doklady Math. 106(S1), S85–S90 (2022). https://doi.org/10.1134/S1064562422060138
Article Google Scholar
Lin, B.Y., Huang, C., Liu, Q., Gu, W., Sommerer, S., Ren, X.: On grounded planning for embodied tasks with language models (2022)
Google Scholar
Liu, Y., Ott, M., Goyal, N., Du, J., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)
Google Scholar
Logeswaran, L., Fu, Y., Lee, M., Lee, H.: Few-shot subgoal planning with language models (2022)
Google Scholar
Mackenzie, J., Benham, R., Petri, M., Trippas, J.R., et al.: CC-News-En: a large English news corpus. In: CIKM (2020)
Google Scholar
Min, S.Y., Chaplot, D.S., Ravikumar, P., Bisk, Y., Salakhutdinov, R.: FILM: following instructions in language with modular methods (2021)
Google Scholar
OpenAI: Introducing ChatGPT (2022). https://openai.com/blog/chatgpt
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019). https://arxiv.org/abs/1908.10084
Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., et al.: Byte pair encoding: a text compression scheme that accelerates pattern matching (1999)
Google Scholar
Shridhar, M., Thomason, J., Gordon, D., Bisk, Y., et al.: ALFRED: a benchmark for interpreting grounded instructions for everyday tasks. In: CVPR (2020)
Google Scholar
Singh, I., Blukis, V., Mousavian, A., Goyal, A., et al.: ProgPrompt: generating situated robot task plans using large language models (2022)
Google Scholar
Song, C.H., Wu, J., Washington, C., Sadler, B.M., et al.: LLM-planner: few-shot grounded planning for embodied agents with large language models (2022)
Google Scholar
Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image description evaluation. In: CVPR (2015)
Google Scholar
Vemprala, S., Bonatti, R., Bucker, A., Kapoor, A.: ChatGPT for robotics: design principles and model abilities. Tech. rep., Microsoft (2023)
Google Scholar
Wang, B., Komatsuzaki, A.: GPT-J-6B: a 6 billion parameter autoregressive language model (2021). https://github.com/kingoflolz/mesh-transformer-jax
Wei, J., et al.: Finetuned language models are zero-shot learners (2021)
Google Scholar
Wei, J., et al.: Chain of thought prompting elicits reasoning in large language models (2022)
Google Scholar
Zhang, S., Roller, S., Goyal, N., Artetxe, M., et al.: OPT: open pre-trained transformer language models (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Christina Sarkisyan & Alexandr Korchemnyi
AIRI, Moscow, Russia
Alexey K. Kovalev & Aleksandr I. Panov
Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow, Russia
Aleksandr I. Panov

Authors

Christina Sarkisyan
View author publications
You can also search for this author in PubMed Google Scholar
Alexandr Korchemnyi
View author publications
You can also search for this author in PubMed Google Scholar
Alexey K. Kovalev
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr I. Panov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexey K. Kovalev .

Editor information

Editors and Affiliations

Department of Psychology, Stockholm University, Stockholm, Sweden
Patrick Hammer
Örebro University, Örebro, Sweden
Marjan Alirezaie
University of Gothenburg, Gothenburg, Sweden
Claes Strannegård

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarkisyan, C., Korchemnyi, A., Kovalev, A.K., Panov, A.I. (2023). Evaluation of Pretrained Large Language Models in Embodied Planning Tasks. In: Hammer, P., Alirezaie, M., Strannegård, C. (eds) Artificial General Intelligence. AGI 2023. Lecture Notes in Computer Science(), vol 13921. Springer, Cham. https://doi.org/10.1007/978-3-031-33469-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-33469-6_23
Published: 24 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33468-9
Online ISBN: 978-3-031-33469-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of Pretrained Large Language Models in Embodied Planning Tasks

Abstract

Access this chapter

Similar content being viewed by others

Application of Pretrained Large Language Models in Embodied Artificial Intelligence

ProgPrompt: program generation for situated robot task planning using large language models

Generative Artificial Intelligence: Opportunities and Challenges of Large Language Models

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evaluation of Pretrained Large Language Models in Embodied Planning Tasks

Abstract

Access this chapter

Similar content being viewed by others

Application of Pretrained Large Language Models in Embodied Artificial Intelligence

ProgPrompt: program generation for situated robot task planning using large language models

Generative Artificial Intelligence: Opportunities and Challenges of Large Language Models

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation