Learning Unknown Groundings for Natural Language Interaction with Mobile Robots

  • Mycal TuckerEmail author
  • Derya Aksaray
  • Rohan Paul
  • Gregory J. Stein
  • Nicholas Roy
Conference paper
Part of the Springer Proceedings in Advanced Robotics book series (SPAR, volume 10)


Our goal is to enable robots to understand or “ground” natural language instructions in the context of their perceived workspace. Contemporary models learn a probabilistic correspondence between input phrases and semantic concepts (or groundings) such as objects, regions or goals for robot motion derived from the robot’s world model. Crucially, these models assume a fixed and a priori known set of object types as well as phrases and train probable correspondences offline using static language-workspace corpora. Hence, model inference fails when an input command contains unknown phrases or references to novel object types that were not seen during the training. We introduce a probabilistic model that incorporates a notion of unknown groundings and learns a correspondence between an unknown phrase and an unknown object that cannot be classified into known visual categories. Further, we extend the model to “hypothesize” known or unknown object groundings in case the language utterance references an object that exists beyond the robot’s partial view of its workspace. When the grounding for an instruction is unknown or hypothetical, the robot performs exploratory actions to gather new observations and find the referenced objects beyond the current view. Once an unknown grounding is associated with percepts of a new object, the model is adapted and trained online using accrued visual-linguistic observations to reflect the new knowledge gained for interpreting future utterances. We evaluate the model quantitatively using a corpus from a user study and report experiments on a mobile platform in a workspace populated with objects from a standardized dataset. A video of the experimental demonstration is available at:


  1. 1.
    Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)Google Scholar
  2. 2.
    Deits, R., Tellex, S., Thaker, P., Simeonov, D., Kollar, T., Roy, N.: Clarifying commands with information-theoretic human-robot dialog. J. Hum.-Robot. Interact. 2(2), 58–79 (2013)CrossRefGoogle Scholar
  3. 3.
    Duvallet, F., Walter, M., Howard, T., Hemachandra, S., Oh, J.H., Teller, S., Roy, N., Stentz, A.T.: Inferring maps and behaviors from natural language instructions. In: International Symposium on Experimental Robotics (2014)Google Scholar
  4. 4.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9,1871–1874 (2008)Google Scholar
  5. 5.
    Fong, T.W., Thorpe, C., Baur, C.: Robot, asker of questions. Robot. Auton. Syst. (2003)Google Scholar
  6. 6.
    Howard, T., Tellex, S., Roy, N.: A natural language planner interface for mobile manipulators. In: International Conference on Robotics and Automation (2014)Google Scholar
  7. 7.
    Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Nyga, D., Beetz, M.: Reasoning about unmodelled concepts-incorporating class taxonomies in probabilistic relational models. arXiv:1504.05411 (2015)
  9. 9.
    Paul, R., Arkin, J., Roy, N., Howard, T.: Efficient grounding of abstract spatial concepts for natural language interaction with robot manipulators. In: Proceedings of Robotics Science and Systems (RSS), Ann Arbor, Michigan, USA (2016)Google Scholar
  10. 10.
    Ros, R., Lemaignan, S., Sisbot, E.A., Alami, R., Steinwender, J., Hamann, K., Warneken, F.: Which one? grounding the referent based on efficient human-robot interaction. In: 19th International Symposium in Robot and Human Interactive Communication, pp. 570–575 (2010)Google Scholar
  11. 11.
    Roy, N., Pineau, J., Thrun, S.: Spoken dialogue management using probabilistic reasoning. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong (2000)Google Scholar
  12. 12.
    Tellex, S., Kollar, T., Dickerson, S., Walter, M., Banerjee, A., Teller, S., Roy, N.: Understanding natural language commands for robotic navigation and mobile manipulation. In: National Conference on Artificial Intelligence (2011)Google Scholar
  13. 13.
    Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1469–1472. ACM (2010)Google Scholar
  14. 14.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: European Conference on Computer Vision, pp. 391–405. Springer (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mycal Tucker
    • 1
    Email author
  • Derya Aksaray
    • 1
  • Rohan Paul
    • 1
  • Gregory J. Stein
    • 1
  • Nicholas Roy
    • 1
  1. 1.Massachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations