Journal of Intelligent & Robotic Systems

, Volume 66, Issue 1–2, pp 187–204 | Cite as

Learning Novel Objects for Extended Mobile Manipulation

  • Tomoaki Nakamura
  • Komei Sugiura
  • Takayuki Nagai
  • Naoto Iwahashi
  • Tomoki Toda
  • Hiroyuki Okada
  • Takashi Omori


We propose a method for learning novel objects from audio visual input. The proposed method is based on two techniques: out-of-vocabulary (OOV) word segmentation and foreground object detection in complex environments. A voice conversion technique is also involved in the proposed method so that the robot can pronounce the acquired OOV word intelligibly. We also implemented a robotic system that carries out interactive mobile manipulation tasks, which we call “extended mobile manipulation”, using the proposed method. In order to evaluate the robot as a whole, we conducted a task “Supermarket” adopted from the RoboCup@Home league as a standard task for real-world applications. The results reveal that our integrated system works well in real-world applications.


Mobile manipulation Object learning Object recognition Out-of-vocabulary RoboCup@Home 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Inamura, T., Okada, K., Tokutsu, S., Hatao, N., Inaba, M., Inoue, H.: HRP-2W: a humanoid platform for research on support behavior in daily life environments. Robot. Auton. Syst. 57(2), 145–154 (2009)CrossRefGoogle Scholar
  2. 2.
    Wyrobek, K., Berger, E., Van der Loos, H., Salisbury, J.: Towards a personal robotics development platform: rationale and design of an intrinsically safe personal robot. IEEE Int. Conf. Robot. Autom. 2165–2170 (2008)Google Scholar
  3. 3.
    Weisshardt, F., Reiser, U., Parlitz, C., Verl, A.: Making high-tech service robot platforms available. In: Proceedings-ISR/ROBOTIK 2010 (2010)Google Scholar
  4. 4.
    Stückler, J., Behnke, S.: Integrating indoor mobility, object manipulation, intuitive interaction for domestic service tasks. In: IEEE-RAS International Conference on Humanoid Robots (2009)Google Scholar
  5. 5.
    Holz, D., Paulus, J., Breuer, T., Giorgana, G., Reckhaus, M., Hegger, F., Müller, C., Jin, Z., Hartanto, R., Ploeger, P., et al.: The b-it-bots RoboCup@ home 2009 team description paper. RoboCup 2009@ Home League Team Descriptions, Graz, Austria (2009)Google Scholar
  6. 6.
    RoboCup@Home: (2010)Google Scholar
  7. 7.
    2010 Mobile Manipulation Challenge: (2010)
  8. 8.
    Semantic Robot Vision Challenge: (2009)
  9. 9.
    Bazzi, I., Glass, J.: A multi-class approach for modelling out-of-vocabulary words. In: Seventh International Conference on Spoken Language Processing (2002)Google Scholar
  10. 10.
    Nakano, M., Iwahashi, N., Nagai, T., Sumii, T., Zuo, X., Taguchi, R., Nose, T., Mizutani, A., Nakamura, T., Attamim, M., et al.: Grounding new words on the physical world in multi-domain human-robot dialogues. In: 2010 AAAI Fall Symposium Series, pp. 74–79 (2010)Google Scholar
  11. 11.
    Holzapfel, H., Neubig, D., Waibel, A.: A dialogue approach to learning object descriptions and semantic categories. Robot. Auton. Syst. 56(11):1004–1013 (2008)CrossRefGoogle Scholar
  12. 12.
    Toda, T., Ohtani, Y., Shikano, K.: One-to-many and many-to-one voice conversion based on eigenvoices. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 1249–1252 (2007)Google Scholar
  13. 13.
    Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23(3), 309–314 (2004)CrossRefGoogle Scholar
  14. 14.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2002)Google Scholar
  15. 15.
    Mishra, A.K., Aloimonos, Y.: Active segmentation. Int. J. Human. Rob. 6, 361–386 (2009)CrossRefGoogle Scholar
  16. 16.
    Hasler, S., Wersing, H., Kirstein, S., Körner, E.: Large-scale real-time object identification based on analytic features. In: Artificial Neural Networks–ICANN 2009, pp. 663–672 (2009)Google Scholar
  17. 17.
    Kim, H., Murphy-Chutorian, E., Triesch, J.: Semi-autonomous learning of objects. In: Computer Vision and Pattern Recognition Workshop, p. 145 (2006)Google Scholar
  18. 18.
    Wersing, H., Kirstein, S., Gotting, M., Brandl, H., Dunn, M., Mikhailova, I., Goerick, C., Steil, J., Ritter, H., Korner, E.: Online learning of objects in a biologically motivated visual architecture. Int. J. Neural Syst. 17(4), 219–230 (2007)CrossRefGoogle Scholar
  19. 19.
    Iwahashi, N.: Robots that learn language: developmental approach to human-machine conversations. In: Symbol Grounding and Beyond, pp. 143–167 (2006)Google Scholar
  20. 20.
    Roy, D.: Grounding words in perception and action: computational insights. Trends Cogn. Sci. 9(8), 389–396 (2005)CrossRefGoogle Scholar
  21. 21.
    Fujita, M., Hasegawa, R., Takagi, T., Yokono, J., Shimomura, H.: An autonomous robot that eats information via interaction with humans and environments. In: IEEE International Workshop on Robot and Human Interactive Communication, pp. 383–389 (2002)Google Scholar
  22. 22.
    Johnson-Roberson, M., Skantze, G., Bohg, J., Gustafson, J., Carlson, R., Kragic, D.: Enhanced visual scene understanding through human-robot dialog. In: 2010 AAAI Fall Symposium on Dialog with Robots (2010)Google Scholar
  23. 23.
  24. 24.
    Okada, K., Kagami, S., Inaba, M., Inoue, H.: Plane segment finder: algorithm, implementation and applications. IEEE Int. Conf. Robot. Autom. 2, 2120–2125 (2005)Google Scholar
  25. 25.
    Nakamura, S., Markov, K., Nakaiwa, H., Kikui, G., Kawai, H., Jitsuhiro, T., Zhang, J., Yamamoto, H., Sumita, E., Yamamoto, S.: The ATR multilingual speech-to-speech translation system. IEEE Trans. Audio, Speech, Lang. Process. 14(2), 365–376 (2006)CrossRefGoogle Scholar
  26. 26.
    Fujimoto, M., Nakamura, S.: Sequential non-stationary noise tracking using particle filtering with switching dynamical system. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2006)Google Scholar
  27. 27.
    Kawai, H., Toda, T., Ni, J., Tsuzaki, M., Tokuda, K.: XIMERA: a new TTS from ATR based on corpus-based technologies. In: Fifth ISCA Workshop on Speech Synthesis, pp. 179–184 (2004)Google Scholar
  28. 28.
    Okada, H., Omori, T., Iwahashi, N., Sugiura, K., Nagai, T., Watanabe, N., Mizutani, A., Nakamura, T., Attamimi, M.: Team eR@sers 2009 in the @home league team description paper (2009)Google Scholar
  29. 29.
    Nene, S.A., Nayar, S.K., Murase, H.: Columbia Object Image Library (COIL-100). Technical report (1996)Google Scholar
  30. 30.
    International Telecommunication Union: ITU-T P.800.
  31. 31.
    Attamimi, M., Mizutani, A., Nakamura, T., Nagai, T., Funakoshi, K., Nakano, M.: Real-time 3D visual sensor for robust object recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4560–4565 (2010)Google Scholar
  32. 32.
    RoboCup@Home league committee: RoboCup@ Home rules & regulations. (2009)

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Tomoaki Nakamura
    • 1
  • Komei Sugiura
    • 2
  • Takayuki Nagai
    • 1
  • Naoto Iwahashi
    • 2
  • Tomoki Toda
    • 3
  • Hiroyuki Okada
    • 4
  • Takashi Omori
    • 4
  1. 1.The University of Electro-CommunicationsTokyoJapan
  2. 2.National Institute of Information and Communications TechnologyKyotoJapan
  3. 3.Nara Institute of Science and TechnologyNaraJapan
  4. 4.Tamagawa UniversityTokyoJapan

Personalised recommendations