Disambiguation in Unknown Object Detection by Integrating Image and Speech Recognition Confidences

  • Yuko Ozasa
  • Yasuo Ariki
  • Mikio Nakano
  • Naoto Iwahashi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7724)


This paper presents a new method to detect unknown objects and their unknown names in object manipulation through man-robot dialog. In the method, the detection is carried out by using the information of object images and user’s speech in an integrated way. Originality of the method is to use logistic regression for the discrimination between unknown and known objects. The accuracy of the unknown object detection was 97% in the case when there were about fifty known objects.


Object Recognition Speech Recognition Image Recognition Object Manipulation Speech Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Araki, T., et al.: Autonomous Acquisition of Multimodal Information for Online Object Concept Formation by Robots. In: IEEE International Conference on Intelligent Robots and Systems (2011)Google Scholar
  2. 2.
    Holzapfel, H., et al.: A Dialogue Approach to Learning Object Descriptions and Semantic Categories. Robotics and Autonomous Systems 56(11), 1004–1013 (2008)CrossRefGoogle Scholar
  3. 3.
    Nakano, M., et al.: Grounding New Words on The Physical World in Multi-Domain Human-Robot Dialogues. In: Dialog with Robots: Papers from the AAAI Fall Symposium (2010)Google Scholar
  4. 4.
    Steels, L., Kaplan, F.: AIBO’s first words: The social learning of language and meaning. Evolution of Communication 4(1), 3–32 (2002)CrossRefGoogle Scholar
  5. 5.
    Skocaj, D., et al.: A basic cognitive system for interactive continuous learning of visual concepts. In: ICRA 2010 Workshop (2010)Google Scholar
  6. 6.
    Zuo, X., et al.: Detecting Robot-Directed Speech by Situated Understanding in Physical Interaction. Journal of Artificial Intelligence 25(25), 670–682 (2010)Google Scholar
  7. 7.
  8. 8.
    Jiang, H.: Confidence Measures for Speech Recognition: A survey. Speech Communication 45, 455–470 (2005)CrossRefGoogle Scholar
  9. 9.
    Persoon, E., Fu, K.S.: Shape Discrimination Using Fourier Descriptors. IEEE Trans. Accoust. Speech Signal Processing 28(4), 170–179 (1977)MathSciNetGoogle Scholar
  10. 10.
    Kurita, T.: Interactive Weighted Least Squares Algorithms for Neural Networks Classifiers. In: Proc. Workshop on Algorithmic Learning Theory, pp. 77–86 (1992)Google Scholar
  11. 11.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer Science+Business Media, LLC, New York (2006)zbMATHGoogle Scholar
  12. 12.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yuko Ozasa
    • 1
  • Yasuo Ariki
    • 1
  • Mikio Nakano
    • 2
  • Naoto Iwahashi
    • 3
  1. 1.Graduate School of System InformaticsKobe UniversityKobeJapan
  2. 2.Honda Research Institute Japan Co., Ltd.Wako-shiJapan
  3. 3.Keihanna Research LaboratoriesNational Institute of Information and Communications TechnologySoraku-gunJapan

Personalised recommendations