Operating a Robot by Nonverbal Voice Expressed with Acoustic Features

  • Shizuka TakahashiEmail author
  • Ikuo Mizuuchi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 867)


This paper proposes methods for operating a robot by nonverbal voice. These methods enable operators to operate multi-degrees of freedom simultaneously and operate a robot intuitively by nonverbal voice by associating the nonverbal voice, tongue position and the coordinate of the robot’s hand. The voice is defined by formants or Mel Frequency Cepstral Coefficients (MFCC). Formants and MFCC are acoustic features and they show the characteristics of the vocal tract such as the mouth and the tongue. We propose two methods. One is the method in which voice expressed with overlapped formants ranges are used to change variable about robots’ operation. This method enables operators to operate multi-degrees of freedom simultaneously by nonverbal voice. The other is the method that operators tongue positions are distinguished by nonverbal voice. These tongue positions correspond to the coordinate of the robot’s hand and it enables the operators to operate a robot intuitively. We found the feasibility of the methods through experiments of simple tasks. These methods can realize operating a robot intuitively in continuous values by voice and can be utilized for user-friendly system.


  1. 1.
    Grondin, F., Michaud, F.: Robust speech/non-speech discrimination based on pitch estimation for mobile robots. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1650–1655, May 2016Google Scholar
  2. 2.
    Harada, S., Saponas, T.S., Landay, J.A.: Voicepen: augmenting pen input with simultaneous non-linguistic vocalization. In: the 9th Conference on Multimodal Interfaces, pp. 178–185 (2007)Google Scholar
  3. 3.
    Harada, S., Wobbrock, J.O., Landay, J.A.: Voicedraw: a hands-free voice-driven drawing application for people with motor impairments. In: The 9th Conference on Computers and Accessibility, pp. 27–34 (2007)Google Scholar
  4. 4.
    House, B., Malkin, J., Bilmes, J.: The voicebot: a voice controlled robot arm. In: The Conference on Human Factors in Computing Systems, pp. 183–192 (2009)Google Scholar
  5. 5.
    Igarashi, T., Hughes, J.F.: Voice as sound: using non-verbal voice input for interactive control. In: The 14th Symposium on User Interface Software and Technology, pp. 155–156 (2001)Google Scholar
  6. 6.
    Malkin, J., Li, X., Harada, S., Landay, J., Bilmes, J.: The vocal joystick engine v1.0. computational speech. Language 25(3), 535–555 (2011)Google Scholar
  7. 7.
    Mihara, Y., Shibayama, E., Takahashi, S.: The migratory cursor: accurate speech-based cursor movement by moving multiple ghost cursors using non-verbal vocalizations. In: The 7th Conference on Computers and Accessibility. Assets 2005, pp. 76–83 (2005)Google Scholar
  8. 8.
    Mizuuchi, I., Fujimoto, J., Sodeyama, Y., Yamamoto, K., Okada, K., Inaba, M.: A kitchen assistant manipulation system of a variety of dishes based on shape estimation with tracing dish surfaces by sensing proximity and touch information. J. Robot. Soc. Jpn. 30(9), 889–898 (2012)CrossRefGoogle Scholar
  9. 9.
    Nakadai, K., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino, H.: An open source software system for robot audition hark and its evaluation. In: Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots, pp. 561–566, December 2008Google Scholar
  10. 10.
    Sakamoto, D., Komatsu, T., Igarashi, T.: Voice augmented manipulation: using paralinguistic information to manipulate mobile devices. In: The 15th Conference on Human-Computer Interaction with Mobile Devices and Services, pp. 69–78 (2013)Google Scholar
  11. 11.
    Sears, A., Young, M.: Physical disabilities and computing technologies: An Analysis of Impairments. In: The Human-Computer Interaction Handbook, pp. 482–503 (2003)Google Scholar
  12. 12.
    Sporka, A.J., Slavík, P.: Vocal control of a radio-controlled car. In: SIGACCESS, pp. 3–8 (2008)Google Scholar
  13. 13.
    Yoshida, T., Nakadai, K., Okuno, H.G.: Automatic speech recognition improved by two-layered audio-visual integration for robot audition. In: 2009 9th IEEE-RAS International Conference on Humanoid Robots, pp. 604–609, December 2009Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Tokyo University of Agriculture and TechnologyTokyoJapan

Personalised recommendations