, Volume 25, Issue 2, pp 155–168 | Cite as

Incremental learning of gestures for human–robot interaction

  • Shogo OkadaEmail author
  • Yoichi Kobayashi
  • Satoshi Ishibashi
  • Toyoaki Nishida
Original Article


For a robot to cohabit with people, it should be able to learn people’s nonverbal social behavior from experience. In this paper, we propose a novel machine learning method for recognizing gestures used in interaction and communication. Our method enables robots to learn gestures incrementally during human–robot interaction in an unsupervised manner. It allows the user to leave the number and types of gestures undefined prior to the learning. The proposed method (HB-SOINN) is based on a self-organizing incremental neural network and the hidden Markov model. We have added an interactive learning mechanism to HB-SOINN to prevent a single cluster from running into a failure as a result of polysemy of being assigned more than one meaning. For example, a sentence: “Keep on going left slowly” has three meanings such as, “Keep on (1)”, “going left (2)”, “slowly (3)”. We experimentally tested the clustering performance of the proposed method against data obtained from measuring gestures using a motion capture device. The results show that the classification performance of HB-SOINN exceeds that of conventional clustering approaches. In addition, we have found that the interactive learning function improves the learning performance of HB-SOINN.


Social robot Human–robot interaction Gesture recognition Unsupervised learning Clustering Incremental learning 



Funding for this study was primarily provided by Kyoto University’s Global COE program. The authors express their appreciation.


  1. Argyle M (1988) Bodily Communication. Methuen & Co. Ltd, New YorkGoogle Scholar
  2. Bagnall AJ, Janacek GJ (2004) Clustering time series from ARMA models with clipped data. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 49–58Google Scholar
  3. Bellman RE (1961) Adaptive control processes. Princeton University Press, Princeton, New Jersey, USAGoogle Scholar
  4. Corduas M, Piccolo D (2008) Time series clustering and classification by the autoregressive metric. Comput Stat Data Anal 52(4):1860–1872zbMATHCrossRefMathSciNetGoogle Scholar
  5. Ernst J, Nau GJ, Bar-Joseph Z (2005) Clustering short time series gene expression data. Bioinformatics 21(suppl 1):59–68Google Scholar
  6. Forsyth D, Arikan O, Ikemoto L, O’Brien J, Ramanan D (2005) Computational studies of human motion: part 1, tracking and motion synthesis. Found Trends Comput Graph Vis 1(2–3):77–254CrossRefGoogle Scholar
  7. Fritzke B (1994) Growing cell structures–a self-organizing network for unsupervised and supervised learning. Neural Netw 7(9):1441–1460CrossRefGoogle Scholar
  8. Hamker F (2001) Life-long learning cell structures—continuously learning without catastrophic interference. Neural Netw 14(4):551–572CrossRefGoogle Scholar
  9. Hasanuzzaman M, Ampornaramveth V, Zhang T, Bhuiyan MA, Shirai Y, Ueno H (2004) Real-time vision-based gesture recognition for human robot interaction. IEEE international conference on robotics and biomimetics 2004 (Robio 2004), pp 413–418Google Scholar
  10. Hasanuzzaman M, Ampornaramveth V, Zhang T, Bhuiyan MA, Shirai Y, Ueno H (2007) Adaptive visual gesture recognition for human-robot interaction using a knowledge-based software platform. Rob Auton Syst 55(8):643–657CrossRefGoogle Scholar
  11. Iwahashi N (2006) Robots that learn language–developmental approach to human–machine conversations. In: Proceedings of international workshop on emergence and evolution of linguistic communication, pp 142–179Google Scholar
  12. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69zbMATHCrossRefMathSciNetGoogle Scholar
  13. Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeGoogle Scholar
  14. Nishida T (2007) Social intelligence design and human computing. In: Huang TS et al (eds) Human computing, LNAI 4451, pp 190–214, Springer-Verlag New York, IncGoogle Scholar
  15. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRefGoogle Scholar
  16. Shen F, Hasegawa O (2006) An incremental network for on-line unsupervised classification and topology learning. Neural Netw 19(1):90–106zbMATHCrossRefGoogle Scholar
  17. Störring M, Moeslund TB, Liu Y, Granum E (2004) Computer vision-based gesture recognition for an augmented reality interface. In: 4th IASTED international conference on visualization, imaging, and image processing, pp 766–771Google Scholar
  18. Wang T, Shum H, Xu Y, Zheng N (2001) Unsupervised analysis of Human gestures. In: IEEE pacific rim conference on multimedia, pp 174–181Google Scholar
  19. Warren T (2005) Clustering of time series data–a survey. Pattern Recogn 38(11):1857–1874zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  • Shogo Okada
    • 1
    Email author
  • Yoichi Kobayashi
    • 2
  • Satoshi Ishibashi
    • 3
  • Toyoaki Nishida
    • 1
  1. 1.Department of Intelligence Science and Technology, Graduate School of InformaticsKyoto UniversityKyotoJapan
  2. 2.Department of Creative Informatics, Graduate of Information Science and TechnologyThe University of TokyoTokyoJapan
  3. 3.Department of Intelligence Science and Technology, Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations