Autonomous Robots

, Volume 37, Issue 3, pp 317–331 | Cite as

Learning object, grasping and manipulation activities using hierarchical HMMs

  • Mitesh PatelEmail author
  • Jaime Valls Miro
  • Danica Kragic
  • Carl Henrik Ek
  • Gamini Dissanayake


This article presents a probabilistic algorithm for representing and learning complex manipulation activities performed by humans in everyday life. The work builds on the multi-level Hierarchical Hidden Markov Model (HHMM) framework which allows decomposition of longer-term complex manipulation activities into layers of abstraction whereby the building blocks can be represented by simpler action modules called action primitives. This way, human task knowledge can be synthesised in a compact, effective representation suitable, for instance, to be subsequently transferred to a robot for imitation. The main contribution is the use of a robust framework capable of dealing with the uncertainty or incomplete data inherent to these activities, and the ability to represent behaviours at multiple levels of abstraction for enhanced task generalisation. Activity data from 3D video sequencing of human manipulation of different objects handled in everyday life is used for evaluation. A comparison with a mixed generative-discriminative hybrid model HHMM/SVM (support vector machine) is also presented to add rigour in highlighting the benefit of the proposed approach against comparable state of the art techniques.


Hierarchical Hidden Markov Model (HHMM) Action primitives Grasping and manipulation Human daily activities 



The authors would like to acknowledge Nikolaos Kyriazis and Antonis Argyros from Institute of Computer Science, FORTH and Department of Computer Science, University of Crete, Crete, Greece for their contribution towards acquiring the data sets used in this paper.


  1. Abou-Moustafa, K. T., Cheriet, M., & Suen, C. Y. (2004). Classification of time-series data using a generative/discriminative hybrid. In Proceedings of the ninth international workshop on frontiers in handwriting recognition (pp. 51–56).Google Scholar
  2. Aksoy, E. E., Abramov, A., Dörr, J., Ning, K., Dellen, B., & Wörgötter, F. (2011). Learning the semantics of object-action relations by observation. International Journal Robotics Research, 30(10), 1229–1249.CrossRefGoogle Scholar
  3. Bishop, C. M., & Lasserre, J. (2007). Generative or discriminative? Getting the best of both worlds. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, & M. West (Eds.), Bayesian statistics (Vol. 8, pp. 3–24). Oxford University Press.Google Scholar
  4. Blimes, J. (1998). A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical Report ICSI-TR-97-021, University of Berkeley.Google Scholar
  5. Cappé, O., & Moulines, E. (2009). On-line expectationmaximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 593–613.zbMATHMathSciNetGoogle Scholar
  6. Castellani, A., Botturi, D., Bicego, M., & Fiorini, P. (2004). Hybrid hmm/svm model for the analysis and segmentation of teleoperation tasks. In IEEE international conference on robotics and automation, 2004 (Vol. 3, pp. 2918–2923).Google Scholar
  7. Dillmann, R., Rogalla, O., Ehrenmann, M., Zöllner, R., & Bordegoni, M. (1999). Learning robot behaviour and skills based on human demonstration and advice: The machine learning paradigm. In International symposium on robotics research (pp. 229–238).Google Scholar
  8. Dindo, H., & Schillaci, G. (2010). An adaptive probabilistic approach to goal-level imitation learning. In IEEE/RSJ international conference on intelligent robots and systems (pp. 4452–4457).Google Scholar
  9. Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden markov model: Analysis and applications. Machine Learning, 32, 41–62.CrossRefzbMATHGoogle Scholar
  10. Heinze, C. (2003). Modeling intention recognition for intelligent agent systems. PhD thesis, The University of Melbourne.Google Scholar
  11. Iba, S., Predis, C. J. J., & Khosla, P. K. (2005). Interactive multi-model robot programming. International Journal of Robotics Research, 24(1), 83–104.CrossRefGoogle Scholar
  12. Ijspeert, A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. IEEE International Conference on Robotics and Automation, 2, 1398–1403.Google Scholar
  13. Jenkins, O. C., & Mataric, M. J. (2004). Performance-derived behavior vocabularies: Data driven acqusition of skills from motion. International Journal of Humanoid Robotics, 1(2), 237–288.CrossRefGoogle Scholar
  14. Jensen, F. V. (1996). An introduciton to Bayesian networks. UCL Press.Google Scholar
  15. Kawanaka, D., Okatani, T., & Deguchi, K. (2005). Hierarchical-hmm based recognition of human activity. In Proceedings of machine vision applications.Google Scholar
  16. Khansari-Zadeh, S., & Billard, A. (2010). Imitation learning of globally stable non-linear point-to-point robot motions using nonlinear programming. In IEEE/RSJ international conference on intelligent robots and systems (pp. 2676–2683).Google Scholar
  17. Kragic, D., Marayong, P., Li, M., Okamura, A. M., & Hager, G. D. (2005). Human-machine collaborative systems for microsurgical applications. International Journal Robotics Research, 24(9), 731–741.CrossRefGoogle Scholar
  18. Krüger, V., Herzog, D., Baby, S., Ude, A., & Kragic, D. (2010). Learning actions from observations. IEEE Robotics & Automatation Magazine, 17(2), 30–43.CrossRefGoogle Scholar
  19. Kulic, D., Kragic, D., & Krüger, V. (2011). Learning action primitives. In T. B. Moeslund, A. Hilton, V. Krüger, & L. Sigal (Eds.), Visual analysis of humans (pp. 333–353). London: Springer.Google Scholar
  20. Le, Q., & Bengio, S. (2002). Hybrid generative-discriminative models for speech and speaker recognition. Idiap-RR Idiap-RR-06-2002, IDIAP.Google Scholar
  21. Liao, L. (2006). Location-based activity recognition. PhD thesis, University of Washington.Google Scholar
  22. Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning. PhD thesis, University of Califronia, Berkeley.Google Scholar
  23. Nemec, B., & Ude, A. (2012). Action sequencing using dynamic movement primitives. Robotica, 30(5), 837–846.CrossRefGoogle Scholar
  24. Newtson, D., Engquist, G. A., & Bois, J. (1977). The objective basis of behaviour units. Journal of Personality and Social Psychology, 35(12), 847–862.CrossRefGoogle Scholar
  25. Nguyen, N., Phung, D., Venkatesh, S., & Bui, H. (2005). Learning and detecting activities from movement trajectories using the hierarchical hidden markov model. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, 955–960.Google Scholar
  26. Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011a). Efficient model-based 3d tracking of hand articulations using kinect. In Proceedings of the British machine vision conference (pp. 101.1–101.11). BMVA Press.Google Scholar
  27. Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011b). Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In IEEE international conference on computer vision (pp. 2088–2095).Google Scholar
  28. Papazov, C., & Burschka, D. (2011). An efficient ransac for 3d object recognition in noisy and occluded scenes. Computer Vision, pp. 135–148.Google Scholar
  29. Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In IEEE/RSJ International Conference on Robotics and Automation (pp. 1293–1298).Google Scholar
  30. Patel, M., Miró, J. V., & Dissanayake, G. (2012). A hierarchical hidden markov model to support activities of daily living with an assistive robotic walker. In 4th IEEE RAS EMBS international conference on biomedical robotics and biomechatronics (pp. 1071–1076).Google Scholar
  31. Raina, R., Shen, Y., Ng, A. Y., & McCallum, A. (2004). Classification with hybrid generative/discriminative models. In S. Thrun, L. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems. Cambridge, MA: MIT Press.Google Scholar
  32. Rizzolatti, G., Foggassi, L., & Gallese, V. (2001). Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Reviews Neuroscience, 2, 661–670.CrossRefGoogle Scholar
  33. Schaal, S., Ijspeert, A. J., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical transaction of the Royal Society of London, Series B, 358(1431), 537–547.CrossRefGoogle Scholar
  34. Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. (2004). Learning movement primitives. In International symposium on robotics research. Springer.Google Scholar
  35. Song, D., Ek, C. H., Huebner, K., & Kragic, D. (2011a). Embodiment-specific representation of robot grasping using graphical models and latent-space discretization. In IEEE/RSJ international conference on intelligent robots and systems (pp. 980–986).Google Scholar
  36. Song, D., Ek, C. H., Huebner, K., & Kragic, D. (2011b). Multivariate discretization for bayesian network structure learning in robot grasping. In IEEE/RSJ international conference on robotics and automation (pp. 1944–1950).Google Scholar
  37. Stadermann, J., & Rigoll, G. (2004). A hybrid svm/hmm acoustic modeling approach to automatic speech recognition. ISCA: In INTERSPEECH.Google Scholar
  38. Valstar, M. F. & Pantic, M. (2007). Combined support vector machines and hidden markov models for modeling facial action temporal dynamics. In IEEE international conference on Human-computer interaction, HCI’07 (pp. 118–127). Berlin: Springer.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Mitesh Patel
    • 1
    Email author
  • Jaime Valls Miro
    • 1
  • Danica Kragic
    • 2
  • Carl Henrik Ek
    • 2
  • Gamini Dissanayake
    • 1
  1. 1.Faculty of Engineering and ITUniversity of Technology Sydney (UTS)UltimoAustralia
  2. 2.Members of the Computer Vision and Active Perception Laboratory, Centre for Autonomous Systems, School of Computer Science and CommunicationThe Royal Institute of Technology (KTH)StockholmSweden

Personalised recommendations