Skip to main content


Log in

Learning object, grasping and manipulation activities using hierarchical HMMs

  • Published:
Autonomous Robots Aims and scope Submit manuscript


This article presents a probabilistic algorithm for representing and learning complex manipulation activities performed by humans in everyday life. The work builds on the multi-level Hierarchical Hidden Markov Model (HHMM) framework which allows decomposition of longer-term complex manipulation activities into layers of abstraction whereby the building blocks can be represented by simpler action modules called action primitives. This way, human task knowledge can be synthesised in a compact, effective representation suitable, for instance, to be subsequently transferred to a robot for imitation. The main contribution is the use of a robust framework capable of dealing with the uncertainty or incomplete data inherent to these activities, and the ability to represent behaviours at multiple levels of abstraction for enhanced task generalisation. Activity data from 3D video sequencing of human manipulation of different objects handled in everyday life is used for evaluation. A comparison with a mixed generative-discriminative hybrid model HHMM/SVM (support vector machine) is also presented to add rigour in highlighting the benefit of the proposed approach against comparable state of the art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others


  • Abou-Moustafa, K. T., Cheriet, M., & Suen, C. Y. (2004). Classification of time-series data using a generative/discriminative hybrid. In Proceedings of the ninth international workshop on frontiers in handwriting recognition (pp. 51–56).

  • Aksoy, E. E., Abramov, A., Dörr, J., Ning, K., Dellen, B., & Wörgötter, F. (2011). Learning the semantics of object-action relations by observation. International Journal Robotics Research, 30(10), 1229–1249.

    Article  Google Scholar 

  • Bishop, C. M., & Lasserre, J. (2007). Generative or discriminative? Getting the best of both worlds. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, & M. West (Eds.), Bayesian statistics (Vol. 8, pp. 3–24). Oxford University Press.

  • Blimes, J. (1998). A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical Report ICSI-TR-97-021, University of Berkeley.

  • Cappé, O., & Moulines, E. (2009). On-line expectationmaximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 593–613.

    MATH  MathSciNet  Google Scholar 

  • Castellani, A., Botturi, D., Bicego, M., & Fiorini, P. (2004). Hybrid hmm/svm model for the analysis and segmentation of teleoperation tasks. In IEEE international conference on robotics and automation, 2004 (Vol. 3, pp. 2918–2923).

  • Dillmann, R., Rogalla, O., Ehrenmann, M., Zöllner, R., & Bordegoni, M. (1999). Learning robot behaviour and skills based on human demonstration and advice: The machine learning paradigm. In International symposium on robotics research (pp. 229–238).

  • Dindo, H., & Schillaci, G. (2010). An adaptive probabilistic approach to goal-level imitation learning. In IEEE/RSJ international conference on intelligent robots and systems (pp. 4452–4457).

  • Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden markov model: Analysis and applications. Machine Learning, 32, 41–62.

    Article  MATH  Google Scholar 

  • Heinze, C. (2003). Modeling intention recognition for intelligent agent systems. PhD thesis, The University of Melbourne.

  • Iba, S., Predis, C. J. J., & Khosla, P. K. (2005). Interactive multi-model robot programming. International Journal of Robotics Research, 24(1), 83–104.

    Article  Google Scholar 

  • Ijspeert, A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. IEEE International Conference on Robotics and Automation, 2, 1398–1403.

    Google Scholar 

  • Jenkins, O. C., & Mataric, M. J. (2004). Performance-derived behavior vocabularies: Data driven acqusition of skills from motion. International Journal of Humanoid Robotics, 1(2), 237–288.

    Article  Google Scholar 

  • Jensen, F. V. (1996). An introduciton to Bayesian networks. UCL Press.

  • Kawanaka, D., Okatani, T., & Deguchi, K. (2005). Hierarchical-hmm based recognition of human activity. In Proceedings of machine vision applications.

  • Khansari-Zadeh, S., & Billard, A. (2010). Imitation learning of globally stable non-linear point-to-point robot motions using nonlinear programming. In IEEE/RSJ international conference on intelligent robots and systems (pp. 2676–2683).

  • Kragic, D., Marayong, P., Li, M., Okamura, A. M., & Hager, G. D. (2005). Human-machine collaborative systems for microsurgical applications. International Journal Robotics Research, 24(9), 731–741.

    Article  Google Scholar 

  • Krüger, V., Herzog, D., Baby, S., Ude, A., & Kragic, D. (2010). Learning actions from observations. IEEE Robotics & Automatation Magazine, 17(2), 30–43.

    Article  Google Scholar 

  • Kulic, D., Kragic, D., & Krüger, V. (2011). Learning action primitives. In T. B. Moeslund, A. Hilton, V. Krüger, & L. Sigal (Eds.), Visual analysis of humans (pp. 333–353). London: Springer.

  • Le, Q., & Bengio, S. (2002). Hybrid generative-discriminative models for speech and speaker recognition. Idiap-RR Idiap-RR-06-2002, IDIAP.

  • Liao, L. (2006). Location-based activity recognition. PhD thesis, University of Washington.

  • Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning. PhD thesis, University of Califronia, Berkeley.

  • Nemec, B., & Ude, A. (2012). Action sequencing using dynamic movement primitives. Robotica, 30(5), 837–846.

    Article  Google Scholar 

  • Newtson, D., Engquist, G. A., & Bois, J. (1977). The objective basis of behaviour units. Journal of Personality and Social Psychology, 35(12), 847–862.

    Article  Google Scholar 

  • Nguyen, N., Phung, D., Venkatesh, S., & Bui, H. (2005). Learning and detecting activities from movement trajectories using the hierarchical hidden markov model. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, 955–960.

    Google Scholar 

  • Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011a). Efficient model-based 3d tracking of hand articulations using kinect. In Proceedings of the British machine vision conference (pp. 101.1–101.11). BMVA Press.

  • Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011b). Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In IEEE international conference on computer vision (pp. 2088–2095).

  • Papazov, C., & Burschka, D. (2011). An efficient ransac for 3d object recognition in noisy and occluded scenes. Computer Vision, pp. 135–148.

  • Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In IEEE/RSJ International Conference on Robotics and Automation (pp. 1293–1298).

  • Patel, M., Miró, J. V., & Dissanayake, G. (2012). A hierarchical hidden markov model to support activities of daily living with an assistive robotic walker. In 4th IEEE RAS EMBS international conference on biomedical robotics and biomechatronics (pp. 1071–1076).

  • Raina, R., Shen, Y., Ng, A. Y., & McCallum, A. (2004). Classification with hybrid generative/discriminative models. In S. Thrun, L. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems. Cambridge, MA: MIT Press.

    Google Scholar 

  • Rizzolatti, G., Foggassi, L., & Gallese, V. (2001). Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Reviews Neuroscience, 2, 661–670.

    Article  Google Scholar 

  • Schaal, S., Ijspeert, A. J., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical transaction of the Royal Society of London, Series B, 358(1431), 537–547.

    Article  Google Scholar 

  • Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. (2004). Learning movement primitives. In International symposium on robotics research. Springer.

  • Song, D., Ek, C. H., Huebner, K., & Kragic, D. (2011a). Embodiment-specific representation of robot grasping using graphical models and latent-space discretization. In IEEE/RSJ international conference on intelligent robots and systems (pp. 980–986).

  • Song, D., Ek, C. H., Huebner, K., & Kragic, D. (2011b). Multivariate discretization for bayesian network structure learning in robot grasping. In IEEE/RSJ international conference on robotics and automation (pp. 1944–1950).

  • Stadermann, J., & Rigoll, G. (2004). A hybrid svm/hmm acoustic modeling approach to automatic speech recognition. ISCA: In INTERSPEECH.

  • Valstar, M. F. & Pantic, M. (2007). Combined support vector machines and hidden markov models for modeling facial action temporal dynamics. In IEEE international conference on Human-computer interaction, HCI’07 (pp. 118–127). Berlin: Springer.

Download references


The authors would like to acknowledge Nikolaos Kyriazis and Antonis Argyros from Institute of Computer Science, FORTH and Department of Computer Science, University of Crete, Crete, Greece for their contribution towards acquiring the data sets used in this paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mitesh Patel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patel, M., Miro, J.V., Kragic, D. et al. Learning object, grasping and manipulation activities using hierarchical HMMs. Auton Robot 37, 317–331 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: