Hierarchies for Embodied Action Perception

  • Dimitri OgnibeneEmail author
  • Yan Wu
  • Kyuhwa Lee
  • Yiannis Demiris


During social interactions, humans are capable of initiating and responding to rich and complex social actions despite having incomplete world knowledge, and physical, perceptual and computational constraints. This capability relies on action perception mechanisms that exploit regularities in observed goal-oriented behaviours to generate robust predictions and reduce the workload of sensing systems. To achieve this essential capability, we argue that the following three factors are fundamental. First, human knowledge is frequently hierarchically structured, both in the perceptual and execution domains. Second, human perception is an active process driven by current task requirements and context; this is particularly important when the perceptual input is complex (e.g. human motion) and the agent has to operate under embodiment constraints. Third, learning is at the heart of action perception mechanisms, underlying the agent’s ability to add new behaviours to its repertoire. Based on these factors, we review multiple instantiations of a hierarchically-organised biologically-inspired framework for embodied action perception, demonstrating its flexibility in addressing the rich computational contexts of action perception and learning in robotic platforms.


Forward Model Inverse Model Parse Tree Minimum Description Length Grip Aperture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research has received funding from the European Union Seventh Framework Programme FP7/2007-2013, under grant agreement no. [270490]- [EFAA].


  1. Aloimonos, J., Weiss, I., Bandyopadhyay, A. (1988). Active vision. International Journal of Computer Vision, 1(4), 333–356.CrossRefGoogle Scholar
  2. Bajcsy, R. (1988). Active perception. Proceedings of the IEEE, 76(8), 966–1005.CrossRefGoogle Scholar
  3. Ballard, D. (1991). Animate vision. Artificial Intelligence, 48, 57–86.CrossRefGoogle Scholar
  4. Bar, M. (2007). The proactive brain: using analogies and associations to generate predictions. Trends in Cognitive Science, 11(7), 280–289.CrossRefGoogle Scholar
  5. Bar, M., & Biederman, I. (1999). Localizing the cortical region mediating visual awareness of object identity. Proceedings of the National Academy of Sciences USA, 96(4), 1790–1793.CrossRefGoogle Scholar
  6. Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Schmidt, A. M., Dale, A. M., Hamalainen, M. S., Marinkovic, K., Schacter, D. L., Rosen, B. R., Halgren, E. (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences USA, 103(2), 449–454.CrossRefGoogle Scholar
  7. Bishop, C. M., & Lasserre, J. (2007). Generative or discriminative? getting the best of both worlds. Bayesian Statistics, 8, 3–24.MathSciNetGoogle Scholar
  8. Buschman, T. J., & Miller, E. K. (2007). Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science, 315(5820), 1860–1862.CrossRefGoogle Scholar
  9. Calvo-Merino, B., Glaser, D., Grèzes, J., Passingham, R., Haggard, P. (2005). Action observation and acquired motor skills: an fmri study with expert dancers. Cerebral Cortex, 15(8), 1243–1249.CrossRefGoogle Scholar
  10. Cuijpers, R. H., van Schie, H. T., Koppen, M., Erlhagen, W., Bekkering, H. (2006). Goals and means in action observation: a computational approach. Neural Networks, 19(3), 311–322.CrossRefzbMATHGoogle Scholar
  11. Dawkins, R., Bateson, P. P. G., & Hinde, R. A. (1976). Growing points in ethology (pp. 7–54). London: Cambridge University Press.Google Scholar
  12. Dearden, A. M., & Demiris, Y. (2005). Learning forward models for robots. In IJCAI-05, Proceedings of the nineteenth international joint conference on artificial intelligence, Edinburgh, Scotland, UK, July 30–August 5, 2005 (pp. 1440–1445).Google Scholar
  13. Demiris, Y. (2007). Prediction of intent in robotics and multi-agent systems. Cognitive Processing, 8(3), 151–158.CrossRefGoogle Scholar
  14. Demiris, Y., & Hayes, G. M. (2002). Imitation as a dual-route process featuring predictive and learning components: a biologically-plausible computational model. In Imitation in animals and artifacts. Cambridge: MIT.Google Scholar
  15. Demiris, Y., & Johnson, M. (2003). Distributed, predictive perception of actions: a biologically inspired robotics architecture for imitation and learning. Connection Science, 15(4), 231–243.CrossRefGoogle Scholar
  16. Demiris, Y., & Khadhouri, B. (2006). Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54(5), 361–369.CrossRefGoogle Scholar
  17. Demiris, Y., & Khadhouri, B. (2008). Content-based control of goal-directed attention during human action perception. Interaction Studies, 9(2), 353–376.CrossRefGoogle Scholar
  18. Epshtein, B., & Ullman, S. (2007). Semantic hierarchies for recognizing objects and parts. In IEEE conference on computer vision and pattern recognition, 2007. CVPR’07 (pp. 1–8). New York: IEEE.Google Scholar
  19. Fadiga, L., Fogassi, L., Pavesi, G., Rizzolatti, G. (1995). Motor facilitation during action observation: a magnetic stimulation study. Journal of Neurophysiology, 73(6), 2608–2611.Google Scholar
  20. Fagioli, S., Hommel, B., Schubotz, R. (2007). Intentional control of attention: action planning primes action-related stimulus dimensions. Psychological Research, 71(1), 22–29.CrossRefGoogle Scholar
  21. Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119(2), 593.CrossRefGoogle Scholar
  22. Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12), 493–501.CrossRefGoogle Scholar
  23. Gangitano, M., Mottaghy, F., Pascual-Leone, A. (2001). Phase-specific modulation of cortical motor output during movement observation. Neuroreport, 12(7), 1489.CrossRefGoogle Scholar
  24. Gangitano, M., Mottaghy, F., Pascual-Leone, A. (2004). Modulation of premotor mirror neuron activity during observation of unpredictable grasping movements. European Journal of Neuroscience, 20(8), 2193–2202.CrossRefGoogle Scholar
  25. Gazzola, V., & Keysers, C. (2009). The observation and execution of actions share motor and somatosensory voxels in all tested subjects: single-subject analyses of unsmoothed fmri data. Cerebral Cortex, 19(6), 1239–1255.CrossRefGoogle Scholar
  26. Gopnik, A., & Meltzoff, A. (1997). Words, Thoughts, and Theories. Cambridge: MIT.Google Scholar
  27. Grafton, S., et al. (2007). Evidence for a distributed hierarchy of action representation in the brain. Human Movement Science, 26(4), 590–616.CrossRefGoogle Scholar
  28. Grush, R. (2004). The emulation theory of representation: motor control, imagery, and perception. Behavioral and Brain Sciences, 27(3), 377–96.Google Scholar
  29. Haruno, M., Wolpert, D., Kawato, M. (2001). Mosaic model for sensorimotor learning and control. Neural Computation, 13(10), 2201–2220.CrossRefzbMATHGoogle Scholar
  30. Haruno, M., Wolpert, D., Kawato, M. (2003). Hierarchical mosaic for movement generation. Excepta Medica International Coungress Series, 1250, 575–590.CrossRefGoogle Scholar
  31. Hess, W. R. (1957). The functional organization of the diencephalon. New York: Grune & Stratton.Google Scholar
  32. Hesslow, G. (2002). Conscious thought as simulation of behaviour and perception. Trends in Cognitive Sciences, 6(6), 242–247.CrossRefGoogle Scholar
  33. Hinton, G. (2010). Learning to represent visual input. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1537), 177.CrossRefGoogle Scholar
  34. Hinton, G. E., & Ghahramani, Z. (1997). Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society of London B, 352, 1177–1190.CrossRefGoogle Scholar
  35. Honeycutt, C., & Nichols, T. (2010). The decerebrate cat generates the essential features of the force constraint strategy. Journal of Neurophysiology, 103(6), 3266.CrossRefGoogle Scholar
  36. Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286(5449), 2526–2528.CrossRefGoogle Scholar
  37. Ivanov, Y., & Bobick, A. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.CrossRefGoogle Scholar
  38. Jeannerod, M. (1981). Intersegmental coordination during reaching at natural visual objects (vol. 9, pp. 153–168). Hillsdale: Lawrence Erlbaum Associates, Inc.Google Scholar
  39. Jeannerod, M. (1994). The representing brain: neural correlates of motor intention and imagery. Behavioral and Brain Sciences, 17(02), 187–202.CrossRefGoogle Scholar
  40. Johnson, M., & Demiris, Y. (2004). Towards Autonomous Robotic Systems: Proceedings of TAROS 2004; University of Essex, 6.-8.9.2004. Technical report series/Department of Computer Science, University of Essex.
  41. Kato, T., & Floreano, D. (2001). An evolutionary active-vision system. In Proceedings of the 2001 congress on evolutionary computation (vol. 1, pp. 107–114). New York: IEEE. doi:10.1109/CEC.2001.934378.Google Scholar
  42. Keysers, C., & Gazzola, V. (2010). Social neuroscience: mirror neurons recorded in humans. Current Biology, 20, 353–354.CrossRefGoogle Scholar
  43. Langley, P., & Stromsten, S. (2000). Learning context-free grammars with a simplicity bias. In Proceedings of the 11th European conference on machine learning (pp. 321–338). Berlin: Springer.Google Scholar
  44. Lee, K., & Demiris, Y. (2011). Towards incremental learning of task-dependent action sequences using probabilistic parsing. In IEEE first joint international conference on development and learning and on epigenetic robotics (ICDL-EPIROB 2011). Germany: Frankfurt am MainGoogle Scholar
  45. Lee, K., Kim, T. K., Demiris, Y. (2012). Learning reusable task representations using hierarchical activity grammars with uncertainties. In IEEE international conference on robotics and automation (IEEE ICRA 2012). Minnesota: St. Paul.Google Scholar
  46. Liske, E. (1999). The hierarchical organiztion of mantid behaviours. In F. R. Prete, H. Wells, P. H. Wells, L. E. Hurd (Eds.), The praying mantids. Baltimore: Johns Hopkins University Press.Google Scholar
  47. Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of imaging understanding workshop (pp. 121–130). Darpa.Google Scholar
  48. Malcolm, G. L., & Henderson, J. M. (2010). Combining top-down processes to guide eye movements during real-world scene search. Journal of Vision, 10, 1–11.CrossRefGoogle Scholar
  49. Nehaniv, C., & Dautenhahn, K. (2002). The correspondence problems, Chap.  2 (pp. 41–61). Cambridge: MIT.Google Scholar
  50. Ognibene, D., Catenacci Volpi, N., Pezzulo, G. (2011). Learning to grasp information with your own hands. In Proceedings of 12th conference towards autonomous robotics systems (TAROS 2011). Berlin: Springer.
  51. Ognibene, D., Pezzulo, G., Baldassarre, G. (2010). How can bottom-up information shape learning of top-down attention control skills? In Proceedings of 9th international conference on development and learning. New York: IEEE.Google Scholar
  52. O’Regan, J. K., & Noé, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral Brain Science, 24(5), 939–973.CrossRefGoogle Scholar
  53. Pearl, J. (2000). Causality: models, reasoning and inference. Cambridge: Cambridge University Press.Google Scholar
  54. Pezzulo, G., Barca, L., Bocconi, A., Borghi, A. (2010). When affordances climb into your mind: advantages of motor simulation in a memory task performed by novice and expert rock climbers. Brain and Cognition, 73(1), 68–73.CrossRefGoogle Scholar
  55. Pezzulo, G., Barsalou, L. W., Cangelosi, A., Fischer, M. H., Spivey, M., McRae, K. (2011). The mechanics of embodiment: a dialogue on embodiment and computational modeling. Frontiers in Psychology, 2(00005).Google Scholar
  56. Rao, R. P., & Ballard, D. (1995). An active vision architecture based on iconic representations. Artificial Intelligence, 78(1–2), 461–505.CrossRefGoogle Scholar
  57. Reddy, L., & Kanwisher, N. (2006). Coding of visual objects in the ventral stream. Current Opinion in Neurobiology, 16(4), 408–414.CrossRefGoogle Scholar
  58. Ryoo, M., & Aggarwal, J. (2006). Recognition of composite human activities through context-free grammar based representation. In IEEE computer society conference on computer vision and pattern recognition, 2006 (vol. 2, pp. 1709–1718). New York: IEEE.Google Scholar
  59. Sarabia, M., Ros, R., Demiris, Y. (2011). Towards an open-source social middleware for humanoid robots. In Proceedings of the IEEE/RAS international conference on humanoid robotics. New York: IEEE.Google Scholar
  60. Shanton, K., & Goldman, A. (2010). Simulation theory. Wiley Interdisciplinary Reviews: Cognitive Science, 1(4), 527–538.Google Scholar
  61. Simmons, G., & Demiris, Y. (2006). Object grasping using the minimum variance model. Biological Cybernetics, 94(5), 393–407.MathSciNetCrossRefzbMATHGoogle Scholar
  62. Simon, H. A. (1962). The architecture of complexity. Proceedings of the American Philosophical Society, 106(6), 467–482.Google Scholar
  63. Sutton, R. S., Precup, D., Singh, S. (1999). Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 211, 112–181.MathSciNetGoogle Scholar
  64. Suzuki, M., & Floreano, D. (2006). Evolutionary active vision toward three dimensional landmark-navigation. In From animals to animats 9. Berlin: Springer.
  65. Tate, A. (1977). Generating project networks. In Proceedings of the international joint conference on artificial intelligence (IJCAI-77) (pp. 888–893). Cambridge: Morgan Kaufmann.Google Scholar
  66. Tatler, B. W., Hayhoe, M. M., Land, M. F., Ballard, D. (2011). Eye guidance in natural vision: reinterpreting salience. Journal of Vision, 11(5), 1–23.CrossRefGoogle Scholar
  67. Theocharous, G., Murphy, K., Kaelbling, L. (2004). Representing hierarchical pomdps as dbns for multi-scale robot localization. In 2004 IEEE international conference on robotics and automation (ICRA) (vol. 1, pp. 1045–1051). New York: IEEE.Google Scholar
  68. Wu, Y., & Demiris, Y. (2010). Towards one shot learning by imitation for humanoid robots. In 2010 IEEE international conference on robotics and automation (ICRA) (pp. 2889–2894). New york: IEEE.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Dimitri Ognibene
    • 1
    Email author
  • Yan Wu
    • 1
  • Kyuhwa Lee
    • 1
  • Yiannis Demiris
    • 1
  1. 1.Department of Electrical and Electronic EngineeringImperial College LondonLondonUK

Personalised recommendations