Autonomous Robots

, Volume 42, Issue 1, pp 125–145 | Cite as

Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction

  • Dongheui LeeEmail author
  • Raffaele Soloperto
  • Matteo Saveriano


In this paper we propose a new bidirectional invariant motion descriptor of a rigid body. The proposed invariant representation is not affected by rotations, translations, time, linear and angular scaling. Invariant properties of the proposed representation enable to recognize gestures in realistic scenarios with unexpected variations (e.g., changes in user’s initial pose, execution time or an observation point), while Cartesian trajectories are sensitive to these changes. The proposed invariant representation also allows reconstruction of the original motion trajectory, which is useful for human-robot interaction applications where a robot recognizes human actions and executes robot’s proper behaviors using same descriptors. By removing the dependency on absolute pose and scaling factors of the Cartesian trajectories the proposed descriptor achieves flexibility to generate different motion instances from the same invariant representation. In order to illustrate the effectiveness of our proposed descriptor in motion recognition and generation, it is tested on three datasets and experiments on a NAO humanoid robot and a KUKA LWR IV\(+\) manipulator and compared with other existing invariant representations.


Invariant representation Rigid body motion Bidirectional descriptor Recognition Generation 



This work has been supported by the Technical University of Munich, International Graduate School of Science and Engineering.


  1. Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2008). Robot programming by demonstration. In O. Khatib & B. Siciliano (Eds.), Springer handbook of robotics (pp. 1371–1394). Berlin: Springer.CrossRefGoogle Scholar
  2. Bishop, C. M., et al. (2006). Pattern recognition and machine learning. New York: Springer.zbMATHGoogle Scholar
  3. Black, M., & Jepson, D. (1998). A probabilistic framework for matching temporal trajectories: Condensation-based recognition of gestures and expressions. European conference on computer vision, Lecture notes in computer science (Vol. 1406, pp. 909–924). Berlin: Springer.Google Scholar
  4. Burger, B., Ferrané, I., Lerasle, F., & Infantes, G. (2011). Two-handed gesture recognition and fusion with speech to command a robot. Autonomous Robots, 32(2), 129–147.CrossRefGoogle Scholar
  5. Chartrand, R. (2011). Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics, 2011, 1–12.MathSciNetCrossRefzbMATHGoogle Scholar
  6. De Schutter, J. (2010). Invariant description of rigid body motion trajectories. Journal of Mechanisms and Robotics, 2(1), 1–9.CrossRefGoogle Scholar
  7. De Schutter, J., Di Lello, E., De Schutter, J., Matthysen, R., Benoit, T., & De Laet, T. (2011). Recognition of 6 dof rigid body motion trajectories using a coordinate-free representation. In International conference on robotics and automation (pp. 2071–2078).Google Scholar
  8. Denavit, J., & Hartenberg, R. S. (1965). A kinematic notation for lower-pair mechanisms based on matrices. Transaction of the ASME Journal of Applied Mechanics, 22(2), 215–221.MathSciNetzbMATHGoogle Scholar
  9. Dieleman, S., De Fauw, J., & Kavukcuoglu, K. (2016). Exploiting cyclic symmetry in convolutional neural networks. International Conference on Machine Learning.Google Scholar
  10. Hu, K., & Lee, D. (2012). Biped locomotion primitive learning, control and prediction from human data. In 10th International IFAC symposium on robot control (SYROCO).Google Scholar
  11. Hu, K., Ott, C., & Lee, D. (2014). Online human walking imitation in task and joint space based on quadratic programming. In IEEE international conference on robotics and automation (pp. 3458–3464). IEEE.Google Scholar
  12. Isard M, Blake A (1996) Contour tracking by stochastic propagation of conditional density. In European conference on computer vision (pp. 343–356).Google Scholar
  13. Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).Google Scholar
  14. Koppula, H. S., Gupta, R., & Saxena, A. (2013). Learning human activities and object affordances from rgb-d videos. International Journal of Robotic Research, 32, 951–970.CrossRefGoogle Scholar
  15. Kühnel, W. (2006). Differential geometry: Curves-surfaces-manifolds. Providence: American Mathematical Society.zbMATHGoogle Scholar
  16. LeCun Y (2012) Learning invariant feature hierarchies. In European conference on computer vision (pp. 496–505).Google Scholar
  17. Lee, D., & Nakamura, Y. (2010). Mimesis model from partial observations for a humanoid robot. International Journal of Robotics Research, 29(1), 60–80.CrossRefGoogle Scholar
  18. Lee, D., Ott, C., & Nakamura, Y. (2009). Mimetic communication with impedance control for physical human–robot interaction. In IEEE international conference on robotics and automation (pp. 1535–1542).Google Scholar
  19. Li, W., Zhang, Z., & Liu, Z. (2010). Action recognition based on a bag of 3d points. In Conference on computer vision and pattern recognition workshops (pp. 9–14).Google Scholar
  20. Magnanimo, V., Saveriano, M., Rossi, S., & Lee, D. (2014). A Bayesian approach for task recognition and future human activity prediction. In International symposium on robot and human interactive communication (pp. 726–731).Google Scholar
  21. Murray, R. M., Sastry, S. S., & Zexiang, L. (1994). A mathematical introduction to robotic manipulation (1st ed.). Boca Raton: CRC Press.zbMATHGoogle Scholar
  22. Perona, P., & Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. Transactions on Pattern Analysis and Machine Intelligence, 12(7), 629–639.CrossRefGoogle Scholar
  23. Piao, Y., Hayakawa, K., & Sato, J. (2002). Space-time invariants and video motion extraction from arbitrary viewpoints. In International conference on pattern recognition (pp. 56–59).Google Scholar
  24. Piao, Y., Hayakawa, K., & Sato, J. (2004). Space-time invariants for recognizing 3d motions from arbitrary viewpoints under perspective projection. In International conference on image and graphics (pp. 200–203).Google Scholar
  25. Psarrou, A., Gong, S., & Walter, M. (2002). Recognition of human gestures and behaviour based on motion trajectories. Image and Vision Computing, 20(5–6), 349–358.CrossRefGoogle Scholar
  26. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE (pp. 257–286).Google Scholar
  27. Rao, C., Yilmaz, A., & Shah, M. (2002). View-invariant representation and recognition of actions. International Journal of Computer Vision, 50(2), 203–226.CrossRefzbMATHGoogle Scholar
  28. Rao, C., Shah, M., & Syeda-Mahmood, T. (2003). Action recognition based on view invariant spatio-temporal analysis. In ACM multimedia.Google Scholar
  29. Rauch, H. E., Striebel, C. T., & Tung, F. (1965). Maximum likelihood estimates of linear dynamic systems. Journal of the American Institute of Aeronautics and Astronautics, 3(8), 1445–1450.MathSciNetCrossRefGoogle Scholar
  30. Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49.CrossRefzbMATHGoogle Scholar
  31. Sanguansat, P. (2012). Multiple multidimensional sequence alignment using generalized dynamic time warping. WSEAS Transactions on Mathematics, 11(8), 668–678.Google Scholar
  32. Saveriano, M., & Lee, D. (2013). Invariant representation for user independent motion recognition. In International symposium on robot and human interactive communication (pp. 650–655).Google Scholar
  33. Saveriano, M., An, S., & Lee, D. (2015). Incremental kinesthetic teaching of end-effector and null-space motion primitives. In International conference on robotics and automation (pp. 3570–3575).Google Scholar
  34. Schreiber, G., Stemmer, A., & Bischoff, R. (2010). The fast research interface for the kuka lightweight robot. In ICRA workshop on innovative robot control architectures for demanding (Research) applications (pp. 15–21).Google Scholar
  35. Siciliano, B., Sciavicco, L., Villani, L., & Oriolo, G. (2009). Robotics-modelling, planning and control. Berlin: Springer.Google Scholar
  36. Soloperto, R., Saveriano, M., & Lee, D. (2015). A bidirectional invariant representation of motion for gesture recognition and reproduction. In International conference on robotics and automation (pp. 6146–6152).Google Scholar
  37. Vochten, M., De Laet, T., & De Schutter, J. (2015). Comparison of rigid body motion trajectory descriptors for motion representation and recognition. In International conference on robotics and automation (pp. 3010–3017).Google Scholar
  38. Waldherr, S., Romero, R., & Thrun, S. (2000). A gesture based interface for human–robot interaction. Autonomous Robots, 9(2), 151–173.CrossRefGoogle Scholar
  39. Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In Conference on computer vision and pattern recognition (pp. 1290–1297).Google Scholar
  40. Wang, P., Li, W., Gao, Z., Tang, C., Zhang, J., & Ogunbona, P. (2015). Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 1119–1122).Google Scholar
  41. Weiss, I. (1993). Geometric invariants and object recognition. International Journal of Computer Vision, 10(3), 207–231.CrossRefGoogle Scholar
  42. Wu, S., & Li, Y. F. (2008). On signature invariants for effective motion trajectory recognition. International Journal of Robotic Research, 27(8), 895–917.CrossRefGoogle Scholar
  43. Wu, S., & Li, Y. F. (2010). Motion trajectory reproduction from generalized signature description. Pattern Recognition, 43(1), 204–221.CrossRefzbMATHGoogle Scholar
  44. Wu, Y., & Huang, T. S. (2001). Vision-based gesture recognition: A review. In Gesture-based communication in human–computer interaction, lecture notes in computer science (pp. 103–115). Berlin: Springer.Google Scholar
  45. Xia, L., Chen, C. C., Aggarwal, J. K. (2012). View invariant human action recognition using histograms of 3d joints. In Conference on computer vision and pattern recognition workshops (pp 20–27).Google Scholar
  46. Yan, P., Khan, S. M., & Shah, M. (2008). Learning 4d action feature models for arbitrary view action recognition. In International conference on computer vision and pattern recognition (pp. 1–7).Google Scholar
  47. Zisserman, A., & Maybank, S. (1994). A case against epipolar geometry. In Applications of invariance in computer vision, lecture notes in computer science (Vol. 825, pp. 69–88). Berlin: Springer.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Human-Centered Assistive RoboticsTechnical University of MunichMunichGermany
  2. 2.Department of Electrical, Electronic, and Information EngineeringUniversità di BolognaBolognaItaly

Personalised recommendations