Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction

Abstract

In this paper we propose a new bidirectional invariant motion descriptor of a rigid body. The proposed invariant representation is not affected by rotations, translations, time, linear and angular scaling. Invariant properties of the proposed representation enable to recognize gestures in realistic scenarios with unexpected variations (e.g., changes in user’s initial pose, execution time or an observation point), while Cartesian trajectories are sensitive to these changes. The proposed invariant representation also allows reconstruction of the original motion trajectory, which is useful for human-robot interaction applications where a robot recognizes human actions and executes robot’s proper behaviors using same descriptors. By removing the dependency on absolute pose and scaling factors of the Cartesian trajectories the proposed descriptor achieves flexibility to generate different motion instances from the same invariant representation. In order to illustrate the effectiveness of our proposed descriptor in motion recognition and generation, it is tested on three datasets and experiments on a NAO humanoid robot and a KUKA LWR IV\(+\) manipulator and compared with other existing invariant representations.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    This work is based on our preliminary results presented in Soloperto et al. (2015). Our previous work has been extended in several ways: (i) we provide more theoretical insights including a compact closed form of DHB invariants; (ii) we theoretically compare DHB with existing invariant representations, in order to underline differences and similarities; (iii) we compare the recognition performance of DHB invariants with several state-of-the-art approaches; (iv) we report several experiments to show that DHB invariants can be adopted as flexible motion descriptors to execute complex tasks.

  2. 2.

    In the discrete time case, the integral \(\int _{t=0}^{t_f} {|\bullet |}\) in (42) is replaced by \(\sum _{t=0}^{t_f} {|\bullet |}\).

  3. 3.

    As shown in Sects. 8.1.1 and 8.1.3, the proposed DHB descriptor works reasonably well with kinect sensors, which does not ensure tracking of the perfectly same point of a body part.

  4. 4.

    For simplicity, the acronym of the author name (DS) is used to refer the representation in De Schutter (2010).

  5. 5.

    Time dependencies are omitted to simplify the notation.

  6. 6.

    The result is obtained from Algorithm 1 by neglecting the summation and subtraction operations.

  7. 7.

    The reconstruction procedure in Sect. 4 can also be applied to EFS descriptor. Both reconstruction methods (Wu and Li (2010) and Sect. 4) reproduce similar reconstruction errors.

  8. 8.

    A smaller sampling time generates more twist samples and more invariant values. Hence, more products have to be computed in (39) to reconstruct the motion, which increases errors due to the finite precision.

  9. 9.

    Available on-line: creativedistraction.com/downloads/ gesture.zip.

  10. 10.

    www.xsens.com/products/xsens-mvn.

  11. 11.

    Available on-line: research.microsoft.com/en-us/um/ people/zliu/actionrecorsrc.

  12. 12.

    There exists 3 invariants to represent translational motion of the MSR Action3D dataset.

  13. 13.

    www.aldebaran.com/en/cool-robots/nao.

  14. 14.

    For example, for full body motions of a human/humanoid, their heights are the reference. For hand motion, the length of its arm/manipulation are useful.

References

  1. Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2008). Robot programming by demonstration. In O. Khatib & B. Siciliano (Eds.), Springer handbook of robotics (pp. 1371–1394). Berlin: Springer.

    Google Scholar 

  2. Bishop, C. M., et al. (2006). Pattern recognition and machine learning. New York: Springer.

    MATH  Google Scholar 

  3. Black, M., & Jepson, D. (1998). A probabilistic framework for matching temporal trajectories: Condensation-based recognition of gestures and expressions. European conference on computer vision, Lecture notes in computer science (Vol. 1406, pp. 909–924). Berlin: Springer.

  4. Burger, B., Ferrané, I., Lerasle, F., & Infantes, G. (2011). Two-handed gesture recognition and fusion with speech to command a robot. Autonomous Robots, 32(2), 129–147.

    Article  Google Scholar 

  5. Chartrand, R. (2011). Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics, 2011, 1–12.

    MathSciNet  Article  MATH  Google Scholar 

  6. De Schutter, J. (2010). Invariant description of rigid body motion trajectories. Journal of Mechanisms and Robotics, 2(1), 1–9.

    Article  Google Scholar 

  7. De Schutter, J., Di Lello, E., De Schutter, J., Matthysen, R., Benoit, T., & De Laet, T. (2011). Recognition of 6 dof rigid body motion trajectories using a coordinate-free representation. In International conference on robotics and automation (pp. 2071–2078).

  8. Denavit, J., & Hartenberg, R. S. (1965). A kinematic notation for lower-pair mechanisms based on matrices. Transaction of the ASME Journal of Applied Mechanics, 22(2), 215–221.

    MathSciNet  MATH  Google Scholar 

  9. Dieleman, S., De Fauw, J., & Kavukcuoglu, K. (2016). Exploiting cyclic symmetry in convolutional neural networks. International Conference on Machine Learning.

  10. Hu, K., & Lee, D. (2012). Biped locomotion primitive learning, control and prediction from human data. In 10th International IFAC symposium on robot control (SYROCO).

  11. Hu, K., Ott, C., & Lee, D. (2014). Online human walking imitation in task and joint space based on quadratic programming. In IEEE international conference on robotics and automation (pp. 3458–3464). IEEE.

  12. Isard M, Blake A (1996) Contour tracking by stochastic propagation of conditional density. In European conference on computer vision (pp. 343–356).

  13. Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).

  14. Koppula, H. S., Gupta, R., & Saxena, A. (2013). Learning human activities and object affordances from rgb-d videos. International Journal of Robotic Research, 32, 951–970.

    Article  Google Scholar 

  15. Kühnel, W. (2006). Differential geometry: Curves-surfaces-manifolds. Providence: American Mathematical Society.

    MATH  Google Scholar 

  16. LeCun Y (2012) Learning invariant feature hierarchies. In European conference on computer vision (pp. 496–505).

  17. Lee, D., & Nakamura, Y. (2010). Mimesis model from partial observations for a humanoid robot. International Journal of Robotics Research, 29(1), 60–80.

    Article  Google Scholar 

  18. Lee, D., Ott, C., & Nakamura, Y. (2009). Mimetic communication with impedance control for physical human–robot interaction. In IEEE international conference on robotics and automation (pp. 1535–1542).

  19. Li, W., Zhang, Z., & Liu, Z. (2010). Action recognition based on a bag of 3d points. In Conference on computer vision and pattern recognition workshops (pp. 9–14).

  20. Magnanimo, V., Saveriano, M., Rossi, S., & Lee, D. (2014). A Bayesian approach for task recognition and future human activity prediction. In International symposium on robot and human interactive communication (pp. 726–731).

  21. Murray, R. M., Sastry, S. S., & Zexiang, L. (1994). A mathematical introduction to robotic manipulation (1st ed.). Boca Raton: CRC Press.

    MATH  Google Scholar 

  22. Perona, P., & Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. Transactions on Pattern Analysis and Machine Intelligence, 12(7), 629–639.

    Article  Google Scholar 

  23. Piao, Y., Hayakawa, K., & Sato, J. (2002). Space-time invariants and video motion extraction from arbitrary viewpoints. In International conference on pattern recognition (pp. 56–59).

  24. Piao, Y., Hayakawa, K., & Sato, J. (2004). Space-time invariants for recognizing 3d motions from arbitrary viewpoints under perspective projection. In International conference on image and graphics (pp. 200–203).

  25. Psarrou, A., Gong, S., & Walter, M. (2002). Recognition of human gestures and behaviour based on motion trajectories. Image and Vision Computing, 20(5–6), 349–358.

    Article  Google Scholar 

  26. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE (pp. 257–286).

  27. Rao, C., Yilmaz, A., & Shah, M. (2002). View-invariant representation and recognition of actions. International Journal of Computer Vision, 50(2), 203–226.

    Article  MATH  Google Scholar 

  28. Rao, C., Shah, M., & Syeda-Mahmood, T. (2003). Action recognition based on view invariant spatio-temporal analysis. In ACM multimedia.

  29. Rauch, H. E., Striebel, C. T., & Tung, F. (1965). Maximum likelihood estimates of linear dynamic systems. Journal of the American Institute of Aeronautics and Astronautics, 3(8), 1445–1450.

    MathSciNet  Article  Google Scholar 

  30. Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49.

    Article  MATH  Google Scholar 

  31. Sanguansat, P. (2012). Multiple multidimensional sequence alignment using generalized dynamic time warping. WSEAS Transactions on Mathematics, 11(8), 668–678.

    Google Scholar 

  32. Saveriano, M., & Lee, D. (2013). Invariant representation for user independent motion recognition. In International symposium on robot and human interactive communication (pp. 650–655).

  33. Saveriano, M., An, S., & Lee, D. (2015). Incremental kinesthetic teaching of end-effector and null-space motion primitives. In International conference on robotics and automation (pp. 3570–3575).

  34. Schreiber, G., Stemmer, A., & Bischoff, R. (2010). The fast research interface for the kuka lightweight robot. In ICRA workshop on innovative robot control architectures for demanding (Research) applications (pp. 15–21).

  35. Siciliano, B., Sciavicco, L., Villani, L., & Oriolo, G. (2009). Robotics-modelling, planning and control. Berlin: Springer.

    Google Scholar 

  36. Soloperto, R., Saveriano, M., & Lee, D. (2015). A bidirectional invariant representation of motion for gesture recognition and reproduction. In International conference on robotics and automation (pp. 6146–6152).

  37. Vochten, M., De Laet, T., & De Schutter, J. (2015). Comparison of rigid body motion trajectory descriptors for motion representation and recognition. In International conference on robotics and automation (pp. 3010–3017).

  38. Waldherr, S., Romero, R., & Thrun, S. (2000). A gesture based interface for human–robot interaction. Autonomous Robots, 9(2), 151–173.

    Article  Google Scholar 

  39. Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In Conference on computer vision and pattern recognition (pp. 1290–1297).

  40. Wang, P., Li, W., Gao, Z., Tang, C., Zhang, J., & Ogunbona, P. (2015). Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 1119–1122).

  41. Weiss, I. (1993). Geometric invariants and object recognition. International Journal of Computer Vision, 10(3), 207–231.

    Article  Google Scholar 

  42. Wu, S., & Li, Y. F. (2008). On signature invariants for effective motion trajectory recognition. International Journal of Robotic Research, 27(8), 895–917.

    Article  Google Scholar 

  43. Wu, S., & Li, Y. F. (2010). Motion trajectory reproduction from generalized signature description. Pattern Recognition, 43(1), 204–221.

    Article  MATH  Google Scholar 

  44. Wu, Y., & Huang, T. S. (2001). Vision-based gesture recognition: A review. In Gesture-based communication in human–computer interaction, lecture notes in computer science (pp. 103–115). Berlin: Springer.

  45. Xia, L., Chen, C. C., Aggarwal, J. K. (2012). View invariant human action recognition using histograms of 3d joints. In Conference on computer vision and pattern recognition workshops (pp 20–27).

  46. Yan, P., Khan, S. M., & Shah, M. (2008). Learning 4d action feature models for arbitrary view action recognition. In International conference on computer vision and pattern recognition (pp. 1–7).

  47. Zisserman, A., & Maybank, S. (1994). A case against epipolar geometry. In Applications of invariance in computer vision, lecture notes in computer science (Vol. 825, pp. 69–88). Berlin: Springer.

Download references

Acknowledgements

This work has been supported by the Technical University of Munich, International Graduate School of Science and Engineering.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Dongheui Lee.

Appendices

Appendix A: Rigid Body Motion Representation

To represent rigid body motions it is convenient to attach an orthogonal frame to the rigid body (body frame) and to describe the pose (position and orientation) of the body frame wrt a fixed frame (world frame). In each time instant the position of the rigid body is represented by the vector \(\mathbf {p}\) connecting the origin of the body frame with the origin of the world frame. The axes of the body frame can be projected along the axes of the world frame by the means of the direction cosines. Hence, the orientation of the rigid body is described by collecting the direction cosines into a \(3 \times 3\) rotation matrix \(\mathbf {R}\). It is possible to show that a minimal representation of the orientation consists of 3 values (Siciliano et al. 2009). In this work, we use the rotation vector to represent the orientation.

The rotation vector \(\mathbf {r} = \theta \hat{\mathbf{r}}\) is computed from \(\mathbf {R}\) as:

$$\begin{aligned}&\theta = \text {arccos} \left( \frac{trace\left( \mathbf {R}\right) -1}{2}\right) ,\nonumber \\&\hat{\mathbf{r}} = \frac{1}{2\sin {\theta }} \begin{bmatrix} \mathbf{{R}}\left( 3,2\right) - \mathbf{{R}}\left( 2,3\right) \\ \mathbf{{R}}\left( 1,3\right) - \mathbf{{R}}\left( 3,1\right) \\ \mathbf{{R}}\left( 2,1\right) - \mathbf{{R}}\left( 1,2\right) \\\end{bmatrix} \end{aligned}$$

The rotation matrix \(\mathbf {R}\) is computed from \(\mathbf {r}\) as:

$$\begin{aligned} \mathbf {R} = \exp (\mathbf {r}) = \mathbf {I} + \frac{\mathbf {S}(\mathbf {r})}{\theta } \sin (\theta ) + \frac{\mathbf {S}^{2}(\mathbf {r})}{\theta ^{2}}(1 - \cos (\theta )) ~, \end{aligned}$$

where the skew-symmetric matrix \(\mathbf {S}(\mathbf {r})\) is given by:

$$\begin{aligned} \mathbf {S}(\mathbf {r}) = \begin{bmatrix} 0&-r_z&r_y \\ r_z&0&-r_x \\ -r_y&r_x&0 \\ \end{bmatrix} ~. \end{aligned}$$

Appendix B: Proofs of the relationships in Sect. 7.1

1. \(m_{\omega } = d_{\omega }^1\) derives from (20) and \(d_{\omega }^1\) in Table 2.

2. \(\theta _{\omega }^{1} \approx d_{\omega }^{2} \Delta t\). For \(\Delta t \longrightarrow 0\), we can neglect the arc tangent in (28). Hence, we can rewrite \(\theta _{\omega }^{1}\) in (23) as:

$$\begin{aligned} \begin{aligned} \theta _{\omega }^{1}&\approx \frac{\Vert {\varvec{\omega }}_t \times {\varvec{\omega }}_{t+1}\Vert }{{\varvec{\omega }}_{t} \cdot {\varvec{\omega }}_{t+1}} = \frac{\Vert {\varvec{\omega }}_t \times ({\varvec{\omega }}_{t} + \Delta {\varvec{\omega }}_{t})\Vert }{{\varvec{\omega }}_{t} \cdot ({\varvec{\omega }}_{t} + \Delta {\varvec{\omega }}_{t})}\\&\approx \frac{\Vert {\varvec{\omega }}_t \times \Delta {\varvec{\omega }}_{t}\Vert }{\Vert {\varvec{\omega }}_{t} \Vert ^{2}}\frac{\Delta t}{\Delta t} \approx \frac{\Vert {\varvec{\omega }}_t \times \dot{{\varvec{\omega }}}_{t}\Vert }{\Vert {\varvec{\omega }}_{t} \Vert ^{2}}\Delta t = d_{\omega }^{2} \Delta t~. \end{aligned} \end{aligned}$$
(43)

3. \(\theta _{\omega }^{2} \approx d_{\omega }^{3}\Delta t\). Recall that \(\mathbf {a} \times \mathbf {b} = -\mathbf {b} \times \mathbf {a}\) and that \(\mathbf {a}\cdot (\mathbf {b} \times \mathbf {c}) = \mathbf {c}\cdot (\mathbf {a} \times \mathbf {b})\). \(\theta _{\omega }^{2}\) in (23) can be re-written as:

$$\begin{aligned} \begin{aligned} \theta _{\omega }^{2}&= \arctan {\left( \frac{\Vert {\varvec{\omega }}_{t+1}\Vert {\varvec{\omega }}_{t+2} \cdot \left( {\varvec{\omega }}_{t+1}\times {\varvec{\omega }}_{t} \right) }{\left( {\varvec{\omega }}_{t+1} \times {\varvec{\omega }}_{t}\right) \cdot \left( {\varvec{\omega }}_{t+1} \times {\varvec{\omega }}_{t+2}\right) }\right) } \\&= \arctan {\left( \frac{\Vert {\varvec{\omega }}_{t+1}\Vert \left( {\varvec{\omega }}_{t}\times {\varvec{\omega }}_{t+1} \right) \cdot {\varvec{\omega }}_{t+2}}{\left( {\varvec{\omega }}_{t} \times {\varvec{\omega }}_{t+1}\right) \cdot \left( {\varvec{\omega }}_{t+1} \times {\varvec{\omega }}_{t+2}\right) }\right) } \end{aligned} \end{aligned}$$
(44)

The denominator of (44) can be re-written as:

$$\begin{aligned} \begin{aligned}&\left( {\varvec{\omega }}_{t} \times {\varvec{\omega }}_{t+1}\right) \cdot \left( {\varvec{\omega }}_{t+1} \times {\varvec{\omega }}_{t+2}\right) \\&\approx \left( {\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t}\right) \cdot \left[ \left( {\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t}\right) \times \left( {\varvec{\omega }}_{t} \times 2\Delta {\varvec{\omega }}_{t}\right) \right] \\&= \left( {\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t}\right) \cdot \left[ 2({\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t})-({\varvec{\omega }}_{t} \times \Delta {\varvec{\omega }}_{t})\right] \frac{\Delta {t}^{2}}{\Delta {t}^{2}} \\&\approx \left( {\varvec{\omega }}_{t} \times \dot{{\varvec{\omega }}}_{t}\right) \cdot \left( {\varvec{\omega }}_{t} \times \dot{{\varvec{\omega }}}_{t}\right) \Delta {t}^{2} = \Vert {\varvec{\omega }}_{t} \times \dot{{\varvec{\omega }}}_{t} \Vert ^{2}\Delta {t}^{2} \end{aligned} \end{aligned}$$
(45)

Considering that \(\ddot{{{\varvec{a}}}}_t \approx ({{\varvec{a}}}_{t+2} + {{\varvec{a}}}_t)/\Delta {t}^2\), the numerator of (44) can be re-written as:

$$\begin{aligned} \begin{aligned}&\Vert {\varvec{\omega }}_{t+1}\Vert \left( {\varvec{\omega }}_{t}\times {\varvec{\omega }}_{t+1} \right) \cdot {\varvec{\omega }}_{t+2} \approx \Vert {\varvec{\omega }}_t \Vert \left( {\varvec{\omega }}_{t}\times \Delta {\varvec{\omega }}_{t} \right) \cdot \\&(\ddot{{\varvec{\omega }}}_{t} \Delta t^{2} - {\varvec{\omega }}_t) \approx \Vert {\varvec{\omega }}_t \Vert \left( {\varvec{\omega }}_{t}\times \dot{{\varvec{\omega }}}_{t} \right) \cdot \ddot{{\varvec{\omega }}}_{t} \Delta t^{3} \end{aligned} \end{aligned}$$
(46)

Finally, combining (45), (46) and (44), and neglecting the arc tangent, we obtain that \(\theta _{\omega }^{2} \approx d_{\omega }^{3}\Delta t\) for \(\Delta t \longrightarrow 0\).

Appenix C: Proofs of the relationships in Sect. 7.2

1. \(m_{v} = e_{v}^1\) derives from (19) and \(e_{v}^1\) in Table 2. \(m_{\omega } = e_{\omega }^1\) derives from (20) and \(e_{\omega }^1\) in Table 2.

2. \(\theta _{\omega }^{1} \approx e_{\omega }^2 \Delta t\) derives from (43) recalling that \(e_{\omega }^2 = d_{\omega }^2\). \(\theta _{v}^{1} \approx e_{v}^2 \Delta t\) can be proven by following similar steps as in (43) and considering \(e_{v}^2\) in Table 2.

3. \(\theta _{v}^{2} \approx e_{v}^3 \Delta t\) and \(\theta _{\omega }^{2} \approx e_{\omega }^3 \Delta t\). Following similar steps as in (45) and (46), and recalling that \((\mathbf {a} \times \mathbf {b})\times (\mathbf {a} \times \mathbf {c}) = \left[ \mathbf {a}\cdot (\mathbf {b} \times \mathbf {c})\right] \mathbf {a}\), it is possible to prove that \(\theta _{v}^{2} \approx e_{v}^3 \Delta t\) and \(\theta _{\omega }^{2} \approx e_{\omega }^3 \Delta t\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lee, D., Soloperto, R. & Saveriano, M. Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Auton Robot 42, 125–145 (2018). https://doi.org/10.1007/s10514-017-9645-x

Download citation

Keywords

  • Invariant representation
  • Rigid body motion
  • Bidirectional descriptor
  • Recognition
  • Generation