Abstract
In this work, we present FADE, a frequency-based descriptor to encode human motion. FADE is simple, and provides high compression rate and low computational complexity. In order to reduce space and time complexity, we exploit the biomechanical property that human motion is bounded in frequency. FADE and U-FADE can be used in combination with both supervised and unsupervised learning approaches in order to classify and cluster human actions, respectively. We present also a branch of FADE, called Uncompressed FADE (U-FADE). U-FADE performs well in combination with some unsupervised algorithms such as spectral clustering, paying the price of a reduced compression rate. Also, U-FADE performs in general better than FADE well with small datasets. We tested our descriptors with well-known, public motion databases, such as HDM05, Berkeley MHAD, and MSR. Moreover, we compared FADE and U-FADE with diverse state of the art approaches.
This is a preview of subscription content, access via your institution.












Similar content being viewed by others
Notes
Müller M, Röder T, Clausen M, Eberhardt B, Krüger B, Weber A (2007) Documentation mocap database hdm05.
References
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Bissacco, A., Chiuso, A., & Ma, Y., Soatto, S. (2001). Recognition of human gaits. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. II-52–II-57).
Cavallo, A., & Falco, P. (2014). Online segmentation and classification of manipulation actions from the observation of kinetostatic data. IEEE Transactions on Human–Machine Systems, 44(2), 256–269.
Chen, X., & Koskela, M. (2015). Skeleton-based action recognition with extreme learning machines. Neurocomputing, 149, 387–396.
Chen, C., Liu, K., & Kehtarnavaz, N. (2016). Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing, 12(1), 155–163. https://doi.org/10.1007/s11554-013-0370-1.
Cho, K., & Chen, X. (2014). Classifying and visualizing motion capture sequences using deep neural networks. In International conference on computer vision theory and applications (VISAPP) (Vol. 2, pp. 122–130).
De Schutter, J., Di Lello, E., De Schutter, J. F., Matthysen, R., Benoit, T., & De Laet, T. (2011). Recognition of 6 dof rigid body motion trajectories using a coordinate-free representation. In IEEE international conference on robotics and automation (pp. 2071–2078).
Di Benedetto, A., Palmieri, F. A., Cavallo, A., & Falco, P. (2016). A hidden Markov model-based approach to grasping hand gestures classification. In B. Simone, E. Anna, M. F. Carlo, & P. Eros (Eds.), Advances in neural networks (pp. 415–423). Cham: Springer International Publishing.
Du, Y., Wang, W., & Wang, L. (2015). Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1110–1118)
Dyn, N., Levin, D., & Rippa, S. (1990). Data dependent triangulations for piecewise linear interpolation. IMA Journal of Numerical Analysis, 10(1), 137–154.
Evangelidis, G., Singh, G., & Horaud, R. (2014). Skeletal quads: Human action recognition using joint quadruples. In International conference on pattern recognition
Falco, P., Saveriano, M., Hasany, E. G., Kirk, N. H., & Lee, D. (2017). A human action descriptor based on motion coordination. IEEE Robotics and Automation Letters, 2(2), 811–818.
Forestier, N., & Nougier, V. (1998). The effects of muscular fatigue on the coordination of a multijoint movement in human. Neuroscience Letters, 252(3), 187–190.
Gowayyed, M. A., Torki, M., Hussein, M. E., & El-Saban, M. (2013). Histogram of oriented displacements (hod): Describing trajectories of human joints for action recognition. In International joint conference on artificial intelligence
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1), 489–501.
Kulić, D., Ott, C., Lee, D., Ishikawa, J., & Nakamura, Y. (2012). Incremental learning of full body motion primitives and their sequencing through human motion observation. The International Journal of Robotics Research, 31(3), 330–345.
Kulić, D., Takano, W., & Nakamura, Y. (2008). Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden markov chains. The International Journal of Robotics Research, 27(7), 761–784.
Le Naour, T., Courty, N., & Gibet, S. (2012). Fast motion retrieval with the distance input space. In M. Kallmann, & K. Bekris (Eds.), Motion in Games: Proceedings of the 5th International Conference, MIG 2012, Rennes, France, November 15–17, 2012. Berlin, Heidelberg: Springer.
Lee, D., Ott, C., & Nakamura, Y. (2009). Mimetic communication with impedance control for physical human-robot interaction. In IEEE international conference on robotics and automation (Vol. 2009, pp. 1535–1542).
Lee, D., Soloperto, R., & Saveriano, M. (2017). Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Autonomous Robots, 42, 1–21.
Leightley, D., Li, B., McPhee, J. S., Yap, M. H., & Darby, J. (2014). Exemplar-based human action recognition with template matching from a stream of motion capture. In C. Aurélio & K. Mohamed (Eds.), Image analysis and recognition (pp. 12–20). Cham: Springer International Publishing.
Li, W., Zhang, Z., & Liu, Z. (2010). Action recognition based on a bag of 3d points. In IEEE computer society conference on computer vision and pattern recognition-workshops (pp. 9–14).
Liu, J., Shahroudy, A., Xu, D., & Wang, G. (2016). Spatio-temporal lstm with trust gates for 3d human action recognition. In European conference on computer vision (pp. 816–833). Springer.
Lovász, L., & Plummer, M. D. (2009). Matching theory (Vol. 367). Providence: American Mathematical Society.
Mahasseni, B., & Todorovic, S. (2016). Regularizing long short term memory with 3d human-skeleton sequences for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3054–3062).
Medina, J. R., Lawitzky, M., Mörtl, A., Lee, D., & Hirche, S. (2011). An experience-driven robotic assistant acquiring human knowledge to improve haptic cooperation. In 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 2416–2422). IEEE.
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2013). Berkeley MHAD: A comprehensive multimodal human action database. In IEEE workshop on applications of computer vision (pp. 53–60).
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2014). Sequence of the most informative joints (smij): A new representation for human skeletal action recognition. Journal of Visual Communication and Image Representation, 25(1), 24–38.
Pervez, A., Ali, A., Ryu, J. H., & Lee, D. (2017). Novel learning from demonstration approach for repetitive teleoperation tasks. In World haptics conference (WHC) (pp. 60–65).
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. Transactions on Acoustics, Speech, and Signal Processing, 26, 43–49.
Saveriano, M., & Lee, D. (2013). Invariant representation for user independent motion recognition. In IEEE international symposium on robot and human interactive communication (pp. 650–655).
Schmidts, A. M., Lee, D., & Peer, A. (2011). Imitation learning of human grasping skills from motion and force data. In 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1002–1007). IEEE.
Shah, D., Falco, P., Saveriano, M., & Lee, D. (2016). Encoding human actions with a frequency domain approach. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 5304–5311).
Soloperto, R., Saveriano, M., & Lee, D. (2015). A bidirectional invariant representation of motion for gesture recognition and reproduction. In IEEE international conference on robotics and automation (ICRA) (pp. 6146–6152).
Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In IEEE conference on computer vision and pattern recognition (pp. 588–595).
Walker, J. S. (1996). Fast fourier transforms (Vol. 24). Boca Raton: CRC Press.
Wang, C., Wang, Y., & Yuille, A. L. (2016). Mining 3d key-pose-motifs for action recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2639–2647).
Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1290–1297).
Wang, Q., Kurillo, G., Ofli, F., & Bajcsy, R. (2015). Unsupervised temporal segmentation of repetitive human actions based on kinematic modeling and frequency analysis. In 2015 international conference on 3D vision (pp. 562–570). https://doi.org/10.1109/3DV.2015.69.
Xia, L., Chen, C. C., & Aggarwal, J. (2012). View invariant human action recognition using histograms of 3d joints. In IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20–27).
Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In ACM SIGIR conference on research and development in information retrieval (pp. 267–273).
Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., & Xie, X., et al. (2016). Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In AAAI (Vol. 2, p. 8).
Acknowledgements
This work has been supported by the Marie Curie Action LEACON, EU project 659265, and by the Technical University of Munich, International Graduate School of Science and Engineering (IGSSE).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Falco, P., Saveriano, M., Shah, D. et al. Representing human motion with FADE and U-FADE: an efficient frequency-domain approach. Auton Robot 43, 179–196 (2019). https://doi.org/10.1007/s10514-018-9722-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-018-9722-9