Effective and efficient similarity searching in motion capture data


Motion capture data describe human movements in the form of spatio-temporal trajectories of skeleton joints. Intelligent management of such complex data is a challenging task for computers which requires an effective concept of motion similarity. However, evaluating the pair-wise similarity is a difficult problem as a single action can be performed by various actors in different ways, speeds or starting positions. Recent methods usually model the motion similarity by comparing customized features using distance-based functions or specialized machine-learning classifiers. By combining both these approaches, we transform the problem of comparing motions of variable sizes into the problem of comparing fixed-size vectors. Specifically, each rather-short motion is encoded into a compact visual representation from which a highly descriptive 4,096-dimensional feature vector is extracted using a fine-tuned deep convolutional neural network. The advantage is that the fixed-size features are compared by the Euclidean distance which enables efficient motion indexing by any metric-based index structure. Another advantage of the proposed approach is its tolerance towards an imprecise action segmentation, the variance in movement speed, and a lower data quality. All these properties together bring new possibilities for effective and efficient large-scale retrieval.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.


  2. 2.



  1. 1.

    Barnachon M, Bouakaz S, Boufama B, Guillou E (2013) A real-time system for motion retrieval and interpretation. Pattern Recogn Lett 34(15):1789–1798

    Article  Google Scholar 

  2. 2.

    Barnachon M, Bouakaz S, Boufama B, Guillou E (2014) Ongoing human action recognition with motion capture. Pattern Recogn 47(1):238–247

    Article  Google Scholar 

  3. 3.

    Baumann J, Wessel R, Krüger B., Weber A (2014) Action graph: a versatile data structure for action recognition. In: International conference on computer graphics theory and applications (GRAPP 2014). SCITEPRESS, pp 1–10

  4. 4.

    Beecks C, Hassani M, Obeloer F, Seidl T (2015) Efficient query processing in 3D motion capture databases via lower bound approximation of the gesture matching distance. In: 2015 IEEE International symposium on multimedia (ISM 2015), pp 148–153

  5. 5.

    Bouchard D, Badler N (2007) Semantic segmentation of motion capture using Laban movement analysis. Springer Berlin Heidelberg, Berlin Heidelberg, pp 37–44

    Google Scholar 

  6. 6.

    Cai M, Zou B, Gao H, Song J (2014) Motion recognition for 3d human motion capture data using support vector machines with rejection determination. Multimed Tools Appl 70(2):1333–1362

    Article  Google Scholar 

  7. 7.

    Chaudhry R, Ofli F, Kurillo G, Bajcsy R, Vidal R (2013) Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In: Computer vision and pattern recognition workshops (CVPRW 2013), pp 471–478

  8. 8.

    Chen X, Koskela M (2013) Classification of RGB-D and motion capture sequences using extreme learning machine. Image Anal 640–651

  9. 9.

    Cho K, Chen X (2013) Classifying and visualizing motion capture sequences using deep neural networks. CoRR arXiv: abs/1306.3874

  10. 10.

    Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern: Syst 43(4):996–1002

    Article  Google Scholar 

  11. 11.

    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference in machine learning (ICML 2014), pp I–647–I–655

  12. 12.

    Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: International conference on computer vision and pattern recognition (CVPR 2015), pp 1110–1118

  13. 13.

    Elias P, Sedmidubsky J, Zezula P (2015) Motion images: an effective representation of motion capture data for similarity search. In: 8th International conference on similarity search and applications (SISAP 2015). Springer, pp 250–255

  14. 14.

    Huynh DQ (2009) Metrics for 3d rotations: comparison and analysis. J Math Imag Vis 35(2):155–164

    MathSciNet  Article  Google Scholar 

  15. 15.

    Ijjina E, Mohan C (2015) Human action recognition based on motion capture information using fuzzy convolution neural networks. In: 8th International conference on advances in pattern recognition (ICAPR 2015), pp 1–6

  16. 16.

    Kadu H, Kuo CC (2014) Automatic human mocap data classification. IEEE Trans Multimed 16(8):2191–2202

    Article  Google Scholar 

  17. 17.

    Kapadia M, Chiang IK, Thomas T, Badler NI, Kider JT Jr (2013) Efficient motion retrieval in large motion databases. In: ACM SIGGRAPH Symposium on interactive 3D graphics and games (I3D 2013). ACM, New York, pp 19–28

    Google Scholar 

  18. 18.

    Keogh E, Palpanas T, Zordan V B, Gunopulos D, Cardle M (2004) Indexing large human-motion databases. In: 30th International conference on very large data bases (VLDB 2004), VLDB 2004, pp 780–791. VLDB Endowment

  19. 19.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges C, Bottou L, Weinberger K (eds) Advances in neural information processing systems 25. Curran Associates Inc, pp 1097–1105

  20. 20.

    Krüger B, Tautges J, Weber A, Zinke A (2010) Fast local and global similarity searches in large motion capture databases. In: ACM SIGGRAPH/Eurographics symposium on computer animation, SCA 2010. Eurographics Association, pp 1–10

  21. 21.

    Lan R, Sun H (2015) Automated human motion segmentation via motion regularities. Vis Comput 31(1):35–53

    MathSciNet  Article  Google Scholar 

  22. 22.

    Li M, Leung H (2016) Graph-based representation learning for automatic human motion segmentation. Multimed Tools Appl 75(15):9205–9224

    Article  Google Scholar 

  23. 23.

    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: Computer vision and pattern recognition workshops (CVPRW 2010), pp 9–14

  24. 24.

    Liang Y, Lu W, Liang W, Wang Y (2014) Action recognition using local joints structure and histograms of 3d joints. In: 10th International conference on computational intelligence and security (CIS 2014), pp 185–188

  25. 25.

    Liu Y, Zhang X, Cui J, Wu C, Aghajan H, Zha H (2010) Visual analysis of child-adult interactive behaviors in video sequences. In: 16th International conference on virtual systems and multimedia, pp 26–33

  26. 26.

    Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: 21st International conference on pattern recognition (ICPR 2012), pp 898–901

  27. 27.

    Liu Y, Nie L, Han L, Zhang L, Rosenblum D S (2016) Action2activity: recognizing complex activities from sensor data. CoRR arXiv:abs/1611.01872, 1–7

  28. 28.

    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. Big data driven intelligent transportation systems

    Article  Google Scholar 

  29. 29.

    Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI, pp 1266–1272

  30. 30.

    Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2016) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 1–19

  31. 31.

    Milovanovic M, Minovic M, Starcevic D (2013) Walking in colors: human gait recognition using kinect and cbir. IEEE MultiMed 20(4):28–36

    Article  Google Scholar 

  32. 32.

    Müller M, Röder T, Clausen M (2005) Efficient content-based retrieval of motion capture data. In: ACM SIGGRAPH. ACM, pp 677–685

  33. 33.

    Müller M, Röder T, Clausen M, Eberhardt B, Krüger B, Weber A (2007) Documentation Mocap Database HDM05. Tech. Rep. CG-2007-2 Universität Bonn

  34. 34.

    Müller M, Baak A, Seidel HP (2009) Efficient and robust annotation of motion capture data. In: ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2009). ACM Press, pp 17– 26

  35. 35.

    Novak D, Zezula P (2014) Rank aggregation of candidate sets for efficient similarity search. In: 25th Int. Conference on database and expert systems applications (DEXA 2014), pp 42–58

  36. 36.

    Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley mhad: a comprehensive multimodal human action database. In: International workshop on applications of computer vision (WACV 2013), pp 53–60

  37. 37.

    Poppe R, Van Der Zee S, Heylen DJ, Taylor P (2014) Amab: automated measurement and analysis of body motion. Behav Res Methods 46(3):625–633

    Google Scholar 

  38. 38.

    Presti LL, Cascia ML (2016) 3D skeleton-based human action classification: a survey. Pattern Recogn 53:130–147

    Article  Google Scholar 

  39. 39.

    Raptis M, Kirovski D, Hoppe H (2011) Real-time classification of dance gestures from skeleton animation. In: ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2011), SCA 2011. ACM, pp 147–156

  40. 40.

    Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65

    Article  MATH  Google Scholar 

  41. 41.

    Sedmidubsky J, Valcik J, Zezula P (2013) A key-pose similarity algorithm for motion data retrieval. In: Advanced concepts for intelligent vision systems (ACIVS 2013), LNCS, vol 8192. Springer, pp 669–681

  42. 42.

    Sedmidubsky J, Elias P, Zezula P (2016) Similarity searching in long sequences of motion capture data. In: 9th International conference on similarity search and applications (SISAP 2016). Springer, pp 271–285

  43. 43.

    Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1010– 1019

  44. 44.

    Trajcevski G, Ding H, Scheuermann P, Tamassia R, Vaccaro D (2007) Dynamics-aware similarity of moving objects trajectories. In: 15th Annual ACM international symposium on advances in geographic information systems, GIS ’07. ACM, New York, pp 11:1–11:8

    Google Scholar 

  45. 45.

    Valcik J, Sedmidubsky J, Zezula P (2016) Assessing similarity models for human-motion retrieval applications. Comput Anim Virt Worlds 27(5):484–500

    Article  Google Scholar 

  46. 46.

    Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: International conference on computer vision and pattern recognition (CVPR 2014), pp 588–595

  47. 47.

    Vögele A, Krüger B, Klein R (2014) Efficient unsupervised temporal segmentation of human motion. In: ACM Symposium on computer animation, pp 167–176

  48. 48.

    Wang J Y, Lee H M (2009) Recognition of human actions using motion capture data and support vector machine. In: World Congress on software engineering (WCSE 2009), vol 1, pp 234–238

  49. 49.

    Wang Y, Neff M (2015) Deep signatures for indexing and retrieval in large motion databases. In: 8th ACM SIGGRAPH conference on motion in games. ACM, pp 37–45

  50. 50.

    Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: International conference on computer vision and pattern recognition (CVPR 2012). IEEE Computer Society, pp 1290–1297

  51. 51.

    Wang H, Su H, Zheng K, Sadiq S, Zhou X (2013) An effectiveness study on trajectory similarity measures. In: 24th Australasian database conference, ADC ’13. Australian Computer Society, Inc., Darlinghurst, pp 13–22

    Google Scholar 

  52. 52.

    Wang X, Chen L, Jing J, Zheng H (2016) Human motion capture data retrieval based on semantic thumbnail. Multimed Tools Appl 75(19):11,723–11,740

    Article  Google Scholar 

  53. 53.

    Wu S, Wang Z, Xia S (2009) Indexing and retrieval of human motion data by a hierarchical tree. In: 16th ACM Symposium on virtual reality software and technology (VRST 2009). ACM Press, New York, pp 207–214

    Google Scholar 

  54. 54.

    Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: International conference on computer vision (ICCV 2013), pp 2752–2759

  55. 55.

    Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach, advances in database systems, vol 32. Springer-Verlag

  56. 56.

    Zhao X, Li X, Pang C, Zhu X, Sheng Q Z (2013) Online human gesture recognition from motion data streams. In: 21st International conference on multimedia (MM 2013). ACM, pp 23–32

Download references


This research was supported by GBP103/12/G084.

Author information



Corresponding author

Correspondence to Jan Sedmidubsky.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sedmidubsky, J., Elias, P. & Zezula, P. Effective and efficient similarity searching in motion capture data. Multimed Tools Appl 77, 12073–12094 (2018). https://doi.org/10.1007/s11042-017-4859-7

Download citation


  • Motion capture data retrieval
  • Effective similarity measure
  • Efficient indexing
  • k-NN query
  • Motion image
  • Convolutional neural network
  • Fixed-size motion feature