Multimedia Tools and Applications

, Volume 77, Issue 10, pp 12073–12094 | Cite as

Effective and efficient similarity searching in motion capture data

  • Jan SedmidubskyEmail author
  • Petr Elias
  • Pavel Zezula


Motion capture data describe human movements in the form of spatio-temporal trajectories of skeleton joints. Intelligent management of such complex data is a challenging task for computers which requires an effective concept of motion similarity. However, evaluating the pair-wise similarity is a difficult problem as a single action can be performed by various actors in different ways, speeds or starting positions. Recent methods usually model the motion similarity by comparing customized features using distance-based functions or specialized machine-learning classifiers. By combining both these approaches, we transform the problem of comparing motions of variable sizes into the problem of comparing fixed-size vectors. Specifically, each rather-short motion is encoded into a compact visual representation from which a highly descriptive 4,096-dimensional feature vector is extracted using a fine-tuned deep convolutional neural network. The advantage is that the fixed-size features are compared by the Euclidean distance which enables efficient motion indexing by any metric-based index structure. Another advantage of the proposed approach is its tolerance towards an imprecise action segmentation, the variance in movement speed, and a lower data quality. All these properties together bring new possibilities for effective and efficient large-scale retrieval.


Motion capture data retrieval Effective similarity measure Efficient indexing k-NN query Motion image Convolutional neural network Fixed-size motion feature 



This research was supported by GBP103/12/G084.


  1. 1.
    Barnachon M, Bouakaz S, Boufama B, Guillou E (2013) A real-time system for motion retrieval and interpretation. Pattern Recogn Lett 34(15):1789–1798CrossRefGoogle Scholar
  2. 2.
    Barnachon M, Bouakaz S, Boufama B, Guillou E (2014) Ongoing human action recognition with motion capture. Pattern Recogn 47(1):238–247CrossRefGoogle Scholar
  3. 3.
    Baumann J, Wessel R, Krüger B., Weber A (2014) Action graph: a versatile data structure for action recognition. In: International conference on computer graphics theory and applications (GRAPP 2014). SCITEPRESS, pp 1–10Google Scholar
  4. 4.
    Beecks C, Hassani M, Obeloer F, Seidl T (2015) Efficient query processing in 3D motion capture databases via lower bound approximation of the gesture matching distance. In: 2015 IEEE International symposium on multimedia (ISM 2015), pp 148–153Google Scholar
  5. 5.
    Bouchard D, Badler N (2007) Semantic segmentation of motion capture using Laban movement analysis. Springer Berlin Heidelberg, Berlin Heidelberg, pp 37–44Google Scholar
  6. 6.
    Cai M, Zou B, Gao H, Song J (2014) Motion recognition for 3d human motion capture data using support vector machines with rejection determination. Multimed Tools Appl 70(2):1333–1362CrossRefGoogle Scholar
  7. 7.
    Chaudhry R, Ofli F, Kurillo G, Bajcsy R, Vidal R (2013) Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In: Computer vision and pattern recognition workshops (CVPRW 2013), pp 471–478Google Scholar
  8. 8.
    Chen X, Koskela M (2013) Classification of RGB-D and motion capture sequences using extreme learning machine. Image Anal 640–651Google Scholar
  9. 9.
    Cho K, Chen X (2013) Classifying and visualizing motion capture sequences using deep neural networks. CoRR arXiv: abs/1306.3874
  10. 10.
    Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern: Syst 43(4):996–1002CrossRefGoogle Scholar
  11. 11.
    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference in machine learning (ICML 2014), pp I–647–I–655Google Scholar
  12. 12.
    Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: International conference on computer vision and pattern recognition (CVPR 2015), pp 1110–1118Google Scholar
  13. 13.
    Elias P, Sedmidubsky J, Zezula P (2015) Motion images: an effective representation of motion capture data for similarity search. In: 8th International conference on similarity search and applications (SISAP 2015). Springer, pp 250–255Google Scholar
  14. 14.
    Huynh DQ (2009) Metrics for 3d rotations: comparison and analysis. J Math Imag Vis 35(2):155–164MathSciNetCrossRefGoogle Scholar
  15. 15.
    Ijjina E, Mohan C (2015) Human action recognition based on motion capture information using fuzzy convolution neural networks. In: 8th International conference on advances in pattern recognition (ICAPR 2015), pp 1–6Google Scholar
  16. 16.
    Kadu H, Kuo CC (2014) Automatic human mocap data classification. IEEE Trans Multimed 16(8):2191–2202CrossRefGoogle Scholar
  17. 17.
    Kapadia M, Chiang IK, Thomas T, Badler NI, Kider JT Jr (2013) Efficient motion retrieval in large motion databases. In: ACM SIGGRAPH Symposium on interactive 3D graphics and games (I3D 2013). ACM, New York, pp 19–28Google Scholar
  18. 18.
    Keogh E, Palpanas T, Zordan V B, Gunopulos D, Cardle M (2004) Indexing large human-motion databases. In: 30th International conference on very large data bases (VLDB 2004), VLDB 2004, pp 780–791. VLDB EndowmentGoogle Scholar
  19. 19.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges C, Bottou L, Weinberger K (eds) Advances in neural information processing systems 25. Curran Associates Inc, pp 1097–1105Google Scholar
  20. 20.
    Krüger B, Tautges J, Weber A, Zinke A (2010) Fast local and global similarity searches in large motion capture databases. In: ACM SIGGRAPH/Eurographics symposium on computer animation, SCA 2010. Eurographics Association, pp 1–10Google Scholar
  21. 21.
    Lan R, Sun H (2015) Automated human motion segmentation via motion regularities. Vis Comput 31(1):35–53MathSciNetCrossRefGoogle Scholar
  22. 22.
    Li M, Leung H (2016) Graph-based representation learning for automatic human motion segmentation. Multimed Tools Appl 75(15):9205–9224CrossRefGoogle Scholar
  23. 23.
    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: Computer vision and pattern recognition workshops (CVPRW 2010), pp 9–14Google Scholar
  24. 24.
    Liang Y, Lu W, Liang W, Wang Y (2014) Action recognition using local joints structure and histograms of 3d joints. In: 10th International conference on computational intelligence and security (CIS 2014), pp 185–188Google Scholar
  25. 25.
    Liu Y, Zhang X, Cui J, Wu C, Aghajan H, Zha H (2010) Visual analysis of child-adult interactive behaviors in video sequences. In: 16th International conference on virtual systems and multimedia, pp 26–33Google Scholar
  26. 26.
    Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: 21st International conference on pattern recognition (ICPR 2012), pp 898–901Google Scholar
  27. 27.
    Liu Y, Nie L, Han L, Zhang L, Rosenblum D S (2016) Action2activity: recognizing complex activities from sensor data. CoRR arXiv:abs/1611.01872, 1–7
  28. 28.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. Big data driven intelligent transportation systemsCrossRefGoogle Scholar
  29. 29.
    Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI, pp 1266–1272Google Scholar
  30. 30.
    Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2016) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 1–19Google Scholar
  31. 31.
    Milovanovic M, Minovic M, Starcevic D (2013) Walking in colors: human gait recognition using kinect and cbir. IEEE MultiMed 20(4):28–36CrossRefGoogle Scholar
  32. 32.
    Müller M, Röder T, Clausen M (2005) Efficient content-based retrieval of motion capture data. In: ACM SIGGRAPH. ACM, pp 677–685Google Scholar
  33. 33.
    Müller M, Röder T, Clausen M, Eberhardt B, Krüger B, Weber A (2007) Documentation Mocap Database HDM05. Tech. Rep. CG-2007-2 Universität BonnGoogle Scholar
  34. 34.
    Müller M, Baak A, Seidel HP (2009) Efficient and robust annotation of motion capture data. In: ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2009). ACM Press, pp 17– 26Google Scholar
  35. 35.
    Novak D, Zezula P (2014) Rank aggregation of candidate sets for efficient similarity search. In: 25th Int. Conference on database and expert systems applications (DEXA 2014), pp 42–58Google Scholar
  36. 36.
    Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley mhad: a comprehensive multimodal human action database. In: International workshop on applications of computer vision (WACV 2013), pp 53–60Google Scholar
  37. 37.
    Poppe R, Van Der Zee S, Heylen DJ, Taylor P (2014) Amab: automated measurement and analysis of body motion. Behav Res Methods 46(3):625–633Google Scholar
  38. 38.
    Presti LL, Cascia ML (2016) 3D skeleton-based human action classification: a survey. Pattern Recogn 53:130–147CrossRefGoogle Scholar
  39. 39.
    Raptis M, Kirovski D, Hoppe H (2011) Real-time classification of dance gestures from skeleton animation. In: ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2011), SCA 2011. ACM, pp 147–156Google Scholar
  40. 40.
    Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65CrossRefzbMATHGoogle Scholar
  41. 41.
    Sedmidubsky J, Valcik J, Zezula P (2013) A key-pose similarity algorithm for motion data retrieval. In: Advanced concepts for intelligent vision systems (ACIVS 2013), LNCS, vol 8192. Springer, pp 669–681Google Scholar
  42. 42.
    Sedmidubsky J, Elias P, Zezula P (2016) Similarity searching in long sequences of motion capture data. In: 9th International conference on similarity search and applications (SISAP 2016). Springer, pp 271–285Google Scholar
  43. 43.
    Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1010– 1019Google Scholar
  44. 44.
    Trajcevski G, Ding H, Scheuermann P, Tamassia R, Vaccaro D (2007) Dynamics-aware similarity of moving objects trajectories. In: 15th Annual ACM international symposium on advances in geographic information systems, GIS ’07. ACM, New York, pp 11:1–11:8Google Scholar
  45. 45.
    Valcik J, Sedmidubsky J, Zezula P (2016) Assessing similarity models for human-motion retrieval applications. Comput Anim Virt Worlds 27(5):484–500CrossRefGoogle Scholar
  46. 46.
    Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: International conference on computer vision and pattern recognition (CVPR 2014), pp 588–595Google Scholar
  47. 47.
    Vögele A, Krüger B, Klein R (2014) Efficient unsupervised temporal segmentation of human motion. In: ACM Symposium on computer animation, pp 167–176Google Scholar
  48. 48.
    Wang J Y, Lee H M (2009) Recognition of human actions using motion capture data and support vector machine. In: World Congress on software engineering (WCSE 2009), vol 1, pp 234–238Google Scholar
  49. 49.
    Wang Y, Neff M (2015) Deep signatures for indexing and retrieval in large motion databases. In: 8th ACM SIGGRAPH conference on motion in games. ACM, pp 37–45Google Scholar
  50. 50.
    Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: International conference on computer vision and pattern recognition (CVPR 2012). IEEE Computer Society, pp 1290–1297Google Scholar
  51. 51.
    Wang H, Su H, Zheng K, Sadiq S, Zhou X (2013) An effectiveness study on trajectory similarity measures. In: 24th Australasian database conference, ADC ’13. Australian Computer Society, Inc., Darlinghurst, pp 13–22Google Scholar
  52. 52.
    Wang X, Chen L, Jing J, Zheng H (2016) Human motion capture data retrieval based on semantic thumbnail. Multimed Tools Appl 75(19):11,723–11,740CrossRefGoogle Scholar
  53. 53.
    Wu S, Wang Z, Xia S (2009) Indexing and retrieval of human motion data by a hierarchical tree. In: 16th ACM Symposium on virtual reality software and technology (VRST 2009). ACM Press, New York, pp 207–214CrossRefGoogle Scholar
  54. 54.
    Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: International conference on computer vision (ICCV 2013), pp 2752–2759Google Scholar
  55. 55.
    Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach, advances in database systems, vol 32. Springer-VerlagGoogle Scholar
  56. 56.
    Zhao X, Li X, Pang C, Zhu X, Sheng Q Z (2013) Online human gesture recognition from motion data streams. In: 21st International conference on multimedia (MM 2013). ACM, pp 23–32Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Masaryk UniversityBrnoCzech Republic

Personalised recommendations