Abstract
Motion capture data describe human movements in the form of spatio-temporal trajectories of skeleton joints. Intelligent management of such complex data is a challenging task for computers which requires an effective concept of motion similarity. However, evaluating the pair-wise similarity is a difficult problem as a single action can be performed by various actors in different ways, speeds or starting positions. Recent methods usually model the motion similarity by comparing customized features using distance-based functions or specialized machine-learning classifiers. By combining both these approaches, we transform the problem of comparing motions of variable sizes into the problem of comparing fixed-size vectors. Specifically, each rather-short motion is encoded into a compact visual representation from which a highly descriptive 4,096-dimensional feature vector is extracted using a fine-tuned deep convolutional neural network. The advantage is that the fixed-size features are compared by the Euclidean distance which enables efficient motion indexing by any metric-based index structure. Another advantage of the proposed approach is its tolerance towards an imprecise action segmentation, the variance in movement speed, and a lower data quality. All these properties together bring new possibilities for effective and efficient large-scale retrieval.
Similar content being viewed by others
References
Barnachon M, Bouakaz S, Boufama B, Guillou E (2013) A real-time system for motion retrieval and interpretation. Pattern Recogn Lett 34(15):1789–1798
Barnachon M, Bouakaz S, Boufama B, Guillou E (2014) Ongoing human action recognition with motion capture. Pattern Recogn 47(1):238–247
Baumann J, Wessel R, Krüger B., Weber A (2014) Action graph: a versatile data structure for action recognition. In: International conference on computer graphics theory and applications (GRAPP 2014). SCITEPRESS, pp 1–10
Beecks C, Hassani M, Obeloer F, Seidl T (2015) Efficient query processing in 3D motion capture databases via lower bound approximation of the gesture matching distance. In: 2015 IEEE International symposium on multimedia (ISM 2015), pp 148–153
Bouchard D, Badler N (2007) Semantic segmentation of motion capture using Laban movement analysis. Springer Berlin Heidelberg, Berlin Heidelberg, pp 37–44
Cai M, Zou B, Gao H, Song J (2014) Motion recognition for 3d human motion capture data using support vector machines with rejection determination. Multimed Tools Appl 70(2):1333–1362
Chaudhry R, Ofli F, Kurillo G, Bajcsy R, Vidal R (2013) Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In: Computer vision and pattern recognition workshops (CVPRW 2013), pp 471–478
Chen X, Koskela M (2013) Classification of RGB-D and motion capture sequences using extreme learning machine. Image Anal 640–651
Cho K, Chen X (2013) Classifying and visualizing motion capture sequences using deep neural networks. CoRR arXiv: abs/1306.3874
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern: Syst 43(4):996–1002
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference in machine learning (ICML 2014), pp I–647–I–655
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: International conference on computer vision and pattern recognition (CVPR 2015), pp 1110–1118
Elias P, Sedmidubsky J, Zezula P (2015) Motion images: an effective representation of motion capture data for similarity search. In: 8th International conference on similarity search and applications (SISAP 2015). Springer, pp 250–255
Huynh DQ (2009) Metrics for 3d rotations: comparison and analysis. J Math Imag Vis 35(2):155–164
Ijjina E, Mohan C (2015) Human action recognition based on motion capture information using fuzzy convolution neural networks. In: 8th International conference on advances in pattern recognition (ICAPR 2015), pp 1–6
Kadu H, Kuo CC (2014) Automatic human mocap data classification. IEEE Trans Multimed 16(8):2191–2202
Kapadia M, Chiang IK, Thomas T, Badler NI, Kider JT Jr (2013) Efficient motion retrieval in large motion databases. In: ACM SIGGRAPH Symposium on interactive 3D graphics and games (I3D 2013). ACM, New York, pp 19–28
Keogh E, Palpanas T, Zordan V B, Gunopulos D, Cardle M (2004) Indexing large human-motion databases. In: 30th International conference on very large data bases (VLDB 2004), VLDB 2004, pp 780–791. VLDB Endowment
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges C, Bottou L, Weinberger K (eds) Advances in neural information processing systems 25. Curran Associates Inc, pp 1097–1105
Krüger B, Tautges J, Weber A, Zinke A (2010) Fast local and global similarity searches in large motion capture databases. In: ACM SIGGRAPH/Eurographics symposium on computer animation, SCA 2010. Eurographics Association, pp 1–10
Lan R, Sun H (2015) Automated human motion segmentation via motion regularities. Vis Comput 31(1):35–53
Li M, Leung H (2016) Graph-based representation learning for automatic human motion segmentation. Multimed Tools Appl 75(15):9205–9224
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: Computer vision and pattern recognition workshops (CVPRW 2010), pp 9–14
Liang Y, Lu W, Liang W, Wang Y (2014) Action recognition using local joints structure and histograms of 3d joints. In: 10th International conference on computational intelligence and security (CIS 2014), pp 185–188
Liu Y, Zhang X, Cui J, Wu C, Aghajan H, Zha H (2010) Visual analysis of child-adult interactive behaviors in video sequences. In: 16th International conference on virtual systems and multimedia, pp 26–33
Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: 21st International conference on pattern recognition (ICPR 2012), pp 898–901
Liu Y, Nie L, Han L, Zhang L, Rosenblum D S (2016) Action2activity: recognizing complex activities from sensor data. CoRR arXiv:abs/1611.01872, 1–7
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. Big data driven intelligent transportation systems
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI, pp 1266–1272
Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2016) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 1–19
Milovanovic M, Minovic M, Starcevic D (2013) Walking in colors: human gait recognition using kinect and cbir. IEEE MultiMed 20(4):28–36
Müller M, Röder T, Clausen M (2005) Efficient content-based retrieval of motion capture data. In: ACM SIGGRAPH. ACM, pp 677–685
Müller M, Röder T, Clausen M, Eberhardt B, Krüger B, Weber A (2007) Documentation Mocap Database HDM05. Tech. Rep. CG-2007-2 Universität Bonn
Müller M, Baak A, Seidel HP (2009) Efficient and robust annotation of motion capture data. In: ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2009). ACM Press, pp 17– 26
Novak D, Zezula P (2014) Rank aggregation of candidate sets for efficient similarity search. In: 25th Int. Conference on database and expert systems applications (DEXA 2014), pp 42–58
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley mhad: a comprehensive multimodal human action database. In: International workshop on applications of computer vision (WACV 2013), pp 53–60
Poppe R, Van Der Zee S, Heylen DJ, Taylor P (2014) Amab: automated measurement and analysis of body motion. Behav Res Methods 46(3):625–633
Presti LL, Cascia ML (2016) 3D skeleton-based human action classification: a survey. Pattern Recogn 53:130–147
Raptis M, Kirovski D, Hoppe H (2011) Real-time classification of dance gestures from skeleton animation. In: ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2011), SCA 2011. ACM, pp 147–156
Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65
Sedmidubsky J, Valcik J, Zezula P (2013) A key-pose similarity algorithm for motion data retrieval. In: Advanced concepts for intelligent vision systems (ACIVS 2013), LNCS, vol 8192. Springer, pp 669–681
Sedmidubsky J, Elias P, Zezula P (2016) Similarity searching in long sequences of motion capture data. In: 9th International conference on similarity search and applications (SISAP 2016). Springer, pp 271–285
Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1010– 1019
Trajcevski G, Ding H, Scheuermann P, Tamassia R, Vaccaro D (2007) Dynamics-aware similarity of moving objects trajectories. In: 15th Annual ACM international symposium on advances in geographic information systems, GIS ’07. ACM, New York, pp 11:1–11:8
Valcik J, Sedmidubsky J, Zezula P (2016) Assessing similarity models for human-motion retrieval applications. Comput Anim Virt Worlds 27(5):484–500
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: International conference on computer vision and pattern recognition (CVPR 2014), pp 588–595
Vögele A, Krüger B, Klein R (2014) Efficient unsupervised temporal segmentation of human motion. In: ACM Symposium on computer animation, pp 167–176
Wang J Y, Lee H M (2009) Recognition of human actions using motion capture data and support vector machine. In: World Congress on software engineering (WCSE 2009), vol 1, pp 234–238
Wang Y, Neff M (2015) Deep signatures for indexing and retrieval in large motion databases. In: 8th ACM SIGGRAPH conference on motion in games. ACM, pp 37–45
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: International conference on computer vision and pattern recognition (CVPR 2012). IEEE Computer Society, pp 1290–1297
Wang H, Su H, Zheng K, Sadiq S, Zhou X (2013) An effectiveness study on trajectory similarity measures. In: 24th Australasian database conference, ADC ’13. Australian Computer Society, Inc., Darlinghurst, pp 13–22
Wang X, Chen L, Jing J, Zheng H (2016) Human motion capture data retrieval based on semantic thumbnail. Multimed Tools Appl 75(19):11,723–11,740
Wu S, Wang Z, Xia S (2009) Indexing and retrieval of human motion data by a hierarchical tree. In: 16th ACM Symposium on virtual reality software and technology (VRST 2009). ACM Press, New York, pp 207–214
Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: International conference on computer vision (ICCV 2013), pp 2752–2759
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach, advances in database systems, vol 32. Springer-Verlag
Zhao X, Li X, Pang C, Zhu X, Sheng Q Z (2013) Online human gesture recognition from motion data streams. In: 21st International conference on multimedia (MM 2013). ACM, pp 23–32
Acknowledgements
This research was supported by GBP103/12/G084.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sedmidubsky, J., Elias, P. & Zezula, P. Effective and efficient similarity searching in motion capture data. Multimed Tools Appl 77, 12073–12094 (2018). https://doi.org/10.1007/s11042-017-4859-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4859-7