Abstract
In this paper, the problem of unsupervised human action retrieval in 3D mesh sequences is addressed. An action is composed of a mesh sequence, wherein each frame is represented by a static shape descriptor. Six state-of-the-art static descriptors are used to extract meaningful information for each sequence. Firstly, these descriptors are examined in terms of frame-to-frame similarity by means of Receiver Operating Characteristic (ROC) curves. Then, they are utilized in the action retrieval problem, where the query is an entire 3D mesh sequence. Each action is a multidimensional curve which traverses the points defined by the vectors of each descriptor. The estimation of similarity between actions is achieved by calculating the Dynamic Time Warping (DTW) distance between the corresponding curves. The retrieval performance is further examined when the Sakoe band is used to constrain the search space in DTW. The experiments concerning the action retrieval problem were carried out by using a real dataset and an artificial dataset where the proposed retrieval framework is shown to achieve high performance for both datasets.
Similar content being viewed by others
References
Gao Y, Wang M, Ji R, Wu X, Dai Q (2012) 3D object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21(9):4290–4303. doi:10.1109/TIP.2012.2199502
Gao Y, Wang M, Ji R, Wu X, Dai Q (2014) 3D object retrieval with hausdorff distance learning. IEEE Trans Ind Electron 61(4):2088–2098. doi:10.1109/TIE.2013.2262760
Huang P, Hilton A, Starck J (2010) Shape similarity for 3D video sequences of people. Int J Comput Vis 89(2-3):362–381. doi:10.1007/s11263-010-0319-9
Huang P, Tung T, Nobuhara S, Hilton A, Matsuyama T (2010) Comparison of skeleton and non-skeleton shape descriptors for 3D video. In: Proceedings of the 3DPVT International Symposium
Jiang Y-G, Li Z, Chang S-F (2011) Modeling scene and object contexts for human action retrieval with few examples. IEEE Trans Circ Syst Video Technol 21 (5):674–681. doi:10.1109/TCSVT.2011.2129870
Johnson A, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans on PAMI 21(5):433–449
Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56–65
Kasai D, Yamasaki T, Aizawa K (2009) Retrieval of time-varying mesh and motion capture data using 2D video queries based on silhouette shape descriptors, IEEE ICME, p 854–857. doi:10.1109/ICME.2009.5202629
Kilner J, Guillemaut J-Y, Hilton A (2009) 3D action matching with key-pose detection, ICCV Workshops, p 1–8
Osada R, Funkhouser T, Chazelle B, Dobkin D (2002) Shape distributions. ACM Trans Graph (TOG) 21:807–832. doi:10.1145/571647.571648
Papadakis P, Pratikakis I, Perantonis S, Theoharis T (2007) Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recog 40(9). doi:10.1016/j.patcog.2006.12.026
Papadakis P, Pratikakis I, Theoharis T, Passalis G, Perantonis S (2008) 3D object retrieval using an efficient and compact hybrid shape descriptor, Eurographics Workshop on 3D Object Retrieval. doi:10.0011/162/2657
Papadakis P, Pratikakis I, Theoharis T, Passalis G, Perantonis S (2010) PANORAMA: A 3D shape descriptor based on panoramic views for unsupervised 3D object retrieval. Int J Comput Vis 89:177–192. doi:10.1007/s11263-009-0281-6
Ratanamahatana CA, Keogh E (2004) Everything you know about dynamic time warping is wrong. Third Workshop on Mining Temporal and Sequential Data, in conjunction with the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), August 22-25, 2004-Seattle, WA
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoustics Speech Signal Process ASSP , p 43–49
Senin P (2008) Dynamic time warping algorithm review. In: University of Hawaii at Manoa, Technical report series
Sfikas K, Pratikakis I, Koutsoudis A, Savelonas M, Theoharis T (2014) Partial matching of 3D cultural heritage objects using panoramic views, multimedia tools and applications, In press. Springer. doi:10.1007/s11042-014-2069-0
Slama R, Wannous H, Daoudi M (2014) 3D human motion analysis framework for shape similarity and retrieval. Image Vision Comput 32(2):131–154. doi:10.1016/j.imavis.2013.12.011
Starck J, Hilton A (2003) Model-based multiple view reconstruction of people. In: Proceedings of the ninth international conference on computer vision, p 915–922
Starck J, Hilton A (2007) Surface capture for performance based animation. IEEE Comput Graph Appl 27(3):21–31. doi:10.1109/MCG.2007.68
Tudu B, Bhattacharyya N, Kow B, Bandyopadhyay R (2008) Comparison of multivariate normalization techniques as applied to electronic nose based pattern classification for black tea. In: Proceedings IEEE 3rd International Conference on Sensing Technology, p 254–258 . doi:10.1109/ICSENST.2008.4757108
Veinidis C, Pratikakis I, Theoharis T (2014) Querying 3D mesh sequences for human action retrieval. In: Proceedings International Conference of 3D Computer Vision (3DV14), Workshop on Dynamic Shape Measurement and Analysis, December 8-11 2014, Tokyo, Japan, p 33–40
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2006) Indexing multidimensional time-series. VLDB J 15(1):1–20. doi:10.1007/s00778-004-0144-2
Yamasaki T, Aizawa K (2007) Motion segmentation and retrieval for 3D video based on modified shape distribution. EURASIP J Appl Signal Process 2007(1):211–222. doi:10.1155/2007/59535
Yamasaki T, Aizawa K (2009) A Euclidean-geodesic shape distribution for retrieval of time-varying mesh sequences, IEEE ICME, p 846–849
Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576. doi:10.1109/LSP.2014.2310494
Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circ Syst Video Technol 24(12):2077–2089. doi:10.1109/TCSVT.2014.2335852
Acknowledgments
This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES (MIS 379516). Investing in knowledge society through the European Social Fund.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Veinidis, C., Pratikakis, I. & Theoharis, T. On the retrieval of 3D mesh sequences of human actions. Multimed Tools Appl 76, 2059–2085 (2017). https://doi.org/10.1007/s11042-015-3137-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3137-9