Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 3, pp 2789–2814 | Cite as

Unsupervised human action retrieval using salient points in 3D mesh sequences

  • Christos VeinidisEmail author
  • Ioannis Pratikakis
  • Theoharis Theoharis
Article
  • 203 Downloads

Abstract

The problem of human action retrieval based on the representation of the human body as a 3D mesh is addressed. The proposed 3D mesh sequence descriptor is based on a set of trajectories of salient points of the human body: its centroid and its five protrusion ends. The extracted descriptor of the corresponding trajectories incorporates a set of significant features of human motion, such as velocity, total displacement from the initial position and direction. As distance measure, a variation of the Dynamic Time Warping (DTW) algorithm, combined with a kmeans based method for multiple distance matrix fusion, is applied. The proposed method is fully unsupervised. Experimental evaluation has been performed on two artificial datasets, one of which is being made publicly available by the authors. The experimentation on these datasets shows that the proposed scheme achieves retrieval performance beyond the state of the art.

Keywords

Human action retrieval 3D mesh sequences Sequence descriptor Dynamic Time Warping 

References

  1. 1.
    Evangelidis G, Singh G, Horaud R (2014) Skeletal quads: human action recognition using joint quadruples. In: Proceedings of the IEEE international conference on pattern recognition, pp 1–6Google Scholar
  2. 2.
    Gao Y, Wang M, Ji R, Wu X, Dai Q (2012) 3D object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21(9):4290–4303.  https://doi.org/10.1109/TIP.2012.2199502 MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Gao Y, Wang M, Ji R, Wu X, Dai Q (2014) 3D object retrieval with Hausdorff distance learning. IEEE Trans Ind Electron 61(4):2088–2098.  https://doi.org/10.1109/TIE.2013.2262760 CrossRefGoogle Scholar
  4. 4.
    Gkalelis N, Kim H, Hilton A, Nikolaidis N, Pitas I (2009) The i3Dpost multi-view and 3D human action/interaction. In: Proceedings of CVMP, pp 159–168Google Scholar
  5. 5.
    Holte MB, Moeslund TB, Fihl P (2010) View-invariant gesture recognition using 3D optical flow and harmonic motion context. Comput Vis Image Underst 114(12):1353–1361.  https://doi.org/10.1016/635j.cviu.2010.07.012 CrossRefGoogle Scholar
  6. 6.
    Holte M, Moeslund T, Nikolaidis N, Pitas I (2011) A3D human action recognition for multi-view camera systems. In: Proceedings of the 3DIMPVTGoogle Scholar
  7. 7.
    Huang P, Hilton A, Starck J (2010) Shape similarity for 3D video sequences of people. Int J Comput Vis 89(2–3):362–381CrossRefGoogle Scholar
  8. 8.
    Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psychophys 14(2):201–211CrossRefGoogle Scholar
  9. 9.
    Johnson A, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans PAMI 21(5):433–449CrossRefGoogle Scholar
  10. 10.
    Kasai D, Yamasaki T, Aizawa K (2009) Retrieval of time-varying mesh and motion capture data using 2D video queries based on silhouette shape descriptors. In: IEEE ICME, pp 854–857,  https://doi.org/10.1109/ICME.2009.5202629
  11. 11.
    Kelgeorgiadis K, Nikolaidis N (2014) Human action recognition in 3D motion sequences. In: Proceedings of the 22nd European signal processing conference (EUSIPCO). [IEEE]Google Scholar
  12. 12.
    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proceedings of computer vision and pattern recognition workshops (CVPRW). IEEE, San Francisco, pp 9–14,  https://doi.org/10.1109/CVPRW.2010.5543273
  13. 13.
    Masood SZ, Ellis C, Tappen MF, LaViola JJ, Sukthankar R (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vis 101(3):420–436.  https://doi.org/10.1007/s11263-012-0550-7 CrossRefGoogle Scholar
  14. 14.
    Matikainen P, Hebert M, Sukthankar R (2010) Representing pairwise spatial and temporal relations for action recognition. In: ECCV, pp 508–521Google Scholar
  15. 15.
    Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2014) Sequence of the most informative joints (smij). J Vis Commun Image Represent 25(1):24–38.  https://doi.org/10.1016/j.jvcir.2013.04.007 CrossRefGoogle Scholar
  16. 16.
    Osada R, Funkhouser T, Chazelle B, Dobkin D (2002) Shape distributions. ACM Trans Graph (TOG) 21:807–832.  https://doi.org/10.1145/571647.571648 CrossRefzbMATHGoogle Scholar
  17. 17.
    Papadakis P, Pratikakis I, Theoharis T, Passalis G, Perantonis S (2008) 3D object retrieval using an efficient and compact hybrid shape descriptor. In: Eurographics 2008 workshop on 3D object retrieval,  https://doi.org/10.2312/3DOR/3DOR08/009-016
  18. 18.
    Papadakis P, Pratikakis I, Theoharis T, Passalis G, Perantonis S (2010) PANORAMA: A 3D shape descriptor based on panoramic views for unsupervised 3D object retrieval. Int J Comput Vis 89:177–192CrossRefGoogle Scholar
  19. 19.
    Presti LL, Cascia ML (2016) 3D skeleton-based human action classification: a survey. Pattern Recogn 53:130–147.  https://doi.org/10.1016/j.patcog.2015.11.019 CrossRefGoogle Scholar
  20. 20.
    Qiao R, Liu L, Shen C, van den Hengel A (2017) Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition. Pattern Recogn 66:202–212.  https://doi.org/10.1016/j.patcog.2017.01.015 CrossRefGoogle Scholar
  21. 21.
    Sfikas K, Pratikakis I, Koutsoudis A, Savelonas M, Theoharis T (2014) Partial matching of 3D cultural heritage objects using panoramic views, multimedia tools and applications, in press. Springer.  https://doi.org/10.1007/s11042-014-2069-0
  22. 22.
    Shahroudy A, Wang G, Ng T-T, Yang Q (2016) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129.  https://doi.org/10.1109/TPAMI.2015.2505295 CrossRefGoogle Scholar
  23. 23.
    Shilane P, Min P, Kazhdan M, Funkhouser T (2004) The princeton shape benchmark. In: Shape modeling international, pp 167–178Google Scholar
  24. 24.
    Slama R, Wannous H, Daoudi M (2014) 3D human motion analysis framework for shape similarity and retrieval. Image Vis Comput 32(2):131–154CrossRefGoogle Scholar
  25. 25.
    Starck J, Aizawa K (2003) Model-based multiple view reconstruction of people. In: Proceedings of the ninth international conference on computer vision, pp 915–922Google Scholar
  26. 26.
    Starck J, Hilton A (2007) Surface capture for performance based animation. IEEE Comput Graph Appl 27(3):21–31CrossRefGoogle Scholar
  27. 27.
    Veinidis C, Pratikakis I, Theoharis T (2017) On the retrieval of 3D mesh sequences of human actions. Multimed Tools Appl 76(2):2059–2085CrossRefGoogle Scholar
  28. 28.
    Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE conference on computer vision and pattern recognition, pp 588–595Google Scholar
  29. 29.
    Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2006) Indexing multidimensional time-series. VLDB J 15(1):1–20.  https://doi.org/10.1007/s00778-004-0144-2 CrossRefGoogle Scholar
  30. 30.
    Wang H, Klaser A, Schmid C, Liu C (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79.  https://doi.org/10.1007/s11263-012-0594-8 MathSciNetCrossRefGoogle Scholar
  31. 31.
    Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. TPAMI 36(5):914–927CrossRefGoogle Scholar
  32. 32.
    Weinland D, Boyer E (2008) Action recognition using exemplar-based embedding. In: IEEE conference on computer vision and pattern recognition, CVPR, pp 1–7Google Scholar
  33. 33.
    Yamasaki T, Aizawa K (2007) Motion segmentation and retrieval for 3D video based on modified shape distribution. EURASIP J Appl Signal Process 2007(1):211–222.  https://doi.org/10.1155/2007/59535 CrossRefzbMATHGoogle Scholar
  34. 34.
    Yamasaki T, Aizawa K (2009) A euclidean-geodesic shape distribution for retrieval of time-varying mesh sequences. In: IEEE ICME, pp 846–849Google Scholar
  35. 35.
    Yang X, Tian Y (2014) Effective 3D action recognition using eigenjoints. J Vis Commun Image Represent 25(1):2–11MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Christos Veinidis
    • 1
    Email author
  • Ioannis Pratikakis
    • 1
  • Theoharis Theoharis
    • 2
    • 3
  1. 1.Department of Electrical and Computer EngineeringDemocritus University of ThraceXanthiGreece
  2. 2.Computer Graphics Laboratory, Department of Informatics and TelecommunicationsUniversity of AthensAthensGreece
  3. 3.IDINorwegian University of Science and Technology (NTNU)TrondheimNorway

Personalised recommendations