International Journal of Computer Vision

, Volume 89, Issue 2–3, pp 362–381 | Cite as

Shape Similarity for 3D Video Sequences of People

  • Peng HuangEmail author
  • Adrian Hilton
  • Jonathan Starck


This paper presents a performance evaluation of shape similarity metrics for 3D video sequences of people with unknown temporal correspondence. Performance of similarity measures is compared by evaluating Receiver Operator Characteristics for classification against ground-truth for a comprehensive database of synthetic 3D video sequences comprising animations of fourteen people performing twenty-eight motions. Static shape similarity metrics shape distribution, spin image, shape histogram and spherical harmonics are evaluated using optimal parameter settings for each approach. Shape histograms with volume sampling are found to consistently give the best performance for different people and motions. Static shape similarity is extended over time to eliminate the temporal ambiguity. Time-filtering of the static shape similarity together with two novel shape-flow descriptors are evaluated against temporal ground-truth. This evaluation demonstrates that shape-flow with a multi-frame alignment of motion sequences achieves the best performance, is stable for different people and motions, and overcome the ambiguity in static shape similarity. Time-filtering of the static shape histogram similarity measure with a fixed window size achieves marginally lower performance for linear motions with the same computational cost as static shape descriptors. Performance of the temporal shape descriptors is validated for real 3D video sequence of nine actors performing a variety of movements. Time-filtered shape histograms are shown to reliably identify frames from 3D video sequences with similar shape and motion for people with loose clothing and complex motion.


Temporal shape similarity 3D video Surface motion capture Human motion 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., & Thrun, S. (2008). Performance capture from sparse multi-view video. ACM Transactions on Graphics, 27(3), 1–10. CrossRefGoogle Scholar
  2. Ankerst, M., Kastenmüller, G., Kriegel, H. P., & Seidl, T. (1999). 3D shape histograms for similarity search and classification in spatial databases. In SSD ’99: proceedings of the 6th international symposium on advances in spatial databases (pp. 207–226). London: Springer. Google Scholar
  3. Arikan, O., Forsyth, D. A., & O’Brien, J. F. (2003). Motion synthesis from annotations. ACM Transactions on Graphics, 22(3), 402–408. CrossRefGoogle Scholar
  4. Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(4), 509–522. CrossRefGoogle Scholar
  5. Bobick, A. F., & Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis & Machine Intelligence, 23(3), 257–267. CrossRefGoogle Scholar
  6. Bustos, B., Keim, D., Saupe, D., & Schreck, T. (2007). Content-based 3D object retrieval. Computer Graphics and Applications. IEEE, 27(4), 22–27. Google Scholar
  7. Carranza, J., Theobalt, C., Magnor, M. A., & Seidel, H. P. (2003). Free-viewpoint video of human actors. ACM Transactions on Graphics, 22(3), 569–577. CrossRefGoogle Scholar
  8. Chen, D. Y., Ouhyoung, M., Tian, X. P., & Shen, Y. T. (2003). On visual similarity based 3D model retrieval. Computer Graphics Forum (EUROGRAPHICS’03), 22(3), 223–232. CrossRefGoogle Scholar
  9. Chua, C. S., & Jarvis, R. (1997). Point signatures: a new representation for 3D object recognition. International Journal of Computer Vision, 25(1), 63–85. CrossRefGoogle Scholar
  10. Corney, J., Rea, H., Clark, D., Pritchard, J., Breaks, M., & Macleod, R. (2002). Coarse filters for shape matching. Computer Graphics and Applications, IEEE, 22(3), 65–74. CrossRefGoogle Scholar
  11. Cutler, R., & Davis, L. S. (2000). Robust real-time periodic motion detection, analysis, and applications. IEEE Transactions on Pattern Analysis & Machine Intelligence, 22(8), 781–796. CrossRefGoogle Scholar
  12. Del Bimbo, A., & Pala, P. (2006). Content-based retrieval of 3D models. ACM Transactions on Multimedia Computing, Communications, and Applications, 2(1), 20–43. CrossRefGoogle Scholar
  13. Efros, A. A., Berg, A. C., Mori, G., & Malik, J. (2003). Recognizing action at a distance. In ICCV ’03: Proceedings of the ninth IEEE international conference on computer vision. Washington: IEEE Computer Society. Google Scholar
  14. El-Mehalawi, M. (2003). A database system of mechanical components based on geometric and topological similarity. part ii: indexing, retrieval, matching, and similarity assessment. Computer-Aided Design, 35(1), 95–105. CrossRefGoogle Scholar
  15. Elad, A., & Kimmel, R. (2003). On bending invariant signatures for surfaces. IEEE Transactions on Pattern Analysis & Machine Intelligence, 25(10), 1285–1295. CrossRefGoogle Scholar
  16. Gleicher, M., Joon, H., Lucas, S., & Jepsen, K. A. (2003). Snap-together motion: assembling run-time animation. ACM Transactions on Graphics, 22, 181–188. CrossRefGoogle Scholar
  17. Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis & Machine Intelligence, 29(12), 2247–2253. CrossRefGoogle Scholar
  18. Hilaga, M., Shinagawa, Y., Kohmura, T., & Kunii, T. L. (2001). Topology matching for fully automatic similarity estimation of 3D shapes. In SIGGRAPH ’01: Proceedings of the 28th annual conference on computer graphics and interactive techniques (pp. 203–212). New York: ACM Press. CrossRefGoogle Scholar
  19. Huang, P., & Hilton, A. (2009). Human motion synthesis from 3D video. In Proceedings of the 2009 conference on computer vision and pattern recognition (CVPR’09) (pp. 1478–1485). Google Scholar
  20. Huang, P., Starck, J., & Hilton, A. (2007a). A study of shape similarity for temporal surface sequences of people. In 3DIM ’07: Proceedings of the sixth international conference on 3D digital imaging and modeling (pp. 408–418). Washington: IEEE Computer Society. CrossRefGoogle Scholar
  21. Huang, P., Starck, J., & Hilton, A. (2007b). Temporal 3D shape matching. In The fourth European conference on visual media production (CVMP’07) (pp. 1–10). Google Scholar
  22. Iyer, N., Jayanti, S., Lou, K., Kalyanaraman, Y., & Ramani, K. (2005). Three-dimensional shape searching: state-of-the-art review and future trends. Computer-Aided Design, 37(5), 509–530. CrossRefGoogle Scholar
  23. Jain, V., & Zhang, H. (2007). A spectral approach to shape-based retrieval of articulated 3D models. Computer-Aided Design, 39(5), 398–407. CrossRefGoogle Scholar
  24. Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis & Machine Intelligence, 21(5), 433–449. CrossRefGoogle Scholar
  25. Kanade, T., Rander, P., & Narayanan, P. J. (1997). Virtualized reality: Constructing virtual worlds from real scenes. IEEE MultiMedia, 4(1), 34–47. CrossRefGoogle Scholar
  26. Kazhdan, M., Chazelle, B., Dobkin, D. P., Finkelstein, A., & Funkhouser, T. A. (2002). A reflective symmetry descriptor. In ECCV (Vol. 2, pp. 642–656). Google Scholar
  27. Kazhdan, M., Funkhouser, T., & Rusinkiewicz, S. (2003). Rotation invariant spherical harmonic representation of 3D shape descriptors. In SGP ’03: Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on geometry processing (pp. 156–164). Google Scholar
  28. Körtgen, M., Park, G. J., Novotni, M., & Klein, R. (2003). 3D shape matching with 3D shape contexts. In The 7th central European seminar on computer graphics. Google Scholar
  29. Kovar, L., Gleicher, M., & Pighin, F. (2002). Motion graphs. In SIGGRAPH ’02: Proceedings of the 29th annual conference on computer graphics and interactive techniques (Vol. 21, pp. 473–482). New York: ACM Press. CrossRefGoogle Scholar
  30. Krüger, V., Kragic, D., Ude, A., & Geib, C. (2007). The meaning of action: A review on action recognition and mapping. Advanced Robotics, 21(13), 1473–1501. Google Scholar
  31. Lee, J., Chai, J., Reitsma, P. S. A., Hodgins, J. K., & Pollard, N. S. (2002). Interactive control of avatars animated with human motion data. ACM Transactions on Graphics, 21(3), 491–500. Google Scholar
  32. Mcwherter, D., Peabody, M., Regli, W. C., & Shokoufandeh, A. (2001). Solid model databases: Techniques and empirical results. Journal of Computing and Information Science in Engineering, 1(4), 300–310. CrossRefGoogle Scholar
  33. Novotni, M., & Klein, R. (2003). 3D Zernike descriptors for content based shape retrieval. In SM ’03: Proceedings of the eighth ACM symposium on solid modeling and applications (pp. 216–225). New York: ACM Press. CrossRefGoogle Scholar
  34. Ohbuchi, R., Minamitani, T., & Takei, T. (2003). Shape-similarity search of 3D models by using enhanced shape functions. In Theory and practice of computer graphics, 2003 proceedings (pp. 97–104). Google Scholar
  35. Osada, R., Funkhouser, T., Chazelle, B., & Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics, 21(4), 807–832. CrossRefGoogle Scholar
  36. Paquet, E. (2000). Description of shape information for 2D and 3D objects. Signal Processing: Image Communication, 16, 103–122. CrossRefGoogle Scholar
  37. Schödl, A., Szeliski, R., Salesin, D. H., & Essa, I. (2000). Video textures. In SIGGRAPH ’00: Proceedings of the 27th annual conference on computer graphics and interactive techniques (pp. 489–498). New York: ACM Press/Addison-Wesley. CrossRefGoogle Scholar
  38. Shum, H. Y., Hebert, M., & Ikeuchi, K. (1996). On 3D shape similarity. In Proceedings of the 1996 conference on computer vision and pattern recognition (CVPR ’96) (pp. 526–531). Google Scholar
  39. Starck, J., & Hilton, A. (2003). Model-based multiple view reconstruction of people. In ICCV ’03: Proceedings of the ninth international conference on computer vision (pp. 915–922). Google Scholar
  40. Starck, J., & Hilton, A. (2007). Surface capture for performance-based animation. IEEE Computer Graphics and Applications, 27(3), 21–31. CrossRefGoogle Scholar
  41. Starck, J., Miller, G., & Hilton, A. (2005). Video-based character animation. In SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 49–58). New York: ACM Press. CrossRefGoogle Scholar
  42. Sundar, H., Silver, D., Gagvani, N., & Dickinson, S. (2003). Skeleton based shape matching and retrieval. In SMI ’03: Proceedings of the shape modeling international 2003 (p. 130). Google Scholar
  43. Tangelder, J. W. H., & Veltkamp, R. C. (2004). A survey of content based 3D shape retrieval methods. In SMI ’04: Proceedings of the shape modeling international 2004 (pp. 145–156). Washington: IEEE Computer Society. CrossRefGoogle Scholar
  44. Theobalt, C., Ahmed, N., Lensch, H., Magnor, M., & Seidel, H. P. (2007). Seeing people in different light-joint shape, motion, and reflectance capture. IEEE Transactions on Visualization and Computer Graphics, 13(4), 663–674. CrossRefGoogle Scholar
  45. Vlasic, D., Baran, I., Matusik, W., & Popović, J. (2008). Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics, 27(3), 1–9. CrossRefGoogle Scholar
  46. Weinland, D., Ronfard, R., & Boyer, E. (2006). Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2), 249–257. CrossRefGoogle Scholar
  47. Xu, J., Yamasaki, T., & Aizawa, K. (2006). Motion editing in 3D video database. In 3DPVT ’06: Proceedings of the third international symposium on 3D data processing, visualization, and transmission (pp. 472–479). Washington: IEEE Computer Society. CrossRefGoogle Scholar
  48. Zaharia, T., & Preteux, F. (2001). Three-dimensional shape-based retrieval within the mpeg-7 framework. In Proceedings SPIE conference on nonlinear image processing and pattern analysis XII (Vol. 4304, pp. 133–145). Google Scholar
  49. Zhang, C., & Chen, T. (2001). Efficient feature extraction for 2D/3D objects in mesh representation. In Image processing, 2001 proceedings 2001 international conference (Vol. 3, pp. 935–938). Google Scholar
  50. Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., & Szeliski, R. (2004). High-quality video view interpolation using a layered representation. ACM Transactions on Graphics, 23(3), 600–608. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Centre for Vision, Speech and Signal Processing (CVSSP)University of SurreyGuildfordUK

Personalised recommendations