Skip to main content
Log in

Shape Similarity for 3D Video Sequences of People

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents a performance evaluation of shape similarity metrics for 3D video sequences of people with unknown temporal correspondence. Performance of similarity measures is compared by evaluating Receiver Operator Characteristics for classification against ground-truth for a comprehensive database of synthetic 3D video sequences comprising animations of fourteen people performing twenty-eight motions. Static shape similarity metrics shape distribution, spin image, shape histogram and spherical harmonics are evaluated using optimal parameter settings for each approach. Shape histograms with volume sampling are found to consistently give the best performance for different people and motions. Static shape similarity is extended over time to eliminate the temporal ambiguity. Time-filtering of the static shape similarity together with two novel shape-flow descriptors are evaluated against temporal ground-truth. This evaluation demonstrates that shape-flow with a multi-frame alignment of motion sequences achieves the best performance, is stable for different people and motions, and overcome the ambiguity in static shape similarity. Time-filtering of the static shape histogram similarity measure with a fixed window size achieves marginally lower performance for linear motions with the same computational cost as static shape descriptors. Performance of the temporal shape descriptors is validated for real 3D video sequence of nine actors performing a variety of movements. Time-filtered shape histograms are shown to reliably identify frames from 3D video sequences with similar shape and motion for people with loose clothing and complex motion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., & Thrun, S. (2008). Performance capture from sparse multi-view video. ACM Transactions on Graphics, 27(3), 1–10.

    Article  Google Scholar 

  • Ankerst, M., Kastenmüller, G., Kriegel, H. P., & Seidl, T. (1999). 3D shape histograms for similarity search and classification in spatial databases. In SSD ’99: proceedings of the 6th international symposium on advances in spatial databases (pp. 207–226). London: Springer.

    Google Scholar 

  • Arikan, O., Forsyth, D. A., & O’Brien, J. F. (2003). Motion synthesis from annotations. ACM Transactions on Graphics, 22(3), 402–408.

    Article  Google Scholar 

  • Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(4), 509–522.

    Article  Google Scholar 

  • Bobick, A. F., & Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis & Machine Intelligence, 23(3), 257–267.

    Article  Google Scholar 

  • Bustos, B., Keim, D., Saupe, D., & Schreck, T. (2007). Content-based 3D object retrieval. Computer Graphics and Applications. IEEE, 27(4), 22–27.

    Google Scholar 

  • Carranza, J., Theobalt, C., Magnor, M. A., & Seidel, H. P. (2003). Free-viewpoint video of human actors. ACM Transactions on Graphics, 22(3), 569–577.

    Article  Google Scholar 

  • Chen, D. Y., Ouhyoung, M., Tian, X. P., & Shen, Y. T. (2003). On visual similarity based 3D model retrieval. Computer Graphics Forum (EUROGRAPHICS’03), 22(3), 223–232.

    Article  Google Scholar 

  • Chua, C. S., & Jarvis, R. (1997). Point signatures: a new representation for 3D object recognition. International Journal of Computer Vision, 25(1), 63–85.

    Article  Google Scholar 

  • Corney, J., Rea, H., Clark, D., Pritchard, J., Breaks, M., & Macleod, R. (2002). Coarse filters for shape matching. Computer Graphics and Applications, IEEE, 22(3), 65–74.

    Article  Google Scholar 

  • Cutler, R., & Davis, L. S. (2000). Robust real-time periodic motion detection, analysis, and applications. IEEE Transactions on Pattern Analysis & Machine Intelligence, 22(8), 781–796.

    Article  Google Scholar 

  • Del Bimbo, A., & Pala, P. (2006). Content-based retrieval of 3D models. ACM Transactions on Multimedia Computing, Communications, and Applications, 2(1), 20–43.

    Article  Google Scholar 

  • Efros, A. A., Berg, A. C., Mori, G., & Malik, J. (2003). Recognizing action at a distance. In ICCV ’03: Proceedings of the ninth IEEE international conference on computer vision. Washington: IEEE Computer Society.

    Google Scholar 

  • El-Mehalawi, M. (2003). A database system of mechanical components based on geometric and topological similarity. part ii: indexing, retrieval, matching, and similarity assessment. Computer-Aided Design, 35(1), 95–105.

    Article  Google Scholar 

  • Elad, A., & Kimmel, R. (2003). On bending invariant signatures for surfaces. IEEE Transactions on Pattern Analysis & Machine Intelligence, 25(10), 1285–1295.

    Article  Google Scholar 

  • Gleicher, M., Joon, H., Lucas, S., & Jepsen, K. A. (2003). Snap-together motion: assembling run-time animation. ACM Transactions on Graphics, 22, 181–188.

    Article  Google Scholar 

  • Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis & Machine Intelligence, 29(12), 2247–2253.

    Article  Google Scholar 

  • Hilaga, M., Shinagawa, Y., Kohmura, T., & Kunii, T. L. (2001). Topology matching for fully automatic similarity estimation of 3D shapes. In SIGGRAPH ’01: Proceedings of the 28th annual conference on computer graphics and interactive techniques (pp. 203–212). New York: ACM Press.

    Chapter  Google Scholar 

  • Huang, P., & Hilton, A. (2009). Human motion synthesis from 3D video. In Proceedings of the 2009 conference on computer vision and pattern recognition (CVPR’09) (pp. 1478–1485).

  • Huang, P., Starck, J., & Hilton, A. (2007a). A study of shape similarity for temporal surface sequences of people. In 3DIM ’07: Proceedings of the sixth international conference on 3D digital imaging and modeling (pp. 408–418). Washington: IEEE Computer Society.

    Chapter  Google Scholar 

  • Huang, P., Starck, J., & Hilton, A. (2007b). Temporal 3D shape matching. In The fourth European conference on visual media production (CVMP’07) (pp. 1–10).

  • Iyer, N., Jayanti, S., Lou, K., Kalyanaraman, Y., & Ramani, K. (2005). Three-dimensional shape searching: state-of-the-art review and future trends. Computer-Aided Design, 37(5), 509–530.

    Article  Google Scholar 

  • Jain, V., & Zhang, H. (2007). A spectral approach to shape-based retrieval of articulated 3D models. Computer-Aided Design, 39(5), 398–407.

    Article  Google Scholar 

  • Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis & Machine Intelligence, 21(5), 433–449.

    Article  Google Scholar 

  • Kanade, T., Rander, P., & Narayanan, P. J. (1997). Virtualized reality: Constructing virtual worlds from real scenes. IEEE MultiMedia, 4(1), 34–47.

    Article  Google Scholar 

  • Kazhdan, M., Chazelle, B., Dobkin, D. P., Finkelstein, A., & Funkhouser, T. A. (2002). A reflective symmetry descriptor. In ECCV (Vol. 2, pp. 642–656).

  • Kazhdan, M., Funkhouser, T., & Rusinkiewicz, S. (2003). Rotation invariant spherical harmonic representation of 3D shape descriptors. In SGP ’03: Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on geometry processing (pp. 156–164).

  • Körtgen, M., Park, G. J., Novotni, M., & Klein, R. (2003). 3D shape matching with 3D shape contexts. In The 7th central European seminar on computer graphics.

  • Kovar, L., Gleicher, M., & Pighin, F. (2002). Motion graphs. In SIGGRAPH ’02: Proceedings of the 29th annual conference on computer graphics and interactive techniques (Vol. 21, pp. 473–482). New York: ACM Press.

    Chapter  Google Scholar 

  • Krüger, V., Kragic, D., Ude, A., & Geib, C. (2007). The meaning of action: A review on action recognition and mapping. Advanced Robotics, 21(13), 1473–1501.

    Google Scholar 

  • Lee, J., Chai, J., Reitsma, P. S. A., Hodgins, J. K., & Pollard, N. S. (2002). Interactive control of avatars animated with human motion data. ACM Transactions on Graphics, 21(3), 491–500.

    Google Scholar 

  • Mcwherter, D., Peabody, M., Regli, W. C., & Shokoufandeh, A. (2001). Solid model databases: Techniques and empirical results. Journal of Computing and Information Science in Engineering, 1(4), 300–310.

    Article  Google Scholar 

  • Novotni, M., & Klein, R. (2003). 3D Zernike descriptors for content based shape retrieval. In SM ’03: Proceedings of the eighth ACM symposium on solid modeling and applications (pp. 216–225). New York: ACM Press.

    Chapter  Google Scholar 

  • Ohbuchi, R., Minamitani, T., & Takei, T. (2003). Shape-similarity search of 3D models by using enhanced shape functions. In Theory and practice of computer graphics, 2003 proceedings (pp. 97–104).

  • Osada, R., Funkhouser, T., Chazelle, B., & Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics, 21(4), 807–832.

    Article  Google Scholar 

  • Paquet, E. (2000). Description of shape information for 2D and 3D objects. Signal Processing: Image Communication, 16, 103–122.

    Article  Google Scholar 

  • Schödl, A., Szeliski, R., Salesin, D. H., & Essa, I. (2000). Video textures. In SIGGRAPH ’00: Proceedings of the 27th annual conference on computer graphics and interactive techniques (pp. 489–498). New York: ACM Press/Addison-Wesley.

    Chapter  Google Scholar 

  • Shum, H. Y., Hebert, M., & Ikeuchi, K. (1996). On 3D shape similarity. In Proceedings of the 1996 conference on computer vision and pattern recognition (CVPR ’96) (pp. 526–531).

  • Starck, J., & Hilton, A. (2003). Model-based multiple view reconstruction of people. In ICCV ’03: Proceedings of the ninth international conference on computer vision (pp. 915–922).

  • Starck, J., & Hilton, A. (2007). Surface capture for performance-based animation. IEEE Computer Graphics and Applications, 27(3), 21–31.

    Article  Google Scholar 

  • Starck, J., Miller, G., & Hilton, A. (2005). Video-based character animation. In SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 49–58). New York: ACM Press.

    Chapter  Google Scholar 

  • Sundar, H., Silver, D., Gagvani, N., & Dickinson, S. (2003). Skeleton based shape matching and retrieval. In SMI ’03: Proceedings of the shape modeling international 2003 (p. 130).

  • Tangelder, J. W. H., & Veltkamp, R. C. (2004). A survey of content based 3D shape retrieval methods. In SMI ’04: Proceedings of the shape modeling international 2004 (pp. 145–156). Washington: IEEE Computer Society.

    Chapter  Google Scholar 

  • Theobalt, C., Ahmed, N., Lensch, H., Magnor, M., & Seidel, H. P. (2007). Seeing people in different light-joint shape, motion, and reflectance capture. IEEE Transactions on Visualization and Computer Graphics, 13(4), 663–674.

    Article  Google Scholar 

  • Vlasic, D., Baran, I., Matusik, W., & Popović, J. (2008). Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics, 27(3), 1–9.

    Article  Google Scholar 

  • Weinland, D., Ronfard, R., & Boyer, E. (2006). Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2), 249–257.

    Article  Google Scholar 

  • Xu, J., Yamasaki, T., & Aizawa, K. (2006). Motion editing in 3D video database. In 3DPVT ’06: Proceedings of the third international symposium on 3D data processing, visualization, and transmission (pp. 472–479). Washington: IEEE Computer Society.

    Chapter  Google Scholar 

  • Zaharia, T., & Preteux, F. (2001). Three-dimensional shape-based retrieval within the mpeg-7 framework. In Proceedings SPIE conference on nonlinear image processing and pattern analysis XII (Vol. 4304, pp. 133–145).

  • Zhang, C., & Chen, T. (2001). Efficient feature extraction for 2D/3D objects in mesh representation. In Image processing, 2001 proceedings 2001 international conference (Vol. 3, pp. 935–938).

  • Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., & Szeliski, R. (2004). High-quality video view interpolation using a layered representation. ACM Transactions on Graphics, 23(3), 600–608.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, P., Hilton, A. & Starck, J. Shape Similarity for 3D Video Sequences of People. Int J Comput Vis 89, 362–381 (2010). https://doi.org/10.1007/s11263-010-0319-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-010-0319-9

Keywords

Navigation