Abstract
This paper presents a performance evaluation of shape similarity metrics for 3D video sequences of people with unknown temporal correspondence. Performance of similarity measures is compared by evaluating Receiver Operator Characteristics for classification against ground-truth for a comprehensive database of synthetic 3D video sequences comprising animations of fourteen people performing twenty-eight motions. Static shape similarity metrics shape distribution, spin image, shape histogram and spherical harmonics are evaluated using optimal parameter settings for each approach. Shape histograms with volume sampling are found to consistently give the best performance for different people and motions. Static shape similarity is extended over time to eliminate the temporal ambiguity. Time-filtering of the static shape similarity together with two novel shape-flow descriptors are evaluated against temporal ground-truth. This evaluation demonstrates that shape-flow with a multi-frame alignment of motion sequences achieves the best performance, is stable for different people and motions, and overcome the ambiguity in static shape similarity. Time-filtering of the static shape histogram similarity measure with a fixed window size achieves marginally lower performance for linear motions with the same computational cost as static shape descriptors. Performance of the temporal shape descriptors is validated for real 3D video sequence of nine actors performing a variety of movements. Time-filtered shape histograms are shown to reliably identify frames from 3D video sequences with similar shape and motion for people with loose clothing and complex motion.
Similar content being viewed by others
References
Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., & Thrun, S. (2008). Performance capture from sparse multi-view video. ACM Transactions on Graphics, 27(3), 1–10.
Ankerst, M., Kastenmüller, G., Kriegel, H. P., & Seidl, T. (1999). 3D shape histograms for similarity search and classification in spatial databases. In SSD ’99: proceedings of the 6th international symposium on advances in spatial databases (pp. 207–226). London: Springer.
Arikan, O., Forsyth, D. A., & O’Brien, J. F. (2003). Motion synthesis from annotations. ACM Transactions on Graphics, 22(3), 402–408.
Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(4), 509–522.
Bobick, A. F., & Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis & Machine Intelligence, 23(3), 257–267.
Bustos, B., Keim, D., Saupe, D., & Schreck, T. (2007). Content-based 3D object retrieval. Computer Graphics and Applications. IEEE, 27(4), 22–27.
Carranza, J., Theobalt, C., Magnor, M. A., & Seidel, H. P. (2003). Free-viewpoint video of human actors. ACM Transactions on Graphics, 22(3), 569–577.
Chen, D. Y., Ouhyoung, M., Tian, X. P., & Shen, Y. T. (2003). On visual similarity based 3D model retrieval. Computer Graphics Forum (EUROGRAPHICS’03), 22(3), 223–232.
Chua, C. S., & Jarvis, R. (1997). Point signatures: a new representation for 3D object recognition. International Journal of Computer Vision, 25(1), 63–85.
Corney, J., Rea, H., Clark, D., Pritchard, J., Breaks, M., & Macleod, R. (2002). Coarse filters for shape matching. Computer Graphics and Applications, IEEE, 22(3), 65–74.
Cutler, R., & Davis, L. S. (2000). Robust real-time periodic motion detection, analysis, and applications. IEEE Transactions on Pattern Analysis & Machine Intelligence, 22(8), 781–796.
Del Bimbo, A., & Pala, P. (2006). Content-based retrieval of 3D models. ACM Transactions on Multimedia Computing, Communications, and Applications, 2(1), 20–43.
Efros, A. A., Berg, A. C., Mori, G., & Malik, J. (2003). Recognizing action at a distance. In ICCV ’03: Proceedings of the ninth IEEE international conference on computer vision. Washington: IEEE Computer Society.
El-Mehalawi, M. (2003). A database system of mechanical components based on geometric and topological similarity. part ii: indexing, retrieval, matching, and similarity assessment. Computer-Aided Design, 35(1), 95–105.
Elad, A., & Kimmel, R. (2003). On bending invariant signatures for surfaces. IEEE Transactions on Pattern Analysis & Machine Intelligence, 25(10), 1285–1295.
Gleicher, M., Joon, H., Lucas, S., & Jepsen, K. A. (2003). Snap-together motion: assembling run-time animation. ACM Transactions on Graphics, 22, 181–188.
Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis & Machine Intelligence, 29(12), 2247–2253.
Hilaga, M., Shinagawa, Y., Kohmura, T., & Kunii, T. L. (2001). Topology matching for fully automatic similarity estimation of 3D shapes. In SIGGRAPH ’01: Proceedings of the 28th annual conference on computer graphics and interactive techniques (pp. 203–212). New York: ACM Press.
Huang, P., & Hilton, A. (2009). Human motion synthesis from 3D video. In Proceedings of the 2009 conference on computer vision and pattern recognition (CVPR’09) (pp. 1478–1485).
Huang, P., Starck, J., & Hilton, A. (2007a). A study of shape similarity for temporal surface sequences of people. In 3DIM ’07: Proceedings of the sixth international conference on 3D digital imaging and modeling (pp. 408–418). Washington: IEEE Computer Society.
Huang, P., Starck, J., & Hilton, A. (2007b). Temporal 3D shape matching. In The fourth European conference on visual media production (CVMP’07) (pp. 1–10).
Iyer, N., Jayanti, S., Lou, K., Kalyanaraman, Y., & Ramani, K. (2005). Three-dimensional shape searching: state-of-the-art review and future trends. Computer-Aided Design, 37(5), 509–530.
Jain, V., & Zhang, H. (2007). A spectral approach to shape-based retrieval of articulated 3D models. Computer-Aided Design, 39(5), 398–407.
Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis & Machine Intelligence, 21(5), 433–449.
Kanade, T., Rander, P., & Narayanan, P. J. (1997). Virtualized reality: Constructing virtual worlds from real scenes. IEEE MultiMedia, 4(1), 34–47.
Kazhdan, M., Chazelle, B., Dobkin, D. P., Finkelstein, A., & Funkhouser, T. A. (2002). A reflective symmetry descriptor. In ECCV (Vol. 2, pp. 642–656).
Kazhdan, M., Funkhouser, T., & Rusinkiewicz, S. (2003). Rotation invariant spherical harmonic representation of 3D shape descriptors. In SGP ’03: Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on geometry processing (pp. 156–164).
Körtgen, M., Park, G. J., Novotni, M., & Klein, R. (2003). 3D shape matching with 3D shape contexts. In The 7th central European seminar on computer graphics.
Kovar, L., Gleicher, M., & Pighin, F. (2002). Motion graphs. In SIGGRAPH ’02: Proceedings of the 29th annual conference on computer graphics and interactive techniques (Vol. 21, pp. 473–482). New York: ACM Press.
Krüger, V., Kragic, D., Ude, A., & Geib, C. (2007). The meaning of action: A review on action recognition and mapping. Advanced Robotics, 21(13), 1473–1501.
Lee, J., Chai, J., Reitsma, P. S. A., Hodgins, J. K., & Pollard, N. S. (2002). Interactive control of avatars animated with human motion data. ACM Transactions on Graphics, 21(3), 491–500.
Mcwherter, D., Peabody, M., Regli, W. C., & Shokoufandeh, A. (2001). Solid model databases: Techniques and empirical results. Journal of Computing and Information Science in Engineering, 1(4), 300–310.
Novotni, M., & Klein, R. (2003). 3D Zernike descriptors for content based shape retrieval. In SM ’03: Proceedings of the eighth ACM symposium on solid modeling and applications (pp. 216–225). New York: ACM Press.
Ohbuchi, R., Minamitani, T., & Takei, T. (2003). Shape-similarity search of 3D models by using enhanced shape functions. In Theory and practice of computer graphics, 2003 proceedings (pp. 97–104).
Osada, R., Funkhouser, T., Chazelle, B., & Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics, 21(4), 807–832.
Paquet, E. (2000). Description of shape information for 2D and 3D objects. Signal Processing: Image Communication, 16, 103–122.
Schödl, A., Szeliski, R., Salesin, D. H., & Essa, I. (2000). Video textures. In SIGGRAPH ’00: Proceedings of the 27th annual conference on computer graphics and interactive techniques (pp. 489–498). New York: ACM Press/Addison-Wesley.
Shum, H. Y., Hebert, M., & Ikeuchi, K. (1996). On 3D shape similarity. In Proceedings of the 1996 conference on computer vision and pattern recognition (CVPR ’96) (pp. 526–531).
Starck, J., & Hilton, A. (2003). Model-based multiple view reconstruction of people. In ICCV ’03: Proceedings of the ninth international conference on computer vision (pp. 915–922).
Starck, J., & Hilton, A. (2007). Surface capture for performance-based animation. IEEE Computer Graphics and Applications, 27(3), 21–31.
Starck, J., Miller, G., & Hilton, A. (2005). Video-based character animation. In SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 49–58). New York: ACM Press.
Sundar, H., Silver, D., Gagvani, N., & Dickinson, S. (2003). Skeleton based shape matching and retrieval. In SMI ’03: Proceedings of the shape modeling international 2003 (p. 130).
Tangelder, J. W. H., & Veltkamp, R. C. (2004). A survey of content based 3D shape retrieval methods. In SMI ’04: Proceedings of the shape modeling international 2004 (pp. 145–156). Washington: IEEE Computer Society.
Theobalt, C., Ahmed, N., Lensch, H., Magnor, M., & Seidel, H. P. (2007). Seeing people in different light-joint shape, motion, and reflectance capture. IEEE Transactions on Visualization and Computer Graphics, 13(4), 663–674.
Vlasic, D., Baran, I., Matusik, W., & Popović, J. (2008). Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics, 27(3), 1–9.
Weinland, D., Ronfard, R., & Boyer, E. (2006). Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2), 249–257.
Xu, J., Yamasaki, T., & Aizawa, K. (2006). Motion editing in 3D video database. In 3DPVT ’06: Proceedings of the third international symposium on 3D data processing, visualization, and transmission (pp. 472–479). Washington: IEEE Computer Society.
Zaharia, T., & Preteux, F. (2001). Three-dimensional shape-based retrieval within the mpeg-7 framework. In Proceedings SPIE conference on nonlinear image processing and pattern analysis XII (Vol. 4304, pp. 133–145).
Zhang, C., & Chen, T. (2001). Efficient feature extraction for 2D/3D objects in mesh representation. In Image processing, 2001 proceedings 2001 international conference (Vol. 3, pp. 935–938).
Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., & Szeliski, R. (2004). High-quality video view interpolation using a layered representation. ACM Transactions on Graphics, 23(3), 600–608.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, P., Hilton, A. & Starck, J. Shape Similarity for 3D Video Sequences of People. Int J Comput Vis 89, 362–381 (2010). https://doi.org/10.1007/s11263-010-0319-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-010-0319-9