Shape from Selfies: Human Body Shape Estimation Using CCA Regression Forests

  • Endri DibraEmail author
  • Cengiz Öztireli
  • Remo Ziegler
  • Markus Gross
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9908)


In this work, we revise the problem of human body shape estimation from monocular imagery. Starting from a statistical human shape model that describes a body shape with shape parameters, we describe a novel approach to automatically estimate these parameters from a single input shape silhouette using semi-supervised learning. By utilizing silhouette features that encode local and global properties robust to noise, pose and view changes, and projecting them to lower dimensional spaces obtained through multi-view learning with canonical correlation analysis, we show how regression forests can be used to compute an accurate mapping from the silhouette to the shape parameter space. This results in a very fast, robust and automatic system under mild self-occlusion assumptions. We extensively evaluate our method on thousands of synthetic and real data and compare it to the state-of-art approaches that operate under more restrictive assumptions.


Body Shape Canonical Correlation Analysis Shape Estimation Kernel Canonical Correlation Analysis Template Mesh 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was funded by the KTI-grant 15599.1.

Supplementary material

419976_1_En_6_MOESM1_ESM.mp4 (25.6 mb)
Supplementary material 1 (mp4 26174 KB)
419976_1_En_6_MOESM2_ESM.pdf (528 kb)
Supplementary material 2 (pdf 528 KB)


  1. 1.
  2. 2.
    de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. In: SIGGRAPH (2008)Google Scholar
  3. 3.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: Shape completion and animation of people. In: SIGGRAPH (2005)Google Scholar
  4. 4.
    Balan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: CVPR (2007)Google Scholar
  5. 5.
    Baran, I., Popovic, J.: Automatic rigging and animation of 3d characters. ACM Trans. Graph. 26, 1–8 (2007)CrossRefGoogle Scholar
  6. 6.
    Boisvert, J., Shu, C., Wuhrer, S., Xi, P.: Three-dimensional human shape inference from silhouettes: reconstruction and validation. Mach. Vis. Appl. 24, 145–157 (2013)CrossRefGoogle Scholar
  7. 7.
    Boykov, Y., Jolly, M.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: ICCV (2001)Google Scholar
  8. 8.
    Breiman, L.: Random forests. Mach. Learn. 26, 123–140 (2001)zbMATHGoogle Scholar
  9. 9.
    Bălan, A.O., Black, M.J.: The naked truth: estimating body shape under clothing. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 15–29. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88688-4_2 CrossRefGoogle Scholar
  10. 10.
    Casas, D., Volino, M., Collomosse, J., Hilton, A.: 4d video textures for interactive character appearance. Comp. Graph. Forum(Proc. Eurographics) 33, 371–380 (2014)CrossRefGoogle Scholar
  11. 11.
    Chen, X., Guo, Y., Zhou, B., Zhao, Q.: Deformable model for estimating clothed and naked human shapes from a single image. Vis. Comput. 29, 1187–1196 (2013)CrossRefGoogle Scholar
  12. 12.
    Chen, Y., Cipolla, R.: Learning shape priors for single view reconstruction. In: ICCV Workshops (2009)Google Scholar
  13. 13.
    Chen, Y., Kim, T.-K., Cipolla, R.: Inferring 3D shapes and deformations from single views. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 300–313. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15558-1_22 CrossRefGoogle Scholar
  14. 14.
    Chen, Y., Kim, T., Cipolla, R.: Silhouette-based object phenotype recognition using 3d shape priors. In: ICCV (2011)Google Scholar
  15. 15.
    Delamarre, Q., Faugeras, O.: 3d articulated models and multi-view tracking with silhouettes. In: ICCV (1999)Google Scholar
  16. 16.
    Guan, L., Franco, J., Pollefeys, M.: Multi-object shape estimation and tracking from silhouette cues. In: CVPR (2008)Google Scholar
  17. 17.
    Guan, P., Reiss, L., Hirshberg, D.A., Weiss, A., Black, M.J.: Drape: dressing any person. ACM Trans. Graph 31, 1–10 (2012)CrossRefGoogle Scholar
  18. 18.
    Guan, P., Weiss, A., Balan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: ICCV (2009)Google Scholar
  19. 19.
    Hardoon, D.R., Mourão Miranda, J., Brammer, M., Shawe-Taylor, J.: Unsupervised analysis of fmri data using kernel canonical correlation. NeuroImage 37, 1250–1259 (2007)CrossRefGoogle Scholar
  20. 20.
    Hardoon, D.R., Szedmak, S.R., Shawe-taylor, J.R.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004)CrossRefzbMATHGoogle Scholar
  21. 21.
    Hasler, N., Ackermann, H., Rosenhahn, B., Thormählen, T., Seidel, H.: Multilinear pose and body shape estimation of dressed subjects from image sets. In: CVPR (2010)Google Scholar
  22. 22.
    Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.: A statistical model of human pose and body shape. Comput. Graph. Forum 28, 337–246 (2009)CrossRefGoogle Scholar
  23. 23.
    Helten, T., Baak, A., Bharaj, G., Müller, M., Seidel, H., Theobalt, C.: Personalization and evaluation of a real-time depth-based full body tracker. In: 3DV (2013)Google Scholar
  24. 24.
    Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)CrossRefzbMATHGoogle Scholar
  25. 25.
    Jain, A., Thormählen, T., Seidel, H.-P., Theobalt, C.: MovieReshape: tracking and reshaping of humans in videos. ACM Trans. Graph. 29(6), 148:1–148:10 (2010). doi: 10.1145/1882261.1866174 CrossRefGoogle Scholar
  26. 26.
    Kakade, S.M., Foster, D.P.: Multi-view regression via canonical correlation analysis. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. Lecture Notes in Artificial Intelligence (LNAI), vol. 4539, pp. 82–96. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-72927-3_8 CrossRefGoogle Scholar
  27. 27.
    Kakadiaris, I.A., Metaxas, D.: Three-dimensional human body model acquisition from multiple views. IJCV 30, 191–218 (1998)CrossRefGoogle Scholar
  28. 28.
    Kim, T.K., Wong, S.F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: CVPR (2007)Google Scholar
  29. 29.
    Lahner, Z., Rodola, E., Schmidt, F.R., Bronstein, M.M., Cremers, D.: Efficient globally optimal 2d-to-3d deformable shape matching. In: CVPR (2016)Google Scholar
  30. 30.
    Laurentini, A.: The visual hull concept for silhouette-based image understanding. PAMI 16, 150–162 (1994)CrossRefGoogle Scholar
  31. 31.
    Lewis, J.P., Cordner, M., Fong, N.: Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: SIGGRAPH (2000)Google Scholar
  32. 32.
    Ling, H., Jacobs, D.W.: Shape classification using the inner-distance. PAMI 29, 286–299 (2007)CrossRefGoogle Scholar
  33. 33.
    McWilliams, B., Balduzzi, D., Buhmann, J.M.: Correlated random features for fast semi-supervised learning. In: NIPS (2013)Google Scholar
  34. 34.
    Mikic, I., Trivedi, M., Hunter, E., Cosman, P.: Human body model acquisition and tracking using voxel data. IJCV 53, 199–223 (2003)CrossRefzbMATHGoogle Scholar
  35. 35.
    Neophytou, A., Hilton, A.: Shape and pose space deformation for subject specific animation. In: 3DV (2013)Google Scholar
  36. 36.
    Neophytou, A., Hilton, A.: A layered model of human body and garment deformation. In: 3DV (2014)Google Scholar
  37. 37.
    Perbet, F., Johnson, S., Pham, M.T., Stenger, B.: Human body shape estimation using a multi-resolution manifold forest. In: CVPR (2014)Google Scholar
  38. 38.
    Pishchulin, L., Wuhrer, S., Helten, T., Theobalt, C., Schiele, B.: Building statistical shape spaces for 3d human modeling. CoRR (2015)Google Scholar
  39. 39.
    Robinette, K.M., Daanen, H.A.M.: The caesar project: a 3-d surface anthropometry survey. In: 3DIM (1999)Google Scholar
  40. 40.
    Rogge, L., Klose, F., Stengel, M., Eisemann, M., Magnor, M.: Garment replacement in monocular video sequences. ACM Trans. Graph. 34, 1–10 (2014)CrossRefGoogle Scholar
  41. 41.
    Sargin, M.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Audiovisual synchronization and fusion using canonical correlation analysis. Trans. Multimedia 9, 1396–1403 (2007)CrossRefGoogle Scholar
  42. 42.
    Schmidt, F.R., Farin, D., Cremers, D.: Fast matching of planar shapes in sub-cubic runtime. In: ICCV (2007)Google Scholar
  43. 43.
    Schmidt, F.R., Töppe, E., Cremers, D.: Efficient planar graph cuts with applications in computer vision. In: CVPR (2009)Google Scholar
  44. 44.
    Shapira, L., Shamir, A., Cohen-Or, D.: Consistent mesh partitioning and skeletonisation using the shape diameter function. Visual Comput. 24, 249–259 (2008)CrossRefGoogle Scholar
  45. 45.
    Sharma, A., Kumar, A., Daume III, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR (2012)Google Scholar
  46. 46.
    Sigal, L., Balan, A.O., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: NIPS (2007)Google Scholar
  47. 47.
    Slama, R., Wannous, H., Daoudi, M.: Extremal human curves: a new human body shape and pose descriptor. In: FG (2013)Google Scholar
  48. 48.
    Starck, J., Miller, G., Hilton, A.: Video-based character animation. In: ACM SIGGRAPH Eurographics SCA (2005)Google Scholar
  49. 49.
    Stoll, C., Gall, J., de Aguiar, E., Thrun, S., Theobalt, C.: Video-based reconstruction of animatable human characters. In: SIGGRAPH Asia (2010)Google Scholar
  50. 50.
    Weiss, A., Hirshberg, D.A., Black, M.J.: Home 3d body scans from noisy image and range data. In: ICCV (2011)Google Scholar
  51. 51.
    Wuhrer, S., Pishchulin, L., Brunton, A., Shu, C., Lang, J.: Estimation of human body shape and posture under clothing. CVIU 127, 31–42 (2014)Google Scholar
  52. 52.
    Xi, P., Lee, W., Shu, C.: A data-driven approach to human-body cloning using a segmented body database. In: Pacific Graphics (2007)Google Scholar
  53. 53.
    Xu, F., Liu, Y., Stoll, C., Tompkin, J., Bharaj, G., Dai, Q., Seidel, H.P., Kautz, J., Theobalt, C.: Video-based characters: Creating new human performances from a multi-view video database. In: SIGGRAPH (2011)Google Scholar
  54. 54.
    Yang, Y., Yu, Y., Zhou, Y., Du, S., Davis, J., Yang, R.: Semantic parametric reshaping of human body models. In: 3DV (2014)Google Scholar
  55. 55.
    Ye, M., Yang, R.: Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In: CVPR (2014)Google Scholar
  56. 56.
    Zhou, S., Fu, H., Liu, L., Cohen-Or, D., Han, X.: Parametric reshaping of human bodies in images. ACM Trans. Graph. 29(4), 126:1–126:10 (2010). doi: 10.1145/1778765.1778863 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Endri Dibra
    • 1
    Email author
  • Cengiz Öztireli
    • 1
  • Remo Ziegler
    • 2
  • Markus Gross
    • 1
  1. 1.Department of Computer ScienceETH ZürichZürichSwitzerland
  2. 2.VizrtZürichSwitzerland

Personalised recommendations