Advertisement

SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12348)

Abstract

While models of 3D clothing learned from real data exist, no method can predict clothing deformation as a function of garment size. In this paper, we introduce SizerNet to predict 3D clothing conditioned on human body shape and garment size parameters, and ParserNet to infer garment meshes and shape under clothing with personal details in a single pass from an input mesh. SizerNet allows to estimate and visualize the dressing effect of a garment in various sizes, and ParserNet allows to edit clothing of an input mesh directly, removing the need for scan segmentation, which is a challenging problem in itself. To learn these models, we introduce the SIZER dataset of clothing size variation which includes 100 different subjects wearing casual clothing items in various sizes, totaling to approximately 2000 scans. This dataset includes the scans, registrations to the SMPL model, scans segmented in clothing parts, garment category and size labels. Our experiments show better parsing accuracy and size prediction than baseline methods trained on SIZER. The code, model and dataset will be released for research purposes at: https://virtualhumans.mpi-inf.mpg.de/sizer/.

Notes

Acknowledgements

This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans) and a Facebook research award. We thank Tarun, Navami, and Yash for helping us with the data capture and RVH team members  [4], for their meticulous feedback on this manuscript.

Supplementary material

504435_1_En_1_MOESM1_ESM.pdf (17 mb)
Supplementary material 1 (pdf 17383 KB)

References

  1. 1.
    Agisoft metashape. https://www.agisoft.com/
  2. 2.
  3. 3.
  4. 4.
    Real virtual humans, Max Planck Institute for Informatics. https://virtualhumans.mpi-inf.mpg.de/people.html
  5. 5.
    Treedy’s scanner. https://www.treedys.com
  6. 6.
    de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H., Thrun, S.: Performance capture from sparse multi-view video. ACM Trans. Graph. 27(3), 98:1–98:10 (2008)CrossRefGoogle Scholar
  7. 7.
    Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  8. 8.
    Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Detailed human avatars from monocular video. In: International Conference on 3D Vision (3DV) (2018)Google Scholar
  9. 9.
    Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  10. 10.
    Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2shape: detailed full human body geometry from a single image. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2019)Google Scholar
  11. 11.
    Bălan, A.O., Black, M.J.: The naked truth: estimating body shape under clothing. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 15–29. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88688-4_2CrossRefGoogle Scholar
  12. 12.
    Bertiche, H., Madadi, M., Escalera, S.: CLOTH3D: clothed 3D humans. vol. abs/1912.02792 (2019)Google Scholar
  13. 13.
    Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Combining implicit function learning and parametric models for 3D human reconstruction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) European Conference on Computer Vision (ECCV), vol. 12347, pp. 311–329. Springer, Cham (2020).  https://doi.org/10.1007/978-3-030-58536-5_19CrossRefGoogle Scholar
  14. 14.
    Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: learning to dress 3D people from images. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2019)Google Scholar
  15. 15.
    Bogo, F., Romero, J., Pons-Moll, G., Black, M.J.: Dynamic FAUST: registering human bodies in motion. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  16. 16.
    Bradley, D., Popa, T., Sheffer, A., Heidrich, W., Boubekeur, T.: Markerless garment capture. ACM Trans. Graph. 27, 99 (2008)CrossRefGoogle Scholar
  17. 17.
    Chen, X., et al.: Towards 3D human shape recovery under clothing. CoRR abs/1904.02601 (2019)Google Scholar
  18. 18.
    Dong, H., Liang, X., Wang, B., Lai, H., Zhu, J., Yin, J.: Towards multi-pose guided virtual try-on network. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  19. 19.
    Dong, H., et al.: Fashion editing with adversarial parsing learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  20. 20.
    Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 805–822. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01225-0_47CrossRefGoogle Scholar
  21. 21.
    Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M.J.: DRAPE: DRessing any PErson. ACM Trans. Graph. (Proc. SIGGRAPH) 31(4), 35:1–35:10 (2012)Google Scholar
  22. 22.
    Gundogdu, E., Constantin, V., Seifoddini, A., Dang, M., Salzmann, M., Fua, P.: GarNet: a two-stream network for fast and accurate 3D cloth draping. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2019)Google Scholar
  23. 23.
    Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: Livecap: real-time human performance capture from monocular video (2019)Google Scholar
  24. 24.
    Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: DeepCap: monocular human performance capture using weak supervision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)Google Scholar
  25. 25.
    Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: ARCH: animatable reconstruction of clothed humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3093–3102 (2020)Google Scholar
  26. 26.
    Jiang, B., Zhang, J., Hong, Y., Luo, J., Liu, L., Bao, H.: BCNet: learning body and cloth shape from a single image. arXiv preprint arXiv:2004.00214 (2020)
  27. 27.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Computer Vision and Pattern Regognition (CVPR) (2018)Google Scholar
  28. 28.
    Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: International Conference on Computer Vision (2019)Google Scholar
  29. 29.
    Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR (2019)Google Scholar
  30. 30.
    Lähner, Z., Cremers, D., Tung, T.: DeepWrinkles: accurate and realistic clothing modeling. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 698–715. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01225-0_41CrossRefGoogle Scholar
  31. 31.
    Lazova, V., Insafutdinov, E., Pons-Moll, G.: 360-degree textures of people in clothing from a single image. In: International Conference on 3D Vision (3DV) (2019)Google Scholar
  32. 32.
    Leroy, V., Franco, J., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: IEEE International Conference on Computer Vision, ICCV, Venice, Italy, pp. 3113–3122 (2017)Google Scholar
  33. 33.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34(6), 248:1–248:16 (2015)Google Scholar
  34. 34.
    Ma, Q., et al.: Learning to dress 3D people in generative clothing. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)Google Scholar
  35. 35.
    Miguel, E., et al.: Data-driven estimation of cloth simulation models. Comput. Graph. Forum 31(2), 519–528 (2012)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision (2018)Google Scholar
  37. 37.
    Patel, C., Liao, Z., Pons-Moll, G.: The virtual tailor: predicting clothing in 3D as a function of human pose, shape and garment style. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)Google Scholar
  38. 38.
    Pons-Moll, G., Pujades, S., Hu, S., Black, M.: ClothCap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. 36(4), 1–15 (2017)CrossRefGoogle Scholar
  39. 39.
    Pons-Moll, G., Romero, J., Mahmood, N., Black, M.J.: Dyna: a model of dynamic human shape in motion. ACM Trans. Graph. 34, 120 (2015)CrossRefGoogle Scholar
  40. 40.
    Pumarola, A., Sanchez, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3DPeople: modeling the geometry of dressed humans. In: International Conference in Computer Vision (ICCV) (2019)Google Scholar
  41. 41.
    Rother, C., Kolmogorov, V., Blake, A.: GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23, 309–314 (2004)CrossRefGoogle Scholar
  42. 42.
    Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2304–2314 (2019)Google Scholar
  43. 43.
    Santesteban, I., Otaduy, M.A., Casas, D.: Learning-based animation of clothing for virtual try-on. Comput. Graph. Forum (Proc. Eurograph.) 38, 355–366 (2019)Google Scholar
  44. 44.
    Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27(3), 21–31 (2007)CrossRefGoogle Scholar
  45. 45.
    Stuyck, T.: Cloth Simulation for Computer Graphics. Synthesis Lectures on Visual Computing. Morgan & Claypool Publishers, San Rafael (2018)Google Scholar
  46. 46.
    Tao, Y., et al.: SimulCap: single-view human performance capture with cloth simulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  47. 47.
    Tung, T., Nobuhara, S., Matsuyama, T.: Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo. In: IEEE 12th International Conference on Computer Vision, ICCV, Kyoto, Japan, pp.1709–1716 (2009)Google Scholar
  48. 48.
    Wang, H., Hecht, F., Ramamoorthi, R., O’Brien, J.F.: Example-based wrinkle synthesis for clothing animation. ACM Trans. Graph. (Proc. SIGGRAPH) 29(4), 107:1–107:8 (2010)Google Scholar
  49. 49.
    Wang, H., Ramamoorthi, R., O’Brien, J.F.: Data-driven elastic models for cloth: modeling and measurement. ACM Trans. Graph. (Proc. SIGGRAPH) 30(4), 71:1–71:11 (2011)Google Scholar
  50. 50.
    Wang, T.Y., Ceylan, D., Popovic, J., Mitra, N.J.: Learning a shared shape space for multimodal garment design. ACM Trans. Graph. 37(6), 1:1–1:14 (2018)Google Scholar
  51. 51.
    White, R., Crane, K., Forsyth, D.A.: Capturing and animating occluded cloth. ACM Trans. Graph. 26(3), 34 (2007)CrossRefGoogle Scholar
  52. 52.
    Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10965–10974 (2019)Google Scholar
  53. 53.
    Xu, H., Li, J., Lu, G., Zhang, D., Long, J.: Predicting ready-made garment dressing fit for individuals based on highly reliable examples. Comput. Graph. 90, 135–144 (2020)CrossRefGoogle Scholar
  54. 54.
    Xu, Y., Zhu, S.C., Tung, T.: DenseRaC: joint 3D pose and shape estimation by dense render and compare. In: International Conference on Computer Vision (2019)Google Scholar
  55. 55.
    Yamaguchi, K.: Parsing clothing in fashion photographs. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). CVPR 2012, pp. 3570–3577. IEEE Computer Society, USA (2012)Google Scholar
  56. 56.
    Yamaguchi, K., Kiapour, M.H., Berg, T.L.: Paper doll parsing: retrieving similar styles to parse clothing items. In: IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, 1–8 December 2013, pp. 3519–3526. IEEE Computer Society (2013)Google Scholar
  57. 57.
    Yang, J., Franco, J.-S., Hétroy-Wheeler, F., Wuhrer, S.: Analyzing clothing layer deformation statistics of 3d human motions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 245–261. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_15CrossRefGoogle Scholar
  58. 58.
    Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling (2014)Google Scholar
  59. 59.
    Yu, T., et al.: Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: The IEEE International Conference on Computer Vision and Pattern Recognition(CVPR). IEEE (2018)Google Scholar
  60. 60.
    Zhang, C., Pujades, S., Black, M., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: IEEE CVPR (2017)Google Scholar
  61. 61.
    Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: The IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  62. 62.
    Zhu, H., et al.: Deep fashion3D: a dataset and benchmark for 3D garment reconstruction from single images. arXiv preprint arXiv:2003.12753 (2020)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.MPI for InformaticsSaarbrückenGermany
  2. 2.Facebook Reality LabsSausalitoUSA

Personalised recommendations