Skip to main content
Log in

Cloth texture preserving image-based 3D virtual try-on

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

3D virtual try-on based on a single image can provide an excellent shopping experience for Internet users and has enormous business potential. The existing methods of processing the clothed 3D human body generated from the virtual try-on images are reconstructed in 3D models by extracting the depth information from the input images. However, the generated results are unstable and often fail to capture the high-frequency information loss detail features in the larger spatial background during the process of downsampling for depth prediction, and the loss of the generator gradient when predicting the occluded areas in the high-resolution images. To address this problem, we propose a multi-resolution parallel approach to obtain low-frequency information and retain as much of the high-frequency depth features in the images during depth prediction; at the same time, we use a multi-scale generator and discriminator to more accurately infer the feature images of the occluded regions to generate a fine-grained dressed 3D human body. Our method not only provides better details and effects to the final 3D mannequin generation for 3D virtual fitting, but also significantly improves the user’s try-on experience than previous studies, as evidenced by our higher quantitative and qualitative evaluations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author, [author initials], upon reasonable request.

References

  1. Peng, S., Dong, J., Wang, Q., Zhang, S., Shuai, Q., Bao, H., Zhou, X.: Animatable neural radiance fields for human body modeling. arXiv preprint arXiv:2105.02872 (2021)

  2. Jiang, Y., Jiang, S., Sun, G., Su, Z., Guo, K., Wu, M., Yu, J., Xu, L.: Neuralhofusion: Neural volumetric rendering under human-object interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6155–6165 (2022)

  3. Zhi, Y., Qian, S., Yan, X., Gao, S.: Dual-space nerf: Learning animatable avatars and scene lighting in separate spaces. arXiv preprint arXiv:2208.14851 (2022)

  4. Wang, S., Schwarz, K., Geiger, A., Tang, S.: Arah: animatable volume rendering of articulated human sdfs. In: European Conference on Computer Vision, pp. 1–19 (2022). Springer

  5. Weng, C.-Y., Curless, B., Srinivasan, P.P., Barron, J.T., Kemelmacher-Shlizerman, I.: Humannerf: Free-viewpoint rendering of moving people from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16210–16220 (2022)

  6. Te, G., Li, X., Li, X., Wang, J., Hu, W., Lu, Y.: Neural capture of animatable 3d human from monocular video. In: European Conference on Computer Vision, pp. 275–291 (2022). Springer

  7. Jafarian, Y., Park, H.S.: Learning high fidelity depths of dressed humans by watching social media dance videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12753–12762 (2021)

  8. Ma, Q., Yang, J., Black, M.J., Tang, S.: Neural point-based shape modeling of humans in challenging clothing. arXiv preprint arXiv:2209.06814 (2022)

  9. Shao, R., Zheng, Z., Zhang, H., Sun, J., Liu, Y.: Diffustereo: High quality human reconstruction via diffusion-based stereo using sparse cameras. In: European Conference on Computer Vision, pp. 702–720 (2022). Springer

  10. Hong, F., Pan, L., Cai, Z., Liu, Z.: Garment4d: Garment reconstruction from point cloud sequences. Adv. Neural. Inf. Process. Syst. 34, 27940–27951 (2021)

    Google Scholar 

  11. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34(6), 1–16 (2015)

    Article  Google Scholar 

  12. Patel, C., Liao, Z., Pons-Moll, G.: Tailornet: Predicting clothing in 3d as a function of human pose, shape and garment style. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7365–7375 (2020)

  13. Corona, E., Pumarola, A., Alenya, G., Pons-Moll, G., Moreno-Noguer, F.: Smplicit: Topology-aware generative model for clothed people. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11875–11885 (2021)

  14. He, T., Xu, Y., Saito, S., Soatto, S., Tung, T.: Arch++: Animation-ready clothed human reconstruction revisited. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11046–11056 (2021)

  15. Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: Learning to dress 3d people from images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5420–5430 (2019)

  16. Zhao, F., Xie, Z., Kampffmeyer, M., Dong, H., Han, S., Zheng, T., Zhang, T., Liang, X.: M3d-vton: A monocular-to-3d virtual try-on network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13239–13249 (2021)

  17. Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1511–1520 (2017)

  18. Saito, S., Simon, T., Saragih, J., Joo, H.: Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 84–93 (2020)

  19. Jiang, B., Zhang, J., Hong, Y., Luo, J., Liu, L., Bao, H.: Bcnet: Learning body and cloth shape from a single image. In: European Conference on Computer Vision, pp. 18–35 (2020). Springer

  20. Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: Icon: implicit clothed humans obtained from normals. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13286–13296 (2022). IEEE

  21. Zhu, H., Cao, Y., Jin, H., Chen, W., Du, D., Wang, Z., Cui, S., Han, X.: Deep fashion3d: A dataset and benchmark for 3d garment reconstruction from single images. In: European Conference on Computer Vision, pp. 512–530 (2020). Springer

  22. Lei, J., Sridhar, S., Guerrero, P., Sung, M., Mitra, N., Guibas, L.J.: Pix2surf: Learning parametric 3d surface models of objects from images. In: European Conference on Computer Vision, pp. 121–138 (2020). Springer

  23. Petrov, A.: On obtaining shape from color shading. Color Res. Appl. 18(6), 375–379 (1993)

    Article  Google Scholar 

  24. Zhang, R., Tsai, P.-S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)

    Article  MATH  Google Scholar 

  25. Yan, S., Wu, C., Wang, L., Xu, F., An, L., Guo, K., Liu, Y.: Ddrnet: Depth map denoising and refinement for consumer depth cameras using cascaded CNNS. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 151–167 (2018)

  26. Sterzentsenko, V., Saroglou, L., Chatzitofis, A., Thermos, S., Zioulis, N., Doumanoglou, A., Zarpalas, D., Daras, P.: Self-supervised deep depth denoising. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1242–1251 (2019)

  27. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)

  28. Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)

  29. Kazhdan, M., Hoppe, H.: Screened Poisson surface reconstruction. ACM Trans. Graph. 32(3), 1–13 (2013)

    Article  MATH  Google Scholar 

  30. Telea, A.: An image inpainting technique based on the fast marching method. J. Gr. Tools 9(1), 23–34 (2004)

    Article  Google Scholar 

  31. Wang, L., Zhao, X., Yu, T., Wang, S., Liu, Y.: Normalgan: learning detailed 3D human from a single RGB-d image. In: European Conference on Computer Vision, pp. 430–446 (2020). Springer

  32. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  33. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)

  34. Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 589–604 (2018)

  35. Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: Deephuman: 3D human reconstruction from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7739–7749 (2019)

  36. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2304–2314 (2019)

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors disclosed no relevant relationships. XH, CZ, JH, RL, JL, TP.

Corresponding author

Correspondence to Junjie Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, X., Zheng, C., Huang, J. et al. Cloth texture preserving image-based 3D virtual try-on. Vis Comput 39, 3347–3357 (2023). https://doi.org/10.1007/s00371-023-02999-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02999-4

Keywords

Navigation