Skip to main content
Log in

Virtual try-on based on attention U-Net

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Image-based virtual try-on, aiming to fit new in-shop clothes into a person image, has gained extensive attention in the fields of computer vision and image process community. Most of the current virtual try-on methods are based on thin plate spline transformation and composition mask. Such methods are limited by the spatial feature retention performance of the network, hard to warp clothes aligning with new body when the body shape and posture change largely, and its difficult for them to handle self-occlusion. We employ a two-stage approach, warping clothes in the clothes warping module (CWM), and generating try-on results in cross-domain fusion module (CFM). To address the problem of hard to warping clothes aligning with new body, we add a combined loss to the Clothes Warping Module, including a perceptual loss for the clothes parsing region and a L1 loss for the whole image. To solve the self-occlusion problem, firstly, we adopt an attention-based U-Net network as the backbone of the cross-domain fusion module. Then, we improved the framework of CFM to generate composition mask, adjusted clothes and rendered person, and composite the final try-on result through mask operations. Experiments on the Zalando dataset demonstrate that this work can warp clothes naturally with details preserved and produce photo-realistic try-on results without self-occlusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)

    Article  Google Scholar 

  2. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299 (2017)

  3. Choi, S., Park, S., Lee, M., Choo, J.: Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14,131–14,140 (2021)

  4. Dong, H., Liang, X., Shen, X., Wang, B., Lai, H., Zhu, J., Hu, Z., Yin, J.: Towards multi-pose guided virtual try-on network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9026–9035 (2019)

  5. Dong, H., Liang, X., Shen, X., Wu, B., Chen, B.C., Yin, J.: Fw-gan: Flow-navigated warping gan for video virtual try-on. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1161–1170 (2019)

  6. Fang, H.S., Lu, G., Fang, X., Xie, J., Tai, Y.W., Lu, C.: Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. arXiv preprint arXiv:1805.04310 (2018)

  7. Ge, C., Song, Y., Ge, Y., Yang, H., Liu, W., Luo, P.: Disentangled cycle consistency for highly-realistic virtual try-on. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16,928–16,937 (2021)

  8. Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8485–8493 (2021)

  9. Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 932–940 (2017)

  10. Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7543–7552 (2018)

  11. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)

  12. Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Liu, J., Cheng, W.H.: Fashionon: semantic-guided image-based virtual try-on with detailed human and clothing information. In: Proceedings of the 27th ACM international conference on multimedia, pp. 275–283 (2019)

  13. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)

  14. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  15. Jandial, S., Chopra, A., Ayush, K., Hemani, M., Krishnamurthy, B., Halwai, A.: Sievenet: A unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2182–2190 (2020)

  16. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp. 694–711. Springer (2016)

  17. Lewis, K.M., Varadharajan, S., Kemelmacher-Shlizerman, I.: Tryongan: body-aware try-on via layered interpolation. ACM Trans. Graph. (TOG) 40(4), 1–10 (2021)

    Article  Google Scholar 

  18. Li, K., Chong, M.J., Zhang, J., Liu, J.: Toward accurate and realistic outfits visualization with attention to details. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15,546–15,555 (2021)

  19. Minar, M.R., Ahn, H.: Cloth-vton: Clothing three-dimensional reconstruction for hybrid image-based virtual try-on. In: Proceedings of the Asian conference on computer vision (2020)

  20. Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)

  21. Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: Clothcap: seamless 4d clothing capture and retargeting. ACM Trans. Graph. (TOG) 36(4), 1–15 (2017)

    Article  Google Scholar 

  22. Ren, B., Tang, H., Meng, F., Ding, R., Shao, L., Torr, P.H., Sebe, N.: Cloth interactive transformer for virtual try-on. arXiv preprint arXiv:2104.05519 (2021)

  23. Rocco, I., Arandjelovic, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6148–6157 (2017)

  24. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 234–241. Springer (2015)

  25. Roy, D., Santra, S., Chanda, B.: Lgvton: A landmark guided approach to virtual try-on. arXiv preprint arXiv:2004.00562 (2020)

  26. Sattar, H., Pons-Moll, G., Fritz, M.: Fashion is taking shape: Understanding clothing preference based on body shape from online sources. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp. 968–977. IEEE (2019)

  27. Sekine, M., Sugita, K., Perbet, F., Stenger, B., Nishiyama, M.: Virtual fitting by single-shot body shape estimation. In: International conference on 3D body scanning technologies, pp. 406–413. Citeseer (2014)

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  29. Song, D., Li, T., Mao, Z., Liu, A.A.: Sp-viton: shape-preserving image-based virtual try-on network. Multimed. Tools Appl. 79(45), 33757–33769 (2020)

    Article  Google Scholar 

  30. Trebing, K., Stanczyk, T., Mehrkanoon, S.: Smaat-unet: Precipitation nowcasting using a small attention-unet architecture. Pattern Recogn. Lett. 145, 178–186 (2021)

  31. Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European conference on computer vision (ECCV), pp. 589–604 (2018)

  32. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018)

  33. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  34. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)

  35. Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7850–7859 (2020)

  36. Zhou, S., Fu, H., Liu, L., Cohen-Or, D., Han, X.: Parametric reshaping of human bodies in images. ACM Trans. Graph. (TOG) 29(4), 1–10 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, X., Zhang, J., Huang, J. et al. Virtual try-on based on attention U-Net. Vis Comput 38, 3365–3376 (2022). https://doi.org/10.1007/s00371-022-02563-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02563-6

Keywords

Navigation