Virtual try-on based on attention U-Net

Hu, Xinrong; Zhang, Junyu; Huang, Jin; Liang, JinXing; Yu, Feng; Peng, Tao

doi:10.1007/s00371-022-02563-6

Virtual try-on based on attention U-Net

Original article
Published: 15 July 2022

Volume 38, pages 3365–3376, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xinrong Hu¹,
Junyu Zhang¹,
Jin Huang ORCID: orcid.org/0000-0001-6214-9781¹,
JinXing Liang¹,
Feng Yu¹ &
…
Tao Peng¹

484 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Image-based virtual try-on, aiming to fit new in-shop clothes into a person image, has gained extensive attention in the fields of computer vision and image process community. Most of the current virtual try-on methods are based on thin plate spline transformation and composition mask. Such methods are limited by the spatial feature retention performance of the network, hard to warp clothes aligning with new body when the body shape and posture change largely, and its difficult for them to handle self-occlusion. We employ a two-stage approach, warping clothes in the clothes warping module (CWM), and generating try-on results in cross-domain fusion module (CFM). To address the problem of hard to warping clothes aligning with new body, we add a combined loss to the Clothes Warping Module, including a perceptual loss for the clothes parsing region and a L1 loss for the whole image. To solve the self-occlusion problem, firstly, we adopt an attention-based U-Net network as the backbone of the cross-domain fusion module. Then, we improved the framework of CFM to generate composition mask, adjusted clothes and rendered person, and composite the final try-on result through mask operations. Experiments on the Zalando dataset demonstrate that this work can warp clothes naturally with details preserved and produce photo-realistic try-on results without self-occlusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LG-VTON: Fashion Landmark Meets Image-Based Virtual Try-On

Single Stage Virtual Try-On Via Deformable Attention Flows

High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions

References

Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)
Article Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299 (2017)
Choi, S., Park, S., Lee, M., Choo, J.: Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14,131–14,140 (2021)
Dong, H., Liang, X., Shen, X., Wang, B., Lai, H., Zhu, J., Hu, Z., Yin, J.: Towards multi-pose guided virtual try-on network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9026–9035 (2019)
Dong, H., Liang, X., Shen, X., Wu, B., Chen, B.C., Yin, J.: Fw-gan: Flow-navigated warping gan for video virtual try-on. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1161–1170 (2019)
Fang, H.S., Lu, G., Fang, X., Xie, J., Tai, Y.W., Lu, C.: Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. arXiv preprint arXiv:1805.04310 (2018)
Ge, C., Song, Y., Ge, Y., Yang, H., Liu, W., Luo, P.: Disentangled cycle consistency for highly-realistic virtual try-on. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16,928–16,937 (2021)
Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8485–8493 (2021)
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 932–940 (2017)
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7543–7552 (2018)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)
Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Liu, J., Cheng, W.H.: Fashionon: semantic-guided image-based virtual try-on with detailed human and clothing information. In: Proceedings of the 27th ACM international conference on multimedia, pp. 275–283 (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Jandial, S., Chopra, A., Ayush, K., Hemani, M., Krishnamurthy, B., Halwai, A.: Sievenet: A unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2182–2190 (2020)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp. 694–711. Springer (2016)
Lewis, K.M., Varadharajan, S., Kemelmacher-Shlizerman, I.: Tryongan: body-aware try-on via layered interpolation. ACM Trans. Graph. (TOG) 40(4), 1–10 (2021)
Article Google Scholar
Li, K., Chong, M.J., Zhang, J., Liu, J.: Toward accurate and realistic outfits visualization with attention to details. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15,546–15,555 (2021)
Minar, M.R., Ahn, H.: Cloth-vton: Clothing three-dimensional reconstruction for hybrid image-based virtual try-on. In: Proceedings of the Asian conference on computer vision (2020)
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: Clothcap: seamless 4d clothing capture and retargeting. ACM Trans. Graph. (TOG) 36(4), 1–15 (2017)
Article Google Scholar
Ren, B., Tang, H., Meng, F., Ding, R., Shao, L., Torr, P.H., Sebe, N.: Cloth interactive transformer for virtual try-on. arXiv preprint arXiv:2104.05519 (2021)
Rocco, I., Arandjelovic, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6148–6157 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 234–241. Springer (2015)
Roy, D., Santra, S., Chanda, B.: Lgvton: A landmark guided approach to virtual try-on. arXiv preprint arXiv:2004.00562 (2020)
Sattar, H., Pons-Moll, G., Fritz, M.: Fashion is taking shape: Understanding clothing preference based on body shape from online sources. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp. 968–977. IEEE (2019)
Sekine, M., Sugita, K., Perbet, F., Stenger, B., Nishiyama, M.: Virtual fitting by single-shot body shape estimation. In: International conference on 3D body scanning technologies, pp. 406–413. Citeseer (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, D., Li, T., Mao, Z., Liu, A.A.: Sp-viton: shape-preserving image-based virtual try-on network. Multimed. Tools Appl. 79(45), 33757–33769 (2020)
Article Google Scholar
Trebing, K., Stanczyk, T., Mehrkanoon, S.: Smaat-unet: Precipitation nowcasting using a small attention-unet architecture. Pattern Recogn. Lett. 145, 178–186 (2021)
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European conference on computer vision (ECCV), pp. 589–604 (2018)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7850–7859 (2020)
Zhou, S., Fu, H., Liu, L., Cohen-Or, D., Han, X.: Parametric reshaping of human bodies in images. ACM Trans. Graph. (TOG) 29(4), 1–10 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Wuhan Textile University – Yangguang Campus, Wuhan, China
Xinrong Hu, Junyu Zhang, Jin Huang, JinXing Liang, Feng Yu & Tao Peng

Authors

Xinrong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Huang
View author publications
You can also search for this author in PubMed Google Scholar
JinXing Liang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, X., Zhang, J., Huang, J. et al. Virtual try-on based on attention U-Net. Vis Comput 38, 3365–3376 (2022). https://doi.org/10.1007/s00371-022-02563-6

Download citation

Accepted: 18 May 2022
Published: 15 July 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s00371-022-02563-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Virtual try-on based on attention U-Net

Abstract

Access this article

Similar content being viewed by others

LG-VTON: Fashion Landmark Meets Image-Based Virtual Try-On

Single Stage Virtual Try-On Via Deformable Attention Flows

High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Virtual try-on based on attention U-Net

Abstract

Access this article

Similar content being viewed by others

LG-VTON: Fashion Landmark Meets Image-Based Virtual Try-On

Single Stage Virtual Try-On Via Deformable Attention Flows

High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation