Context-Aware Enhanced Virtual Try-On Network with fabric adaptive registration

Tong, Shuo; Liu, Han; Guo, Runyuan; Wang, Wenqing; Liu, Ding

doi:10.1007/s00371-024-03432-0

Context-Aware Enhanced Virtual Try-On Network with fabric adaptive registration

Research
Published: 11 May 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shuo Tong¹,
Han Liu¹,
Runyuan Guo¹,
Wenqing Wang¹ &
…
Ding Liu¹

39 Accesses
Explore all metrics

Abstract

Image-based virtual try-on technology provides a better shopping experience for online customers and holds immense commercial value. However, existing methods face challenges in accurately aligning garments and preservation of garment texture details when dealing with challenging body poses or complex target clothing. Furthermore, these methods have been unable to adaptively fuse and generate images based on different body parts in a refined manner, struggling to generate and retain high-quality details of body parts, resulting in limited quality of try-on results. To address these issues, we propose a novel virtual try-on network named Context-Aware Enhanced Virtual Try-On Network (CAE-VTON). The key ideas of our method are as follows: (1) Introducing a Multi-Scale Neighborhood Consensus Warp Module (MNCWM) with matching filtering capability that is sensitive to small semantic differences, which generates highly accurate garment alignment results and coupled natural try-on generation results. (2) Proposing a fabric deformation energy smoothness loss to constrain local deformations of clothing, thus preserving complex details in garments. (3) Designing a Body Reconstruction Module (BRM) that adaptively generates and retains exposed skin areas of the body. (4) Introducing a novel try-on generation module called Context-Adaptive Awareness-Enhanced Try-on Module (CAAETM) that integrates all components and utilizes target semantic label map to adaptively generate the final try-on results for different body parts. We evaluate our model on the VITON-HD and VITON datasets and find that our method achieves state-of-the-art performance in qualitative and quantitative evaluations for virtual try-on.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

Augmented Reality: A Comprehensive Review

Article 20 October 2022

Open-Vocabulary Text-Driven Human Image Generation

Article 15 May 2024

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 770–785 (2018)
Patel, C., Liao, Z., Pons-Moll, G.: Tailornet: Predicting clothing in 3d as a function of human pose, shape and garment style. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7363–7373 (2020)
Li, C., Cohen, F.: Virtual reconstruction of 3d articulated human shapes applied to garment try-on in a virtual fitting room. Multimed. Tools Appl. 81(8), 11071–11085 (2022)
Article Google Scholar
Chen, Z., Yu, F., Jiang, M., Wang, H., Hua, A., Peng, T., Hu, X., Zhu, P.: Three stages of 3d virtual try-on network with appearance flow and shape field. Vis. Comput. 2023, 1–15 (2023)
Google Scholar
Hu, X., Zheng, C., Huang, J., Luo, R., Liu, J., Peng, T.: Cloth texture preserving image-based 3d virtual try-on. Vis. Comput. 39(8), 3347–3357 (2023)
Article Google Scholar
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: An image-based virtual try-on network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7543–7552 (2017)
Han, X., Huang, W., Hu, X., Scott, M.R.: Clothflow: a flow-based model for clothed person generation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10470–10479 (2019)
Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating preserving image content. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7847–7856 (2020)
Minar, M.R., Tuan, T.T., Ahn, H., Rosin, P., Lai, Y.-K.: Cp-vton+: Clothing shape and texture preserving image-based virtual try-on. In: CVPR Workshops, vol. 3, pp. 10–14 (2020)
Ge, C., Song, Y., Ge, Y., Yang, H., Liu, W., Luo, P.: Disentangled cycle consistency for highly-realistic virtual try-on. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16928–16937 (2021)
Yang, H., Yu, X., Liu, Z.: Full-range virtual try-on with recurrent tri-level transform. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3460–3469 (2022)
Yu, R., Wang, X., Xie, X.: Vtnfp: An image-based virtual try-on network with body and clothing feature preservation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10511–10520 (2019)
Jandial, S., Chopra, A., Ayush, K., Hemani, M., Krishnamurthy, B., Halwai, A.: Sievenet: A unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2182–2190 (2020)
Du, C., Yu, F., Jiang, M., Hua, A., Zhao, Y., Wei, X., Peng, T., Hu, X.: High fidelity virtual try-on network via semantic adaptation and distributed componentization. Comput. Vis. Media 8(4), 649–663 (2022)
Article Google Scholar
Chopra, A., Jain, R., Hemani, M., Krishnamurthy, B.: Zflow: Gated appearance flow-based virtual try-on with 3d priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5433–5442 (2021)
Choi, S., Park, S., Lee, M., Choo, J.: Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14131–14140 (2021)
Fele, B., Lampe, A., Peer, P., Struc, V.: C-vton: Context-driven image-based virtual try-on network. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3144–3153 (2022)
Zunair, H., Gobeil, Y., Mercier, S., Hamza, A.B.: Fill in fabrics: Body-aware self-supervised inpainting for image-based virtual try-on. In: British Machine Vision Conference (2022)
Lee, S., Gu, G., Park, S., Choi, S., Choo, J.: High-resolution virtual try-on with misalignment and occlusion-handled conditions. In: European Conference on Computer Vision, pp. 204–219 (2022). Springer
Song, D., Li, T., Mao, Z., Liu, A.-A.: Sp-viton: shape-preserving image-based virtual try-on network. Multimed. Tools Appl. 79, 33757–33769 (2020)
Article Google Scholar
Roy, D., Santra, S., Chanda, B.: Lgvton: a landmark guided approach for model to person virtual try-on. Multimed. Tools Appl. 81(4), 5051–5087 (2022)
Article Google Scholar
Chang, Y., Peng, T., Yu, F., He, R., Hu, X., Liu, J., Zhang, Z., Jiang, M.: Vtnct: an image-based virtual try-on network by combining feature with pixel transformation. Vis. Comput. 39(7), 2583–2596 (2023)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). Springer
Guo, R., Liu, H., Liu, D.: When deep learning-based soft sensors encounter reliability challenges: a practical knowledge-guided adversarial attack and its defense. IEEE Trans. Ind. Inf. 20(2), 2702–2714 (2024)
Article Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 286–301 (2016). Springer
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Ncnet: Neighbourhood consensus networks for estimating image correspondences. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1020–1034 (2020)
Article Google Scholar
Guo, R., Liu, H., Xie, G., Zhang, Y., Liu, D.: A self-interpretable soft sensor based on deep learning and multiple attention mechanism: from data selection to sensor modeling. IEEE Trans. Ind. Inf. 19(5), 6859–6871 (2023)
Article Google Scholar
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)
Article Google Scholar
Liu, X., Yin, G., Shao, J., Wang, X., Li, H.: Learning to predict layout-to-image conditional convolutions for semantic image synthesis. In: Neural Information Processing Systems (2019)
Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Book Google Scholar
Yin, Z., Shi, J.: Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992 (2018)
Shen, Z., Han, X., Xu, Z., Niethammer, M.: Networks for joint affine and non-parametric image registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4224–4233 (2019)
Rohr, K.: Landmark-Based Image Analysis: Using Geometric and Intensity Models. Kluwer, Dordrecht (2001)
Book Google Scholar
Li, J., Wang, Z., Lai, S., Zhai, Y., Zhang, M.: Parallax-tolerant image stitching based on robust elastic warping. IEEE Trans. Multimed. 20(7), 1672–1687 (2017)
Article Google Scholar
Unser, M., Aldroubi, A., Eden, M.: B-spline signal processing. i. Theory. IEEE Trans. Signal Process. 41(2), 821–833 (1993)
Article Google Scholar
Li, S., Han, K., Costain, T.W., Howard-Jenkins, H., Prisacariu, V.A.: Correspondence networks with adaptive neighbourhood consensus. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10193–10202 (2020)
Kang, D., Kwon, H., Min, J., Cho, M.: Relational embedding for few-shot classification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8802–8813 (2021)
Jia, X., De Brabandere, B., Tuytelaars, T., Van Gool, L., ESAT-PSI, K.: Dynamic filter networks for predicting unobserved views. In: Proc. Eur. Conf. Comput. Vis. Workshops, pp. 1–2 (2016)
Yang, B., Bender, G., Le, Q.V., Ngiam, J.: Condconv: Conditionally parameterized convolutions for efficient inference. In: Advances in neural information processing systems, vol. 32 (2019)
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 589–604 (2018)
Hu, X., Zhang, J., Huang, J., Liang, J., Yu, F., Peng, T.: Virtual try-on based on attention u-net. Vis. Comput. 38(9–10), 3365–3376 (2022)
Article Google Scholar
Ren, B., Tang, H., Meng, F., Ding, R., Torr, P.H., Sebe, N.: Cloth interactive transformer for virtual try-on. ACM Transactions on Multimedia Computing, Communications and Applications (2021)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1510–1519 (2017)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Kim, J., Kim, M., Kang, H., Lee, K.: U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830 (2019)
Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Li, Z., Liu, Y., Chen, X., Cai, H., Gu, J., Qiao, Y., Dong, C.: Blueprint separable residual network for efficient image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 833–843 (2022)
Bynagari, N.B.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Asian J. Appl. Sci. Eng. (2019)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant 92270117, Grant 62127809, Grant 61973248, the Natural Science Basic Research Program of Shaanxi under Grant 2024JC-YBQN-0697, and the Doctoral Scientific Research Startup Foundation of Xi’an University of Technology under Grant 103-451123015.

Author information

Authors and Affiliations

School of Automation and Information Engineering, Xi’an University of Technology, No. 5 South Jinhua Road, Xi’an, 710048, Shaanxi, China
Shuo Tong, Han Liu, Runyuan Guo, Wenqing Wang & Ding Liu

Authors

Shuo Tong
View author publications
You can also search for this author in PubMed Google Scholar
Han Liu
View author publications
You can also search for this author in PubMed Google Scholar
Runyuan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wenqing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ding Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han Liu.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tong, S., Liu, H., Guo, R. et al. Context-Aware Enhanced Virtual Try-On Network with fabric adaptive registration. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03432-0

Download citation

Accepted: 22 April 2024
Published: 11 May 2024
DOI: https://doi.org/10.1007/s00371-024-03432-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-Aware Enhanced Virtual Try-On Network with fabric adaptive registration

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

Augmented Reality: A Comprehensive Review

Open-Vocabulary Text-Driven Human Image Generation

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Context-Aware Enhanced Virtual Try-On Network with fabric adaptive registration

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

Augmented Reality: A Comprehensive Review

Open-Vocabulary Text-Driven Human Image Generation

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation