Skip to main content
Log in

VTNCT: an image-based virtual try-on network by combining feature with pixel transformation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Image-based virtual try-on tasks with the goal of transferring a target clothing item onto the corresponding region of a person have attracted increasing research attention recently. However, most of the existing image-based virtual try-on methods have a shortcoming in detail generation and preservation. To resolve these issues, we propose a novel virtual try-on network to generate photo-realistic try-on image while preserving the details of clothes and non-target regions. We introduce two key innovations. One is the clothing warping module, which uses a warping strategy combining feature with pixel transformation to obtain the warped clothes with realistic texture and robust alignment. The other is the arm generation module, which is an original module and is highly effective for dealing with occlusion and generating the details of the arm region. In addition, we use a distillation strategy to solve the degeneration caused by the wrong parsing, which further proves the effectiveness of our components. Extensive experiments on a public fashion dataset demonstrate our system achieves the state-of-the-art virtual try-on performance both qualitatively and quantitatively. The code is available at https://github.com/changyuan96/VTNCT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)

    Article  Google Scholar 

  2. Brouet, R., Sheffer, A., Boissieux, L., Cani, M.P.: Design preserving garment transfer. ACM Transactions on Graphics 31(4). https://doi.org/10.1145/2185520.2185532

  3. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)

  4. Chang, Y., Peng, T., He, R., Hu, X., Liu, J., Zhang, Z., Jiang, M.: Dp-vton: toward detail-preserving image-based virtual try-on network. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2295–2299. IEEE (2021)

  5. Chen, W., Wang, H., Li, Y., Su, H., Wang, Z., Tu, C., Lischinski, D., Cohen-Or, D., Chen, B.: Synthesizing training images for boosting human 3d pose estimation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 479–488. IEEE (2016)

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

  7. Dong, H., Liang, X., Shen, X., Wang, B., Lai, H., Zhu, J., Hu, Z., Yin, J.: Towards multi-pose guided virtual try-on network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9026–9035 (2019)

  8. Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8485–8493 (2021)

  9. Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)

  10. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

  11. Guan, P., Reiss, L., Hirshberg, D.A., Weiss, A., Black, M.J.: Drape: dressing any person. ACM Trans. Graph. (TOG) 31(4), 1–10 (2012)

    Article  Google Scholar 

  12. Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y., Wang, J.: Long text generation via adversarial training with leaked information. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  13. Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7543–7552 (2018)

  14. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)

  15. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

  16. Issenhuth, T., Mary, J., Calauzenes, C.: Do not mask what you do not need to mask: a parser-free virtual try-on. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp. 619–635. Springer (2020)

  17. Jandial, S., Chopra, A., Ayush, K., Hemani, M., Krishnamurthy, B., Halwai, A.: Sievenet: A unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2182–2190 (2020)

  18. Jetchev, N., Bergmann, U.: The conditional analogy GAN: swapping fashion articles on people images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2287–2292 (2017)

  19. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  20. Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: blind motion deblurring using conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8183–8192 (2018)

  21. Lassner, C., Pons-Moll, G., Gehler, P.V.: A generative model of people in clothing. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 853–862 (2017)

  22. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)

  23. Lee, H.J., Lee, R., Kang, M., Cho, M., Park, G.: La-viton: a network for looking-attractive virtual try-on. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3129–3132. IEEE (2019)

  24. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: Advances in Neural Information Processing Systems, pp. 405–415 (2017)

  25. Ma, T., Tian, W.: Back-projection-based progressive growing generative adversarial network for single image super-resolution. Vis. Comput. 37(5), 925–938 (2021)

    Article  Google Scholar 

  26. Minar, M., Tuan, T., Ahn, H., Rosin, P., Lai, Y.: Cp-vton+: clothing shape and texture preserving image-based virtual try-on. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, vol. 2, p. 11 (2020)

  27. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  28. Mo, S., Cho, M., Shin, J.: Instagan: instance-aware image-to-image translation. arXiv preprint arXiv:1812.10889 (2018)

  29. Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: Clothcap: seamless 4d clothing capture and retargeting. ACM Trans. Graph. (TOG) 36(4), 1–15 (2017)

    Article  Google Scholar 

  30. Pumarola, A., Agudo, A., Sanfeliu, A., Moreno-Noguer, F.: Unsupervised person image synthesis in arbitrary poses. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8620–8628 (2018)

  31. Qiao, T., Zhang, J., Xu, D., Tao, D.: Mirrorgan: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)

  32. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)

  33. Rocco, I., Arandjelovic, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6148–6157 (2017)

  34. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)

  35. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)

  36. Siarohin, A., Sangineto, E., Lathuiliere, S., Sebe, N.: Deformable GANs for pose-based human image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3408–3416 (2018)

  37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

  38. Song, H., Wang, M., Zhang, L., Li, Y., Jiang, Z., Yin, G.: S2rgan: sonar-image super-resolution based on generative adversarial network. Vis. Comput. 37(8), 2285–2299 (2021)

    Article  Google Scholar 

  39. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

  40. Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 589–604 (2018)

  41. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)

  42. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  43. Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7850–7859 (2020)

  44. Yang, Y., Cheng, Z., Yu, H., Zhang, Y., Cheng, X., Zhang, Z., Xie, G.: MSE-Net: generative image inpainting with multi-scale encoder. Vis. Comput., 1–13 (2021)

  45. Yin, G., Liu, B., Sheng, L., Yu, N., Wang, X., Shao, J.: Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2327–2336 (2019)

  46. Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: sequence generative adversarial nets with policy gradient. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

  47. Yu, R., Wang, X., Xie, X.: Vtnfp: an image-based virtual try-on network with body and clothing feature preservation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10511–10520 (2019)

  48. Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., Carin, L.: Adversarial feature matching for text generation. In: International Conference on Machine Learning, pp. 4006–4015. PMLR (2017)

  49. Zhao, B., Wu, X., Cheng, Z.Q., Liu, H., Jie, Z., Feng, J.: Multi-view image generation from a single-view. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 383–391 (2018)

  50. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

Download references

Acknowledgements

This work is supported in part by the Science Foundation of Hubei under Grant No.2014CFB764 and Department of Education of the Hubei Province of China under Grant No.Q20131608, and Engineering Research Center of Hubei Province for Clothing Information.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Peng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, Y., Peng, T., Yu, F. et al. VTNCT: an image-based virtual try-on network by combining feature with pixel transformation. Vis Comput 39, 2583–2596 (2023). https://doi.org/10.1007/s00371-022-02480-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02480-8

Keywords

Navigation