Abstract
Image-based virtual try-on systems have significant commercial value in online garment shopping. However, prior methods fail to appropriately handle details, so are defective in maintaining the original appearance of organizational items including arms, the neck, and in-shop garments. We propose a novel high fidelity virtual try-on network to generate realistic results. Specifically, a distributed pipeline is used for simultaneous generation of organizational items. First, the in-shop garment is warped using thin plate splines (TPS) to give a coarse shape reference, and then a corresponding target semantic map is generated, which can adaptively respond to the distribution of different items triggered by different garments. Second, organizational items are componentized separately using our novel semantic map-based image adjustment network (SMIAN) to avoid interference between body parts. Finally, all components are integrated to generate the overall result by SMIAN. A priori dual-modal information is incorporated in the tail layers of SMIAN to improve the convergence rate of the network. Experiments demonstrate that the proposed method can retain better details of condition information than current methods. Our method achieves convincing quantitative and qualitative results on existing benchmark datasets.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Jetchev, N.; Bergmann, U. The conditional analogy GAN: Swapping fashion articles on people images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2287–2292, 2017.
Han, X. T.; Wu, Z. X.; Wu, Z.; Yu, R. C.; Davis, L. S. VITON: An image-based virtual try-on network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7543–7552, 2018.
Lee, H. J.; Lee, R.; Kang, M.; Cho, M.; Park, G. LA-VITON: A network for looking-attractive virtual try-on. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 3129–3132, 2019.
Wang, B.; Zheng H.; Liang, X.; Chen, Y.; Lin, L.; Yang, M. Toward characteristic-preserving image-based virtual try-on network. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11217. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 607–623, 2018.
Han, X. T.; Huang, W. L.; Hu, X. J.; Scott, M. ClothFlow: A flow-based model for clothed person generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10470–10479, 2019.
Ma, Q. L.; Yang, J. L.; Ranjan, A.; Pujades, S.; Pons-Moll, G.; Tang, S. Y.; Black, M. J. Learning to dress 3D people in generative clothing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6468–6477, 2020.
Mir, A.; Alldieck, T.; Pons-Moll, G. Learning to transfer texture from clothing images to 3D humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7021–7032, 2020.
Zhu, H. M.; Cao, Y.; Jin, H.; Chen, W. K.; Du, D.; Wang, Z. Y.; Cui, S.; Han, X. Deep Fashion3D: A dataset and benchmark for 3D garment reconstruction from single images. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 512–530, 2020.
Lahner, Z.; Cremers, D.; Tung, T. DeepWrinkles: Accurate and realistic clothing modeling. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11208. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 698–715, 2018.
Liang, J. B.; Lin, M. C. Machine learning for digital try-on: Challenges and progress. Computational Visual Media Vol. 7, No. 2, 159–167, 2021.
Zheng, Z. H.; Zhang, H. T.; Zhang, F. L.; Mu, T. J. Image-based clothes changing system. Computational Visual Media Vol. 3, No. 4, 337–347, 2017.
Neuberger, A.; Borenstein, E.; Hilleli, B.; Oks, E.; Alpert, S. Image based virtual try-on network from unpaired data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5183–5192, 2020.
Rocco, I.; Arandjelovi R.; Sivic, J. Convolutional neural network architecture for geometric matching. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 11, 2553–2567, 2019.
Duchon, J. Splines minimizing rotation-invariant semi-norms in Sobolev spaces. In: Constructive Theory of Functions of Several Variables. Lecture Notes in Mathematics, Vol. 571. Schempp, W.; Zeller, K. Eds. Springer Berlin Heidelberg, 85–100, 1977.
Minar, M. R.; Tuan, T. T.; Ahn, H.; Rosin, P.; Lai. Y.-K. CP-VTON+: Clothing shape and texture preserving image-based virtual try-on. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.
Yang, H.; Zhang, R. M.; Guo, X. B.; Liu, W.; Zuo, W. M.; Luo, P. Towards photo-realistic virtual try-on by adaptively Generating↔Preserving image content. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7847–7856, 2020.
Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM Vol. 63, No. 11, 139–144, 2020.
Yu, R. Y.; Wang, X. Q.; Xie, X. H. VTNFP: An image-based virtual try-on network with body and clothing feature preservation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10510–10519, 2019.
Karras, T.; Laine, S.; Aila, T. M. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4396–4405, 2019.
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
Jo, Y.; Park, J. SC-FEGAN: Face editing generative adversarial network with user’s sketch and color. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1745–1753, 2019.
Choi, Y.; Choi, M.; Kim, M.; Ha, J. W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789–8797, 2018.
Honda, S. VITON-GAN: Virtual try-on image generator trained with adversarial loss. In: Proceedings of the Eurographics 2019 — Posters, 2019.
Cui, Y. R.; Liu, Q.; Gao, C. Y.; Su, Z. FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets. Computer Graphics Forum Vol. 37, No. 7, 109–119, 2018.
Zhang, F.; Zhu, X. T.; Dai, H. B.; Ye, M.; Zhu, C. Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7091–7100, 2020.
Cheng, B. W.; Xiao, B.; Wang, J. D.; Shi, H. H.; Huang, T. S.; Zhang, L. HigherHRNet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5385–5394, 2020.
Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S. H.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 1, 172–186, 2021.
Gong, K.; Liang, X. D.; Zhang, D. Y.; Shen, X. H.; Lin, L. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6757–6765, 2017.
Wang, W.; Yu, K. C.; Hugonot, J.; Fua, P.; Salzmann, M. Recurrent U-net for resource-constrained segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2142–2151, 2019.
Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. CCNet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 603–612, 2019.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Osman, A. A. A.; Bolkart, T.; Black, M. J. STAR: Sparse trained articulated human body regressor. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12351. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 598–613, 2020.
Zhao, F. W.; Xie, Z. Y.; Kampffmeyer, M.; Dong, H. Y.; Han, S. F.; Zheng, T. X.; Zhang, T.; Liang, X. M3D-VTON: A monocular-to-3D virtual try-on network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 13219–13229, 2021.
Cui, A.; McKee, D.; Lazebnik, S. Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14638–14647, 2021.
Choi, S.; Park, S.; Lee, M.; Choo, J. VITON-HD: High-resolution virtual try-on via misalignment-aware normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14126–14135, 2021.
Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
Wang, T. C.; Liu, M. Y.; Zhu, J. Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807, 2018.
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention — MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234–241, 2015.
Men, Y. F.; Mao, Y. M.; Jiang, Y. N.; Ma, W. Y.; Lian, Z. H. Controllable person image synthesis with attribute-decomposed GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5083–5092, 2020.
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
Kingma, D. P.; Ba, J. L. Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations, 2015.
Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600–612, 2004.
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2234–2242, 2016.
Jandial, S.; Chopra, A.; Ayush, K.; Hemani, M.; Kumar, A.; Krishnamurthy, B. SieveNet: A unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2171–2179, 2020.
Ge, C. J.; Song, Y. B.; Ge, Y. Y.; Yang, H.; Liu, W.; Luo, P. Disentangled cycle consistency for highly-realistic virtual try-on. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16923–16932, 2021.
Acknowledgements
This manuscript is an extended version of our previous work which appeared at the IEEE International Conference on Tools with Artificial Intelligence (C. Du et al. VTON-HF: High fidelity virtual try-on network via semantic adaptation. ICTAI 2021, 224–231, doi: 10.1109/ICTAI52525.2021.00038). We declare that we submit this manuscript to Computational Visual Media with permission.
We would like to thank the anonymous reviewers for their constructive comments. The findings and observations in this paper are those of the authors and do not necessarily reflect the views of the supporters.
Funding
This work was supported by Young Talents Programme of Scientific Research Program of Hubei Education Department (Project No. Q20201709), Research on the Key Technology of Flexible Intelligent Manufacturing of Clothing based on Digital Twin of Hubei Key Research and Development Program (Project No. 2021BAA042), and Open Topic of Engineering Research Center of Hubei Province for Clothing Information (Project No. 900204).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Chenghu Du is currently a master student in the School of Computer Science and Artificial Intelligence, Wuhan Textile University, where he received his B.S. degree in computer science and technology in 2019. His research interests include image processing and computer vision.
Feng Yu is currently a lecturer with the School of Computer Science and Artificial Intelligence, Wuhan Textile University. He received his Ph.D. degree from the School of Computer Science and Technology, Huazhong University of Science and Technology. His research interests include machine vision algorithms, artificial intelligence applications, and clothing intelligent manufacturing.
Minghua Jiang is currently the vice-chancellor of Wuhan Textile University, where he is also a professor with the School of Computer Science and Artificial Intelligence. He received his Ph.D. degree from the School of Computer Science and Technology, Huazhong University of Science and Technology. His research interests include computer system architecture, artificial intelligence applications, and clothing intelligent manufacturing.
Ailing Hua is currently a master student in the School of Computer Science and Artificial Intelligence, Wuhan Textile University, where she received her B.S. degree in software engineering in 2019. Her research interests include image processing and machine learning.
Yaxin Zhao is currently a master student in the School of Computer Science and Artificial Intelligence, Wuhan Textile University, where she received her B.S. degree in information management and information systems in 2019. Her research interests include deep learning and image processing.
Xiong Wei received his Ph.D. degree in computer architecture in 2011 and carried out postdoctoral research in computer architecture in 2011 in the National University of Defense Technology. He is currently an associate professor and a vice dean in the School of Computer Science and Artificial Intelligence at Wuhan Textile University. His research interests include storage architecture, GPUs, and parallel algorithms.
Tao Peng received his M.Sc. and Ph.D. degrees in computer science from Huazhong University of Science and Technology in 2006 and 2011, respectively. He is currently an associate professor in the School of Computer Science and Artificial Intelligence, Wuhan Textile University. His research interests include data mining, pattern recognition, and network security.
Xinron Hu earned her Ph.D. degree at the Institute for Pattern Recognition and Artificial Intelligence, Huazhong University of Science and Technology in 2008. Now she is a professor and dean in the School of Computer Science and Artificial Intelligence, Wuhan Textile University. Her research interests include image processing, virtual reality technology, and computer vision.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Du, C., Yu, F., Jiang, M. et al. High fidelity virtual try-on network via semantic adaptation and distributed componentization. Comp. Visual Media 8, 649–663 (2022). https://doi.org/10.1007/s41095-021-0264-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-021-0264-2