SP-VITON: shape-preserving image-based virtual try-on network


Image-based virtual try-on networks for changing the outfit of a person in an image with the desired clothes of another image have attracted increasing research interests. Previous work try to extract a clothing-agnostic person representation from the original person image and then synthesize it with the given clothes image through a try-on network. However, their body shape representation just downsamples the clothed body segmentation to a low resolution, which is too coarse and still contains noises of original clothes and may result in unrealistic artifacts. Correspondingly, we propose an SP-VITON (Shape-Preserving VIrtual Try-On Network) to keep the user’s original body shape while getting rid of the original clothes. Firstly, we augment the shape variety of the dataset and estimate the 2D shape under clothes of the person using DensePose. Then a try-on network is trained with the augmented dataset and new shape representation. Experiment results show our improvements for applying to various shapes and clothes types of the input person image, compared with the state-of-the-art image-based try-on methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Alp Güler R, Neverova N, Kokkinos I (2018) Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7297–7306

  2. 2.

    Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J (2005) Scape: shape completion and animation of people. In: ACM transactions on graphics (TOG), vol 24. ACM, pp 408–416

  3. 3.

    Bălan AO, Black MJ (2008) The naked truth: estimating body shape under clothing. In: European conference on computer vision. Springer, pp 15–29

  4. 4.

    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. In: IEEE transactions on pattern analysis and machine intelligence (TPAMI) 24(4):509–522

    Google Scholar 

  5. 5.

    Bender J, Müller M, Macklin M (2015) Position-based simulation methods in computer graphics. In: Eurographics (tutorials)

  6. 6.

    Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, Black MJ (2016) Keep it smpl: automatic estimation of 3d human pose and shape from a single image. In: European conference on computer vision. Springer, pp 561–578

  7. 7.

    Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1302–1310

  8. 8.

    Cheng Z, Ding Y, He X, Zhu L, Song X, Kankanhalli M (2018) A3ncf: an adaptive aspect attention model for rating prediction

  9. 9.

    Cheng Z, Ding Y, Zhu L, Kankanhalli M (2018) Aspect-aware latent factor model: rating prediction with ratings and reviews. arXiv:1802.07938

  10. 10.

    Dong H, Liang X, Gong K, Lai H, Zhu J, Yin J (2018) Soft-gated warping-gan for pose-guided person image synthesis. In: Advances in neural information processing systems, pp 472–482

  11. 11.

    Han X, Wu Z, Wu Z, Yu R, Davis LS (2018) Viton: an image-based virtual try-on network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7543–7552

  12. 12.

    Hao T, Wang B, Zhao L, Feng X, Sun J (2018) Reconstruction and analysis of a genome-scale metabolic network foreriocheir sinensishepatopancreas. IEEE Access 6:79235–79244

    Article  Google Scholar 

  13. 13.

    Hao T, Yu AL, Peng W, Wang B, Sun JS (2016) Cross domain mitotic cell recognition. Neurocomputing 195:6–12

    Article  Google Scholar 

  14. 14.

    Hasler N, Stoll C, Rosenhahn B, Thormählen T, Seidel HP (2009) Estimating body shape of dressed humans. Comput Graphics 33(3):211–216

    Article  Google Scholar 

  15. 15.

    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Comput Sci

  16. 16.

    Liang X, Gong K, Shen X, Lin L (2018) Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell

  17. 17.

    Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: a skinned multi-person linear model. ACM Transactions on Graphics (TOG) 34(6):248

    Article  Google Scholar 

  18. 18.

    Macklin M, Müller M, Chentanez N (2016) Xpbd: position-based simulation of compliant constrained dynamics. In: Proceedings of the 9th international conference on motion in games. ACM , pp 49–54

  19. 19.

    Miguel E, Bradley D, Thomaszewski B, Bickel B, Matusik W, Otaduy MA, Marschner S (2012) Data-driven estimation of cloth simulation models. In: Computer graphics forum. Wiley Online Library, vol 31, pp 519–528

  20. 20.

    Neverova N, Alp Guler R, Kokkinos I (2018) Dense pose transfer. In: Proceedings of the European conference on computer vision (ECCV), pp 123–138

  21. 21.

    Omran M, Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 2018 international conference on 3d vision (3DV). IEEE, pp 484–494

  22. 22.

    Pons-Moll G, Pujades S, Hu S, Black MJ (2017) Clothcap: seamless 4d clothing capture and retargeting. ACM Transactions on Graphics (TOG) 36(4):73

    Article  Google Scholar 

  23. 23.

    Raj A, Sangkloy P, Chang H, Hays J, Ceylan D, Lu J (2018) Swapnet: image based garment transfer. In: European conference on computer vision. Springer, pp 679–695

  24. 24.

    Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242

  25. 25.

    Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016). In: 3d human pose estimation: a review of the literature and analysis of covariates, vol 152, pp 1–20

  26. 26.

    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  27. 27.

    Song D, Tong R, Chang J, Yang X, Tang M, Zhang JJ (2016) 3d body shapes estimation from dressed-human silhouettes. In: Computer graphics forum. Wiley Online Library, vol 35, pp 147–156

  28. 28.

    Song D, Tong R, Du J, Zhang Y, Jin Y (2018) Data-driven 3d human body customization with a mobile device. IEEE Access

  29. 29.

    Tang M, Tong R, Narain R, Meng C, Manocha D (2013) A gpu-based streaming algorithm for high-resolution cloth simulation. In: Computer graphics forum. Wiley Online Library, vol 32, pp 21–30

  30. 30.

    Tsoli A, Mahmood N, Black MJ (2014) Breathing life into shape: capturing, modeling and animating 3d human breathing. ACM Transactions on Graphics (TOG) 33(4):52

    Article  Google Scholar 

  31. 31.

    Wang B, Zheng H, Liang X, Chen Y, Lin L, Yang M (2018) Toward characteristic-preserving image-based virtual try-on network. In: European conference on computer vision. Springer, pp 607–623

  32. 32.

    Wang H, O’Brien JF, Ramamoorthi R (2011) Data-driven elastic models for cloth: modeling and measurement. In: ACM transactions on graphics (TOG), vol 30. ACM, p 71

  33. 33.

    Wu Z, Lin G, Tao Q, Cai J (2018) M2e-try on net: fashion from model to everyone. arXiv:1811.08599

  34. 34.

    Wuhrer S, Pishchulin L, Brunton A, Shu C, Lang J (2014) Estimation of human body shape and posture under clothing. Comput Vis Image Underst 127:31–42

    Article  Google Scholar 

  35. 35.

    Yang S, Pan Z, Amert T, Wang K, Yu L, Berg T, Lin MC (2018) Physics-inspired garment recovery from a single-view image. ACM Transactions on Graphics (TOG) 37(5):170

    Article  Google Scholar 

  36. 36.

    Yoo D, Kim N, Park S, Paek AS, Kweon IS (2016) Pixel-level domain transfer. In: European conference on computer vision, pp 517–532

  37. 37.

    Zhou S, Fu H, Liu L, Cohen-Or D, Han X (2010) Parametric reshaping of human bodies in images. In: ACM transactions on graphics (TOG), vol 29. ACM, p 126

  38. 38.

    Zhu S, Urtasun R, Fidler S, Lin D, Change Loy C (2017) Be your own prada: fashion synthesis with structural coherence. In: Proceedings of the IEEE international conference on computer vision, pp 1680–1688

Download references


This work was supported in part by the National Nature Science Foundation of China (61902277,61772359,61872267,61702471), the grant of 2019 Tianjin New Generation Artificial Intelligence Major Program, the grant of 2018 Tianjin New Generation Artificial Intelligence Major Program (18ZXZNGX00150), the Open Project Program of the State Key Lab of CAD & CG, Zhejiang University (Grant No.A1907), the grant of Elite Scholar Program of Tianjin University (2019XRX-0035).

Author information



Corresponding author

Correspondence to An-An Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Song, D., Li, T., Mao, Z. et al. SP-VITON: shape-preserving image-based virtual try-on network. Multimed Tools Appl 79, 33757–33769 (2020). https://doi.org/10.1007/s11042-019-08363-w

Download citation


  • Virtual try-on
  • Shape-preserving
  • Person image synthesis
  • Image alignment