Skip to main content
Log in

Superpixels Features Extractor Network (SP-FEN) for Clothing Parsing Enhancement

  • Published:
Neural Processing Letters Aims and scope Submit manuscript


In this paper, the research looks at improving clothing parsing using superpixels features extractor network (SP-FEN). Clothing parsing using a fully convolutional network has two parts: an encoder and decoder. The encoder lowers the dimensionality and produces a low-resolution prediction, while the decoder tries to upscale the prediction and returns it to the size of the input image. Typically, fine-grained details get lost in the encoding part of the model is not recovered well in the decoder part. To fix this issue, skip connections are typically used in recovering and adding more fine-grained details to the final prediction. A new method is proposed to introduce superpixels features to the decoder by adding a side network (SP-FEN) that extracts features from superpixels representation of the input image using the SLIC Algorithm. SP-FEN then produces a meaningful superpixels features to be injected into the decoder. The SP-FEN is learning to choose specific features to be fed to the decoder part to boost the outputs overall quality. The proposed method has shown to enhance the MIoU accuracy using the refined Fashionista V1.0 dataset and CFPD dataset. The results showed that the proposed approach achieved superior performance with pixel-wise segmentation and clothing parsing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others


  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467

  2. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282

    Article  Google Scholar 

  3. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  4. Bossard L, Dantone M, Leistner C, Wengert C, Quack T, Van Gool L (2012) Apparel classification with style. In: Asian conference on computer vision, Springer, Berlin, pp 321–335

  5. Chao X, Huiskes MJ, Gritti T, Ciuhu C (2009) A framework for robust feature selection for real-time fashion style recommendation. In: Proceedings of the 1st international workshop on Interactive multimedia for consumer electronics, ACM, New York, pp 35–42

  6. Chen H, Gallagher A, Girod B (2012) Describing clothing by semantic attributes. In: European conference on computer vision, Springer, BErlin, pp 609–623

  7. Chen H, Xu ZJ, Liu ZQ, Zhu SC (2006) Composite templates for cloth modeling and sketching. In: 2006 IEEE computer society conference on computer vision and pattern recognition, IEEE, New York, vol 1, pp 943–950

  8. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915

  9. Chen Q, Huang J, Feris R, Brown LM, Dong J, Yan S (2015) Deep domain adaptation for describing people based on fine-grained clothing attributes. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, New York, pp 5315–5324

  10. Cheng HD, Jiang XH, Sun Y, Wang J (2001) Color image segmentation: advances and prospects. Pattern Recogn 34(12):2259–2281

    Article  Google Scholar 

  11. Di W, Wah C, Bhardwaj A, Piramuthu R, Sundaresan N (2013) Style finder: Fine-grained clothing style detection and retrieval. In: 2013 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, New York, pp 8–13

  12. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp 647–655

  13. Dong J, Chen Q, Xia W, Huang Z, Yan S (2013) A deformable mixture parsing model with parselets. In: 2013 IEEE international conference on computer vision (ICCV), IEEE, New York, pp 3408–3415

  14. Efford N (2000) Digital image processing: a practical introduction using java (with CD-ROM). Addison-Wesley Longman Publishing Co. Inc, Boston

    Google Scholar 

  15. Feris R, Bobbitt R, Brown L, Pankanti S (2014) Attribute-based people search: Lessons learnt from a practical surveillance system. In: Proceedings of international conference on multimedia retrieval, ACM, New York

  16. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857

  17. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  18. Gonzalez RC, Woods RE (2018) Digital image processing. Pearson, New York, NY

    Google Scholar 

  19. Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: European conference on computer vision, Springer, Berlin, pp 345–360

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  21. Hsu E, Paz C, Shen S (2011) Clothing image retrieval for smarter shopping. EE368, Department of Electrical and Engineering, Stanford University

  22. Hu Z, Yan H, Lin X (2008) Clothing segmentation using foreground and background estimation based on the constrained delaunay triangulation. Pattern Recogn 41(5):1581–1592

    Article  Google Scholar 

  23. Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. arXiv:1608.06993

  24. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360

  25. Ji W, Li X, Zhuang Y, El Farouk Bourahla O, Ji Y, Li S, Cui J (2018) Semantic locality-aware deformable network for clothing segmentation. In: Proceedings of the 27th international joint conference on artificial intelligence, AAAI Press, pp 764–770

  26. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  27. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  28. Liang X, Lin L, Yang W, Luo P, Huang J, Yan S (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimedia 18(6):1175–1186

    Article  Google Scholar 

  29. Liu S, Feng J, Domokos C, Xu H, Huang J, Hu Z, Yan S (2014) Fashion parsing with weak color-category labels. IEEE Trans Multimedia 16(1):253–265

    Article  Google Scholar 

  30. Liu S, Feng J, Song Z, Zhang T, Lu H, Xu C, Yan S (2012) Hi, magic closet, tell me what to wear! In: Proceedings of the 20th ACM international conference on Multimedia, ACM, New York, pp 619–628

  31. Liu S, Song Z, Liu G, Xu C, Lu H, Yan S (2012) Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, New York, pp 3330–3337

  32. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  33. Marmanis D, Wegner JD, Galliani S, Schindler K, Datcu M, Stilla U (2016) Semantic segmentation of aerial images with an ensemble of cnss. ISPRS Ann Photogramm Remote Sens Spat Inf Sci 2016(3):473–480

    Article  Google Scholar 

  34. Mizuochi M, Kanezaki A, Harada T (2014) Clothing retrieval based on local similarity with multiple images. In: Proceedings of the 22nd ACM international conference on Multimedia, ACM, pp 1165–1168

  35. Redi M (2013) Novel methods for semantic and aesthetic multimedia retrieval. Ph.D. thesis, Université Nice Sophia Antipolis

  36. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3859–3869

  37. Shi S, Wang Q, Xu P, Chu X (2016) Benchmarking state-of-the-art deep learning software tools. In: 2016 7th international conference on cloud computing and big data (CCBD), IEEE, pp 99–104

  38. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  39. Song Z, Wang M, Hua XS, Yan S (2011) Predicting occupation via human clothing and contexts. In: 2011 IEEE international conference on computer vision (ICCV), IEEE, pp 1084–1091

  40. Sonka M, Hlavac V, Boyle R (2014) Image processing, analysis, and machine vision. Cengage Learning, Boston

    Google Scholar 

  41. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, pp 4278–4284

  42. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  43. Tangseng P, Wu Z, Yamaguchi K (2017) Looking at outfit to parse clothing. arXiv preprint arXiv:1703.01386

  44. Taylor L, Nitschke G (2017) Improving deep learning using generic data augmentation. arXiv preprint arXiv:1708.06020

  45. Vaquero DA, Feris RS, Tran D, Brown L, Hampapur A, Turk M (2009) Attribute-based people search in surveillance environments. In: 2009 Workshop on applications of computer vision (WACV), IEEE, pp 1–8

  46. van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, Gouillart E, Yu T (2014) the scikit-image contributors: scikit-image: image processing in Python. PeerJ 2:e453.

    Article  Google Scholar 

  47. Wang H, Peng X, Xiao X, Liu Y (2017) Bslic: Slic superpixels based on boundary term. Symmetry 9(3):31

    Article  MathSciNet  Google Scholar 

  48. Wang LL, Chien CC et al (2007) Color texture segmentation for clothing based on finite prolate spheroidal sequences. Asian J Health Inf Sci 1(4):425–445

    Google Scholar 

  49. Weber M, Bauml M, Stiefelhagen R (2011) Part-based clothing segmentation for person retrieval. In: 2011 8th IEEE international conference on advanced video and signal-based surveillance (AVSS), IEEE, pp 361–366

  50. Wong SC, Gatt A, Stamatescu V, McDonnell MD (2016) Understanding data augmentation for classification: when to warp? In: 2016 international conference on digital image computing: techniques and applications (DICTA), IEEE, pp 1–6

  51. Wu X, Zhao B, Liang LL, Peng Q (2013) Clothing extraction by coarse region localization and fine foreground/background estimation. In: International conference on multimedia modeling, Springer, Berlin, pp 316–326

  52. Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3570–3577

  53. Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2015) Retrieving similar styles to parse clothing. IEEE Trans Pattern Anal Mach Intell 37(5):1028–1040

    Article  Google Scholar 

  54. Yang M, Yu K (2011) Real-time clothing recognition in surveillance videos. In: 2011 18th IEEE international conference on image processing (ICIP), IEEE, pp 2937–2940

  55. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122

  56. Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19(6):1245–1256

    Article  Google Scholar 

  57. Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv preprint arXiv:1612.01105

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Chu Kiong Loo.

Ethics declarations

Conflict of Interest

The authors declare there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ihsan, A.M., Loo, C.K., Naji, S.A. et al. Superpixels Features Extractor Network (SP-FEN) for Clothing Parsing Enhancement. Neural Process Lett 51, 2245–2263 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI: