Skip to main content

Triple Attention Network for Clothing Parsing

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Abstract

Clothing parsing has been actively studied in the vision community in recent years. Inspired by the color coherence for clothing and the self-attention mechanism, this paper proposes a Triple Attention Network (TANet) equipped with a color attention module, a position attention module and a channel attention module, to facilitate fine-grained segmentation of clothing images. Concretely, the color attention module is introduced for harvesting color coherence, which selectively aggregates the color feature of clothing. The position attention module and the channel attention module are designed to emphasize the semantic interdependencies in spatial and channel dimensions respectively. The outputs of the three attention modules are incorporated to further improve feature representation which contributes to more precise clothing parsing results. The proposed TANet has achieved 69.54\(\%\) mIoU - a promising clothing parsing performance on ModaNet, the latest large-scale clothing parsing dataset. Especially, the color attention module is also demonstrated to bring semantic consistency and precision improvement obviously. The source code is made available in the public domain.

Supported by National Natural Science Foundation of China (No. 61170093).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.github.com/cm-jsw/TANet.

  2. 2.

    https://github.com/eBay/modanet.

References

  1. Tangseng, P., Wu, Z., Yamaguchi, K.: Looking at outfit to parse clothing. CoRR abs/1703.01386 (2017). http://arxiv.org/abs/1703.01386

  2. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Retrieving similar styles to parse clothing. IEEE Trans. Pattern Anal. Mach. Intell. 37(5), 1028–1040 (2015). https://doi.org/10.1109/TPAMI.2014.2353624

    Article  Google Scholar 

  3. Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: A high performance CRF model for clothes parsing. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 64–81. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_5

    Chapter  Google Scholar 

  4. Liu, S., et al.: Fashion parsing with video context. IEEE Trans. Multimed. 17(8), 1347–1358 (2015). https://doi.org/10.1109/TMM.2015.2443559

    Article  Google Scholar 

  5. Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identfication of clothing images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  6. Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014

    Google Scholar 

  7. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  8. Hsiao, W.L., Grauman, K.: Creating capsule wardrobes from fashion images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  9. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577, June 2012. https://doi.org/10.1109/CVPR.2012.6248101

  10. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

    Google Scholar 

  11. Fu, J., et al.: Dual attention network for scene segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  12. Fan, X., Luo, H., Zhang, X., He, L., Zhang, C., Jiang, W.: SCPNet: spatial-channel parallelism network for joint holistic and partial person re-identification. In: ACCV (2018)

    Google Scholar 

  13. Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-person pose estimation with enhanced channel-wise and spatial information. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  14. Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

    Google Scholar 

  15. Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: ModaNet: a large-scale street fashion dataset with polygon annotations. In: Proceedings of the 26th ACM International Conference on Multimedia, MM 2018, pp. 1670–1678. ACM, New York (2018). https://doi.org/10.1145/3240508.3240652

  16. Liu, S., et al.: Fashion parsing with weak color-category labels. IEEE Trans. Multimed. 16(1), 253–265 (2014). https://doi.org/10.1109/TMM.2013.2285526

    Article  MathSciNet  Google Scholar 

  17. Chen, L., Papandreou, G., Schro, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017). http://arxiv.org/abs/1706.05587

  18. Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  19. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

    Google Scholar 

  20. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  21. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  23. Huang, Z., Wei, Y., Wang, X., Liu, W.: A pytorch semantic segmentation toolbox (2018). https://github.com/speedinghzl/pytorch-segmentation-toolbox

  24. Bulò, S.R., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of DNNs. CoRR abs/1712.02616 (2017). http://arxiv.org/abs/1712.02616

  25. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  26. Chen, L.C., Zhu, Y., Papandreou, G., Schro, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: The European Conference on Computer Vision (ECCV), September 2018

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingfu Xiong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, R., Cheng, M., Xiong, M., Qin, X., Liu, J., Hu, X. (2020). Triple Attention Network for Clothing Parsing. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12532. Springer, Cham. https://doi.org/10.1007/978-3-030-63830-6_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63830-6_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63829-0

  • Online ISBN: 978-3-030-63830-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics