A single-stage fashion clothing detection using multilevel visual attention

Majuran, Shajini; Ramanan, Amirthalingam

doi:10.1007/s00371-022-02751-4

A single-stage fashion clothing detection using multilevel visual attention

Original article
Published: 28 December 2022

Volume 39, pages 6609–6623, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

342 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Fashion is defined as a prevailing custom or style of dress, etiquette, and socialising. In recent years, fashion clothing analysis has attracted extensive attention from many researchers due to the introduction of large-scale datasets and the use of deep learning techniques. In this work, we propose a single-stage attention-based network for fashion clothing detection and classification. The proposed network is a single-stage detection which benefits from adopting multidimensional features through a multilevel architecture, so that the semantic gap between the lower- and upper-level features from different levels of feature representation is resolved. Besides, the network is structured based on multilevel contextual features retrieved using attention blocks in a global manner that implements a strong visual attention. Further, the classification and detection branches maintain fewer trainable parameters; thus, the model not only shows efficiency but also the testing results show state-of-the-art performance in fashion clothing detection and classification evaluated on large-scale DeepFashion2 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved landmark-driven and spatial–channel attentive convolutional neural network for fashion clothes classification

Article 29 June 2020

Fashion sub-categories and attributes prediction model using deep learning

Article 07 June 2022

Fashion Landmark Detection in the Wild

References

Ak, K.E., Lim, J.H., Tham, J.Y., Kassim, A.A.: Attribute manipulation generative adversarial networks for fashion images. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 10541–10550 (2019)
Al-Halah, Z., Stiefelhagen, R., Grauman, K.: Fashion forward: forecasting visual style in fashion. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 388–397 (2017). https://doi.org/10.1109/ICCV.2017.50
Bay, H., Tuytelaars, T., Gool, L.V.: Surf: speeded up robust features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 404–417. Springer (2006)
Cao, K., Gao, J., Choi, K.-N., Duan, L.: Learning a hierarchical global attention for image classification. Future Internet 12(11), 178–189 (2020)
Article Google Scholar
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6298–6306 (2017). https://doi.org/10.1109/CVPR.2017.667
Chen, M., Qin, Y., Qi, L., Sun, Y.: Improving fashion landmark detection by dual attention feature enhancement. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 3101–3104 (2019)
Chen, Y., Wang, K., Liao, X., Qian, Y., Wang, Q., Yuan, Z., Heng, P.-A.: Channel-Unet: a spatial channelwise convolutional neural network for liver and tumors segmentation. Front. Genet. 10, 1–10 (2019)
Article Google Scholar
Cheng, W.-H., Song, S., Chen, C.-Y., Hidayati, S.C., Liu, J.: Fashion Meets Computer Vision: A Survey. In: arXiv preprint arXiv:2003.13988 (2020)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893 (2005)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248 (2010)
Florea, G.A., Mihailescu, R.-C.: Multimodal deep learning for group activity recognition in smart office environments. Future Internet 12(8), 133 (2020)
Article Google Scholar
Gao, J., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019). ISSN: 0925–2312. https://doi.org/10.1016/j.neucom.2019.08.018
Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5337–5345 (2019)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7543–7552 (2018). https://doi.org/10.1109/CVPR.2018.00787
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015)
Google Scholar
Kang, W.-C., Kim, E., Leskovec, J., Rosenberg, C., McAuley J.: Complete the look: scene-based complementary product recommendation. In: Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 10532–10541 (2019)
Kim, H., Jin, L., Doo, H., Niaz, A., Kim, C.Y., Memon, A.A., Choi, K.N.: Multiple-clothing detection and fashion landmark estimation using a single-stage detector. IEEE Access 9, 11694–11704 (2021). https://doi.org/10.1109/ACCESS.2021.3051424
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. pp. 1–15 (2014). arXiv preprint arXiv:1412.6980
Kokul, T., Fookes, C., Sridharan, S., Ramanan, A., Pinidiyaarachchi, U.A.J.: Gate connected convolutional neural network for object tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 2602–2606 (2017)
Lee, S., Eun, H., Oh, S., Kim, W., Jung, C., Kim, C.: Landmarkfree clothes recognition with a two-branch feature selective network. IET Electron. Lett. 55(13), 745–747 (2019)
Article Google Scholar
Lee, S., Oh, S., Jung, C., Kim, C.: A global-local embedding module for fashion landmark detection. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 3153–3156 (2019)
Li, P., Li, Y., Jiang, X., Zhen, X.: Two-stream multi-task network for fashion recognition. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3038–3042 (2019)
Li, Y., Tang, S., Ye, Y., Ma, J.: Spatial-aware non-local attention for fashion landmark detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 820–825 (2019)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 740–755. Springer (2014)
Liu, J., Lu, H.: Deep fashion analysis with feature map upsampling and landmarkdriven attention. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 30–36 (2018)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J. Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot multibox detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37. Springer (2016)
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), pp. 1096–1104 (2016)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157 (1999)
Mohammadi, S.O., Kalhor, A.: Smart fashion: a review of AI applications in the Fashion & Apparel Industry. In: arXiv preprint arXiv:2111.00905 (2021)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: Neural Information Processing Systems (NIPS) Autodiff Workshop, pp. 1–4 ((2017))
Quintino Ferreira, B., Costeira, J.P., Sousa, R.G., Gui, L.-Y., Gomes, J.P.: Pose guided attention for multi-label fashion image classification. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1–4 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. In: arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. (NIPS) 28, 91–99 (2015)
Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of the Second International Conference on Learning Representations (ICLR), pp. 1–16 (2013)
Shajini, M., Ramanan, A.: A multi-staged feature-attentive network for fashion clothing classification and attribute prediction. Electron. Lett. Comput. Vis. Image Anal. (ELCVIA) 20(2), 83–100 (2021). https://doi.org/10.5565/rev/elcvia.1409
Article Google Scholar
Shajini, M., Ramanan, A.: An improved landmark-driven and spatial–channel attentive convolutional neural network for fashion clothes classification. Visual Comput. 37(6), 1517–1526 (2021). https://doi.org/10.1007/s00371-020-01885-7
Article Google Scholar
Shih, K.J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4613–4621 (2016). https://doi.org/10.1109/CVPR.2016.499
Si, C., Wang, W., Wang, L., Tan, T.: Multistage adversarial losses for pose-based human image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 118–126 (2018). https://doi.org/10.1109/CVPR.2018.00020
Sidnev, A., Krapivin, A., Trushkov, A., Krasikova, E., Kazakov, M., Viryasov, M.: DeepMark++: real-time clothing detection at the edge. In: Proceedings of the Winter Conference on Applications of Computer Vision (WACV), pp. 1–4 (2020)
Sidnev, A., Trushkov, A., Kazakov, M., Korolev, I., Sorokin, V.: Deepmark: one-shot clothing detection. In: IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 1–4 (2019)
Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: Neuroaesthetics in fashion: modeling the perception of fashionability. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 869–877 (2015)
Song, X., Feng, F., Han, X., Yang, X., Liu, W., Nie, L.: Neural compatibility modeling with attentive knowledge distillation. In: Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 5–14 (2018)
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, pp. 1–10 (2020)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–10 (2020)
Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 1–9 (2001)
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3156–3164 (2017)
Wang, H., O’Brien, J.F., Ramamoorthi, R.: Data-driven elastic models for cloth: modeling and measurement. Assoc. Comput. Mach. pp. 1–12. ISBN: 9781450309431 (2011). https://doi.org/10.1145/1964921.1964966
Wang, W., Xu, Y., Shen, J., Zhu, S.-C.: Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 4271–4280 (2018)
Yang, S., Ambert, T., Pan, Z., Wang, K., Yu, K., Berg, K., Lin, M.C.: Detailed garment recovery from a singleview image. In: ACM Transactions on Graphics, vol. 28(4) , pp. 1–11 (2016)
Yu, C., Hu, Y., Chen, Y., Zeng, B.: Personalized fashion design. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 9046–9055 (2019)
Yu, F., Wang, D., Shelhamer, E., Darrell, T. Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2403–2412 (2018)
Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. In: IEEE Transactions on Cybernetics (2021)
Zhou, X., Wang, D., Krahenbuhl, P.: Objects as Points. (2019). arXiv:1904.07850 [cs.CV]

Download references

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Science, University of Jaffna, Jaffna, Sri Lanka
Shajini Majuran & Amirthalingam Ramanan

Authors

Shajini Majuran
View author publications
You can also search for this author in PubMed Google Scholar
Amirthalingam Ramanan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shajini Majuran.

Ethics declarations

Conflict of Interest

Authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Majuran, S., Ramanan, A. A single-stage fashion clothing detection using multilevel visual attention. Vis Comput 39, 6609–6623 (2023). https://doi.org/10.1007/s00371-022-02751-4

Download citation

Accepted: 09 December 2022
Published: 28 December 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00371-022-02751-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A single-stage fashion clothing detection using multilevel visual attention

Abstract

Access this article

Similar content being viewed by others

An improved landmark-driven and spatial–channel attentive convolutional neural network for fashion clothes classification

Fashion sub-categories and attributes prediction model using deep learning

Fashion Landmark Detection in the Wild

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A single-stage fashion clothing detection using multilevel visual attention

Abstract

Access this article

Similar content being viewed by others

An improved landmark-driven and spatial–channel attentive convolutional neural network for fashion clothes classification

Fashion sub-categories and attributes prediction model using deep learning

Fashion Landmark Detection in the Wild

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation