Skip to main content
Log in

A single-stage fashion clothing detection using multilevel visual attention

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Fashion is defined as a prevailing custom or style of dress, etiquette, and socialising. In recent years, fashion clothing analysis has attracted extensive attention from many researchers due to the introduction of large-scale datasets and the use of deep learning techniques. In this work, we propose a single-stage attention-based network for fashion clothing detection and classification. The proposed network is a single-stage detection which benefits from adopting multidimensional features through a multilevel architecture, so that the semantic gap between the lower- and upper-level features from different levels of feature representation is resolved. Besides, the network is structured based on multilevel contextual features retrieved using attention blocks in a global manner that implements a strong visual attention. Further, the classification and detection branches maintain fewer trainable parameters; thus, the model not only shows efficiency but also the testing results show state-of-the-art performance in fashion clothing detection and classification evaluated on large-scale DeepFashion2 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Ak, K.E., Lim, J.H., Tham, J.Y., Kassim, A.A.: Attribute manipulation generative adversarial networks for fashion images. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 10541–10550 (2019)

  2. Al-Halah, Z., Stiefelhagen, R., Grauman, K.: Fashion forward: forecasting visual style in fashion. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 388–397 (2017). https://doi.org/10.1109/ICCV.2017.50

  3. Bay, H., Tuytelaars, T., Gool, L.V.: Surf: speeded up robust features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 404–417. Springer (2006)

  4. Cao, K., Gao, J., Choi, K.-N., Duan, L.: Learning a hierarchical global attention for image classification. Future Internet 12(11), 178–189 (2020)

    Article  Google Scholar 

  5. Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6298–6306 (2017). https://doi.org/10.1109/CVPR.2017.667

  6. Chen, M., Qin, Y., Qi, L., Sun, Y.: Improving fashion landmark detection by dual attention feature enhancement. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 3101–3104 (2019)

  7. Chen, Y., Wang, K., Liao, X., Qian, Y., Wang, Q., Yuan, Z., Heng, P.-A.: Channel-Unet: a spatial channelwise convolutional neural network for liver and tumors segmentation. Front. Genet. 10, 1–10 (2019)

    Article  Google Scholar 

  8. Cheng, W.-H., Song, S., Chen, C.-Y., Hidayati, S.C., Liu, J.: Fashion Meets Computer Vision: A Survey. In: arXiv preprint arXiv:2003.13988 (2020)

  9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893 (2005)

  10. Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248 (2010)

  11. Florea, G.A., Mihailescu, R.-C.: Multimodal deep learning for group activity recognition in smart office environments. Future Internet 12(8), 133 (2020)

    Article  Google Scholar 

  12. Gao, J., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019). ISSN: 0925–2312. https://doi.org/10.1016/j.neucom.2019.08.018

  13. Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5337–5345 (2019)

  14. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

  15. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)

  16. Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7543–7552 (2018). https://doi.org/10.1109/CVPR.2018.00787

  17. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)

  18. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015)

    Google Scholar 

  19. Kang, W.-C., Kim, E., Leskovec, J., Rosenberg, C., McAuley J.: Complete the look: scene-based complementary product recommendation. In: Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 10532–10541 (2019)

  20. Kim, H., Jin, L., Doo, H., Niaz, A., Kim, C.Y., Memon, A.A., Choi, K.N.: Multiple-clothing detection and fashion landmark estimation using a single-stage detector. IEEE Access 9, 11694–11704 (2021). https://doi.org/10.1109/ACCESS.2021.3051424

    Article  Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. pp. 1–15 (2014). arXiv preprint arXiv:1412.6980

  22. Kokul, T., Fookes, C., Sridharan, S., Ramanan, A., Pinidiyaarachchi, U.A.J.: Gate connected convolutional neural network for object tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 2602–2606 (2017)

  23. Lee, S., Eun, H., Oh, S., Kim, W., Jung, C., Kim, C.: Landmarkfree clothes recognition with a two-branch feature selective network. IET Electron. Lett. 55(13), 745–747 (2019)

    Article  Google Scholar 

  24. Lee, S., Oh, S., Jung, C., Kim, C.: A global-local embedding module for fashion landmark detection. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 3153–3156 (2019)

  25. Li, P., Li, Y., Jiang, X., Zhen, X.: Two-stream multi-task network for fashion recognition. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3038–3042 (2019)

  26. Li, Y., Tang, S., Ye, Y., Ma, J.: Spatial-aware non-local attention for fashion landmark detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 820–825 (2019)

  27. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)

  28. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)

  29. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 740–755. Springer (2014)

  30. Liu, J., Lu, H.: Deep fashion analysis with feature map upsampling and landmarkdriven attention. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 30–36 (2018)

  31. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J. Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)

  32. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot multibox detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37. Springer (2016)

  33. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), pp. 1096–1104 (2016)

  34. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157 (1999)

  35. Mohammadi, S.O., Kalhor, A.: Smart fashion: a review of AI applications in the Fashion & Apparel Industry. In: arXiv preprint arXiv:2111.00905 (2021)

  36. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: Neural Information Processing Systems (NIPS) Autodiff Workshop, pp. 1–4 ((2017))

  37. Quintino Ferreira, B., Costeira, J.P., Sousa, R.G., Gui, L.-Y., Gomes, J.P.: Pose guided attention for multi-label fashion image classification. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1–4 (2019)

  38. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  39. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. In: arXiv preprint arXiv:1804.02767 (2018)

  40. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. (NIPS) 28, 91–99 (2015)

    Google Scholar 

  41. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of the Second International Conference on Learning Representations (ICLR), pp. 1–16 (2013)

  42. Shajini, M., Ramanan, A.: A multi-staged feature-attentive network for fashion clothing classification and attribute prediction. Electron. Lett. Comput. Vis. Image Anal. (ELCVIA) 20(2), 83–100 (2021). https://doi.org/10.5565/rev/elcvia.1409

    Article  Google Scholar 

  43. Shajini, M., Ramanan, A.: An improved landmark-driven and spatial–channel attentive convolutional neural network for fashion clothes classification. Visual Comput. 37(6), 1517–1526 (2021). https://doi.org/10.1007/s00371-020-01885-7

    Article  Google Scholar 

  44. Shih, K.J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4613–4621 (2016). https://doi.org/10.1109/CVPR.2016.499

  45. Si, C., Wang, W., Wang, L., Tan, T.: Multistage adversarial losses for pose-based human image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 118–126 (2018). https://doi.org/10.1109/CVPR.2018.00020

  46. Sidnev, A., Krapivin, A., Trushkov, A., Krasikova, E., Kazakov, M., Viryasov, M.: DeepMark++: real-time clothing detection at the edge. In: Proceedings of the Winter Conference on Applications of Computer Vision (WACV), pp. 1–4 (2020)

  47. Sidnev, A., Trushkov, A., Kazakov, M., Korolev, I., Sorokin, V.: Deepmark: one-shot clothing detection. In: IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 1–4 (2019)

  48. Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: Neuroaesthetics in fashion: modeling the perception of fashionability. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 869–877 (2015)

  49. Song, X., Feng, F., Han, X., Yang, X., Liu, W., Nie, L.: Neural compatibility modeling with attentive knowledge distillation. In: Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 5–14 (2018)

  50. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, pp. 1–10 (2020)

  51. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–10 (2020)

  52. Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  53. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 1–9 (2001)

  54. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3156–3164 (2017)

  55. Wang, H., O’Brien, J.F., Ramamoorthi, R.: Data-driven elastic models for cloth: modeling and measurement. Assoc. Comput. Mach. pp. 1–12. ISBN: 9781450309431 (2011). https://doi.org/10.1145/1964921.1964966

  56. Wang, W., Xu, Y., Shen, J., Zhu, S.-C.: Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 4271–4280 (2018)

  57. Yang, S., Ambert, T., Pan, Z., Wang, K., Yu, K., Berg, K., Lin, M.C.: Detailed garment recovery from a singleview image. In: ACM Transactions on Graphics, vol. 28(4) , pp. 1–11 (2016)

  58. Yu, C., Hu, Y., Chen, Y., Zeng, B.: Personalized fashion design. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 9046–9055 (2019)

  59. Yu, F., Wang, D., Shelhamer, E., Darrell, T. Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2403–2412 (2018)

  60. Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. In: IEEE Transactions on Cybernetics (2021)

  61. Zhou, X., Wang, D., Krahenbuhl, P.: Objects as Points. (2019). arXiv:1904.07850 [cs.CV]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shajini Majuran.

Ethics declarations

Conflict of Interest

Authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Majuran, S., Ramanan, A. A single-stage fashion clothing detection using multilevel visual attention. Vis Comput 39, 6609–6623 (2023). https://doi.org/10.1007/s00371-022-02751-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02751-4

Keywords

Navigation