Skip to main content

A Multi-label Image Recognition Algorithm Based on Spatial and Semantic Correlation Interaction

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14434))

Included in the following conference series:

  • 480 Accesses

Abstract

Multi-Label Image Recognition (MLIR) approaches usually exploit label correlations to achieve good performance. Two types of label correlations principally studied, i.e., the spatial and semantic correlations. However, most of the existing algorithms for multi-label image recognition consider semantic correlations and spatial correlations respectively, and often require additional information support. Although some algorithms simultaneously capture the semantic and spatial correlations of labels, they ignore the intrinsic relationship between the two. Specifically, only considering spatial correlations will misidentify some difficult objects in the image. For example, different categories of objects with similar appearance and close distance are mistaken for the same category, and semantic correlations can constrain the error caused by spatial correlations. In this work, we propose a multi-label image recognition algorithm based on transformer, named Spatial and Semantic Correlation Interaction (SSCI). Transformer is used to model the internal relationship between spatial correlations and semantic correlations to improve the recognition ability of the model for difficult objects. Experiments on the public datasets MS-COCO, VOC2007 and VOC2012 show that the mAP values reach 84.1%, 95.0% and 95.4%, respectively. Compared with other MLIR algorithms, the proposed algorithm can significantly improve the recognition performance of multi-label images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alam, M.U., Baldvinsson, J.R., Wang, Y.: Exploring LRP and Grad-CAM visualization to interpret multi-label-multi-class pathology prediction using chest radiography. In: 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS), pp. 258–263 (2022)

    Google Scholar 

  2. Chen, T., Xu, M., Hui, X., Wu, H., Lin, L.: Learning semantic-specific graph representation for multi-label image recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 522–531 (2019)

    Google Scholar 

  3. Chen, Z.M., Cui, Q., Zhao, B., Song, R., Zhang, X., Yoshie, O.: SST: spatial and semantic transformers for multi-label image recognition. IEEE Trans. Image Process. 31, 2570–2583 (2022)

    Article  Google Scholar 

  4. Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5172–5181 (2019)

    Google Scholar 

  5. Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Learning graph convolutional networks for multi-label recognition and applications. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 6969–6983 (2023)

    Article  Google Scholar 

  6. Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

    Google Scholar 

  7. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88, 303–338 (2010)

    Article  Google Scholar 

  8. Gao, B.B., Zhou, H.Y.: Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Trans. Image Process. 30, 5920–5932 (2021)

    Article  Google Scholar 

  9. Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 729–739 (2019)

    Google Scholar 

  10. Lanchantin, J., Wang, T., Ordonez, V., Qi, Y.: General multi-label image classification with transformers. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16473–16483 (2021)

    Google Scholar 

  11. Li, Yining, Huang, Chen, Loy, Chen Change, Tang, Xiaoou: Human attribute recognition by deep hierarchical contexts. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9910, pp. 684–700. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_41

    Chapter  Google Scholar 

  12. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  13. Shao, J., Kang, K., Loy, C.C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4657–4666 (2015)

    Google Scholar 

  14. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  15. Wang, Y., Xie, Y., Liu, Y., Zhou, K., Li, X.: Fast graph convolution network based multi-label image recognition via cross-modal fusion, pp. 1575–1584. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  16. Wei, Y., et al.: HCP: a flexible CNN framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1901–1907 (2016)

    Article  Google Scholar 

  17. Xu, J., Huang, S., Zhou, F., Huangfu, L., Zeng, D., Liu, B.: Boosting multi-label image classification with complementary parallel self-distillation. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, pp. 1495–1501 (2022)

    Google Scholar 

  18. Ye, Jin, He, Junjun, Peng, Xiaojiang, Wu, Wenhao, Qiao, Yu.: Attention-driven dynamic graph convolutional network for multi-label image recognition. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 649–665. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_39

    Chapter  Google Scholar 

  19. Zhou, F., Huang, S., Xing, Y.: Deep semantic dictionary learning for multi-label image classification. ArXiv abs arXiv:2012.12509 (2020)

  20. Zhou, F., Huang, S., Liu, B., Yang, D.: Multi-label image classification via category prototype compositional learning. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4513–4525 (2022)

    Article  Google Scholar 

  21. Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2027–2036 (2017). https://doi.org/10.1109/CVPR.2017.219

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Genlin Ji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheng, J., Ji, G., Yang, Q., Hao, J. (2024). A Multi-label Image Recognition Algorithm Based on Spatial and Semantic Correlation Interaction. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14434. Springer, Singapore. https://doi.org/10.1007/978-981-99-8549-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8549-4_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8548-7

  • Online ISBN: 978-981-99-8549-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics