A Multi-label Image Recognition Algorithm Based on Spatial and Semantic Correlation Interaction

Cheng, Jing; Ji, Genlin; Yang, Qinkai; Hao, Junzhao

doi:10.1007/978-981-99-8549-4_2

Jing Cheng¹⁵,
Genlin Ji¹⁵,
Qinkai Yang¹⁵ &
…
Junzhao Hao¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14434))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

480 Accesses

Abstract

Multi-Label Image Recognition (MLIR) approaches usually exploit label correlations to achieve good performance. Two types of label correlations principally studied, i.e., the spatial and semantic correlations. However, most of the existing algorithms for multi-label image recognition consider semantic correlations and spatial correlations respectively, and often require additional information support. Although some algorithms simultaneously capture the semantic and spatial correlations of labels, they ignore the intrinsic relationship between the two. Specifically, only considering spatial correlations will misidentify some difficult objects in the image. For example, different categories of objects with similar appearance and close distance are mistaken for the same category, and semantic correlations can constrain the error caused by spatial correlations. In this work, we propose a multi-label image recognition algorithm based on transformer, named Spatial and Semantic Correlation Interaction (SSCI). Transformer is used to model the internal relationship between spatial correlations and semantic correlations to improve the recognition ability of the model for difficult objects. Experiments on the public datasets MS-COCO, VOC2007 and VOC2012 show that the mAP values reach 84.1%, 95.0% and 95.4%, respectively. Compared with other MLIR algorithms, the proposed algorithm can significantly improve the recognition performance of multi-label images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-Label Image Classification Model Based on Multiscale Fusion and Adaptive Label Correlation

Article 02 January 2024

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations

Adaptive image annotation: refining labels according to contents and relations

Article 30 January 2022

References

Alam, M.U., Baldvinsson, J.R., Wang, Y.: Exploring LRP and Grad-CAM visualization to interpret multi-label-multi-class pathology prediction using chest radiography. In: 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS), pp. 258–263 (2022)
Google Scholar
Chen, T., Xu, M., Hui, X., Wu, H., Lin, L.: Learning semantic-specific graph representation for multi-label image recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 522–531 (2019)
Google Scholar
Chen, Z.M., Cui, Q., Zhao, B., Song, R., Zhang, X., Yoshie, O.: SST: spatial and semantic transformers for multi-label image recognition. IEEE Trans. Image Process. 31, 2570–2583 (2022)
Article Google Scholar
Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5172–5181 (2019)
Google Scholar
Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Learning graph convolutional networks for multi-label recognition and applications. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 6969–6983 (2023)
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Google Scholar
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88, 303–338 (2010)
Article Google Scholar
Gao, B.B., Zhou, H.Y.: Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Trans. Image Process. 30, 5920–5932 (2021)
Article Google Scholar
Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 729–739 (2019)
Google Scholar
Lanchantin, J., Wang, T., Ordonez, V., Qi, Y.: General multi-label image classification with transformers. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16473–16483 (2021)
Google Scholar
Li, Yining, Huang, Chen, Loy, Chen Change, Tang, Xiaoou: Human attribute recognition by deep hierarchical contexts. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9910, pp. 684–700. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_41
Chapter Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Shao, J., Kang, K., Loy, C.C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4657–4666 (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Wang, Y., Xie, Y., Liu, Y., Zhou, K., Li, X.: Fast graph convolution network based multi-label image recognition via cross-modal fusion, pp. 1575–1584. Association for Computing Machinery, New York, NY, USA (2020)
Google Scholar
Wei, Y., et al.: HCP: a flexible CNN framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1901–1907 (2016)
Article Google Scholar
Xu, J., Huang, S., Zhou, F., Huangfu, L., Zeng, D., Liu, B.: Boosting multi-label image classification with complementary parallel self-distillation. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, pp. 1495–1501 (2022)
Google Scholar
Ye, Jin, He, Junjun, Peng, Xiaojiang, Wu, Wenhao, Qiao, Yu.: Attention-driven dynamic graph convolutional network for multi-label image recognition. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 649–665. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_39
Chapter Google Scholar
Zhou, F., Huang, S., Xing, Y.: Deep semantic dictionary learning for multi-label image classification. ArXiv abs arXiv:2012.12509 (2020)
Zhou, F., Huang, S., Liu, B., Yang, D.: Multi-label image classification via category prototype compositional learning. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4513–4525 (2022)
Article Google Scholar
Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2027–2036 (2017). https://doi.org/10.1109/CVPR.2017.219

Download references

Author information

Authors and Affiliations

School of Computer and Electronic Information/School of Artificial Intelligence, Nanjing Normal University, Nanjing, China
Jing Cheng, Genlin Ji, Qinkai Yang & Junzhao Hao

Authors

Jing Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Genlin Ji
View author publications
You can also search for this author in PubMed Google Scholar
Qinkai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Junzhao Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Genlin Ji .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, J., Ji, G., Yang, Q., Hao, J. (2024). A Multi-label Image Recognition Algorithm Based on Spatial and Semantic Correlation Interaction. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14434. Springer, Singapore. https://doi.org/10.1007/978-981-99-8549-4_2

Download citation

DOI: https://doi.org/10.1007/978-981-99-8549-4_2
Published: 25 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8548-7
Online ISBN: 978-981-99-8549-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Multi-label Image Recognition Algorithm Based on Spatial and Semantic Correlation Interaction

Abstract

Access this chapter

Similar content being viewed by others

Multi-Label Image Classification Model Based on Multiscale Fusion and Adaptive Label Correlation

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations

Adaptive image annotation: refining labels according to contents and relations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Multi-label Image Recognition Algorithm Based on Spatial and Semantic Correlation Interaction

Abstract

Access this chapter

Similar content being viewed by others

Multi-Label Image Classification Model Based on Multiscale Fusion and Adaptive Label Correlation

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations

Adaptive image annotation: refining labels according to contents and relations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation