Advertisement

Attention-Driven Dynamic Graph Convolutional Network for Multi-label Image Recognition

Conference paper
  • 815 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12366)

Abstract

Recent studies often exploit Graph Convolutional Network (GCN) to model label dependencies to improve recognition accuracy for multi-label image recognition. However, constructing a graph by counting the label co-occurrence possibilities of the training data may degrade model generalizability, especially when there exist occasional co-occurrence objects in test images. Our goal is to eliminate such bias and enhance the robustness of the learnt features. To this end, we propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image. ADD-GCN adopts a Dynamic Graph Convolutional Network (D-GCN) to model the relation of content-aware category representations that are generated by a Semantic Attention Module (SAM). Extensive experiments on public multi-label benchmarks demonstrate the effectiveness of our method, which achieves mAPs of 85.2%, 96.0%, and 95.5% on MS-COCO, VOC2007, and VOC2012, respectively, and outperforms current state-of-the-art methods with a clear margin.

Keywords

Multi-label image recognition Semantic attention Label dependency Dynamic graph convolutional network 

Notes

Acknowledgements

This work is partially supported by National Natural Science Foundation of China (U1813218, U1713208), Science and Technology Service Network Initiative of Chinese Academy of Sciences (KFJ-STS-QYZX-092), Guangdong Special Support Program (2016TX03X276), and Shenzhen Basic Research Program (JSGG20180507182100698, CXB201104220032A), Shenzhen Institute of Artificial Intelligence and Robotics for Society. We also appreciate Xiaoping Lai and Hao Xing from VIPShop Inc. who cooperate this project with us and provide validation Fashion data.

Supplementary material

504479_1_En_39_MOESM1_ESM.pdf (3.4 mb)
Supplementary material 1 (pdf 3462 KB)

References

  1. 1.
    Chen, T., Wang, Z., Li, G., Lin, L.: Recurrent attentional reinforcement learning for multi-label image recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  2. 2.
    Chen, T., Xu, M., Hui, X., Wu, H., Lin, L.: Learning semantic-specific graph representation for multi-label image recognition. arXiv preprint arXiv:1908.07325 (2019)
  3. 3.
    Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., Kalantidis, Y.: Graph-based global reasoning networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 433–442 (2019)Google Scholar
  4. 4.
    Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5177–5186 (2019)Google Scholar
  5. 5.
    Cheng, M.M., Zhang, Z., Lin, W.Y., Torr, P.: Bing: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3286–3293 (2014)Google Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  7. 7.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010).  https://doi.org/10.1007/s11263-009-0275-4CrossRefGoogle Scholar
  8. 8.
    Ge, W., Yang, S., Yu, Y.: Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1277–1286 (2018)Google Scholar
  9. 9.
    Ge, Z., Mahapatra, D., Sedai, S., Garnavi, R., Chakravorty, R.: Chest x-rays classification: a multi-label and fine-grained problem. arXiv preprint arXiv:1807.07247 (2018)
  10. 10.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    He, S., Xu, C., Guo, T., Xu, C., Tao, D.: Reinforced multi-label image classification by exploring curriculum. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  13. 13.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  14. 14.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)Google Scholar
  15. 15.
    Jain, H., Prabhu, Y., Varma, M.: Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–944. ACM (2016)Google Scholar
  16. 16.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)Google Scholar
  17. 17.
    Li, Q., Qiao, M., Bian, W., Tao, D.: Conditional graphical lasso for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2977–2986 (2016)Google Scholar
  18. 18.
    Li, X., Zhao, F., Guo, Y.: Multi-label image classification with a probabilistic label enhancement model. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI 2014, AUAI Press, Arlington, Virginia, United States, pp. 430–439 (2014). http://dl.acm.org/citation.cfm?id=3020751.3020796
  19. 19.
    Li, Y., Huang, C., Loy, C.C., Tang, X.: Human attribute recognition by deep hierarchical contexts. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 684–700. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_41CrossRefGoogle Scholar
  20. 20.
    Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.): Microsoft COCO: common objects in context. ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48
  21. 21.
    Liu, L., Guo, S., Huang, W., Scott, M.: Decoupling category-wise independence and relevance with self-attention for multi-label image classification. In: ICASSP 2019, pp. 1682–1686, May 2019.  https://doi.org/10.1109/ICASSP.2019.8683665
  22. 22.
    Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS Workshop (2017)Google Scholar
  23. 23.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  24. 24.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  25. 25.
    Tommasi, T., Patricia, N., Caputo, B., Tuytelaars, T.: A deeper look at dataset bias. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 37–55. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58347-1_2CrossRefGoogle Scholar
  26. 26.
    Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR 2011, pp. 1521–1528. IEEE (2011)Google Scholar
  27. 27.
    Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)Google Scholar
  28. 28.
    Wang, M., Luo, C., Hong, R., Tang, J., Feng, J.: Beyond object proposals: random crop pooling for multi-label image recognition. IEEE Trans. Image Process. 25(12), 5678–5688 (2016)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Wang, Y., et al.: Multi-label classification with label graph superimposing. arXiv preprint arXiv:1911.09243 (2019)
  30. 30.
    Wang, Z., Chen, T., Li, G., Xu, R., Lin, L.: Multi-label image recognition by recurrently discovering attentional regions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 464–472 (2017)Google Scholar
  31. 31.
    Wei, Y., et al.: HCP: a flexible CNN framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1901–1907 (2015)CrossRefGoogle Scholar
  32. 32.
    Yang, H., Tianyi Zhou, J., Zhang, Y., Gao, B.B., Wu, J., Cai, J.: Exploit bounding box annotations for multi-label object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 280–288 (2016)Google Scholar
  33. 33.
    Yang, X., Li, Y., Luo, J.: Pinterest board recommendation for Twitter users. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 963–966. ACM (2015)Google Scholar
  34. 34.
    Zhang, J., Wu, Q., Shen, C., Zhang, J., Lu, J.: Multilabel image classification with regional latent semantic dependencies. IEEE Trans. Multimedia 20(10), 2801–2813 (2018)CrossRefGoogle Scholar
  35. 35.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Computer Vision and Pattern Recognition (2016)Google Scholar
  36. 36.
    Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5513–5522 (2017)Google Scholar
  37. 37.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_26CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.ShenZhen Key Lab of Computer Vision and Pattern Recognition, Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
  2. 2.School of Biomedical Engineering, the Institute of Medical RoboticsShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations