Advertisement

Multi-label image classification with recurrently learning semantic dependencies

  • Long Chen
  • Ronggui Wang
  • Juan YangEmail author
  • Lixia Xue
  • Min Hu
Original Article
  • 88 Downloads

Abstract

Recognizing multi-label images is a significant but challenging task toward high-level visual understanding. Remarkable success has been achieved by applying CNN–RNN design-based models to capture the underlying semantic dependencies of labels and predict the label distributions over the global-level features output by CNNs. However, such global-level features often fuse the information of multiple objects, leading to the difficulty in recognizing small object and capturing the label co-relation. To better solve this problem, in this paper, we propose a novel multi-label image classification framework which is an improvement to the CNN–RNN design pattern. By introducing the attention network module in the CNN–RNN architecture, the objects features of the attention map are separated by the channels which are further send to the LSTM network to capture dependencies and predict labels sequentially. A category-wise max-pooling operation is then performed to integrate these labels into the final prediction. Experimental results on PASCAL2007 and MS-COCO datasets demonstrate that our model can effectively exploit the correlation between tags to improve the classification performance as well as better recognize the small targets.

Keywords

Multi-label CNN–RNN Attention LSTM Dependencies 

Notes

Funding

Funding was provided by the National Natural Science Foundation of China (Grant No. 61672202).

References

  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)Google Scholar
  2. 2.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  3. 3.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  4. 4.
    Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  5. 5.
    Huang, G., Liu, Z., Maaten, L., van der Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2017)Google Scholar
  6. 6.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. Comput. Vis. Pattern Recogn. 7132–7141 (2018)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  8. 8.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014)Google Scholar
  9. 9.
    Nguyen, T.V.: Salient Object detection via objectness proposals. In: AAAI’15 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 4286–4287 (2015)Google Scholar
  10. 10.
    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)CrossRefGoogle Scholar
  11. 11.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In European Conference on Computer Vision, pp. 391–405 (2014)Google Scholar
  12. 12.
    Arbeláez, P.A., Pont-Tuset, J., Barron, J.T., Marqués, F., Malik, J.: Multiscale combinatorial grouping. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)Google Scholar
  13. 13.
    Wei, Y., Xia, W., Lin, M., et al.: Hcp: A flexible cnn framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1901–1907 (2016)CrossRefGoogle Scholar
  14. 14.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)Google Scholar
  15. 15.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  16. 16.
    Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation. In: International Conference on Learning Representations (2014)Google Scholar
  17. 17.
    van der Gaag, L.C., Feelders, A.J.: Probabilistic Graphical Models. Lecture Notes in Artificial Intelligence (2014). https://www.springer.com/cn/book/9783319114323
  18. 18.
    Jin, J., Nakayama, H.: Annotation order matters: recurrent image annotator for arbitrary length image tagging. In: International Conference on Pattern Recognition (ICPR), pp. 2452–2457 (2016)Google Scholar
  19. 19.
    Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN–RNN: A unified framework for multi-label image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)Google Scholar
  20. 20.
    Zhang, J., Wu, Q., Shen, C., Zhang, J., Lu, J.: Multi-label image classification with regional latent semantic dependencies. IEEE Trans. Multimed. 20, 2801– 2813 (2018)CrossRefGoogle Scholar
  21. 21.
    Chen, Q., Song, Z., Hua, Y., Huang, Z., Yan, S.: Hierarchical matching with side information for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3426–3433 (2012)Google Scholar
  22. 22.
    Dong, J., Xia, W., Chen, Q., Feng, J., Huang, Z., Yan, S.: Subcategory-aware object classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 827–834 (2013)Google Scholar
  23. 23.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  24. 24.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)Google Scholar
  25. 25.
    Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29, 51–59 (1996)CrossRefGoogle Scholar
  26. 26.
    Li, Y., Song, Y., Luo, J.: Improving pairwise ranking for multi-label image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1837–1845 (2017)Google Scholar
  27. 27.
    Yang, H., Zhou, J.T., Zhang, Y., Gao, B., Wu, J., Cai, J.: Exploit bounding box annotations for multi-label object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 280–288 (2016)Google Scholar
  28. 28.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105, 222–245 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Wang, Z., Chen, T., Li, G., Xu, R., Lin, L.: Multi-label image recognition by recurrently discovering attentional regions. In: IEEE International Conference on Computer Vision (ICCV), pp. 464–472 (2017)Google Scholar
  30. 30.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Neural Inf. Process. Syst. 2, 2017–2025 (2015)Google Scholar
  31. 31.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111, 98–136 (2015)CrossRefGoogle Scholar
  32. 32.
    Lin, T.-Y., Maire, M., Belongie, S.J., Hays, J., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)Google Scholar
  33. 33.
    Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5513–5522 (2017)Google Scholar
  34. 34.
    Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)Google Scholar
  35. 35.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRefGoogle Scholar
  36. 36.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 1, 91–99 (2015)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Long Chen
    • 1
  • Ronggui Wang
    • 1
  • Juan Yang
    • 1
    Email author
  • Lixia Xue
    • 1
  • Min Hu
    • 1
  1. 1.School of Computer and InformationHefei University of TechnologyHefeiChina

Personalised recommendations