Skip to main content
Log in

Classify multi-label images via improved CNN model with adversarial network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Convolution neural network (CNN) achieves outstanding results in single-label image classification task. However, due to the complex underlying object layout and insufficient multi-label training images, how to achieve better performance for multi-label images via CNN is still an open problem. In this work, we propose an improved deep CNN model which can extract features of objects at different scales in multi-label images by spatial pyramid pooling as well as feature fusion. In model training, we first transfer the parameters pre-trained on ImageNet to our model, then an Adversarial Network is trained to generate examples with occlusions, which makes our model invariant to occlusions. Experimental results on Pascal VOC 2012 and Corel 5K image datasets demonstrate the superiority of the proposed approach over many approaches. The mAP of our model reaches 84.0% on the VOC 2012 dataset, which significantly outperforms most approaches and closes to HCP, the representative multi-label classification approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Alfassy A, Karlinsky L, Aides A, Shtok J, Harary S, Feris R, Giryes R, Bronstein AM (2019) Laso: Label-set operations networks for multi-label few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6548–6557

  2. Azizpour H, Razavian AS, Sullivan J, Maki A, Carlsson S (2015) Factors of transferability for a generic convnet representation. IEEE Trans Pattern Anal Mach Intell 38(9):1790–1802

    Article  Google Scholar 

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  4. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):article 27

    Article  Google Scholar 

  5. Chen Q, Song Z, Hua Y, Huang Z, Yan S (2012) Hierarchical matching with side information for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3426–3433

  6. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol 1, pp 886–893

  7. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: Proceedings of International Conference on Machine Learning, pp 647–655

  8. Dong J, Xia W, Chen Q, Feng J, Huang Z, Yan S (2013) Subcategory-aware object classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 827–834

  9. Duygulu P, Freitas ND, Barnard K, Forsyth DA (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of European Conference on Computer Vision, pp 97–112

  10. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  11. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  12. Gong Y, Jia Y, Leung T, Toshev A, Ioffe S (2014) Deep convolutional ranking for multilabel image annotation. arXiv:13124894

  13. Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Proceedings of European Conference on Computer Vision and Pattern Recognition, pp 392–407

  14. Han T, Zhang L, Pirbhulal S, Wu W, de Albuquerque VHC (2019) A novel cluster head selection technique for edge-computing based iomt systems. Comput Netw 158:114–122

    Article  Google Scholar 

  15. Harzallah H, Jurie F, Schmid C (2010) Combining efficient object localization and image classification. In: Proceedings of IEEE International Conference on Computer Vision, pp 237–244

  16. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  17. Hedelin P, Skoglund J (2000) Vector quantization based on gaussian mixture models. IEEE Trans Speech Audio Process 8(4):385–401

    Article  Google Scholar 

  18. Huang G, Liu Z, Maaten LVD, Weinberger KQ (2016) Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2261–2269

  19. Jarrett K, Kavukcuoglu K, Ranzato M, Lecun Y (2009) What is the best multi-stage architecture for object recognition? In: Proceedings of International Conference on Computer Vision, pp 2146–2153

  20. King RA, Nasrabadi NM (1988) Image coding using vector quantization in the transform domain. IEEE Trans Commun 1(8):957–971

    Google Scholar 

  21. Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto

  22. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun Acm 60(6):84–90

    Article  Google Scholar 

  23. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2169–2178

  24. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a backpropogation network. In: Advances in Neural Information Processing System, pp 396–404

  25. LeCun Y, Huang F J, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol 2, pp 97-104

  26. Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of International Conference on Machine Learning, pp 609–616

  27. Li Z, Shi Z, Zhao W, Li Z, Tang Z (2013) Learning semantic concepts from image database with hybrid generative/discriminative approach. Eng Appl Artif Intell 26(9):2143–2152

    Article  Google Scholar 

  28. Lin M, Chen Q, Yan S (2013) Network in network. arXiv:13124400

  29. Lowe D G (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  30. Ojala T, inen, Harwood D (1996) A comparative study of texture measures with classification based on feature distributions. Pattern Recogn 29(1):51–59

    Article  Google Scholar 

  31. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1717–1724

  32. Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of European Conference on Computer Vision, pp 143–156

  33. Pirbhulal S, Wu W, Mukhopadhyay SC, Li G (2018) Adaptive energy optimization algorithm for internet of medical things. In: Proceedings of the 12th International Conference on Sensing Technology, pp 269–272

  34. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf : An astounding baseline for recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 806–813

  35. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  36. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:13126229

  37. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  38. Song Z, Chen Q, Huang Z, Hua Y, Yan S (2011) Contextualizing object detection and classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1585–1592

  39. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of 31st AAAI Conference on Artificial Intelligence

  40. Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2606–2615

  41. Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907

    Article  Google Scholar 

  42. Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044

    Article  Google Scholar 

  43. Yang YY, Lin YA, Chu HM, Lin HT (2018) Deep learning with a rethinking structure for multi-label classification. arXiv:180201697

  44. Zan W, Tsim YC, Yeung WS, Chan KC (2007) Probabilistic latent semantic analyses (plsa) in bibliometric analysis for technology forecasting. J Technol Manag Innov 2(1):11–24

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61966004, 61663004, 61762078, 61866004), the Guangxi Natural Science Foundation (Nos. 2016GXNSFAA380146, 2017GXNSFAA198365, 2018GXNSFDA281009), the Research Fund of Guangxi Key Lab of Multi-source Information Mining and Security (16-A-03-02, MIMS18-08), the Guangxi Special Project of Science and Technology Base and Talents (AD16380008), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixin Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, T., Li, Z., Zhang, C. et al. Classify multi-label images via improved CNN model with adversarial network. Multimed Tools Appl 79, 6871–6890 (2020). https://doi.org/10.1007/s11042-019-08568-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08568-z

Keywords

Navigation