Classify multi-label images via improved CNN model with adversarial network

Zhou, Tao; Li, Zhixin; Zhang, Canlong; Ma, Huifang

doi:10.1007/s11042-019-08568-z

Classify multi-label images via improved CNN model with adversarial network

Published: 18 December 2019

Volume 79, pages 6871–6890, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Tao Zhou¹,
Zhixin Li ORCID: orcid.org/0000-0002-5313-6134¹,
Canlong Zhang¹ &
…
Huifang Ma²

452 Accesses
14 Citations
Explore all metrics

Abstract

Convolution neural network (CNN) achieves outstanding results in single-label image classification task. However, due to the complex underlying object layout and insufficient multi-label training images, how to achieve better performance for multi-label images via CNN is still an open problem. In this work, we propose an improved deep CNN model which can extract features of objects at different scales in multi-label images by spatial pyramid pooling as well as feature fusion. In model training, we first transfer the parameters pre-trained on ImageNet to our model, then an Adversarial Network is trained to generate examples with occlusions, which makes our model invariant to occlusions. Experimental results on Pascal VOC 2012 and Corel 5K image datasets demonstrate the superiority of the proposed approach over many approaches. The mAP of our model reaches 84.0% on the VOC 2012 dataset, which significantly outperforms most approaches and closes to HCP, the representative multi-label classification approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Improved Convolutional Neural Network Model with Adversarial Net for Multi-label Image Classification

Learning Local Instance Constraint for Multi-label Classification

Multi-label image classification with recurrently learning semantic dependencies

Article 15 December 2018

References

Alfassy A, Karlinsky L, Aides A, Shtok J, Harary S, Feris R, Giryes R, Bronstein AM (2019) Laso: Label-set operations networks for multi-label few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6548–6557
Azizpour H, Razavian AS, Sullivan J, Maki A, Carlsson S (2015) Factors of transferability for a generic convnet representation. IEEE Trans Pattern Anal Mach Intell 38(9):1790–1802
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):article 27
Article Google Scholar
Chen Q, Song Z, Hua Y, Huang Z, Yan S (2012) Hierarchical matching with side information for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3426–3433
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol 1, pp 886–893
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: Proceedings of International Conference on Machine Learning, pp 647–655
Dong J, Xia W, Chen Q, Feng J, Huang Z, Yan S (2013) Subcategory-aware object classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 827–834
Duygulu P, Freitas ND, Barnard K, Forsyth DA (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of European Conference on Computer Vision, pp 97–112
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Gong Y, Jia Y, Leung T, Toshev A, Ioffe S (2014) Deep convolutional ranking for multilabel image annotation. arXiv:13124894
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Proceedings of European Conference on Computer Vision and Pattern Recognition, pp 392–407
Han T, Zhang L, Pirbhulal S, Wu W, de Albuquerque VHC (2019) A novel cluster head selection technique for edge-computing based iomt systems. Comput Netw 158:114–122
Article Google Scholar
Harzallah H, Jurie F, Schmid C (2010) Combining efficient object localization and image classification. In: Proceedings of IEEE International Conference on Computer Vision, pp 237–244
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Hedelin P, Skoglund J (2000) Vector quantization based on gaussian mixture models. IEEE Trans Speech Audio Process 8(4):385–401
Article Google Scholar
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2016) Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2261–2269
Jarrett K, Kavukcuoglu K, Ranzato M, Lecun Y (2009) What is the best multi-stage architecture for object recognition? In: Proceedings of International Conference on Computer Vision, pp 2146–2153
King RA, Nasrabadi NM (1988) Image coding using vector quantization in the transform domain. IEEE Trans Commun 1(8):957–971
Google Scholar
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun Acm 60(6):84–90
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2169–2178
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a backpropogation network. In: Advances in Neural Information Processing System, pp 396–404
LeCun Y, Huang F J, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol 2, pp 97-104
Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of International Conference on Machine Learning, pp 609–616
Li Z, Shi Z, Zhao W, Li Z, Tang Z (2013) Learning semantic concepts from image database with hybrid generative/discriminative approach. Eng Appl Artif Intell 26(9):2143–2152
Article Google Scholar
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:13124400
Lowe D G (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Ojala T, inen, Harwood D (1996) A comparative study of texture measures with classification based on feature distributions. Pattern Recogn 29(1):51–59
Article Google Scholar
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1717–1724
Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of European Conference on Computer Vision, pp 143–156
Pirbhulal S, Wu W, Mukhopadhyay SC, Li G (2018) Adaptive energy optimization algorithm for internet of medical things. In: Proceedings of the 12th International Conference on Sensing Technology, pp 269–272
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf : An astounding baseline for recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 806–813
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:13126229
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Song Z, Chen Q, Huang Z, Hua Y, Yan S (2011) Contextualizing object detection and classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1585–1592
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of 31st AAAI Conference on Artificial Intelligence
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2606–2615
Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907
Article Google Scholar
Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044
Article Google Scholar
Yang YY, Lin YA, Chu HM, Lin HT (2018) Deep learning with a rethinking structure for multi-label classification. arXiv:180201697
Zan W, Tsim YC, Yeung WS, Chan KC (2007) Probabilistic latent semantic analyses (plsa) in bibliometric analysis for technology forecasting. J Technol Manag Innov 2(1):11–24
Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61966004, 61663004, 61762078, 61866004), the Guangxi Natural Science Foundation (Nos. 2016GXNSFAA380146, 2017GXNSFAA198365, 2018GXNSFDA281009), the Research Fund of Guangxi Key Lab of Multi-source Information Mining and Security (16-A-03-02, MIMS18-08), the Guangxi Special Project of Science and Technology Base and Talents (AD16380008), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, China
Tao Zhou, Zhixin Li & Canlong Zhang
College of Computer Science and Engineering, Northwest Normal University, Lanzhou, 730070, China
Huifang Ma

Authors

Tao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Li
View author publications
You can also search for this author in PubMed Google Scholar
Canlong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixin Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, T., Li, Z., Zhang, C. et al. Classify multi-label images via improved CNN model with adversarial network. Multimed Tools Appl 79, 6871–6890 (2020). https://doi.org/10.1007/s11042-019-08568-z

Download citation

Received: 28 December 2018
Revised: 16 October 2019
Accepted: 06 December 2019
Published: 18 December 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11042-019-08568-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classify multi-label images via improved CNN model with adversarial network

Abstract

Access this article

Similar content being viewed by others

An Improved Convolutional Neural Network Model with Adversarial Net for Multi-label Image Classification

Learning Local Instance Constraint for Multi-label Classification

Multi-label image classification with recurrently learning semantic dependencies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classify multi-label images via improved CNN model with adversarial network

Abstract

Access this article

Similar content being viewed by others

An Improved Convolutional Neural Network Model with Adversarial Net for Multi-label Image Classification

Learning Local Instance Constraint for Multi-label Classification

Multi-label image classification with recurrently learning semantic dependencies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation