Skip to main content
Log in

CNN-feature based automatic image annotation method

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic image annotation(AIA) methods are considered as a kind of efficient schemes to solve the problem of semantic-gap between the original images and their semantic information. However, traditional annotation models work well only with finely crafted manual features. To address this problem, we combined the CNN feature of an image into our proposed model which we referred as SEM by using a famous CNN model-AlexNet. We extracted a CNN feature by removing its final layer and it is proved to be useful in our SEM model. Additionally, based on the experience of the traditional KNN models, we propose a model to address the problem of simultaneously addressing the image tag refinement and assignment while maintaining the simplicity of the KNN model. The proposed model divides the images which have similar features into a semantic neighbor group. Moreover, utilizing a self-defined Bayesian-based model, we distribute the tags which belong to the neighbor group to the test images according to the distance between the test image and the neighbors. At last, the experiments are performed on three typical image datasets corel5k, espGame and laprtc12, which verify the effectiveness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems

  2. Cusano C, Bicocca M, Bicocca V (2003) Image annotation using SVM. Proc SPIE 1:330–338

    Article  Google Scholar 

  3. Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60

    Article  Google Scholar 

  4. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) DeCAF: a deep convolutional activation feature for generic visual recognition, 32

  5. Duygulu P, Barnard K, de Freitas JFG, Forsyth DA (2002) Object recognition as machine translation learning a lexicon for a fixed image vocabulary, pp 97–112

  6. Feng SL, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. Proc 2004 IEEE Comput Soc Confon Comput Vis Pattern Recogn 2004 CVPR 2004 2:1002–1009

    Article  Google Scholar 

  7. Gao Y, Fan J, Xue X, Jain R (2006) Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers. In: Proceedings of the 14th annual ACM international conference on multimedia - MULTIMEDIA ’06, (January), pp 901

  8. Gru̇binger M, Clough P, Mu̇ller H, Deselaers T (2006) The IAPR TC-12 benchmark a new evaluation resource for visual information systems. LREC Workshop OntoImage language resources for content-based image retrieval, pp 13–23

  9. Guillaumin M, Mensink T, Verbeek J, Schmid C, Guillaumin M, Mensink T, Verbeek J, Discrim CST, Guillaumin M, Mensink T, Verbeek J, Schmid C, Kuntzmann JL (2010) TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation to cite this version: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation

  10. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778

  11. Jeon J, Lavrenko VP, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval - SIGIR ’03, p 119

  12. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  13. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition

  14. Li Z, Jinhui T (2015) Deep matrix factorization for social image tag refinement and assignment. In: IEEE 17th International workshop on multimedia signal processing, MMSP 2015 (200)

  15. Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999

    Article  Google Scholar 

  16. Li Z, Liu J, Tang J, Hanqing L (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098

    Article  Google Scholar 

  17. Li Z, Jinhui T (2017) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288

    Article  MathSciNet  Google Scholar 

  18. Luo Y, Yang Y, Shen F, Huang Z, Zhou P, Shen HT (2018) Robust discrete code modeling for supervised hashing. Pattern Recogn 75:128–135

    Article  Google Scholar 

  19. Makadia A, Pavlovic V, Kumar S (2010) A new baselines for image annotation. Int J Comput Vis 90:88–105

    Article  Google Scholar 

  20. Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM

  21. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1717–1724

  22. Razavian AS, Azizpour H, Sullivan J, Carlsson S, Sharif A, Hossein R, Josephine A, Stefan S, Royal KTH (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Cvprw, pp 512–519

  23. Rongyao H, Zhu X, Cheng D, He W, Yan Y, Song J, Shichao Z (2017) Graph self-representation method for unsupervised feature selection. Neurocomputing 220:130–137

    Article  Google Scholar 

  24. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, pp 1–14

  25. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  26. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, Hill C, Arbor A (2014) Going deeper with convolutions, 1–9

  27. von Ahn L, Dabbish L (2004) Proceedings of the 2004 conference on human factors in computing systems - CHI ’04 pp 319–326

  28. Wang C, Blei D, Li F-F (2009) Simultaneous image classification and annotation. In: 2009 IEEE Computer society conference on computer vision and pattern recognition workshops. CVPR Workshops 2009, pp 1903–1910

  29. Wang S, Chang XJ, Li X, Long G, Yao L, Sheng QZ (2016) Diagnosis code assignment using sparsity-based disease correlation embedding. IEEE Trans Knowl Data Eng 28(12):3191–3202

    Article  Google Scholar 

  30. Wang S, Li X, Chang X, Yao L, Sheng . ZQ, Long G (2017) Learning multiple diagnosis codes for ICU patients with local disease correlation mining. ACM Trans Knowl Discov Data 11(3):1–21

    Google Scholar 

  31. Yang Y, Ma Z, Yang Y, Nie F, Shen HT (2015) Multitask spectral clustering by exploring intertask correlation. IEEE Trans Cybern 45(5):1069–1080

    Article  Google Scholar 

  32. Yang Y, Shen F, Shen HT, Li H, Li X (2015) Robust discrete spectral hashing for large-scale image semantic indexing. IEEE Trans Big Data 1(4):162–171

    Article  Google Scholar 

  33. Yang Y, Shen F, Huang Z, Shen HT, Li X (2017) Discrete nonnegative spectral clustering. IEEE Trans Knowl Data Eng 29(9):1834–1845

    Article  Google Scholar 

  34. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8689 LNCS(PART 1):818–833

    Google Scholar 

  35. Zhu X, Li X, Zhang S, Ju C, Wu X (2016) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 1:1–13

    Google Scholar 

  36. Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph PCA hashing for similarity search. IEEE Trans Multimed 19(9):2033–2044

    Article  Google Scholar 

  37. Zhu X, Suk H-I, Huang H, Dinggang S (2017) Low-rank graph-regularized structured sparse regression for identifying genetic biomarkers. IEEE Trans Big Data 3(4):1–1

    Article  Google Scholar 

Download references

Acknowledgments

This research is partially supported by Natural Science Foundation of China (Grant No.61602353) and the Fundamental Research Funds for the Central Universities (WUT:2017IVA053, WUT:2017IVB028 and WUT:2017YB028).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Xie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y., Liu, Y., Xie, Q. et al. CNN-feature based automatic image annotation method. Multimed Tools Appl 78, 3767–3780 (2019). https://doi.org/10.1007/s11042-018-6038-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6038-x

Keywords

Navigation