Multimedia Tools and Applications

, Volume 78, Issue 3, pp 3767–3780 | Cite as

CNN-feature based automatic image annotation method

  • Yanchun Ma
  • Yongjian Liu
  • Qing XieEmail author
  • Lin Li


Automatic image annotation(AIA) methods are considered as a kind of efficient schemes to solve the problem of semantic-gap between the original images and their semantic information. However, traditional annotation models work well only with finely crafted manual features. To address this problem, we combined the CNN feature of an image into our proposed model which we referred as SEM by using a famous CNN model-AlexNet. We extracted a CNN feature by removing its final layer and it is proved to be useful in our SEM model. Additionally, based on the experience of the traditional KNN models, we propose a model to address the problem of simultaneously addressing the image tag refinement and assignment while maintaining the simplicity of the KNN model. The proposed model divides the images which have similar features into a semantic neighbor group. Moreover, utilizing a self-defined Bayesian-based model, we distribute the tags which belong to the neighbor group to the test images according to the distance between the test image and the neighbors. At last, the experiments are performed on three typical image datasets corel5k, espGame and laprtc12, which verify the effectiveness of the proposed model.


Image annotation Convolutional neural network Feature extraction Semantic extension 



This research is partially supported by Natural Science Foundation of China (Grant No.61602353) and the Fundamental Research Funds for the Central Universities (WUT:2017IVA053, WUT:2017IVB028 and WUT:2017YB028).


  1. 1.
    Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systemsGoogle Scholar
  2. 2.
    Cusano C, Bicocca M, Bicocca V (2003) Image annotation using SVM. Proc SPIE 1:330–338CrossRefGoogle Scholar
  3. 3.
    Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60CrossRefGoogle Scholar
  4. 4.
    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) DeCAF: a deep convolutional activation feature for generic visual recognition, 32Google Scholar
  5. 5.
    Duygulu P, Barnard K, de Freitas JFG, Forsyth DA (2002) Object recognition as machine translation learning a lexicon for a fixed image vocabulary, pp 97–112Google Scholar
  6. 6.
    Feng SL, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. Proc 2004 IEEE Comput Soc Confon Comput Vis Pattern Recogn 2004 CVPR 2004 2:1002–1009CrossRefGoogle Scholar
  7. 7.
    Gao Y, Fan J, Xue X, Jain R (2006) Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers. In: Proceedings of the 14th annual ACM international conference on multimedia - MULTIMEDIA ’06, (January), pp 901Google Scholar
  8. 8.
    Gru̇binger M, Clough P, Mu̇ller H, Deselaers T (2006) The IAPR TC-12 benchmark a new evaluation resource for visual information systems. LREC Workshop OntoImage language resources for content-based image retrieval, pp 13–23Google Scholar
  9. 9.
    Guillaumin M, Mensink T, Verbeek J, Schmid C, Guillaumin M, Mensink T, Verbeek J, Discrim CST, Guillaumin M, Mensink T, Verbeek J, Schmid C, Kuntzmann JL (2010) TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation to cite this version: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotationGoogle Scholar
  10. 10.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778Google Scholar
  11. 11.
    Jeon J, Lavrenko VP, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval - SIGIR ’03, p 119Google Scholar
  12. 12.
    Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90CrossRefGoogle Scholar
  13. 13.
    LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognitionGoogle Scholar
  14. 14.
    Li Z, Jinhui T (2015) Deep matrix factorization for social image tag refinement and assignment. In: IEEE 17th International workshop on multimedia signal processing, MMSP 2015 (200)Google Scholar
  15. 15.
    Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999CrossRefGoogle Scholar
  16. 16.
    Li Z, Liu J, Tang J, Hanqing L (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098CrossRefGoogle Scholar
  17. 17.
    Li Z, Jinhui T (2017) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288MathSciNetCrossRefGoogle Scholar
  18. 18.
    Luo Y, Yang Y, Shen F, Huang Z, Zhou P, Shen HT (2018) Robust discrete code modeling for supervised hashing. Pattern Recogn 75:128–135CrossRefGoogle Scholar
  19. 19.
    Makadia A, Pavlovic V, Kumar S (2010) A new baselines for image annotation. Int J Comput Vis 90:88–105CrossRefGoogle Scholar
  20. 20.
    Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRMGoogle Scholar
  21. 21.
    Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1717–1724Google Scholar
  22. 22.
    Razavian AS, Azizpour H, Sullivan J, Carlsson S, Sharif A, Hossein R, Josephine A, Stefan S, Royal KTH (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Cvprw, pp 512–519Google Scholar
  23. 23.
    Rongyao H, Zhu X, Cheng D, He W, Yan Y, Song J, Shichao Z (2017) Graph self-representation method for unsupervised feature selection. Neurocomputing 220:130–137CrossRefGoogle Scholar
  24. 24.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, pp 1–14Google Scholar
  25. 25.
    Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRefGoogle Scholar
  26. 26.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, Hill C, Arbor A (2014) Going deeper with convolutions, 1–9Google Scholar
  27. 27.
    von Ahn L, Dabbish L (2004) Proceedings of the 2004 conference on human factors in computing systems - CHI ’04 pp 319–326Google Scholar
  28. 28.
    Wang C, Blei D, Li F-F (2009) Simultaneous image classification and annotation. In: 2009 IEEE Computer society conference on computer vision and pattern recognition workshops. CVPR Workshops 2009, pp 1903–1910Google Scholar
  29. 29.
    Wang S, Chang XJ, Li X, Long G, Yao L, Sheng QZ (2016) Diagnosis code assignment using sparsity-based disease correlation embedding. IEEE Trans Knowl Data Eng 28(12):3191–3202CrossRefGoogle Scholar
  30. 30.
    Wang S, Li X, Chang X, Yao L, Sheng . ZQ, Long G (2017) Learning multiple diagnosis codes for ICU patients with local disease correlation mining. ACM Trans Knowl Discov Data 11(3):1–21Google Scholar
  31. 31.
    Yang Y, Ma Z, Yang Y, Nie F, Shen HT (2015) Multitask spectral clustering by exploring intertask correlation. IEEE Trans Cybern 45(5):1069–1080CrossRefGoogle Scholar
  32. 32.
    Yang Y, Shen F, Shen HT, Li H, Li X (2015) Robust discrete spectral hashing for large-scale image semantic indexing. IEEE Trans Big Data 1(4):162–171CrossRefGoogle Scholar
  33. 33.
    Yang Y, Shen F, Huang Z, Shen HT, Li X (2017) Discrete nonnegative spectral clustering. IEEE Trans Knowl Data Eng 29(9):1834–1845CrossRefGoogle Scholar
  34. 34.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8689 LNCS(PART 1):818–833Google Scholar
  35. 35.
    Zhu X, Li X, Zhang S, Ju C, Wu X (2016) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 1:1–13Google Scholar
  36. 36.
    Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph PCA hashing for similarity search. IEEE Trans Multimed 19(9):2033–2044CrossRefGoogle Scholar
  37. 37.
    Zhu X, Suk H-I, Huang H, Dinggang S (2017) Low-rank graph-regularized structured sparse regression for identifying genetic biomarkers. IEEE Trans Big Data 3(4):1–1CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyWuhan University of TechnologyWuhanChina

Personalised recommendations