A sparse kernel relevance model for automatic image annotation

  • Sean MoranEmail author
  • Victor Lavrenko
Regular Paper


In this paper, we introduce a new form of the continuous relevance model (CRM), dubbed the SKL-CRM, that adaptively selects the best performing kernel per feature type for automatic image annotation. Previous image annotation models apply a standard selection of kernels to model the distribution of image features. Popular examples include a Gaussian kernel for modelling GIST features or a Laplacian kernel for global colour histograms. In this work, we demonstrate that this standard assignment of kernels to feature types is sub-optimal and a substantially higher image annotation accuracy can be attained by adapting the kernel-feature assignment. We formulate an efficient greedy algorithm to find the best kernel-feature alignment and show that it is able to rapidly find a sparse subset of features that maximises annotation \(F_{1}\) score. In a second contribution, we introduce two data-adaptive kernels for image annotation—the generalised Gaussian and multinomial kernels—which we demonstrate can better model the distribution of image features as compared to standard kernels. Evaluation is conducted on three standard image datasets across a selection of different feature representations. The proposed SKL-CRM model is found to attain performance that is competitive to a suite of state-of-the-art image annotation models.


Image annotation Object recognition  Kernel density estimation 



We thank the anonymous reviewer for their helpful comments.


  1. 1.
    von Ahn L, Dabbish L (2005) Esp: labeling images with a computer game. In: AAAI spring symposium: knowledge cfrom volunteer contributors, pp 91–98Google Scholar
  2. 2.
    Ames M, Naaman M (2007) Why we tag: motivations for annotation in mobile and online media. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’07ACM, New York, NY, USA, pp 971–980Google Scholar
  3. 3.
    Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: CVPR. IEEE, New York, pp 2911–2918Google Scholar
  4. 4.
    Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135zbMATHGoogle Scholar
  5. 5.
    Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, SIGIR ’03ACM, New York, NY, USA, pp 127–134Google Scholar
  6. 6.
    Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410CrossRefGoogle Scholar
  7. 7.
    Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. Trans Neural Netw 10(5):1055–1064CrossRefGoogle Scholar
  8. 8.
    Chen M, Zheng A, Weinberger KQ (2013) Fast image tagging. In: Dasgupta S, Mcallester D (eds) Proceedings of the 30th international conference on machine learning (ICML-13), vol 28, pp 1274–1282. JMLR workshop and conference proceedingsGoogle Scholar
  9. 9.
    Cooper WS (1995) Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Trans Inf Syst 13(1):100–111CrossRefGoogle Scholar
  10. 10.
    Cusano C, Ciocca G, Schettini R (2003) Image annotation using SVM. In: Santini S, Schettini R (eds) Internet imaging V, society of photo-optical instrumentation engineers (SPIE) conference Series, vol 5304, pp 330–338Google Scholar
  11. 11.
    Duygulu P, Barnard K, de Freitas JFG, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European conference on computer vision-part IV, ECCV ’02. Springer, London, pp 97–112Google Scholar
  12. 12.
    Enser P, Sandom C, Lewis P (2005) Automatic annotation of images from the practitioner perspective. In: Image and video retrieval, pp 497–506Google Scholar
  13. 13.
    Feng SL, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR’04. IEEE Computer Society, Washington, DC, pp 1002–1009Google Scholar
  14. 14.
    Fu H, Zhang Q, Qiu G (2012) Random forest for image annotation. In: Proceedings of the 12th European conference on computer vision, , ECCV’12, vol Part VI. Springer, Berlin, pp 86–99Google Scholar
  15. 15.
    Grangier D, Bengio S (2008) A discriminative kernel-based approach to rank images from text queries. IEEE Trans Pattern Anal Mach Intell 30(8):1371–1384. doi: 10.1109/TPAMI.2007.70791
  16. 16.
    Grubinger M (2007) Analysis and evaluation of visual information systems performance. PhD thesis, School of Computer Science and Mathematics, Faculty of Health, Engineering and Science, Victoria University, Melbourne, AustraliaGoogle Scholar
  17. 17.
    Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: International conference on computer vision, pp 309–316Google Scholar
  18. 18.
    Hentschel C, Stober S, Nrnberger A, Detyniecki M (2007) Automatic image annotation using a visual dictionary based on reliable image segmentation. In: Adaptive multimedia retrieval. Lecture Notes in Computer Science, vol 4918. Springer, Berlin, pp 45–56Google Scholar
  19. 19.
    Howarth P, Rüger S (2005) Fractional distance measures for content-based image retrieval. In: Proceedings of the 27th European conference on advances in information retrieval research, ECIR’05. Springer, Berlin, pp 447–456Google Scholar
  20. 20.
    Huang J, Kumar SR, Zabih R (1998) An automatic hierarchical image classification scheme. In: Proceedings of the Sixth ACM international conference on multimedia, MULTIMEDIA ’98. ACM, New York, pp 219–228Google Scholar
  21. 21.
    Indyk P, Motwani R (1998) Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, STOC ’98. ACM, New York, pp 604–613Google Scholar
  22. 22.
    Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in Information retrieval, SIGIR ’03. ACM, New York, pp 119–126Google Scholar
  23. 23.
    Jeon J, Manmatha R (2004) Using maximum entropy for automatic image annotation. In: CIVR. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, pp. 24–32Google Scholar
  24. 24.
    Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02. ACM, New York, pp 133–142Google Scholar
  25. 25.
    Lavrenko V, Feng S, Manmatha R (2004) Statistical models for automatic video annotation and retrieval. ICASSP 3:1044–1047Google Scholar
  26. 26.
    Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. NIPSGoogle Scholar
  27. 27.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition, CVPR ’06, vol 2. IEEE Computer Society, Washington, DC, pp 2169–2178Google Scholar
  28. 28.
    Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228CrossRefzbMATHGoogle Scholar
  29. 29.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  30. 30.
    Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the 10th European conference on computer vision: part III, ECCV ’08. Springer, Berlin, pp 316–329Google Scholar
  31. 31.
    Markkula M, Sormunen E (2000) End-user searching challenges indexing practices in the digital newspaper photo archive. Inf Retr 1(4):259–285CrossRefzbMATHGoogle Scholar
  32. 32.
    Metzler D, Manmatha R (2004) An inference network approach to image retrieval. In: Proceedings of the international conference on image and video retrieval. Springer, Berlin, pp 42–50.Google Scholar
  33. 33.
    Mittelman R, Lee H, Kuipers B, Savarese S (2013) Weakly supervised learning of mid-level features with beta-bernoulli process restricted boltzmann machines. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition, CVPR ’13. IEEE Computer Society, Washington, DC, pp 476–483Google Scholar
  34. 34.
    Moran S, Lavrenko V (2011) Optimal tag sets for automatic image annotation. In: Proceedings of the British machine vision conference. BMVA Press, London, pp 1.1–1.11Google Scholar
  35. 35.
    Moran S, Lavrenko V (2014) Sparse kernel learning for image annotation. In: Proceedings of international conference on multimedia retrieval, ICMR ’14. ACM, New York, pp 113:113–113:120Google Scholar
  36. 36.
    Moran S, Lavrenko V, Osborne M (2013) Variable bit quantisation for lsh. In: Proceedings of the 51st annual meeting of the association for computational linguistics (vol 2: short papers). Association for Computational Linguistics, Sofia, pp. 753–758Google Scholar
  37. 37.
    Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 first international workshop on multimedia intelligent storage and retrieval managementGoogle Scholar
  38. 38.
    Nakayama H (2011) Linear distance metric learning for large-scale generic image recognition. PhD thesis, The University of Tokyo, JapanGoogle Scholar
  39. 39.
    Oliva A, Schyns P (2000) Diagnostic colors mediate scene recognition. Cogn Psychol 41(2):176–210CrossRefGoogle Scholar
  40. 40.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175Google Scholar
  41. 41.
    Richtárik P, Takác M (2013) Distributed coordinate descent method for learning with big data. In: CoRR’13Google Scholar
  42. 42.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  43. 43.
    Smucker MD, Allan J, Carterette B (2007) A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the sixteenth ACM conference on information and knowledge management, CIKM ’07. ACM, New York, pp 623–632Google Scholar
  44. 44.
    Ulz MH, Moran SJ (2013) Optimal kernel shape and bandwidth for atomistic support of continuum stress. Model Simul Mater Sci Eng 21(8):085, 017Google Scholar
  45. 45.
    Verma Y, Jawahar CV (2012) Image annotation using metric learning in semantic neighbourhoods. In: Proceedings of the 12th European conference on computer vision, ECCV’12, vol Part III. Springer, Berlin, pp 836–849Google Scholar
  46. 46.
    Wang B, Li ZW, Yu N, Li M (2007) Image annotation in a progressive way. In: Proceedings of ICME, pp 811–814Google Scholar
  47. 47.
    van de Weijer J, Schmid C (2006) Coloring local feature extraction. In: Proceedings of the 9th European conference on computer vision, ECCV’06, vol Part II. Springer, Berlin, pp 334–348Google Scholar
  48. 48.
    Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244zbMATHGoogle Scholar
  49. 49.
    Weston J, Bengio S, Usunier N (2010) Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81(1):21–35MathSciNetCrossRefGoogle Scholar
  50. 50.
    Xiang Y, Zhou X, Unviersity F, seng Chua T, wah Ngo C (2009) A revisit of generative model for automatic image annotation using markov random fields. In: Proceedings of IEEE computer vision and pattern recognition, pp 1153–1160Google Scholar
  51. 51.
    Yakhnenko O, Honavar V (2008) Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th international workshop on multimedia data mining: held in conjunction with the ACM SIGKDD 2008, MDM ’08. ACM, New York, pp 1–7Google Scholar
  52. 52.
    Yashaswi Verma CJ (2013)Exploring svm for image annotation in presence of confusing labels. In: Proceedings of the British machine vision conference. BMVA Press, LondonGoogle Scholar
  53. 53.
    Yavlinsky A, Schofield E, Rüger S (2005) Automated image annotation using global features and robust nonparametric density estimation. In: Proceedings of the 4th international conference on image and video retrieval, CIVR’05. Springer, Berlin, pp 507–517Google Scholar
  54. 54.
    Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2010) Automatic image annotation using group sparsity. In: CVPR. IEEE, New York, pp 3312–3319Google Scholar
  55. 55.
    Zhu S, Liu Y (2008) Image annotation refinement using semantic similarity correlation. In: ICPR’08Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  1. 1.University of Edinburgh, Informatics ForumEdinburghScotland, UK

Personalised recommendations