ImageCLEF annotation with explicit context-aware kernel maps

Regular Paper

Abstract

The general recipe of kernel methods, such as support vector machines (SVMs), includes a preliminary step of hand-crafting or designing similarity kernels. This process, which has been extensively studied during the last decade, has proven to be relatively successful in solving many pattern recognition problems including image annotation. However, many proposed solutions for kernel design, consider similarity between data by taking into account only their content and without context. In this paper, we propose an alternative that upgrades and further enhances usual kernels by making them context-aware. The method is based on the optimization of an objective function mixing content, regularization and also context. We will show that the underlying kernel solution converges to a positive semi-definite fixed-point, which can also be expressed as a dot product involving “explicit” kernel maps. When plugging these explicit context-aware kernel maps into support vector machines, performances substantially improve and outperform competitors for the hard task of image annotation using a recent ImageCLEF annotation benchmark.

Keywords

Explicit kernel maps Context-aware kernels Support vector machines Image annotation 

References

  1. 1.
    Bahlmann C, Haasdonk B, Burkhardt H (2002) On-line handwriting recognition with support vector machines, a kernel approach. In: Proceedings of IWFHR, pp 49–54Google Scholar
  2. 2.
    Barnard K, Duygululu P, Forsyth D, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135Google Scholar
  3. 3.
    Belkin M, Niyogi P (2004) Semi-supervised learning on manifolds. Mach Learn 56:209–239Google Scholar
  4. 4.
    Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comp 15(6):1373–1396CrossRefMATHGoogle Scholar
  5. 5.
    Belkin M, Niyogi P (2006) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7:2399–2434Google Scholar
  6. 6.
    Benavent X, Castellanos A, de Ves E, Hernández-Aranda D, Granados R, Garcia-Serrano A (2013) A multimedia ir-based system for the photo annotation task at imageclef2013. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013Google Scholar
  7. 7.
    Bertelli L, Yu T, Vu D, Gokturk B (2011) Kernelized structural svm learning for supervised object segmentation. In: Proceedings of computer vision and pattern recognition (CVPR), IEEE Conference, IEEE, pp 2153–2160Google Scholar
  8. 8.
    Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, SIGIR ’03, ACM, New York, pp 127–134Google Scholar
  9. 9.
    Borgne H, Popescua A, Znaidia A (2013) Cea list@imageclef 2013: scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs andworkshop, online working notes. Valencia, Spain, 23–26 Sept 2013Google Scholar
  10. 10.
    Bottou L (2010) Large scale machine learning with stochastic gradient descent. In: Proceedings of the 19th international conference on computational statistics, pp 177–187Google Scholar
  11. 11.
    Boughorbel S, Tarel J, Boujemaa N (2005) The intermediate matching kernel for image local features. In: Proceedings of IEEE international joint conference on neural networks, vol 2, pp 889–894Google Scholar
  12. 12.
    Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. Pattern Anal Mach Intell IEEE Trans 23(11):1222–1239CrossRefGoogle Scholar
  13. 13.
    Cao L, Luo J, Huang T (2008) Annotating photo collection by label propagation according to multiple similarity cues. ACM MultimediaGoogle Scholar
  14. 14.
    Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. Pattern Anal Mach Intell IEEE Trans 29(3):394–410CrossRefGoogle Scholar
  15. 15.
    Carson C, Thomas M, Belongie S, Hellerstein JM, Malik J (1999) Blobworld: a system for region-based image indexing and retrieval. In: Proceedings of third international conference on visual information systems, pp 509–516Google Scholar
  16. 16.
    Chang E, Goh K, Sychay G, Wu G (2003) Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. Circuits Syst Video Technol IEEE Trans 13(1):26–38CrossRefGoogle Scholar
  17. 17.
    Cusano C, Ciocca G, Schettini R (2003) Image annotation using svm. In: Proceedings of electronic imaging 2004, International Society for Optics and Photonics, pp 330–338Google Scholar
  18. 18.
    Davis M, King S, Good N, Sarvas R (2004) From context to content: leveraging context to infer media metadata. In: Proceedings of 12th annual ACM international conference on multimedia, MM 2004, Brave new topics session on from context to content: leveraging contextual metadata to infer multimedia Content, ACM Press, New York, pp 188–195Google Scholar
  19. 19.
    Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of computer vision and pattern recognition, CVPR 2009. IEEE Conference, IEEE, pp 248–255Google Scholar
  20. 20.
    Duygulu P, Barnard K, deFreitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden A, Sparr G, Nielsen M, Johansen P (eds) ECCV 2002, LNCS, vol 2353. Springer, Heidelberg, pp 97–112Google Scholar
  21. 21.
    Feng S, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of ICCV, pp 1002–1009Google Scholar
  22. 22.
    Gallagher A, Neustaedter C, Cao L, Luo J, Chen T (2008) Image annotation using personal calendars as context. ACM MultimediaGoogle Scholar
  23. 23.
    Gao Y, Fan J, Xue X, Jain R (2006) Automatic image annotation by incorporating feature hierarchy and boosting to scale up svm classifiers. In: Proceedings of ACM MultimediaGoogle Scholar
  24. 24.
    Gartner T (2003) A survey of kernels for structured data. Multi Relat Data Min 5(1):49–58MathSciNetGoogle Scholar
  25. 25.
    Gómez-Chova L, Camps-Valls G, Munoz-Mari J, Calpe J (2008) Semisupervised image classification with laplacian support vector machines. Geosci Remote Sens Lett IEEE 5(3):336–340CrossRefGoogle Scholar
  26. 26.
    Grana C, Serra G, Manfredi M, Cucchiara R, Martoglia R, Mandreoli F (2013) Unimore at imageclef 2013: scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013Google Scholar
  27. 27.
    Grangier D, Bengio S (2008) A discriminative kernel-based approach to rank images from text queries. Pattern Anal Mach Intell IEEE Trans 30(8):1371–1384CrossRefGoogle Scholar
  28. 28.
    Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res (JMLR) 8:725–760MATHGoogle Scholar
  29. 29.
    Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of computer vision, IEEE 12th international conference, IEEE, pp 309–316Google Scholar
  30. 30.
    Gupta M, Li R, Yin Z, Han J (2010) Survey on social tagging techniques. SIGKDD Explor 12(1):58–72CrossRefGoogle Scholar
  31. 31.
    Hanjalic A (2012) A new gap to bridge: where to go next in social media retrieval? In: Schoeffmann K, Mérialdo B, Hauptmann AG, Ngo C-W, Andreopoulos Y, Breiteneder C (eds) Advances in Multimedia Modeling, 18th International Conference, MMM 2012. Lecture notes in Computer Science, vol 7131. Springer, HeidelbergGoogle Scholar
  32. 32.
    He X, Zemel RS, Carreira-Perpindn MA (2004) Multiscale conditional random fields for image labeling. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR 2004, vol 2. IEEE, pp 695–702Google Scholar
  33. 33.
    Hidaka M, Gunji N, Harada T (2013) Mil at imageclef 2013: scalable system for image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013Google Scholar
  34. 34.
    Hironobu YM, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Proceedings of Boltzmann machines, neural networks, pp 405–409Google Scholar
  35. 35.
    Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of ACM SIGIR, pp 119–126Google Scholar
  36. 36.
    Jin R, Chai JY, Si L (2004) Effective automatic image annotation via a coherent language model and active learning. In: Proceedings of the 12th annual ACM international conference on Multimedia, ACM, pp 892–899Google Scholar
  37. 37.
    Jin Y, Khan L, Wang L, Awad M (2005) Image annotations by combining multiple evidence and wordnet. In: Proceedings of ACM Multimedia, pp 706–715Google Scholar
  38. 38.
    Kang F, Jin R, Sukthankar R (2006) Correlated label propagation with application to multi-label learning. In: Proceedings of computer vision and pattern recognition, IEEE Computer Society Conference, vol 2. IEEE, pp 1719–1726Google Scholar
  39. 39.
    Kondor R, Jebara T (2003) A kernel between sets of vectors. In: Proceedings of the 20th international conference on machine learningGoogle Scholar
  40. 40.
    Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1106–1114Google Scholar
  41. 41.
    Lavrenko V, Manmatha R, Jeon J (2004) A model for learning the semantics of pictures. In: Proceedings of NIPSGoogle Scholar
  42. 42.
    Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans PAMI 25(9):1075–1088CrossRefGoogle Scholar
  43. 43.
    Li J, Wang JZ (2008) Real-time computerized annotation of pictures. Pattern Anal Mach Intell IEEE Trans 30(6):985–1002CrossRefGoogle Scholar
  44. 44.
    Li X, Liao S, Liu B, Yang G, Jin Q, Xu J, Du X (2013) Renmin University of China at imageclef 2013 scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013Google Scholar
  45. 45.
    Li X, Snoek C, Worring M (2008) Learning tag relevance by neighbor voting for social image retrieval. In: Proceedings of MIR conferenceGoogle Scholar
  46. 46.
    Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228CrossRefMATHGoogle Scholar
  47. 47.
    Liu J, Wang B, Li M, Li Z, Ma W, Lu H, Ma S (2007) Dual cross-media relevance model for image annotation. In: Proceedings of ACM Multimedia, pp 605–614Google Scholar
  48. 48.
    Liu W, Tao D (2013) Multiview hessian regularization for image annotation. Image Process IEEE Trans 22(7):2676–2687CrossRefMathSciNetGoogle Scholar
  49. 49.
    Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60CrossRefGoogle Scholar
  50. 50.
    Lyu S (2005) Mercer kernels for object recognition with local features. In: Proceedings of the IEEE computer vision and pattern recognitionGoogle Scholar
  51. 51.
    Maji S, Berg AC, Malik J (2013) Efficient classification for additive kernel svms. IEEE PAMI 35(1):66–77CrossRefGoogle Scholar
  52. 52.
    Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Forsyth D, Torr P, Zisserman A (eds) Computer vision—ECCV 2008, 10th European conference on computer vision. Lecture notes in computer science, vol 5304. Springer, Heidelberg, pp 316–329Google Scholar
  53. 53.
    Mei T, Wang Y, Hua X-S, Gong S, Li S (2008) Coherent image annotation by learning semantic distance. In: Proceedings of computer vision and pattern recognition, CVPR, IEEE conference, IEEE, pp 1–8Google Scholar
  54. 54.
    Monay F, Gatica Perez D (2004) Plsa-based image autoannotation: constraining the latent space. In: Proceedings of ACM international conference on multimediaGoogle Scholar
  55. 55.
    Moran S, Lavrenko V (2014) A sparse kernel relevance model for automatic image annotation. Int J Multimed Inf Retr 3(4):209– 229CrossRefGoogle Scholar
  56. 56.
    Moreno P, Ho P, Vasconcelos N (2003) A kullback-leibler divergence based kernel for svm classfication in multimedia applications. In: Proceedings of neural information processing systemsGoogle Scholar
  57. 57.
    Moser G, Serpico B (2012) Combining support vector machines and markov random fields in an integrated framework for contextual image classification. In: Proceedings of TGRSGoogle Scholar
  58. 58.
    Narayanan H, Belkin M, Niyogi P (2006) On the relation between low density separation, spectral clustering and graph cuts. In: Proceedings of advances in neural information processing systems, pp 1025–1032Google Scholar
  59. 59.
    Nowak S, Huiskes M (2010) New strategies for image annotation: overview of the photo annotation task at imageclef 2010. In: Proceedings of the working notes of CLEF 2010Google Scholar
  60. 60.
    Nowozin S, Lampert CH (2011) Structured learning and prediction in computer vision. Found Trends Comput Gr Vis 6(3–4):185–365MATHGoogle Scholar
  61. 61.
    Pan J-Y, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 653–658Google Scholar
  62. 62.
    Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008) SimpleMKL. JMLR 9:2491–2521MATHMathSciNetGoogle Scholar
  63. 63.
    Reshma IA, Ullah MZ, Aono M (2013) Kdevir at imageclef 2013 image annotation subtask. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013Google Scholar
  64. 64.
    Ritendra D, Joshi D, Li J, Wang J (2008) Image retrieval: ideas, influences, and trends of the new age. In: Proceedings of ACM computing surveysGoogle Scholar
  65. 65.
    Sahbi H (2013) Explicit context-aware kernel map learning for image annotation. In: Proceedings of the 9th international conference on computer vision systemsGoogle Scholar
  66. 66.
    Sahbi H, Audibert J, Keriven R (2007) Graph cut transducers for relevance feedback in content based image retrieval. In: Proceedings of the IEEE conference on computer visionGoogle Scholar
  67. 67.
    Sahbi H, Audibert J-Y, Keriven R (2011) Context-dependent kernels for object classification. In: Proceedings of pattern analysis and machine intelligence (PAMI), vol 4, issue 33Google Scholar
  68. 68.
    Sahbi H, Li X (2010) Context based support vector machines for interconnected image annotation (the Saburo Tsuji best regular paper award). In: Proceedings of the Asian conference on computer vision (ACCV)Google Scholar
  69. 69.
    Sánchez-Oro J, Montalvo S, Montemayor AS, Pantrigo JJ, Duarte A, Fresno V, Martınez R (2013) Urjc&uned at imageclef 2013 photo annotation task. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013Google Scholar
  70. 70.
    Semenovich D, Sowmya A (2010) Geometry aware local kernels for object recognition. In: Proceedings of ACCVGoogle Scholar
  71. 71.
    Shawe-Taylor J, Cristianini N (2000) Support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeGoogle Scholar
  72. 72.
    Singhal A, Jiebo L, Weiyu Z (2003) Probabilistic spatial context models for scene content understanding. In: Proceedings of CVPRGoogle Scholar
  73. 73.
    Srikanth M, Varner J, Bowden M, Moldovan D (2005) Exploiting ontologies for automatic image annotation. In: Proceedings of SIGIR, pp 552–558Google Scholar
  74. 74.
    Stone Z, Zickler T, Darrell T (2008) Auto-tagging facebook: social network context improves photo annotation. In: Proceedings of IVWGoogle Scholar
  75. 75.
    Taskar B, Chatalbashev V, Koller D, Guestrin C (2005) Learning structured prediction models: a large margin approach. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 896–903Google Scholar
  76. 76.
    Tong W, Jin R (2007) Semi-supervised learning by mixed label propagation. Proc Natl Conf Artif Intell 22(1):651MathSciNetGoogle Scholar
  77. 77.
    Torralba A, Murphy K, Freeman W (2007) Sharing visual features for multiclass and multiview object detection. In: Proceedings of IEEE transactions on pattern analysis and machine intelligence (PAMI) vol 25, issue 5Google Scholar
  78. 78.
    Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. In: Proceedings of journal of machine learning research, pp 1453–1484Google Scholar
  79. 79.
    Uricchio T, Bertini M, Ballan L, Del Bimbo A (2013) Micc-unifi at imageclef 2013 scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013Google Scholar
  80. 80.
    Vapnik VN (1998) Statistical learning theory. Wiley-Interscience Publication, New YorkGoogle Scholar
  81. 81.
    Vedaldi A, Zisserman A (2012) Efficient additive kernels via explicit feature maps. IEEE PAMI 34(3):480–492CrossRefGoogle Scholar
  82. 82.
    Villegas M, Paredes R, Thomee B (2013) Overview of the imageclef 2013 scalable concept image annotation subtask. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notesGoogle Scholar
  83. 83.
    Vo P, Sahbi H (2012) Transductive kernel map learning and its application to image annotation. In: Proceedings of the British machine vision conference (BMVC)Google Scholar
  84. 84.
    Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. In: Proceedings of ICCV, pp 257–264Google Scholar
  85. 85.
    Wang C, Jing F, Zhang L, Zhang H (2006) Image annotation refinement using random walk with restarts. In: Proceedings of ACM Multimedia, pp 647–650Google Scholar
  86. 86.
    Wang Y, Gong S (2007) Translating topics to words for image annotation. In: Proceedings of ACM CIKMGoogle Scholar
  87. 87.
    Wu L, Hua X-S, Yu N, Ma W-Y, Li S (2008) Flickr distance. In: Proceedings of the 16th ACM international conference on multimedia, ACM, pp 31–40Google Scholar
  88. 88.
    Wu L, Hua X-S, Yu N, Ma W-Y, Li S (2012) Flickr distance: a relationship measure for visual concepts. IEEE Trans Pattern Anal Mach Intell 34(5):863–875CrossRefGoogle Scholar
  89. 89.
    Yakhnenko O, Honavar V (2008) Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th international workshop on multimedia data mining: held in conjunction with the ACM SIGKDD, ACM, pp 1–7Google Scholar
  90. 90.
    Yang Y-H, Wu P-T, Lee C-W, Lin K-H, Hsu W, Chen H (2008) Contextseer: context search and recommendation at query time for shared consumer photos. In: Proceedings of ACM MultimediaGoogle Scholar
  91. 91.
    Zhang H, Berg AC, Maire M, Malik J (2006) Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of computer vision and pattern recognition, 2006 IEEE computer society conference, vol 2. IEEE, pp 2126–2136Google Scholar
  92. 92.
    Zhang J, Marszalek M, Lazebnik S, Schmid C (2006) Local features and kernels for classification of texture and object categories: a comprehensive study. In: Proceedings of the beyond patches workshop, in conjunction with CVPR2006Google Scholar
  93. 93.
    Zhou D, Bian J, Zheng S, Zha H, Giles CL (2008) Exploring social annotations for information retrieval. In: Proceedings of the 17th international conference on World Wide Web, ACM, pp 715–724Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.CNRS TELECOM ParisTechParisFrance

Personalised recommendations