Multimedia Tools and Applications

, Volume 75, Issue 4, pp 2203–2231 | Cite as

Learning multi-task local metrics for image annotation

  • Xing XuEmail author
  • Atsushi Shimada
  • Hajime Nagahara
  • Rin-ichiro Taniguchi


The goal of image annotation is to automatically assign a set of textual labels to an image to describe the visual contents thereof. Recently, with the rapid increase in the number of web images, nearest neighbor (NN) based methods have become more attractive and have shown exciting results for image annotation. One of the key challenges of these methods is to define an appropriate similarity measure between images for neighbor selection. Several distance metric learning (DML) algorithms derived from traditional image classification problems have been applied to annotation tasks. However, a fundamental limitation of applying DML to image annotation is that it learns a single global distance metric over the entire image collection and measures the distance between image pairs in the image-level. For multi-label annotation problems, it may be more reasonable to measure similarity of image pairs in the label-level. In this paper, we develop a novel label prediction scheme utilizing multiple label-specific local metrics for label-level similarity measure, and propose two different local metric learning methods in a multi-task learning (MTL) framework. Extensive experimental results on two challenging annotation datasets demonstrate that 1) utilizing multiple local distance metrics to learn label-level distances is superior to using a single global metric in label prediction, and 2) the proposed methods using the MTL framework to learn multiple local metrics simultaneously can model the commonalities of labels, thereby facilitating label prediction results to achieve state-of-the-art annotation performance.


Image annotation Label prediction Metric learning Local metric Multi-task learning 


  1. 1.
    Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6:1817–1853MathSciNetzbMATHGoogle Scholar
  2. 2.
    Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272CrossRefGoogle Scholar
  3. 3.
    Binder A, Samek W, Müller KR, Kawanabe M (2013) Enhanced representation and multi-task learning for image annotation. Comp Vision Image Underst 117(5):466–478CrossRefGoogle Scholar
  4. 4.
    Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, SIGIR ’03, pp 127–134Google Scholar
  5. 5.
    Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410CrossRefGoogle Scholar
  6. 6.
    Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75CrossRefMathSciNetGoogle Scholar
  7. 7.
    Chang C C, Lin C J (2011) Libsvm: A library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27CrossRefGoogle Scholar
  8. 8.
    Chen X, Mu Y, Yan S, Chua TS (2010) Efficient large-scale image annotation by probabilistic collaborative multi-label propagation. In: Proceedings of the international conference on multimedia, MM ’10, pp 35–44Google Scholar
  9. 9.
    Chen X, Yuan X, Yan S, Tang J, Rui Y, Chua TS (2011) Towards multi-semantic image annotation with graph regularized exclusive group lasso. In: Proceedings of the 19th ACM international conference on multimedia, MM ’11, pp 263–272Google Scholar
  10. 10.
    Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM international conference on image and video retrieval, CIVR ’09, pp 48:1–48:9Google Scholar
  11. 11.
    Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6:615–637MathSciNetzbMATHGoogle Scholar
  12. 12.
    Feng S, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 1002–1009Google Scholar
  13. 13.
    Feng Z, Jin R, Jain A (2013) Large-scale image annotation by efficient and robust kernel metric learning. In: IEEE international conference on computer vision (ICCV), pp 3490–3497Google Scholar
  14. 14.
    Guillaumin M, Mensink T, Verbeek J, Schmid C (2009a) Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: IEEE 12th international conference on computer vision (ICCV), pp 309–316Google Scholar
  15. 15.
    Guillaumin M, Verbeek J, Schmid C (2009b) Is that you? metric learning approaches for face identification. In: International conference on computer vision, pp 498–505Google Scholar
  16. 16.
    Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128CrossRefGoogle Scholar
  17. 17.
    Koen EA, van de Sande TG, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596CrossRefGoogle Scholar
  18. 18.
    Li X, Snoek CGM, Worring M (2009) Annotating images by harnessing worldwide user-tagged photos. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3717–3720Google Scholar
  19. 19.
    Lin Z, Ding G, Hu M (2013) Multi-source image auto-annotation. In: IEEE international conference on image processing (ICIP), pp 2567–2571Google Scholar
  20. 20.
    Lin Z, Ding G, Hu M (2014) Image auto-annotation via tag-dependent random search over range-constrained visual neighbours. Multimedia Tools Appl:1–26Google Scholar
  21. 21.
    Liu Y, Jin R (2009) Distance metric learning: a comprehensive survey. Research report, Michigan State UniversityGoogle Scholar
  22. 22.
    Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the 10th European Conference on Computer Vision, ECCV ’08, pp 316–329Google Scholar
  23. 23.
    Mensink T, Verbeek J, Perronnin F, Csurka G (2012) Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In: European conference on computer vision, ECCV’12, pp 488–501Google Scholar
  24. 24.
    Nagel K, Nowak S, Kühhirt U, Wolter K (2011) The fraunhofer idmt at imageclef 2011 photo annotation task. In: CLEF (Notebook Papers/Labs/Workshop)Google Scholar
  25. 25.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefzbMATHGoogle Scholar
  26. 26.
    Parameswaran S, Weinberger K (2010) Large margin multi-task metric learning. In: Advances in Neural Information Processing Systems, pp 1867–1875Google Scholar
  27. 27.
    Putthividhya D, Attias HT, Nagarajan SS (2010) Topic regression multi-modal latent dirichlet allocation for image annotation. In: International conference on computer vision and pattern recognition, pp 3408–3415Google Scholar
  28. 28.
    Su Y, Jurie F (2011) Semantic contexts and fisher vectors for the imageclef 2011 photo annotation task. In: CLEF (Notebook Papers/Labs/Workshop)Google Scholar
  29. 29.
    Ushiku Y, Muraoka H, Inaba S, Fujisawa T, Yasumoto K, Gunji N, Higuchi T, Hara Y, Harada T, Kuniyoshi Y (2012) Isi at imageclef 2012: Scalable system for image annotation. In: CLEF (Online Working Notes/Labs/Workshop)Google Scholar
  30. 30.
    van de Sande KEA, Snoek CGM (2011) The university of amsterdam’s concept detection system at imageclef 2011. In: CLEF (Notebook Papers/Labs/Workshop)Google Scholar
  31. 31.
    Verbeek J, Guillaumin M, Mensink T, Schmid C (2010) Image annotation with tagprop on the mirflickr set. In: Proceedings of the international conference on multimedia information retrieval, MIR ’10, pp 537–546Google Scholar
  32. 32.
    Verma Y, Jawahar C (2012) Image annotation using metric learning in semantic neighbourhoods. Eur Conf Comput Vis 7574:836–849Google Scholar
  33. 33.
    Verma Y, Jawahar CV (2013) Exploring svm for image annotation in presence of confusing labels. In: BMVC, pp 25.1–25.11Google Scholar
  34. 34.
    Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244zbMATHGoogle Scholar
  35. 35.
    Wu L, Hoi SC, Jin R, Zhu J, Yu N (2011) Distance metric learning from uncertain side information for automated photo tagging. ACM Trans Intell Syst Technol 2(2):1–28CrossRefGoogle Scholar
  36. 36.
    Xiang Y, Zhou X, Chua TS, Ngo CW (2009) A revisit of generative model for automatic image annotation using markov random fields. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1153–1160Google Scholar
  37. 37.
    Xu X, Shimada A, Ri T (2013) Image annotation by learning label-specific distance metrics. In: Internation conference on image analysis and processing, vol 8156, pp 101–110Google Scholar
  38. 38.
    Yang Y, Ma Z, Hauptmann A, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimed 15(3):661–669CrossRefGoogle Scholar
  39. 39.
    Yuan XT, Liu X, Yan S (2012) Visual classification with multitask joint sparse representation. IEEE Trans Image Process 21(10):4349–4360CrossRefMathSciNetGoogle Scholar
  40. 40.
    Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas D (2010) Automatic image annotation using group sparsity. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3312–3319Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Xing Xu
    • 1
    Email author
  • Atsushi Shimada
    • 1
  • Hajime Nagahara
    • 1
  • Rin-ichiro Taniguchi
    • 1
  1. 1.Department of Advanced Information and TechnologyKyushu UniversityFukuokaJapan

Personalised recommendations