Learning Distance Functions for Automatic Annotation of Images

  • Josip Krapac
  • Frédéric Jurie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4918)


This paper gives an overview of recent approaches towards image representation and image similarity computation for content-based image retrieval and automatic image annotation (category tagging). Additionaly, a new similarity function between an image and an object class is proposed. This similarity function combines various aspects of object class appearance through use of representative images of the class. Similarity to a representative image is determined by weighting local image similarities, where weights are learned from training image pairs, labeled “same” and “different”, using linear SVM. The proposed approach is validated on a challenging dataset where it performed favorably.


Visual Word Object Class Query Image Equal Error Rate Focal Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)CrossRefGoogle Scholar
  2. 2.
    Berg, A.C., Malik, J.: Geometric blur for template matching. In: CVPR, vol. 1, pp. 607–614 (2001)Google Scholar
  3. 3.
    Chang, C., Lin, C.: LIBSVM: A library for support vector machines (2001),
  4. 4.
    Duin, R.P.W.: The combining classifier: To train or not to train? In: ICPR (2002)Google Scholar
  5. 5.
    Fritz, G., Seifert, C., Paletta, L.: A mobile vision system for urban detection with informative local descriptors. In: Computer Vision Systems (2006)Google Scholar
  6. 6.
    Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: NIPS, pp. 417–424. MIT Press, Cambridge, MA (2007)Google Scholar
  7. 7.
    Gudivada, V.N., Raghavan, V.V.: Content-based image retrieval-systems. Computer 28(9), 18–22 (1995)CrossRefGoogle Scholar
  8. 8.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Proc. of Fourth Alvey Vision Conf., pp. 147–151 (1988)Google Scholar
  9. 9.
    Jain, A.K., Vailaya, A.: Image retrieval using color and shape. Pattern Recognition (1996)Google Scholar
  10. 10.
    Johansson, B., Cipolla, R.: A system for automatic pose-estimation from a single image in a city scene. In: Int. Conf. Signal Proc. Pattern Rec. and Analysis (2002)Google Scholar
  11. 11.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: International Conference on Computer Vision (2005)Google Scholar
  12. 12.
    Kadir, T., Brady, M.: Saliency, scale and image description. International Journal of Computer Vision V45(2), 83–105 (2001)CrossRefGoogle Scholar
  13. 13.
    Ke, Y., Sukthankar, R.: Pca-sift: a more distinctive representation for local image descriptors. In: CVPR 2004, pp. II: 506–513 (2004)Google Scholar
  14. 14.
    Leung, T., Malik, J.: Recognizing surfaces using three-dimensional textons. In: ICCV, vol. 2, pp. 1010–1017. IEEE, Los Alamitos, CA (1999)Google Scholar
  15. 15.
    Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vision 30(2), 79–116 (1998)CrossRefGoogle Scholar
  16. 16.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  17. 17.
    Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)CrossRefGoogle Scholar
  18. 18.
    Mikolajczyk, K., Leibe, B., Schiele, B.: Local features for object class recognition. In: ICCV, vol. 2, pp. 1792–1799. IEEE, Los Alamitos, CA (2005)Google Scholar
  19. 19.
    Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)CrossRefGoogle Scholar
  20. 20.
    Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: NIPS, pp. 985–992 (2007)Google Scholar
  21. 21.
    Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: ICML, pp. 625–632. ACM, New York, NY, USA (2005)Google Scholar
  22. 22.
    Nistér, D., Naroditsky, O., Bergen, J.: Visual odometry for ground vehicle applications. Journal of Field Robotics 23(1), 3–20 (2006)zbMATHCrossRefGoogle Scholar
  23. 23.
    Nowak, E., Jurie, F.: Learning visual similarity measures for comparing never seen objects. In: CVPR. IEEE, Los Alamitos, CA (2007)Google Scholar
  24. 24.
    Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: European Conference on Computer Vision. Springer, Heidelberg (2006)Google Scholar
  25. 25.
    Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: BMVA 2002, vol. 1, pp. 113–122 (2002)Google Scholar
  26. 26.
    Opelt, A., Pinz, A., Fussenegger, M., Auer, P.: Generic object recognition with boosting. IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 416–431 (2006)CrossRefGoogle Scholar
  27. 27.
    Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. Technical report, Microsoft Research (1999)Google Scholar
  28. 28.
    Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5), 530–535 (1997)CrossRefGoogle Scholar
  29. 29.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV 2003, pp. 1470–1477 (2003)Google Scholar
  30. 30.
    Steger, C.: An unbiased detector of curvilinear structures. IEEE Trans. Pattern Anal. Mach. Intell. 20(2), 113–125 (1998)MathSciNetCrossRefGoogle Scholar
  31. 31.
    van de Weijer, J., Schmid, C., Verbeek, J.: Learning color names from real-world images. In: CVPR (June 2007)Google Scholar
  32. 32.
    Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference, vol. 2, pp. 1800–1807 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Josip Krapac
    • 1
  • Frédéric Jurie
    • 1
  1. 1.INRIA Rhône-AlpesSaint Ismier CedexFrance

Personalised recommendations