Large Scale Metric Learning for Distance-Based Image Classification on Open Ended Data Sets

  • Thomas MensinkEmail author
  • Jakob Verbeek
  • Florent Perronnin
  • Gabriela Csurka
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


Many real-life large-scale datasets are open-ended and dynamic: new images are continuously added to existing classes, new classes appear over time, and the semantics of existing classes might evolve too. Therefore, we study large-scale image classification methods that can incorporate new classes and training images continuously over time at negligible cost. To this end, we consider two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers. Since the performance of distance-based classifiers heavily depends on the used distance function, we cast the problem into one of learning a low-rank metric, which is shared across all classes. For the NCM classifier, we introduce a new metric learning approach, and we also introduce an extension to allow for richer class representations.

Experiments on the ImageNet 2010 challenge dataset, which contains over one million training images of thousand classes, show that, surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier. Moreover, the NCM performance is comparable to that of linear SVMs which obtain current state-of-the-art performance. Experimentally we study the generalization performance to classes that were not used to learn the metrics. Using a metric learned on 1,000 classes, we show results for the ImageNet-10K dataset which contains 10,000 classes, and obtain performance that is competitive with the current state-of-the-art, while being orders of magnitude faster.


Image Retrieval Training Image Query Image Fisher Discriminant Analysis Target Neighbor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bai B, Weston J, Grangier D, Collobert R, Qi Y, Sadamasa K, Chapelle O, Weinberger K (2010) Learning to rank with (a lot of) word features. Inf Retr 13(3):291–314 (special issue on learning to rank) CrossRefGoogle Scholar
  2. 2.
    Bengio S, Weston J, Grangier D (2011) Label embedding trees for large multi-class tasks. In: NIPS Google Scholar
  3. 3.
    Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: COMPSTAT Google Scholar
  4. 4.
    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge CrossRefzbMATHGoogle Scholar
  5. 5.
    Chai J, Liua H, Chenb B, Baoa Z (2010) Large margin nearest local mean classifier. Signal Process 90(1):236–248 CrossRefzbMATHGoogle Scholar
  6. 6.
    Checkik G, Sharma V, Shalit U, Bengio S (2010) Large scale online learning of image similarity through ranking. J Mach Learn Res 11:1109–1135 MathSciNetGoogle Scholar
  7. 7.
    Clinchant S, Csurka G, Perronnin F, Renders J-M (2007) XRCE’s participation to ImagEval. In: ImageEval workshop at CVIR Google Scholar
  8. 8.
    Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV int workshop on stat learning in computer vision Google Scholar
  9. 9.
    Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR Google Scholar
  10. 10.
    Deng J, Berg A, Li K, Fei-Fei L (2010) What does classifying more than 10,000 image categories tell us? In: ECCV Google Scholar
  11. 11.
    Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611 CrossRefGoogle Scholar
  12. 12.
    Gao T, Koller D (2011) Discriminative learning of relaxed hierarchy for large-scale visual recognition. In: ICCV Google Scholar
  13. 13.
    Gauvain J-L, Lee C-H (1994) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 2(2):291–298 CrossRefGoogle Scholar
  14. 14.
    Globerson A, Roweis S (2006) Metric learning by collapsing classes. In: NIPS Google Scholar
  15. 15.
    Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood component analysis. In: NIPS Google Scholar
  16. 16.
    Gordo A, Rodríguez J, Perronnin F, Valveny E (2012) Leveraging category-level labels for instance-level image retrieval. In: CVPR Google Scholar
  17. 17.
    Gray R, Neuhoff D (1998) Quantization. IEEE Trans Inf Theory 44(6):2325–2383 MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV Google Scholar
  19. 19.
    Guillaumin M, Verbeek J, Schmid C (2009) Is that you? Metric learning approaches for face identification. In: ICCV Google Scholar
  20. 20.
    Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: ECCV Google Scholar
  21. 21.
    Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128 CrossRefGoogle Scholar
  22. 22.
    Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716 CrossRefGoogle Scholar
  23. 23.
    Köstinger M, Hirzer M, Wohlhart P, Roth P, Bischof H (2012) Large scale metric learning from equivalence constraints. In: CVPR Google Scholar
  24. 24.
    Lampert C, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: CVPR Google Scholar
  25. 25.
    Larochelle H, Erhan D, Bengio Y (2008) Zero-data learning of new tasks. In: AAAI conference on artificial intelligence Google Scholar
  26. 26.
    Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: ICML Google Scholar
  27. 27.
    Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T (2011) Large-scale image classification: fast feature extraction and SVM training. In: CVPR Google Scholar
  28. 28.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110 CrossRefGoogle Scholar
  29. 29.
    Lucchi A, Weston J (2012) Joint image and word sense discrimination for image retrieval. In: ECCV Google Scholar
  30. 30.
    Mensink T, Verbeek J, Perronnin F, Csurka G (2012) Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: ECCV Google Scholar
  31. 31.
    Mensink T, Verbeek J, Perronnin F, Csurka G (2013) Distance-based image classification: generalizing to new classes at near-zero cost. IEEE Trans Pattern Anal Mach Intell (to appear) Google Scholar
  32. 32.
    Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: CVPR Google Scholar
  33. 33.
    Nowak E, Jurie F (2007) Learning visual similarity measures for comparing never seen objects. In: CVPR Google Scholar
  34. 34.
    Parameswaran S, Weinberger KQ (2010) Large margin multi-task metric learning. In: NIPS Google Scholar
  35. 35.
    Perronnin F, Sánchez J, Mensink T (2010) Improving the Fisher kernel for large-scale image classification. In: ECCV Google Scholar
  36. 36.
    Perronnin F, Akata Z, Harchaoui Z, Schmid C (2012) Towards good practice in large-scale learning for image classification. In: CVPR Google Scholar
  37. 37.
    Rohrbach M, Stark M, Schiele B (2011) Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR Google Scholar
  38. 38.
    Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: ECCV Google Scholar
  39. 39.
    Sánchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: CVPR Google Scholar
  40. 40.
    Tommasi T, Caputo B (2009) The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: BMVC Google Scholar
  41. 41.
    Veenman C, Tax D (2005) LESS: a model-based classifier for sparse subspaces. IEEE Trans Pattern Anal Mach Intell 27(9):1496–1500 CrossRefGoogle Scholar
  42. 42.
    Webb AR (2002) Statistical pattern recognition. Wiley, New York CrossRefzbMATHGoogle Scholar
  43. 43.
    Weinberger KQ, Chapelle O (2009) Large margin taxonomy embedding for document categorization. In: NIPS Google Scholar
  44. 44.
    Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244 zbMATHGoogle Scholar
  45. 45.
    Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification. In: NIPS Google Scholar
  46. 46.
    Weston J, Bengio S, Usunier N (2011) WSABIE: scaling up to large vocabulary image annotation. In: IJCAI Google Scholar
  47. 47.
    Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238 CrossRefGoogle Scholar
  48. 48.
    Zhou X, Zhang X, Yan Z, Chang S-F, Hasegawa-Johnson M, Huang T (2008) Sift-bag kernel for video event analysis. In: ACM multimedia Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Thomas Mensink
    • 1
    Email author
  • Jakob Verbeek
    • 1
  • Florent Perronnin
    • 2
  • Gabriela Csurka
    • 2
  1. 1.LEAR Team – INRIA GrenobleMontbonnotFrance
  2. 2.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations