Large Scale Metric Learning for Distance-Based Image Classification on Open Ended Data Sets

Mensink, Thomas; Verbeek, Jakob; Perronnin, Florent; Csurka, Gabriela

doi:10.1007/978-1-4471-5520-1_9

Large Scale Metric Learning for Distance-Based Image Classification on Open Ended Data Sets

Thomas Mensink⁶,
Jakob Verbeek⁶,
Florent Perronnin⁷ &
…
Gabriela Csurka⁷

Chapter

3098 Accesses
1 Citations

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

Many real-life large-scale datasets are open-ended and dynamic: new images are continuously added to existing classes, new classes appear over time, and the semantics of existing classes might evolve too. Therefore, we study large-scale image classification methods that can incorporate new classes and training images continuously over time at negligible cost. To this end, we consider two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers. Since the performance of distance-based classifiers heavily depends on the used distance function, we cast the problem into one of learning a low-rank metric, which is shared across all classes. For the NCM classifier, we introduce a new metric learning approach, and we also introduce an extension to allow for richer class representations.

Experiments on the ImageNet 2010 challenge dataset, which contains over one million training images of thousand classes, show that, surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier. Moreover, the NCM performance is comparable to that of linear SVMs which obtain current state-of-the-art performance. Experimentally we study the generalization performance to classes that were not used to learn the metrics. Using a metric learned on 1,000 classes, we show results for the ImageNet-10K dataset which contains 10,000 classes, and obtain performance that is competitive with the current state-of-the-art, while being orders of magnitude faster.

This chapter is largely based on previously published work [30, 31].

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Strictly speaking the covariance matrix is not properly defined as the low-rank matrix W ^⊤ W is non-invertible.
2.
See http://www.image-net.org/challenges/LSVRC/2010/index.

References

Bai B, Weston J, Grangier D, Collobert R, Qi Y, Sadamasa K, Chapelle O, Weinberger K (2010) Learning to rank with (a lot of) word features. Inf Retr 13(3):291–314 (special issue on learning to rank)
Article Google Scholar
Bengio S, Weston J, Grangier D (2011) Label embedding trees for large multi-class tasks. In: NIPS
Google Scholar
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: COMPSTAT
Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book MATH Google Scholar
Chai J, Liua H, Chenb B, Baoa Z (2010) Large margin nearest local mean classifier. Signal Process 90(1):236–248
Article MATH Google Scholar
Checkik G, Sharma V, Shalit U, Bengio S (2010) Large scale online learning of image similarity through ranking. J Mach Learn Res 11:1109–1135
MathSciNet Google Scholar
Clinchant S, Csurka G, Perronnin F, Renders J-M (2007) XRCE’s participation to ImagEval. In: ImageEval workshop at CVIR
Google Scholar
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV int workshop on stat learning in computer vision
Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR
Google Scholar
Deng J, Berg A, Li K, Fei-Fei L (2010) What does classifying more than 10,000 image categories tell us? In: ECCV
Google Scholar
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611
Article Google Scholar
Gao T, Koller D (2011) Discriminative learning of relaxed hierarchy for large-scale visual recognition. In: ICCV
Google Scholar
Gauvain J-L, Lee C-H (1994) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 2(2):291–298
Article Google Scholar
Globerson A, Roweis S (2006) Metric learning by collapsing classes. In: NIPS
Google Scholar
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood component analysis. In: NIPS
Google Scholar
Gordo A, Rodríguez J, Perronnin F, Valveny E (2012) Leveraging category-level labels for instance-level image retrieval. In: CVPR
Google Scholar
Gray R, Neuhoff D (1998) Quantization. IEEE Trans Inf Theory 44(6):2325–2383
Article MathSciNet MATH Google Scholar
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV
Google Scholar
Guillaumin M, Verbeek J, Schmid C (2009) Is that you? Metric learning approaches for face identification. In: ICCV
Google Scholar
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: ECCV
Google Scholar
Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Article Google Scholar
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716
Article Google Scholar
Köstinger M, Hirzer M, Wohlhart P, Roth P, Bischof H (2012) Large scale metric learning from equivalence constraints. In: CVPR
Google Scholar
Lampert C, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: CVPR
Google Scholar
Larochelle H, Erhan D, Bengio Y (2008) Zero-data learning of new tasks. In: AAAI conference on artificial intelligence
Google Scholar
Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: ICML
Google Scholar
Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T (2011) Large-scale image classification: fast feature extraction and SVM training. In: CVPR
Google Scholar
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Lucchi A, Weston J (2012) Joint image and word sense discrimination for image retrieval. In: ECCV
Google Scholar
Mensink T, Verbeek J, Perronnin F, Csurka G (2012) Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: ECCV
Google Scholar
Mensink T, Verbeek J, Perronnin F, Csurka G (2013) Distance-based image classification: generalizing to new classes at near-zero cost. IEEE Trans Pattern Anal Mach Intell (to appear)
Google Scholar
Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: CVPR
Google Scholar
Nowak E, Jurie F (2007) Learning visual similarity measures for comparing never seen objects. In: CVPR
Google Scholar
Parameswaran S, Weinberger KQ (2010) Large margin multi-task metric learning. In: NIPS
Google Scholar
Perronnin F, Sánchez J, Mensink T (2010) Improving the Fisher kernel for large-scale image classification. In: ECCV
Google Scholar
Perronnin F, Akata Z, Harchaoui Z, Schmid C (2012) Towards good practice in large-scale learning for image classification. In: CVPR
Google Scholar
Rohrbach M, Stark M, Schiele B (2011) Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR
Google Scholar
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: ECCV
Google Scholar
Sánchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: CVPR
Google Scholar
Tommasi T, Caputo B (2009) The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: BMVC
Google Scholar
Veenman C, Tax D (2005) LESS: a model-based classifier for sparse subspaces. IEEE Trans Pattern Anal Mach Intell 27(9):1496–1500
Article Google Scholar
Webb AR (2002) Statistical pattern recognition. Wiley, New York
Book MATH Google Scholar
Weinberger KQ, Chapelle O (2009) Large margin taxonomy embedding for document categorization. In: NIPS
Google Scholar
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
MATH Google Scholar
Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification. In: NIPS
Google Scholar
Weston J, Bengio S, Usunier N (2011) WSABIE: scaling up to large vocabulary image annotation. In: IJCAI
Google Scholar
Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238
Article Google Scholar
Zhou X, Zhang X, Yan Z, Chang S-F, Hasegawa-Johnson M, Huang T (2008) Sift-bag kernel for video event analysis. In: ACM multimedia
Google Scholar

Download references

Author information

Authors and Affiliations

LEAR Team – INRIA Grenoble, 655 Avenue de l’Europe, 38330, Montbonnot, France
Thomas Mensink & Jakob Verbeek
Xerox Research Centre Europe, 6 chemin de Maupertuis, 38240, Meylan, France
Florent Perronnin & Gabriela Csurka

Authors

Thomas Mensink
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Verbeek
View author publications
You can also search for this author in PubMed Google Scholar
Florent Perronnin
View author publications
You can also search for this author in PubMed Google Scholar
Gabriela Csurka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Mensink .

Editor information

Editors and Affiliations

Dipartimento di Matematica e Informatica, Università di Catania, Catania, Italy
Giovanni Maria Farinella
Dipartimento di Matematica e Informatica, Università di Catania, Catania, Italy
Sebastiano Battiato
Department of Engineering, University of Cambridge, Cambridge, United Kingdom
Roberto Cipolla

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mensink, T., Verbeek, J., Perronnin, F., Csurka, G. (2013). Large Scale Metric Learning for Distance-Based Image Classification on Open Ended Data Sets. In: Farinella, G., Battiato, S., Cipolla, R. (eds) Advanced Topics in Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5520-1_9

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5520-1_9
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5519-5
Online ISBN: 978-1-4471-5520-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics