Computational Statistics

, Volume 29, Issue 3–4, pp 515–528 | Cite as

Sparse distance metric learning

  • Tze Choy
  • Nicolai MeinshausenEmail author
Original Paper


Nearest neighbour classification requires a good distance metric. Previous approaches try to learn a quadratic distance metric learning so that observations of different classes are well separated. For high-dimensional problems, where many uninformative variables are present, it is attractive to select a sparse distance metric, both to increase predictive accuracy but also to aid interpretation of the result. We investigate the \(\ell 1\)-regularized metric learning problem, making a connection with the Lasso algorithm in the linear least squared settings. We show that the fitted transformation matrix is close to the desired transformation matrix in \(\ell 1\)-norm by assuming a version of the compatibility condition.


Sparse recovery Multiclass Lasso High-dimensional Consistency 


  1. Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Nashua, New HampshirezbMATHGoogle Scholar
  2. Bian W, Tao D (2011) Learning a distance metric by empirical loss minimization. In: Proceedings of the twenty-second international joint conference on artificial intelligence—vol 2, IJCAI’11. Association for the Advancement of Artificial Intelligence Press, pp 1186–1191Google Scholar
  3. Bian W, Tao D (2012) Constrained empirical risk minimization framework for distance metric learning. IEEE Trans Neural Netw Learn Syst 23(8):1194–1205CrossRefGoogle Scholar
  4. Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of Lasso and Dantzig selector. Ann Stat 37:1705–1732CrossRefzbMATHMathSciNetGoogle Scholar
  5. Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefzbMATHGoogle Scholar
  6. Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data. Springer, BerlinCrossRefzbMATHGoogle Scholar
  7. Frank A, Asuncion A (2010) UCI machine learning repository.
  8. Friedman JH, Hastie T, Tibshirani R (2009) December). Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22Google Scholar
  9. Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004). Neighborhood component analysis. In: Advances in neural information processing systems 17. MIT Press, Cambridge, pp 513–520Google Scholar
  10. Hix S, Noury A, Roland G (2006) Dimensions of politics in the European Parliament. Am J Polit Sci 50:494–511CrossRefGoogle Scholar
  11. Negahban S, Wainwright MJ (2011) Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann Stat 89:1069–1097CrossRefMathSciNetGoogle Scholar
  12. Soifer A, Grünbaum B, Johnson P, Rousseau C (2008) The mathematical coloring book: mathematics of coloring and the colorful life of its creators. Springer, New YorkGoogle Scholar
  13. Van De Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the Lasso. Electron J Stat 3:1360–1392CrossRefzbMATHMathSciNetGoogle Scholar
  14. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244zbMATHGoogle Scholar
  15. Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems 15. MIT Press, Cambridge, pp 505–512Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of OxfordOxfordUK

Personalised recommendations