Efficiently Learning the Metric with Side-Information

  • Tijl De Bie
  • Michinari Momma
  • Nello Cristianini
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2842)


A crucial problem in machine learning is to choose an appropriate representation of data, in a way that emphasizes the relations we are interested in. In many cases this amounts to finding a suitable metric in the data space. In the supervised case, Linear Discriminant Analysis (LDA) can be used to find an appropriate subspace in which the data structure is apparent. Other ways to learn a suitable metric are found in [6] and [11]. However recently significant attention has been devoted to the problem of learning a metric in the semi-supervised case. In particular the work by Xing et al. [15] has demonstrated how semi-definite programming (SDP) can be used to directly learn a distance measure that satisfies constraints in the form of side-information. They obtain a significant increase in clustering performance with the new representation. The approach is very interesting, however, the computational complexity of the method severely limits its applicability to real machine learning tasks. In this paper we present an alternative solution for dealing with the problem of incorporating side-information. This side-information specifies pairs of examples belonging to the same class. The approach is based on LDA, and is solved by the efficient eigenproblem. The performance reached is very similar, but the complexity is only O(d 3) instead of O(d 6) where d is the dimensionality of the data. We also show how our method can be extended to deal with more general types of side-information.


Canonical Correlation Analysis Neural Information Processing System Generalize Eigenvalue Problem Rayleigh Quotient Generalize Eigenvector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bach, F.R., Jordan, M.I.: Kernel independent component analysis. Journal of Machine Learning Research 3, 1–48 (2002)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Barker, M., Rayens, W.S.: Partial least squares for discrimination. Journal of Chemometrics 17, 166–173 (2003)CrossRefGoogle Scholar
  3. 3.
    Bartlett, M.S.: Further aspects of the theory of multiple regression. Proc. Camb. Philos. Soc. 34, 33–40 (1938)CrossRefGoogle Scholar
  4. 4.
    Borga, M., Landelius, T., Knutsson, H.: A Unified Approach to PCA, PLS, MLR and CCA. Report LiTH-ISY-R-1992, ISY, SE-581 83 Linköping, Sweden (November 1997)Google Scholar
  5. 5.
    Bradley, P., Bennett, K., Demiriz, A.: Constrained K-means clustering. Technical Report MSR-TR-2000-65, Microsoft Research (2000)Google Scholar
  6. 6.
    Cristianini, N., Shawe-Taylor, J., Elisseeff, A., Kandola, J.: On kernel-target alignment. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, MIT Press, Cambridge (2002)Google Scholar
  7. 7.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, Inc., Chichester (2000)Google Scholar
  8. 8.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7(Part II), 179–188 (1936)Google Scholar
  9. 9.
    Hofmann, T.: What people don’t want. In: European Conference on Machine Learning, ECML (2002)Google Scholar
  10. 10.
    Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)zbMATHGoogle Scholar
  11. 11.
    Lanckriet, G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. Technical Report CSD-02-1206, Division of Computer Science, University of California, Berkeley (2002)Google Scholar
  12. 12.
    Rosipal, R., Trejo, L.J., Matthews, B.: Kernel PLS-SVC for linear and nonlinear classification. In: Proceedings of the Twentieth International Conference on Machine Learning (2003) (to appear)Google Scholar
  13. 13.
    Vert, J.-P., Kanehisa, M.: Graph-driven features extraction from microarray data using diffusion kernels and cca. In: Advances in Neural Information Processing Systems 15, MIT Press, Cambridge (2003)Google Scholar
  14. 14.
    Vinokourov, N.C., Shawe-Taylor, J.: Inferring a semantic representation of text via cross-language correlation analysis. In: Advances in Neural Information Processing Systems 15, MIT Press, Cambridge (2003)Google Scholar
  15. 15.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems 15, MIT Press, Cambridge (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Tijl De Bie
    • 1
  • Michinari Momma
    • 2
  • Nello Cristianini
    • 3
  1. 1.Department of Electrical Engineering ESAT-SCDKatholieke Universiteit LeuvenLeuvenBelgium
  2. 2.Department of Decision Sciences and Engineering SystemsRensselaer Polytechnic InstituteTroyUSA
  3. 3.Department of StatisticsUniversity of California, DavisDavisUSA

Personalised recommendations