On Kernel Target Alignment

  • Nello Cristianini
  • Jaz Kandola
  • Andre Elisseeff
  • John Shawe-Taylor
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 194)


Kernel based methods are increasingly being used for data modeling because of their conceptual simplicity and outstanding performance on many tasks. However, in practice the kernel function is often chosen using trial-and-error heuristics. In this paper we address the problem of measuring the degree of agreement between a kernel and a learning task. We propose a quantity to capture this notion, which we call alignment. We study its theoretical properties, and derive a series of simple algorithms for adapting a kernel to the targets. This produces a series of novel methods for both transductive and inductive inference, kernel combination and kernel selection for both classification and regression problems that are computationally feasible for large problems. The algorithms are tested on publicly available datasets and are shown to exhibit good performance.


Ridge Regression Kernel Matrix True Alignment Breast Cancer Data Breast Cancer Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    F. Bach and M.I. Jordan. Kernel independent components analysis. Department of computer science, University of California, Berkeley, Berkeley, CA, 2001.Google Scholar
  2. [2]
    S. Ben-David, N. Eiron, and H.U. Simon. Non-embedability in Euclidean half spaces. In NIPS Workshop on New Perspectives in Kernel Methods, 2000.Google Scholar
  3. [3]
    K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.Google Scholar
  4. [4]
    M.W. Berry, B. Hendrickson, and P. Raghaven. Sparse matrix re-ordering schemes for browsing hypertext. In J. Renegar, M. Shub, and S. Smale, editors, The Mathematics of Numerical Analysis, pages 91–123. American Mathematical Society, 1996.Google Scholar
  5. [5]
    A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In International Conference on Machine Learning (ICML) 2001, 2001.Google Scholar
  6. [6]
    B. Carl and I. Stephani. Entropy, compactness and the approximation of operators. Cambridge University Press, Cambridge, UK, 1990.Google Scholar
  7. [7]
    N. Cristianini,, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. In International Conference on Machine Learning (ICML 2000), 2000.Google Scholar
  8. [8]
    N. Cristianini, A. Elisseeff, J. Shawe-Taylor, and J. Kandola. On kernel target alignment. In Neural Information Processing Systems 14 (NIPS 14), 2001.Google Scholar
  9. [9]
    N. Cristianini and J. Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.Google Scholar
  10. [10]
    L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of mathematics. Springer, New York, 1996.Google Scholar
  11. [11]
    P. Drineas, R. Kannan, S. Vampala, A. Frieze, and V. Vinay. Clustering in large graphs and matrices. In Proceedings of the 10th ACM-SIAM Symposium on Discrete Algorithms, 1999.Google Scholar
  12. [12]
    J. Forster, N. Schmitt, and H. U. Simon. Estimating the optimal margins of embeddings in euclidean half spaces. In Submitted Computational Learning Theory (COLT) 2001, 2001.Google Scholar
  13. [13]
    R. Herbrich. Learning Kernel Classifiers-Theory and Algorithms. MIT Press, 2002.Google Scholar
  14. [14]
    N. Lanckriet, N. Cristianini, P. Bartlett, L. El-Ghoui, and M.I Jordan. Learning the kernel matrix using semi-definite programming. In International Conference on Machine Learning (ICML 2002), 2002.Google Scholar
  15. [15]
    C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 148–188, 1989.Google Scholar
  16. [16]
    P. Pavlidas, J. Weston, J. Cai, and W.N. Grundy. Gene functional classification from heterogeneous data. In Proceedings of the Fifth International Conference on Computational Molecular Biology, 2001.Google Scholar
  17. [17]
    A. Pothen, H. Simon, and K. Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal Matrix Analysis, 11(3):242–248, 1990.MathSciNetGoogle Scholar
  18. [18]
    S. Saitoh. Theory of Reproducing Kernels and its Applications. Longman Scientific & Technical, Harlow, England, 1988.Google Scholar
  19. [19]
    B. Scholkopf and A. Smola. Learning With Kernels-Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, 2002.Google Scholar
  20. [20]
    B. Scholkopf, A. Smola, and K. Muller. Kernel principal components analysis. In Advances in Kernel Methods-Support Vector Learning. MIT, 2000.Google Scholar
  21. [21]
    J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.MathSciNetCrossRefGoogle Scholar
  22. [22]
    J. Shi and J. Malik. Normalised cuts and image segmentation. In In IEEE Conf. on Computer Vision and Pattern Recognition, pages 731–737, 1997.Google Scholar
  23. [23]
    A. Smola and B. Scholkopf. Sparse greedy matrix approximation for machine learning. In In Proceedings International Conference on Machine Learning 2000, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nello Cristianini
    • 1
  • Jaz Kandola
    • 2
  • Andre Elisseeff
    • 3
  • John Shawe-Taylor
    • 4
  1. 1.Department of StatisticsUniversity of California at DavisDavis
  2. 2.Quantitative Analytics DivisionMerrill LynchLondon
  3. 3.IBM Research LabRueschlikonSwitzerland
  4. 4.School of Electronics and Computer ScienceUniversity of SouthamptonUK

Personalised recommendations