Ensemble Learning with Supervised Kernels

  • Kari Torkkola
  • Eugene Tuv
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


Kernel-based methods have outstanding performance on many machine learning and pattern recognition tasks. However, they are sensitive to kernel selection, they may have low tolerance to noise, and they can not deal with mixed-type or missing data. We propose to derive a novel kernel from an ensemble of decision trees. This leads to kernel methods that naturally handle noisy and heterogeneous data with potentially non-randomly missing values. We demonstrate excellent performance of regularized least square learners based on such kernels.


Support Vector Machine Feature Selection Random Forest Gaussian Kernel Base Learner 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36, 525–536 (1999)CrossRefGoogle Scholar
  2. 2.
    Bousquet, O., Elisseeff, A.: Algorithmic stability and generalization performance. In: NIPS, pp. 196–202 (2000)Google Scholar
  3. 3.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  4. 4.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  5. 5.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CRC Press, Boca Raton (1984)zbMATHGoogle Scholar
  6. 6.
    Cristianini, N., Shawe-Taylor, J., Elisseeff, A., Kandola, J.S.: On kernel-target alignment. In: Proc. NIPS, pp. 367–373 (2001)Google Scholar
  7. 7.
    Cucker, F., Smale, S.: On the mathematial foundations of learning. Bulletin of the American Mathematical Society 89(1), 1–49 (2001)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Cucker, F., Smale, S.: Best choices for regularization parameters in learning theory: on the bias-variance problem. Foundations of Computational Mathematics 2(4), 413–428 (2003)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)CrossRefGoogle Scholar
  10. 10.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. 13th ICML (1996)Google Scholar
  11. 11.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Technical report, Dept. of Statistics, Stanford University (1999)Google Scholar
  12. 12.
    Friedman, J.H.: Stochastic gradient boosting. Technical report, Dept. of Statistics, Stanford University (1999)Google Scholar
  13. 13.
    Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17. MIT Press, Cambridge (2005)Google Scholar
  14. 14.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  15. 15.
    Hoerl, A., Kennard, R.: Ridge regression: Applications to nonorthogonal problems. Technometrics 12(3), 69–82 (1970)zbMATHCrossRefGoogle Scholar
  16. 16.
    Hoerl, A., Kennard, R.: Ridge regression; biased estimation for nonorthogonal problems. Technometrics 12(3), 55–67 (1970)zbMATHCrossRefGoogle Scholar
  17. 17.
    Lanckriet, G.R.G., Cristianini, N., Bartlett, P.L., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research 5, 27–72 (2004)Google Scholar
  18. 18.
    Mukherjee, S., Niyogi, P., Poggio, T., Rifkin, R.: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Technical Report 024, Massachusetts Institute of Technology, Cambridge, MA (2002) AI Memo #2002-024Google Scholar
  19. 19.
    Poggio, T., Rifkin, R., Mukherjee, S., Niyogi, P.: General conditions for predictivity in learning theory. Nature 428, 419–422 (2004)CrossRefGoogle Scholar
  20. 20.
    Poggio, T., Rifkin, R., Mukherjee, S., Rakhlin, A.: Bagging regularizes. CBCL Paper 214, Massachusetts Institute of Technology, Cambridge, MA (February 2002), AI Memo #2002-003.Google Scholar
  21. 21.
    Poggio, T., Smale, S.: The mathematics of learning: Dealing with data. Notices of the American Mathematical Society (AMS) 50(5), 537–544 (2003)zbMATHMathSciNetGoogle Scholar
  22. 22.
    Rifkin, R.: Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. PhD thesis, MIT (2002)Google Scholar
  23. 23.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  24. 24.
    Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. W.H.Wingston, Washington (1977)zbMATHGoogle Scholar
  25. 25.
    Valentini, G., Dietterich, T.: Low bias bagged support vector machines. In: Proc ICML, pp. 752–759 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Kari Torkkola
    • 1
  • Eugene Tuv
    • 2
  1. 1.Intelligent Systems LabMotorolaTempeUSA
  2. 2.Analysis and Control TechnologyIntelChandlerUSA

Personalised recommendations