Machine Learning

, Volume 66, Issue 2–3, pp 209–241 | Cite as

Optimal dyadic decision trees

  • G. Blanchard
  • C. Schäfer
  • Y. Rozenholc
  • K.-R. Müller
Article

Abstract

We introduce a new algorithm building an optimal dyadic decision tree (ODT). The method combines guaranteed performance in the learning theoretical sense and optimal search from the algorithmic point of view. Furthermore it inherits the explanatory power of tree approaches, while improving performance over classical approaches such as CART/C4.5, as shown on experiments on artificial and benchmark data.

Keywords

Decision tree Oracle inequality Adaptive convergence rate Classification Density estimation 

References

  1. Adelson-Velskii, G. M., & Landis, E. M. (1962). An algorithm for the organization of information. Soviet Math. Doclady, 3, 1259–1263.Google Scholar
  2. Barron, A., Birgé, L., & Massart, P. (1999). Risk bounds for model selection via penalization. Probability Theory and Related Fields, 113, 301–413.MATHCrossRefMathSciNetGoogle Scholar
  3. Barron, A., & Sheu, C. (1991). Approximation of density functions by sequences of exponential families. Annals of Statistics, 19, 1347–1369.MATHMathSciNetGoogle Scholar
  4. Bartlett, P., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. Annals of Statistics, 33(4), 1497–1537.MATHCrossRefMathSciNetGoogle Scholar
  5. Blanchard, G. (2004). Different paradigms for choosing sequential reweighting algorithms. Neural Computation, 16, 811–836.MATHCrossRefMathSciNetGoogle Scholar
  6. Blanchard, G., Bousquet, O., & Massart, P. (2004). Statistical performance of support Vector Machines. Submitted manuscript.Google Scholar
  7. Blanchard, G., Schäfer, C., & Rozenholc, Y. (2004). Oracle bounds and exact algorithm for dyadic classification trees. In J. Shawe-Taylor & Y. Singer (Eds.), Proceedings of the 17th Conference on Learning Theory (COLT 2004), number 3210 in lectures notes in artificial intelligence (pp. 378–392). Springer.Google Scholar
  8. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.MATHCrossRefGoogle Scholar
  9. Breiman, L., Friedman, J., Olshen, J., & Stone, C. (1984). Classification and Regression Trees. Wadsworth.Google Scholar
  10. Castellan, G. (2000). Histograms selection with an Akaike type criterion. C. R. Acad. Sci., Paris, Sér. I, Math., 330(8), 729–732.MATHMathSciNetGoogle Scholar
  11. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley series in telecommunications. J. Wiley.Google Scholar
  12. Devroye, L., Györfi, L., & Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of Mathematics. New York: Springer.Google Scholar
  13. Donoho, D. (1997). Cart and best-ortho-basis: A connection. Annals of Statistics, 25, 1870–1911.MATHCrossRefMathSciNetGoogle Scholar
  14. Gey, S., & Nédélec, E. (2005). Model selection for CART regression trees. IEEE Transactions on Information Theory, 51(2), 658–670.CrossRefGoogle Scholar
  15. Györfi, L., Kohler, M., & Krzyzak, A. (2002). A distribution-free theory of nonparametric regression. Springer series in statistics. Springer.Google Scholar
  16. Klemelä, J. (2003). Multivariate histograms with data-dependent partitions. Technical report, Institut für angewandte Mathematik, Universität Heidelberg.Google Scholar
  17. Massart, P. (2000). Some applications of concentration inequalities in statistics. Ann. Fac. Sci. Toulouse Math., 9(2), 245–303.MATHMathSciNetGoogle Scholar
  18. Mika, S., Rätsch, G., Weston, J., Schölkopf, B., & Müller, K.-R. (1999). Fisher discriminant analysis with kernels. In Y.-H. Hu, J. Larsen, E. Wilson & S. Douglas (Eds.), Neural networks for signal processing IX (pp. 41–48). IEEE.Google Scholar
  19. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo.Google Scholar
  20. Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320. also NeuroCOLT Technical Report NC-TR-1998-021.MATHCrossRefGoogle Scholar
  21. Scott, C., & Nowak, R. (2004). Near-minimax optimal classification with dyadic classification trees. In S. Thrun, L. Saul & B. Schölkopf (Eds.), Advances in neural information processing systems 16. Cambridge, MA: MIT Press.Google Scholar
  22. Scott, C., & Nowak, R. (2006). Minimax optimal classification with dyadic decision trees. IEEE Transactions on Information Theory, 52(4), 1335–1353.CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2007

Authors and Affiliations

  • G. Blanchard
    • 1
  • C. Schäfer
    • 1
  • Y. Rozenholc
    • 2
  • K.-R. Müller
    • 1
    • 3
  1. 1.Fraunhofer First (IDA)BerlinGermany
  2. 2.Applied Mathematics Department (MAP5)Université René DescartesParis CedexFrance
  3. 3.Computer Science DepartmentTechnical University of BerlinBerlinGermany

Personalised recommendations