From Ensemble Methods to Comprehensible Models

  • C. Ferri
  • J. Hernández-Orallo
  • M. J. Ramírez-Quintana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2534)


Ensemble methods improve accuracy by combining the predictions of a set of different hypotheses. However, there are two important shortcomings associated with ensemble methods. Huge amounts of memory are required to store a set of multiple hypotheses and, more importantly, comprehensibility of a single hypothesis is lost. In this work, we devise a new method to extract one single solution from a hypothesis ensemble without using extra data, based on two main ideas: the selected solution must be similar, semantically, to the combined solution, and this similarity is evaluated through the use of a random dataset. We have implemented the method using shared ensembles, because it allows for an exponential number of potential base hypotheses. We include several experiments showing that the new method selects a single hypothesis with an accuracy which is reasonably close to the combined hypothesis.


Ensemble Methods Decision Trees Comprehensibility in Machine Learning Classifier Similarity Randomisation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.zbMATHMathSciNetGoogle Scholar
  2. 2.
    J.G. Cleary and L.E. Trigg. Experiences with ob1, an optimal bayes decision tree learner. Technical report, Department of Computer Science, Univ. of Waikato, New Zealand, 1998.Google Scholar
  3. 3.
    J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Meas., 20:37–46, 1960.CrossRefGoogle Scholar
  4. 4.
    T. G Dietterich. Ensemble methods in machine learning. In First International Workshop on Multiple Classifier Systems, pages 1–15, 2000.Google Scholar
  5. 5.
    Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization. Machine Learning, 40(2):139–157, 2000.CrossRefGoogle Scholar
  6. 6.
    C. Ferri, J. Hernández, and M.J. Ramírez. Induction of Decision Multi-trees using Levin Search. In Int. Conf. on Computational Science,ICCS’02, LNCS, 2002.Google Scholar
  7. 7.
    C. Ferri, J. Hernández, and M.J. Ramírez. Learning multiple and different hypotheses. Technical report, Department of Computer Science, Universitat Politécnica de Valéncia, 2002.Google Scholar
  8. 8.
    C. Ferri, J. Hernández, and M.J. Ramírez. SMILES system, a multi-purpose learning system., 2002.
  9. 9.
    Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In the 13th Int. Conf. on Machine Learning (ICML’1996), pages 148–156, 1996.Google Scholar
  10. 10.
    Tim Kam Ho. C4.5 decision forests. In Proc. of 14th Intl. Conf. on Pattern Recognition,Brisbane,Australia, pages 545–549, 1998.Google Scholar
  11. 11.
    Ludmila I. Kuncheva. A Theoretical Study on Six Classifier Fusion Strategies. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(2):281–286, 2002.CrossRefGoogle Scholar
  12. 12.
    Ludmila I. Kuncheva and Christopher J. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Submitted to Machine Learning, 2002.Google Scholar
  13. 13.
    Dragos D. Margineantu and Thomas G. Dietterich. Pruning adaptive boosting. In 14th Int. Conf. on Machine Learning, pages 211–218. Morgan Kaufmann, 1997.Google Scholar
  14. 14.
    N.J. Nilsson. Artificial Intelligence: a new synthesis. Morgan Kaufmann, 1998.Google Scholar
  15. 15.
    University of California. UCI Machine Learning Repository Content Summary.
  16. 16.
    J. Quinlan. Miniboosting decision trees. Submitted to JAIR, 1998.Google Scholar
  17. 17.
    J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  18. 18.
    J. R. Quinlan. Bagging, Boosting, and C4.5. In Proc. of the 13th Nat. Conf. on A.I. and the 8th Innovative Applications of A.I. Conf., pages 725–730. AAAI/MIT Press, 1996.Google Scholar
  19. 19.
    Ross Quinlan. Relational learning and boosting. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 292–306. Springer-Verlag, September 2001.Google Scholar
  20. 20.
    P. Volf and F. Willems. Context maximizing: Finding mdl decision trees. In Symposium on Information Theory in the Benelux,Vol.15, pages 192–200, 1994.Google Scholar
  21. 21.
    Geoffrey I. Webb. Further experimental evidence against the utility of Occam’s razor. Journal of Artificial Intelligence Research, 4:397–417, 1996.zbMATHMathSciNetGoogle Scholar
  22. 22.
    David H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • C. Ferri
    • 1
  • J. Hernández-Orallo
    • 1
  • M. J. Ramírez-Quintana
    • 1
  1. 1.DSIC, UPVValenciaSpain

Personalised recommendations