From Ensemble Methods to Comprehensible Models
Ensemble methods improve accuracy by combining the predictions of a set of different hypotheses. However, there are two important shortcomings associated with ensemble methods. Huge amounts of memory are required to store a set of multiple hypotheses and, more importantly, comprehensibility of a single hypothesis is lost. In this work, we devise a new method to extract one single solution from a hypothesis ensemble without using extra data, based on two main ideas: the selected solution must be similar, semantically, to the combined solution, and this similarity is evaluated through the use of a random dataset. We have implemented the method using shared ensembles, because it allows for an exponential number of potential base hypotheses. We include several experiments showing that the new method selects a single hypothesis with an accuracy which is reasonably close to the combined hypothesis.
KeywordsEnsemble Methods Decision Trees Comprehensibility in Machine Learning Classifier Similarity Randomisation
Unable to display preview. Download preview PDF.
- 2.J.G. Cleary and L.E. Trigg. Experiences with ob1, an optimal bayes decision tree learner. Technical report, Department of Computer Science, Univ. of Waikato, New Zealand, 1998.Google Scholar
- 4.T. G Dietterich. Ensemble methods in machine learning. In First International Workshop on Multiple Classifier Systems, pages 1–15, 2000.Google Scholar
- 6.C. Ferri, J. Hernández, and M.J. Ramírez. Induction of Decision Multi-trees using Levin Search. In Int. Conf. on Computational Science,ICCS’02, LNCS, 2002.Google Scholar
- 7.C. Ferri, J. Hernández, and M.J. Ramírez. Learning multiple and different hypotheses. Technical report, Department of Computer Science, Universitat Politécnica de Valéncia, 2002.Google Scholar
- 8.C. Ferri, J. Hernández, and M.J. Ramírez. SMILES system, a multi-purpose learning system. http://www.dsic.upv.es/~flip/smiles/, 2002.
- 9.Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In the 13th Int. Conf. on Machine Learning (ICML’1996), pages 148–156, 1996.Google Scholar
- 10.Tim Kam Ho. C4.5 decision forests. In Proc. of 14th Intl. Conf. on Pattern Recognition,Brisbane,Australia, pages 545–549, 1998.Google Scholar
- 12.Ludmila I. Kuncheva and Christopher J. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Submitted to Machine Learning, 2002.Google Scholar
- 13.Dragos D. Margineantu and Thomas G. Dietterich. Pruning adaptive boosting. In 14th Int. Conf. on Machine Learning, pages 211–218. Morgan Kaufmann, 1997.Google Scholar
- 14.N.J. Nilsson. Artificial Intelligence: a new synthesis. Morgan Kaufmann, 1998.Google Scholar
- 15.University of California. UCI Machine Learning Repository Content Summary. http://www.ics.uci.edu/~mlearn/MLSummary.html.
- 16.J. Quinlan. Miniboosting decision trees. Submitted to JAIR, 1998.Google Scholar
- 17.J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
- 18.J. R. Quinlan. Bagging, Boosting, and C4.5. In Proc. of the 13th Nat. Conf. on A.I. and the 8th Innovative Applications of A.I. Conf., pages 725–730. AAAI/MIT Press, 1996.Google Scholar
- 19.Ross Quinlan. Relational learning and boosting. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 292–306. Springer-Verlag, September 2001.Google Scholar
- 20.P. Volf and F. Willems. Context maximizing: Finding mdl decision trees. In Symposium on Information Theory in the Benelux,Vol.15, pages 192–200, 1994.Google Scholar