Simple Mimetic Classifiers

  • V. Estruch
  • C. Ferri
  • J. Hernández-Orallo
  • M. J. Ramírez-Quintana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2734)


The combination of classifiers is a powerful tool to improve the accuracy of classifiers, by using the prediction of multiple models and combining them. Many practical and useful combination techniques work by using the output of several classifiers as the input of a second layer classifier. The problem of this and other multi-classifier approaches is that huge amounts of memory are required to store a set of multiple classifiers and, more importantly, the comprehensibility of a single classifier is lost and no knowledge or insight can be acquired from the model. In order to overcome these limitations, in this work we analyse the idea of “mimicking” the semantics of an ensemble of classifiers. More precisely, we use the combination of classifiers for labelling an invented random dataset, and then, we use this artificially labelled dataset to re-train one single model. This model has the following advantages: it is almost similar to the highly accurate combined model, as a single solution it requires much fewer memory resources, no additional validation test must be reserved to do this procedure and, more importantly, the resulting model is expressed as a single classifier in terms of the original attributes and, hence, it can be comprehensible. First, we illustrate this methodology using a popular data-mining package, showing that it can spread into common practice, and then we use our system SMILES, which automates the process and takes advantage of its ensemble method.


multi-classifier systems stacking decision trees comprehensibility in machine learning rule extraction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    D. Angluin. Queries and concept learning. Machine Learning, 2:319, 1987.Google Scholar
  2. 2.
    C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998.Google Scholar
  3. 3.
    A. Blum and T. Mitchell. Combining Labeled and Unlabeled Data with Co-Training. In Proc. of the 1998 Conf. on Computational Learning Theory, 1998.Google Scholar
  4. 4.
    O. Boz. Extracting decision trees from trained neural networks. In 8th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining, 2002.Google Scholar
  5. 5.
    L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.zbMATHMathSciNetGoogle Scholar
  6. 6.
    W. Buntine. Learning classification trees. In D.J. Hand, editor, Artificial Intelligence frontiers in statistics, pages 182–201. Chapman & Hall, London, 1993.Google Scholar
  7. 7.
    M.W. Craven. Extracting Comprehensible Models from Trained Neural Networks. PhD thesis, Dep. of Computer Sciences, University of Wisconsin-Madison, 1996.Google Scholar
  8. 8.
    M.W. Craven and J.W. Shavlik. Extracting tree-structured representations of trained networks. Advances in Neural Information Processing, 8, 1996.Google Scholar
  9. 9.
    T.G Dietterich. Ensemble methods in machine learning. In First International Workshop on Multiple Classifier Systems, pages 1–15, 2000.Google Scholar
  10. 10.
    T.G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization. Machine Learning, 40(2):139–157, 2000.CrossRefGoogle Scholar
  11. 11.
    V. Estruch, C. Ferri, J. Hernández, and M.J. Ramírez. Shared Ensembles using Multi-trees. In 8th Iberoamerican Conf. on Artificial Intelligence, Iberamia’02, volume 2527 of Lecture Notes in Computer Science, pages 204–213, 2002.Google Scholar
  12. 12.
    V. Estruch, C. Ferri, J. Hernández, and M.J. Ramírez. Beam search extraction and forgetting strategies on shared ensembles. In Fourth Workshop on Multiple Classifier Systems (MCS2003), volume to appear of Lecture Notes in Computer Science, 2003.Google Scholar
  13. 13.
    V. Estruch and J. Hernández. Theoretical Issues of Mimetic Classifiers. Technical report, Dep. Information Systems and Computation, Tech. Univ. Valencia,, 2003.Google Scholar
  14. 14.
    C. Ferri, J. Hernández, and M.J. Ramírez. From Ensemble Methods to Comprehensible Models. In The 5th Intl Conf on Discovery Science, volume 2534 of LNCS, pages 164–177, 2002.Google Scholar
  15. 15.
    Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th Intl Conf Machine Learning, pages 148–146. Morgan Kaufmann, 1996.Google Scholar
  16. 16.
    J. Gama. Combining classifiers with constructive induction. In C. Nedellec and C. Rouveirol, editors, Proc. of ECML-98, volume 1398, pages 178–189, 1998.Google Scholar
  17. 17.
    J. Gama and P. Brazdil. Cascade Generalization. Machine Learning, 41(3):315–343, 2000.zbMATHCrossRefGoogle Scholar
  18. 18.
    T.K. Ho. C4.5 decision forests. In Proc. of 14th Intl. Conf. on Pattern Recognition, Brisbane, Australia, pages 545–549, 1998.Google Scholar
  19. 19.
    R. Kohavi and C. Kunz. Option decision trees with majority votes. In Proc. 14th Intl. Conference on Machine Learning, pages 161–169. Morgan Kaufmann, 1997.Google Scholar
  20. 20.
    D.D. Margineantu and T.G. Dietterich. Pruning adaptive boosting. In 14th Int. Conf. on Machine Learning, pages 211–218. Morgan Kaufmann, 1997.Google Scholar
  21. 21.
    C.J. Merz. Using correspondence analysis to combine classifiers. Machine Learning, 36(1/2):33–58, 1999.CrossRefGoogle Scholar
  22. 22.
    A.L. Prodromidis and S.J. Stolfo. Cost complexity-based pruning of ensemble classifiers. Knowledge and Information Systems, 3(4):449–469, 2001.zbMATHCrossRefGoogle Scholar
  23. 23.
    J.R. Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27(3):221–234, 1987.CrossRefGoogle Scholar
  24. 24.
    J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  25. 25.
    J.R. Quinlan. Bagging, Boosting, and C4.5. In Proc. 30th Natl. Conf. on AI and 8th Innovative Apps. of AI Conf., pages 725–730. AAAI Press / MIT Press, 1996.Google Scholar
  26. 26.
    J.R. Quinlan. Miniboosting decision trees. Submitted to JAIR, 1998.Google Scholar
  27. 27.
    B.W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.Google Scholar
  28. 28.
    I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, 1999.Google Scholar
  29. 29.
    D.H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • V. Estruch
    • 1
  • C. Ferri
    • 1
  • J. Hernández-Orallo
    • 1
  • M. J. Ramírez-Quintana
    • 1
  1. 1.DSIC, Univ. Politècnica de ValènciaValenciaSpain

Personalised recommendations