Skip to main content
Log in

Towards an optimally pruned classifier ensemble

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Ensemble pruning is an important area of research in multiple classifier systems. Reducing ensemble size, by selecting diverse and accurate classifiers from a given pool is a popular strategy to improve ensemble performance. In this paper, we present Accu-Prune (AP) algorithm, a majority voting ensemble that uses accuracy ordering and reduced error pruning approach to identify an optimal ensemble from a given pool of classifiers. At each step, the ensemble is extended by adding two lower accuracy classifiers, implicitly adding diversity to the ensemble. The proposed approach closely mimics the results of the Brute Force (BF) search for optimal ensemble, while reducing the search space drastically. We propose that quality of an ensemble is determined by two factors—size and accuracy. Ideally, smaller ensembles are qualitywise preferable over large ensembles with same accuracy. Based on this notion, we design a deficit function to quantify the quality differential between two arbitrary ensembles. The function examines the performance and size difference between two ensembles to quantify the quality differential. Experimentation has been carried out on 25 UCI datasets and AP algorithm has been compared with BF search and other pruning algorithms. The deficit function is used to compare AP with BF search and a well known pruning algorithm, EPIC. Relevant statistical tests reveal that the generalization capability of AP algorithm is better than forward search and backward elimination, comparable to BF search and slightly inferior to EPIC. EPIC ensembles being significantly large, the quality differential between AP and EPIC ensembles is not significant. Thus, for limited memory applications, with tolerance for small amount of error, AP ensembles may be more appropriate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. This lattice is ordered by the subset relation (\(\subset \)) with the infimum as the null ensemble and the supremum as the complete pool.

  2. Since majority voting is used as the combiner function, levels of even order in the lattice (search space) can be skipped to somewhat speed up the search, though computational complexity of the search remains of the same order.

  3. If a classifier predicts the class label of an instance correctly, an oracle output of 1 is generated, −1 otherwise.

  4. We restricted this study to pool sizes 11 and 21, since BF search on pool-size 31 was not feasible because of unacceptably long computational timings for execution.

References

  1. Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2005) Ensemble diversity measures and their applications to thinning. Inf Fus 6(1):49–62

    Article  Google Scholar 

  2. Baumgartner D, Serpen G (2013) Performance of global-local hybrid ensemble versus boosting and bagging. Int J Mach Learn Cybern 4(4):301–317

    Article  Google Scholar 

  3. Brown G, Wyatt JL, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fus 6(1):5–20

    Article  Google Scholar 

  4. Caruana R, Munson A, Niculescu-Mizil A (2006) Getting the most out of ensemble selection. In: Proceedings of the 6th IEEE international conference on data mining, pp 828–833

  5. Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of the twenty-first international conference on Machine learning (ICML '04), ACM, New York, NY, USA.

  6. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  7. Dietterich T (2000) Ensemble methods in machine learning. International workshop on Multiple Classifier Systems, pp 1–10

  8. Faller W, Schreck S (1995) Real-time prediction of unsteady aerodynamics: application for aircraft control and manoeuvrability enhancement. IEEE Trans Neural Netw on 6(6):1461–1468

    Article  Google Scholar 

  9. Fan W, Chu F, Wang H, Yu PS (2002) Pruning and dynamic scheduling of cost-sensitive ensembles. In: Eighteenth national conference on AI, pp 146–151

  10. Frank A, Asuncion A (2010) UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html

  11. Giacinto G, Roli F, Fumera G (2000) Design of effective multiple classifier systems by clustering of classifiers. In: Proc. of ICPR2000, pp 3–8

  12. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18

    Article  Google Scholar 

  13. Ko AHR, Sabourin R, de Souza Britto Jr A (2009) Compound diversity functions for ensemble selection. IJPRAI 23(4):659–686

    Google Scholar 

  14. Kuncheva LI (2005) Diversity in multiple classifier systems. Inf Fus 6(1):3–4

    Article  MathSciNet  Google Scholar 

  15. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach Learn 51:181–207

    Article  MATH  Google Scholar 

  16. Lam L (2000) Classifier combinations: implementations and theoretical issues. In: Multiple Classifier Systems, pp 77–86

  17. Li N, Yu Y, Zhou ZH (2012) Diversity Regularized Ensemble pruning. In: Proceedings of ECML/PKDD (1), Lecture Notes in Computer Science, vol 7523. Springer, Heidelberg, pp 330–345. doi:10.1007/978-3-642-33460-3_27

  18. Liang Y (2004) Real-time VBR video traffic prediction for dynamic bandwidth allocation. IEEE Trans Syst Man Cybern Part C Appl Rev 34(1):32–47

    Article  Google Scholar 

  19. Lu Z, Wu X, Zhu X, Bongard J (2010) Ensemble pruning via individual contribution ordering. KDD’10, ACM, pp 871–880

  20. Margineantu DD, Dietterich TG (1997) Pruning adaptive boosting. ICML’97, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 211–218

  21. Martínez-Muñoz G, Hernández-Lobato D, Suárez A (2009) An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans Pattern Anal Mach Intell 31:245–259

    Article  Google Scholar 

  22. Martínez-Muñoz G, Suárez A (2006) Pruning in ordered bagging ensembles. ICML’06, ACM, New York, NY, USA, pp 609–616

  23. Martínez-Muñoz G, Suárez A (2007) Using boosting to prune bagging ensembles. Pattern Recogn Lett 28(1):156–165

    Article  Google Scholar 

  24. Martinez-Munoz G, Suarez A (2004) Aggregation order in bagging. IASTED, pp 258–263

  25. Partalas I, Tsoumakas G, Vlahavas I (2008) Focused ensemble selection: a diversity-based method for greedy ensemble selection. In: Proc. of ECAI, pp 117–121

  26. Partalas I, Tsoumakas G, Vlahavas IP (2009) Pruning an ensemble of classifiers via reinforcement learning. Neurocomputing 72(7–9):1900–1909

    Article  Google Scholar 

  27. Qiang F, Shang-Xu H, Sheng-Ying Z (2005) Clustering-based selective neural network ensemble. J Zhejiang Univ Sci A 6(5):387–392

    Google Scholar 

  28. Rodriguez J, Goni A, Illarramendi A (2005) Real-time classification of ECGs on a PDA. IEEE Trans Inf Technol Biomed 9(1):23–34

    Article  Google Scholar 

  29. Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39

    Article  Google Scholar 

  30. Tamon C, Xiang J (2000) On the boosting pruning problem. ECML’00, Springer, London, UK, pp 404–412

  31. Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271

    Article  Google Scholar 

  32. Tsoumakas G, Partalas I, Vlahavas I (2009) Ensemble pruning primer. In: Applications of supervised and unsupervised ensemble methods. Springer

  33. Wang XZ, Wang R, Feng HM, Wang HC (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 5(44):620–635

    Article  Google Scholar 

  34. Yang Y, Korb K, Ting K, Webb G (2005) Ensemble selection for superparent-one-dependence estimators. AI 2005:102–112

    MathSciNet  Google Scholar 

  35. Zhang Y, Burer S, Street WN (2006) Ensemble pruning via semi-definite programming. J Mach Learn Res 7:1315–1338

    MATH  MathSciNet  Google Scholar 

  36. Zhou ZH, Tang W (2003) Selective ensemble of decision trees. In: Proceedings of the 9th international conference on Rough sets., fuzzy sets, data mining, and granular computing, Springer, Berlin, Heidelberg, pp 476–483

Download references

Acknowledgments

We thank the anonymous reviewers for the time they spent on the manuscript. Their valuable comments have resulted in appreciable improvements in the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manju Bhardwaj.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhardwaj, M., Bhatnagar, V. Towards an optimally pruned classifier ensemble. Int. J. Mach. Learn. & Cyber. 6, 699–718 (2015). https://doi.org/10.1007/s13042-014-0303-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-014-0303-8

Keywords

Navigation