Skip to main content

Ensembles of Decision Trees for Imbalanced Data

  • Conference paper
Multiple Classifier Systems (MCS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6713))

Included in the following conference series:

Abstract

Ensembles of decision trees are considered for imbalanced datasets. Conventional decision trees (C4.5) and trees for imbalanced data (CCPDT: Class Confidence Proportion Decision Tree) are used as base classifiers. Ensemble methods, based on undersampling and oversampling, for imbalanced data are considered. Conventional ensemble methods, not specific for imbalanced data, are also studied: Bagging, Random Subspaces, AdaBoost, Real AdaBoost, MultiBoost and Rotation Forest. The results show that the ensemble method is much more important that the type of decision trees used as base classifier. Rotation Forest is the ensemble method with the best results. For the decision tree methods, CCPDT shows no advantage.

This work was supported by the Project TIN2008-03151 of the Spanish Ministry of Education and Science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21, 1263–1284 (2009)

    Article  Google Scholar 

  2. Cieslak, D., Chawla, N.: Learning decision trees for unbalanced data. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 241–256. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Liu, W., Chawla, S., Cieslak, D.A., Chawla, N.V.: A Robust Decision Tree Algorithm for Imbalanced Data Sets. In: 10th SIAM International Conference on Data Mining, SDM 2010, pp. 766–777. SIAM, Philadelphia (2010)

    Chapter  Google Scholar 

  4. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MATH  Google Scholar 

  5. Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)

    MATH  Google Scholar 

  6. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, 185–197 (2010)

    Article  Google Scholar 

  7. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting, pp. 107–119 (2003)

    Google Scholar 

  8. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory Undersampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 539–550 (2009)

    Article  Google Scholar 

  9. Hoens, T., Chawla, N.: Generating Diverse Ensembles to Counter the Problem of Class Imbalance. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 488–499. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Flach, P.: The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In: Proc. 20th International Conference on Machine Learning (ICML 2003), pp. 194–201. AAAI Press, Menlo Park (2003)

    Google Scholar 

  11. Chawla, N., Cieslak, D., Hall, L., Joshi, A.: Automatically countering imbalance and its empirical relationship to cost. Data Mining and Knowledge Discovery 17, 225–252 (2008)

    Article  Google Scholar 

  12. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)

    Article  Google Scholar 

  13. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)

    Article  MATH  Google Scholar 

  14. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 95, 337–407 (2000)

    Article  MATH  Google Scholar 

  15. Webb, G.I.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40, 159–196 (2000)

    Article  Google Scholar 

  16. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  17. Rodríguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1619–1630 (2006)

    Article  Google Scholar 

  18. Kuncheva, L.I., Rodríguez, J.J.: An experimental study on rotation forest ensembles. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 459–468. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml

  20. Olszewski, R.T.: Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data. PhD thesis, Computer Science Department, Carnegie Mellon University (2001)

    Google Scholar 

  21. Kuncheva, L.I., Hadjitodorov, S.T., Todorova, L.P.: Experimental comparison of cluster ensemble methods. In: FUSION 2006, Florence, Italy (2006)

    Google Scholar 

  22. Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)

    Google Scholar 

  23. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11 (2009)

    Google Scholar 

  24. Quinlan, J.R.: C4.5: Programs for Machine Learning. Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  25. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MATH  Google Scholar 

  26. Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rodríguez, J.J., Díez-Pastor, J.F., García-Osorio, C. (2011). Ensembles of Decision Trees for Imbalanced Data. In: Sansone, C., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2011. Lecture Notes in Computer Science, vol 6713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21557-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21557-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21556-8

  • Online ISBN: 978-3-642-21557-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics