Advertisement

Ensembles of Decision Trees for Imbalanced Data

  • Juan J. Rodríguez
  • José F. Díez-Pastor
  • César García-Osorio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6713)

Abstract

Ensembles of decision trees are considered for imbalanced datasets. Conventional decision trees (C4.5) and trees for imbalanced data (CCPDT: Class Confidence Proportion Decision Tree) are used as base classifiers. Ensemble methods, based on undersampling and oversampling, for imbalanced data are considered. Conventional ensemble methods, not specific for imbalanced data, are also studied: Bagging, Random Subspaces, AdaBoost, Real AdaBoost, MultiBoost and Rotation Forest. The results show that the ensemble method is much more important that the type of decision trees used as base classifier. Rotation Forest is the ensemble method with the best results. For the decision tree methods, CCPDT shows no advantage.

Keywords

Imbalanced data Decision Trees Bagging Random Subspaces Boosting Rotation Forest 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21, 1263–1284 (2009)CrossRefGoogle Scholar
  2. 2.
    Cieslak, D., Chawla, N.: Learning decision trees for unbalanced data. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 241–256. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Liu, W., Chawla, S., Cieslak, D.A., Chawla, N.V.: A Robust Decision Tree Algorithm for Imbalanced Data Sets. In: 10th SIAM International Conference on Data Mining, SDM 2010, pp. 766–777. SIAM, Philadelphia (2010)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)zbMATHGoogle Scholar
  5. 5.
    Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)zbMATHGoogle Scholar
  6. 6.
    Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, 185–197 (2010)CrossRefGoogle Scholar
  7. 7.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting, pp. 107–119 (2003)Google Scholar
  8. 8.
    Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory Undersampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 539–550 (2009)CrossRefGoogle Scholar
  9. 9.
    Hoens, T., Chawla, N.: Generating Diverse Ensembles to Counter the Problem of Class Imbalance. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 488–499. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Flach, P.: The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In: Proc. 20th International Conference on Machine Learning (ICML 2003), pp. 194–201. AAAI Press, Menlo Park (2003)Google Scholar
  11. 11.
    Chawla, N., Cieslak, D., Hall, L., Joshi, A.: Automatically countering imbalance and its empirical relationship to cost. Data Mining and Knowledge Discovery 17, 225–252 (2008)CrossRefGoogle Scholar
  12. 12.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)CrossRefGoogle Scholar
  13. 13.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)CrossRefzbMATHGoogle Scholar
  14. 14.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 95, 337–407 (2000)CrossRefzbMATHGoogle Scholar
  15. 15.
    Webb, G.I.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40, 159–196 (2000)CrossRefGoogle Scholar
  16. 16.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  17. 17.
    Rodríguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1619–1630 (2006)CrossRefGoogle Scholar
  18. 18.
    Kuncheva, L.I., Rodríguez, J.J.: An experimental study on rotation forest ensembles. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 459–468. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
  20. 20.
    Olszewski, R.T.: Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data. PhD thesis, Computer Science Department, Carnegie Mellon University (2001)Google Scholar
  21. 21.
    Kuncheva, L.I., Hadjitodorov, S.T., Todorova, L.P.: Experimental comparison of cluster ensemble methods. In: FUSION 2006, Florence, Italy (2006)Google Scholar
  22. 22.
    Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)Google Scholar
  23. 23.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11 (2009)Google Scholar
  24. 24.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  25. 25.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)zbMATHGoogle Scholar
  26. 26.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Juan J. Rodríguez
    • 1
  • José F. Díez-Pastor
    • 1
  • César García-Osorio
    • 1
  1. 1.University of BurgosSpain

Personalised recommendations