An Effective Ensemble Method for Multi-class Classification and Regression for Imbalanced Data

  • Tahira Alam
  • Chowdhury Farhan Ahmed
  • Sabit Anwar Zahin
  • Muhammad Asif Hossain Khan
  • Maliha Tashfia Islam
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10933)


In the field of Data Mining, classification and regression plays a vital role as they are useful in various real-life domains. Most of the real-life data suffer from data imbalance problem. The performances of the standard algorithms are hindered for the data imbalance problem. A number of methods have been introduced for imbalance data classification. However, most of them are designed for binary class imbalance problems. Furthermore, they suffer from various problems like loss of useful information, likelihood of overfitting, unexpected mistakes etc. On the other hand, data imbalance problem exists in regression analysis also, although very few existing methods consider this problem. Hence, we propose an effective recursive based ensemble method for multi-class imbalance data classification. We also extend our method to propose an effective recursive based method for solving the data imbalance problem in regression. Extensive performance analyses show that our proposed approach achieves high performance in multi-class classification on class imbalance data and regression analysis on skewed or imbalance data. The experimental results also show that our method outperforms various existing methods for imbalance classification and regression.


Classification Multi-class classification Regression Imbalance problem 


  1. 1.
    Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modelling under imbalanced distributions. CoRR abs/1505.01658 (2015)Google Scholar
  2. 2.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Technical report, Wadsworth International, Monterey, CA (1984)Google Scholar
  3. 3.
    Buza, K., Nanopoulos, A., Nagy, I.J.: Nearest neighbor regression in the presence of bad hubs. Knowl.-Based Syst. 86, 250–260 (2015)CrossRefGoogle Scholar
  4. 4.
    Chawla, N.V., Bowyer, K.W., Kegelmeyer, W.P.: SMOTE: synthetic minority oversampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)zbMATHGoogle Scholar
  5. 5.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). Scholar
  6. 6.
    Donmez, P.: Introduction to Machine Learning, 2nd edn. By Ethem Alpaydın. The MIT Press, Cambridge (2010). ISBN 978-0-262-01243-0. $54/£ 39.95 + 584 p. (Nat. Lang. Eng. 19(2), 285–288 (2013))Google Scholar
  7. 7.
    Galar, M., Fernández, A., Tartas, E.B., Sola, H.B., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cyb. Part C 42(4), 463–484 (2012)CrossRefGoogle Scholar
  8. 8.
    Guo, H., Li, Y., Li, Y., Liu, X., Li, J.: BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng. Appl. AI 49, 176–193 (2016)Google Scholar
  9. 9.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). Scholar
  10. 10.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Burlington (2000)zbMATHGoogle Scholar
  11. 11.
    Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI 2000), vol. 1, pp. 111–117 (2000)Google Scholar
  12. 12.
    Krawczyk, B., Wozniak, M., Herrera, F.: Weighted one-class classification for different types of minority class examples in imbalanced data. In: 2014 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2014, Orlando, FL, USA, 9–12 December 2014, pp. 337–344 (2014)Google Scholar
  13. 13.
    Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)Google Scholar
  14. 14.
    López, V., Fernández, A., del Jesús, M.J., Herrera, F.: A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl.-Based Syst. 38, 85–104 (2013)CrossRefGoogle Scholar
  15. 15.
    Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B 39(2), 539–550 (2009)CrossRefGoogle Scholar
  16. 16.
    Mani, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceeding of International Conference on Machine Learning (ICML 2003) (2003)Google Scholar
  17. 17.
    Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48(5), 1623–1637 (2015)CrossRefGoogle Scholar
  18. 18.
    Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Expert Syst. 32(3), 465–476 (2015)CrossRefGoogle Scholar
  19. 19.
    Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: CIDM, pp. 324–331. IEEE (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Tahira Alam
    • 1
  • Chowdhury Farhan Ahmed
    • 1
  • Sabit Anwar Zahin
    • 1
  • Muhammad Asif Hossain Khan
    • 1
  • Maliha Tashfia Islam
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of DhakaDhakaBangladesh

Personalised recommendations