Advertisement

Extending Bagging for Imbalanced Data

  • Jerzy BłaszczyńskiEmail author
  • Jerzy Stefanowski
  • Łukasz Idkowiak
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 226)

Abstract

Various modifications of bagging for class imbalanced data are discussed. An experimental comparison of known bagging modifications shows that integrating with undersampling is more powerful than oversampling. We introduce Local-and-Over-All Balanced bagging where probability of sampling an example is tuned according to the class distribution inside its neighbourhood. Experiments indicate that this proposal is competitive to best undersampling bagging extensions.

Keywords

Majority Class Bootstrap Sample Class Distribution Random Subset Minority Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)CrossRefGoogle Scholar
  2. 2.
    Błaszczyński, J., Słowiński, R., Stefanowski, J.: Feature Set-based Consistency Sampling in Bagging Ensembles. In: Proc. From Local Patterns To Global Models (LEGO), ECML/PKDD Workshop, pp. 19–35 (2009)Google Scholar
  3. 3.
    Błaszczyński, J., Słowiński, R., Stefanowski, J.: Variable Consistency Bagging Ensembles. Transactions on Rough Sets 11, 40–52 (2010)Google Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Chang, E.: Statistical learning for effective visual information retrieval. In: Proc. of ICIP 2003, pp. 609–612 (2003)Google Scholar
  6. 6.
    Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artifical Intelligence Research 16, 341–378 (2002)Google Scholar
  7. 7.
    Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A Review on Ensembles for Class Imbalance Problem: Bagging, Boosting and Hybrid Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics–Part C 42(4), 463–484 (2011)CrossRefGoogle Scholar
  8. 8.
    He, H., Garcia, E.: Learning from imbalanced data. IEEE Transactions on Data and Knowledge Engineering 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  9. 9.
    Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Statistical Analysis and Data Mining 2(5-6), 412–426 (2009)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Japkowicz, N., Shah, M.: Evaluating Learning Algorithms. A Classification Perpsective. Cambridge University Press (2011)Google Scholar
  11. 11.
    Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Transactions on Systems, Man, and Cybernetics–Part A 41(3), 552–568 (2011)CrossRefGoogle Scholar
  12. 12.
    Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proc. IEEE Symp. Comput. Intell. Data Mining, pp. 324–331 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Jerzy Błaszczyński
    • 1
    Email author
  • Jerzy Stefanowski
    • 1
  • Łukasz Idkowiak
    • 1
  1. 1.Institute of Computing SciencePoznań University of TechnologyPoznańPoland

Personalised recommendations