Skip to main content

Abstract

Various modifications of bagging for class imbalanced data are discussed. An experimental comparison of known bagging modifications shows that integrating with undersampling is more powerful than oversampling. We introduce Local-and-Over-All Balanced bagging where probability of sampling an example is tuned according to the class distribution inside its neighbourhood. Experiments indicate that this proposal is competitive to best undersampling bagging extensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)

    Article  Google Scholar 

  2. Błaszczyński, J., Słowiński, R., Stefanowski, J.: Feature Set-based Consistency Sampling in Bagging Ensembles. In: Proc. From Local Patterns To Global Models (LEGO), ECML/PKDD Workshop, pp. 19–35 (2009)

    Google Scholar 

  3. Błaszczyński, J., Słowiński, R., Stefanowski, J.: Variable Consistency Bagging Ensembles. Transactions on Rough Sets 11, 40–52 (2010)

    Google Scholar 

  4. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  5. Chang, E.: Statistical learning for effective visual information retrieval. In: Proc. of ICIP 2003, pp. 609–612 (2003)

    Google Scholar 

  6. Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artifical Intelligence Research 16, 341–378 (2002)

    Google Scholar 

  7. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A Review on Ensembles for Class Imbalance Problem: Bagging, Boosting and Hybrid Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics–Part C 42(4), 463–484 (2011)

    Article  Google Scholar 

  8. He, H., Garcia, E.: Learning from imbalanced data. IEEE Transactions on Data and Knowledge Engineering 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  9. Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Statistical Analysis and Data Mining 2(5-6), 412–426 (2009)

    Article  MathSciNet  Google Scholar 

  10. Japkowicz, N., Shah, M.: Evaluating Learning Algorithms. A Classification Perpsective. Cambridge University Press (2011)

    Google Scholar 

  11. Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Transactions on Systems, Man, and Cybernetics–Part A 41(3), 552–568 (2011)

    Article  Google Scholar 

  12. Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proc. IEEE Symp. Comput. Intell. Data Mining, pp. 324–331 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerzy Błaszczyński .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Błaszczyński, J., Stefanowski, J., Idkowiak, Ł. (2013). Extending Bagging for Imbalanced Data. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds) Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013. Advances in Intelligent Systems and Computing, vol 226. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00969-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-00969-8_26

  • Publisher Name: Springer, Heidelberg

  • Print ISBN: 978-3-319-00968-1

  • Online ISBN: 978-3-319-00969-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics