Skip to main content

The Usefulness of Roughly Balanced Bagging for Complex and High-Dimensional Imbalanced Data

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9607))

Included in the following conference series:

Abstract

Under-sampling generalizations of bagging ensembles improve classification of imbalanced data better than other ensembles. Roughly Balanced Bagging is the most accurate among them. In this paper, we experimentally study its properties that may influence its good performance. Results of experiments show that it can be constructed with a small number of component classifiers. However, they are less diversified than components of the standard bagging. Moreover, its good performance comes from its ability to recognize unsafe types of minority examples better than other ensembles. We also present how to improve its performance by integrating bootstrap sampling with the random selection of attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds.) CORES 2013. AISC, vol. 226, pp. 273–282. Springer, Heidelberg (2013)

    Google Scholar 

  2. Błaszczyński, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150A, 184–203 (2015)

    Google Scholar 

  3. Chang, E.Y.: Statistical learning for effective visual information retrieval. In: Proceedings of the ICIP 2003, vol. 3, pp. 609–612 (2003)

    Google Scholar 

  4. Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., et al. (eds.) ECML PKDD 2015. LNCS, vol. 9284, pp. 200–215. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  5. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 99, 1–22 (2011)

    Google Scholar 

  6. He, H., Ma, Y. (eds.): Imbalanced Learning: Foundations. Algorithms and Applications, IEEE - Wiley, Hoboken (2013)

    MATH  Google Scholar 

  7. Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Stat. Anal. Data Min. 2(5–6), 412–426 (2009). Proceedings of the SIAM International Conference on Data Mining, 143–152 (2008)

    Article  MathSciNet  Google Scholar 

  8. Ho, T.: The random subspace method for constructing decision forests. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  9. Hoens, T.R., Chawla, N.V.: Generating diverse ensembles to counter the problem of class imbalance. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 488–499. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Jo, T., Japkowicz, N.: Class Imbalances versus small disjuncts. ACM SIGKDD Explor. Newslett. 6(1), 40–49 (2004)

    Article  MathSciNet  Google Scholar 

  11. Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A 41(3), 552–568 (2011)

    Article  Google Scholar 

  12. Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms, 2d edn. Wiley, Hoboken (2014)

    MATH  Google Scholar 

  13. Liu, A., Zhu, Z.: Ensemble methods for class imbalance learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms and Applications, pp. 61–82. Wiley, Hoboken (2013)

    Chapter  Google Scholar 

  14. Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. (2015). doi:10.1007/s10844-015-0368-1

    Google Scholar 

  16. Sobhani, P., Viktor, H., Matwin, S.: Learning from imbalanced data using ensemble methods and cluster-based undersampling. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2014. LNCS, vol. 8983, pp. 69–83. Springer, Heidelberg (2015)

    Google Scholar 

  17. Pio, G., Malerba, D., D’Eila, D., Ceci, M.: Integrating microRNA target predictions for the discovery of gene regulatory networks: a semi-supervised ensemble learning approach. BMC Bioinform. 15(Suppl. 1), S4 (2014)

    Article  Google Scholar 

  18. Wallace, B., Small, K., Brodley, C., Trikalinos, T.: Class Imbalance, Redux. In: Proceedings of the 11th IEEE International Conference on Data Mining, pp. 754–763 (2011)

    Google Scholar 

  19. Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the IEEE Symposium Computational Intelligence Data Mining, pp. 324–331 (2009)

    Google Scholar 

Download references

Acknowledgements

The research was supported by NCN grant DEC-2013/11/B/ST6/00963.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerzy Stefanowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lango, M., Stefanowski, J. (2016). The Usefulness of Roughly Balanced Bagging for Complex and High-Dimensional Imbalanced Data. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2015. Lecture Notes in Computer Science(), vol 9607. Springer, Cham. https://doi.org/10.1007/978-3-319-39315-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39315-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39314-8

  • Online ISBN: 978-3-319-39315-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics