Skip to main content

Evaluating Difficulty of Multi-class Imbalanced Data

  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10352))

Included in the following conference series:

Abstract

Multi-class imbalanced classification is more difficult than its binary counterpart. Besides typical data difficulty factors, one should also consider the complexity of relations among classes. This paper introduces a new method for examining the characteristics of multi-class data. It is based on analyzing the neighbourhood of the minority class examples and on additional information about similarities between classes. The experimental study has shown that this method is able to identify the difficulty of class distribution and that the estimated minority example safe levels are related with prediction errors of standard classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that in our proposal of similarity between classes, we do not model directly misclassifications between minority classes, which alternatively could be handled by yet another approach with costs of misclassifications between classes.

  2. 2.

    Refer to [9] for details of the neighborhood construction, recommended distance functions and neighborhood size tuning.

  3. 3.

    Artificial datasets and more detailed results with additional evaluation measures are available at www.cs.put.poznan.pl/mlango/publications/multi-typology.html.

References

  1. Błaszczyński, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150A, 184–203 (2015)

    Google Scholar 

  2. Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)

    Article  Google Scholar 

  3. Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modeling under imbalanced distributions. ACM Comput. Surv. (CSUR) 49(2), 31 (2016)

    Article  Google Scholar 

  4. Fernandez, A., Lopez, V., Galar, M., Jesus, M., Herrera, F.: Analysis the classification of imbalanced data sets with multiple classes, binarization techniques and ad-hoc approaches. Knowl. Based Syst. 42, 97–110 (2013)

    Article  Google Scholar 

  5. He, H., Yungian, Ma. (eds.): Imbalanced Learning. Foundations, Algorithms and Applications. IEEE - Wiley, Hoboken (2013)

    Google Scholar 

  6. Japkowicz, N., Stephen, S.: Class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–450 (2002)

    MATH  Google Scholar 

  7. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell. 5, 221–232 (2016)

    Article  Google Scholar 

  8. Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.B. (eds.) HAIS 2012. LNCS, vol. 7209. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28931-6_14

    Google Scholar 

  9. Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016)

    Article  Google Scholar 

  10. Seaz, J., Krawczyk, B., Wozniak, M.: Analyzing the oversampling of different classes and types in multi-class imbalanced data. Pattern Recogn. 57, 164–178 (2016)

    Article  Google Scholar 

  11. Stefanowski, J.: Dealing with data difficulty factors while learning from imbalanced data. In: Mielniczuk, J., Matwin, S. (eds.) Challenges in Computational Statistics and Data Mining, pp. 333–363. Springer, Heidelberg (2016)

    Google Scholar 

  12. Wang, S., Yao, X.: Mutliclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B 42(4), 1119–1130 (2012)

    Google Scholar 

  13. Wojciechowski, S., Wilk, S.: The generator of synthetic multi-dimensional data. Poznan University of Technology Report RB-16/14 (2014)

    Google Scholar 

Download references

Acknowledgment

The research was funded by the Polish National Science Center, grant no. DEC-2013/11/B/ST6/00963. The work of the last author was also, partially supported by DS internal grant of PUT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mateusz Lango .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lango, M., Napierala, K., Stefanowski, J. (2017). Evaluating Difficulty of Multi-class Imbalanced Data. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60438-1_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60437-4

  • Online ISBN: 978-3-319-60438-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics