Evaluating Difficulty of Multi-class Imbalanced Data

Lango, Mateusz; Napierala, Krystyna; Stefanowski, Jerzy

doi:10.1007/978-3-319-60438-1_31

Mateusz Lango¹⁹,
Krystyna Napierala¹⁹ &
Jerzy Stefanowski¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10352))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1770 Accesses
5 Citations

Abstract

Multi-class imbalanced classification is more difficult than its binary counterpart. Besides typical data difficulty factors, one should also consider the complexity of relations among classes. This paper introduces a new method for examining the characteristics of multi-class data. It is based on analyzing the neighbourhood of the minority class examples and on additional information about similarities between classes. The experimental study has shown that this method is able to identify the difficulty of class distribution and that the estimated minority example safe levels are related with prediction errors of standard classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that in our proposal of similarity between classes, we do not model directly misclassifications between minority classes, which alternatively could be handled by yet another approach with costs of misclassifications between classes.
2.
Refer to [9] for details of the neighborhood construction, recommended distance functions and neighborhood size tuning.
3.
Artificial datasets and more detailed results with additional evaluation measures are available at www.cs.put.poznan.pl/mlango/publications/multi-typology.html.

References

Błaszczyński, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150A, 184–203 (2015)
Google Scholar
Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)
Article Google Scholar
Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modeling under imbalanced distributions. ACM Comput. Surv. (CSUR) 49(2), 31 (2016)
Article Google Scholar
Fernandez, A., Lopez, V., Galar, M., Jesus, M., Herrera, F.: Analysis the classification of imbalanced data sets with multiple classes, binarization techniques and ad-hoc approaches. Knowl. Based Syst. 42, 97–110 (2013)
Article Google Scholar
He, H., Yungian, Ma. (eds.): Imbalanced Learning. Foundations, Algorithms and Applications. IEEE - Wiley, Hoboken (2013)
Google Scholar
Japkowicz, N., Stephen, S.: Class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–450 (2002)
MATH Google Scholar
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell. 5, 221–232 (2016)
Article Google Scholar
Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.B. (eds.) HAIS 2012. LNCS, vol. 7209. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28931-6_14
Google Scholar
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016)
Article Google Scholar
Seaz, J., Krawczyk, B., Wozniak, M.: Analyzing the oversampling of different classes and types in multi-class imbalanced data. Pattern Recogn. 57, 164–178 (2016)
Article Google Scholar
Stefanowski, J.: Dealing with data difficulty factors while learning from imbalanced data. In: Mielniczuk, J., Matwin, S. (eds.) Challenges in Computational Statistics and Data Mining, pp. 333–363. Springer, Heidelberg (2016)
Google Scholar
Wang, S., Yao, X.: Mutliclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B 42(4), 1119–1130 (2012)
Google Scholar
Wojciechowski, S., Wilk, S.: The generator of synthetic multi-dimensional data. Poznan University of Technology Report RB-16/14 (2014)
Google Scholar

Download references

Acknowledgment

The research was funded by the Polish National Science Center, grant no. DEC-2013/11/B/ST6/00963. The work of the last author was also, partially supported by DS internal grant of PUT.

Author information

Authors and Affiliations

Institute of Computing Science, Poznan University of Technology, Poznań, Poland
Mateusz Lango, Krystyna Napierala & Jerzy Stefanowski

Authors

Mateusz Lango
View author publications
You can also search for this author in PubMed Google Scholar
Krystyna Napierala
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Stefanowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mateusz Lango .

Editor information

Editors and Affiliations

Warsaw University of Technology, Warsaw, Poland
Marzena Kryszkiewicz
University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
Institute of Informatics, University of Warsaw, Warsaw, Poland
Dominik Ślęzak
Faculty of Electronics & Information, Warsaw University of Technology, Warsaw, Poland
Henryk Rybinski
Institute of Mathematics, Warsaw University, Warsaw, Poland
Andrzej Skowron
Department of Computer Science, University of North Carolina at Charlotte, North Carolina, USA
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lango, M., Napierala, K., Stefanowski, J. (2017). Evaluating Difficulty of Multi-class Imbalanced Data. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-60438-1_31
Published: 14 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60437-4
Online ISBN: 978-3-319-60438-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics