Abstract
Multi-class imbalanced classification is more difficult than its binary counterpart. Besides typical data difficulty factors, one should also consider the complexity of relations among classes. This paper introduces a new method for examining the characteristics of multi-class data. It is based on analyzing the neighbourhood of the minority class examples and on additional information about similarities between classes. The experimental study has shown that this method is able to identify the difficulty of class distribution and that the estimated minority example safe levels are related with prediction errors of standard classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that in our proposal of similarity between classes, we do not model directly misclassifications between minority classes, which alternatively could be handled by yet another approach with costs of misclassifications between classes.
- 2.
Refer to [9] for details of the neighborhood construction, recommended distance functions and neighborhood size tuning.
- 3.
Artificial datasets and more detailed results with additional evaluation measures are available at www.cs.put.poznan.pl/mlango/publications/multi-typology.html.
References
Błaszczyński, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150A, 184–203 (2015)
Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)
Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modeling under imbalanced distributions. ACM Comput. Surv. (CSUR) 49(2), 31 (2016)
Fernandez, A., Lopez, V., Galar, M., Jesus, M., Herrera, F.: Analysis the classification of imbalanced data sets with multiple classes, binarization techniques and ad-hoc approaches. Knowl. Based Syst. 42, 97–110 (2013)
He, H., Yungian, Ma. (eds.): Imbalanced Learning. Foundations, Algorithms and Applications. IEEE - Wiley, Hoboken (2013)
Japkowicz, N., Stephen, S.: Class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–450 (2002)
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell. 5, 221–232 (2016)
Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.B. (eds.) HAIS 2012. LNCS, vol. 7209. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28931-6_14
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016)
Seaz, J., Krawczyk, B., Wozniak, M.: Analyzing the oversampling of different classes and types in multi-class imbalanced data. Pattern Recogn. 57, 164–178 (2016)
Stefanowski, J.: Dealing with data difficulty factors while learning from imbalanced data. In: Mielniczuk, J., Matwin, S. (eds.) Challenges in Computational Statistics and Data Mining, pp. 333–363. Springer, Heidelberg (2016)
Wang, S., Yao, X.: Mutliclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B 42(4), 1119–1130 (2012)
Wojciechowski, S., Wilk, S.: The generator of synthetic multi-dimensional data. Poznan University of Technology Report RB-16/14 (2014)
Acknowledgment
The research was funded by the Polish National Science Center, grant no. DEC-2013/11/B/ST6/00963. The work of the last author was also, partially supported by DS internal grant of PUT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lango, M., Napierala, K., Stefanowski, J. (2017). Evaluating Difficulty of Multi-class Imbalanced Data. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-60438-1_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60437-4
Online ISBN: 978-3-319-60438-1
eBook Packages: Computer ScienceComputer Science (R0)