Local Characteristics of Minority Examples in Pre-processing of Imbalanced Data

Stefanowski, Jerzy; Napierała, Krystyna; Trzcielińska, Małgorzata

doi:10.1007/978-3-319-08326-1_13

Jerzy Stefanowski²²,
Krystyna Napierała²² &
Małgorzata Trzcielińska²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8502))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1572 Accesses
3 Citations

Abstract

Informed pre-processing methods for improving classifiers learned from class-imbalanced data are considered. We discuss different ways of analyzing the characteristics of local distributions of examples in such data. Then, we experimentally compare main informed pre-processing methods and show that identifying types of minority examples depending on their k nearest neighbourhood may help in explaining differences in performance of these methods. Finally, we exploit the information about the local neighbourhood to modify the oversampling ratio in a SMOTE–related method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Article Google Scholar
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: Synthetic Minority Over-sampling Technique. J. of Artificial Intelligence Research 16, 341–378 (2002)
Google Scholar
He, H.: Yungian Ma: Imbalanced Learning. Foundations, Algorithms and Applications. IEEE - Wiley (2013)
Google Scholar
Jo, T., Japkowicz, N.: Class Imbalances versus small disjuncts. SIGKDD Explorations 6(1), 40–49 (2004)
Article MathSciNet Google Scholar
Kubat, M., Matwin, S.: Addresing the curse of imbalanced training sets: one-side selection. In: Proc. of the 14th Int. Conf. on Machine Learning, pp. 179–186 (1997)
Google Scholar
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. Tech. Report A-2001-2. University of Tampere (2001)
Google Scholar
Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proc. IEEE Symp. on Computational Intelligence and Data Mining, pp. 104–111 (2011)
Google Scholar
Napierala, K.: Improving rule classifiers for imbalanced data. Ph.D. Thesis. Poznan University of Technology (2013)
Google Scholar
Napierała, K., Stefanowski, J., Wilk, S.: Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 158–167. Springer, Heidelberg (2010)
Chapter Google Scholar
Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)
Chapter Google Scholar
Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 283–292. Springer, Heidelberg (2008)
Chapter Google Scholar
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artifical Intelligence Research 6, 1–34 (1997)
MATH MathSciNet Google Scholar
Wilson, D.R., Martinez, T.: Reduction techniques for instance-based learning algorithms. Machine Learning Journal 38, 257–286 (2000)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Science, Poznań University of Technology, 60-965, Poznań, Poland
Jerzy Stefanowski, Krystyna Napierała & Małgorzata Trzcielińska

Authors

Jerzy Stefanowski
View author publications
You can also search for this author in PubMed Google Scholar
Krystyna Napierała
View author publications
You can also search for this author in PubMed Google Scholar
Małgorzata Trzcielińska
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group PLIS: Programming, Logic and Intelligent Systems Dept. of Communication, Business and Information Technologies, Roskilde University, Denmark
Troels Andreasen & Henning Christiansen &
Department of Computer Science and Artificial Intelligence, CITIC, University of Granada, 18071, Granada, Spain
Juan-Carlos Cubero
University of North Carolina, , , 9201 University City Blvd, Charlotte, NC 28223 USA, and Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stefanowski, J., Napierała, K., Trzcielińska, M. (2014). Local Characteristics of Minority Examples in Pre-processing of Imbalanced Data. In: Andreasen, T., Christiansen, H., Cubero, JC., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2014. Lecture Notes in Computer Science(), vol 8502. Springer, Cham. https://doi.org/10.1007/978-3-319-08326-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-08326-1_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08325-4
Online ISBN: 978-3-319-08326-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics