Classification on Imbalanced Data Sets, Taking Advantage of Errors to Improve Performance

López-Chau, Asdrúbal; García-Lamont, Farid; Cervantes, Jair

doi:10.1007/978-3-319-22053-6_8

Asdrúbal López-Chau⁶,
Farid García-Lamont⁷ &
Jair Cervantes⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9227))

Included in the following conference series:

International Conference on Intelligent Computing

2984 Accesses

Abstract

Classification methods usually exhibit a poor performance when they are applied on imbalanced data sets. In order to overcome this problem, some algorithms have been proposed in the last decade. Most of them generate synthetic instances in order to balance data sets, regardless the classification algorithm. These methods work reasonably well in most cases; however, they tend to cause over-fitting.

In this paper, we propose a method to face the imbalance problem. Our approach, which is very simple to implement, works in two phases; the first one detects instances that are difficult to predict correctly for classification methods. These instances are then categorized into “noisy” and “secure”, where the former refers to those instances whose most of their nearest neighbors belong to the opposite class. The second phase of our method, consists in generating a number of synthetic instances for each one of those that are difficult to predict correctly. After applying our method to data sets, the AUC area of classifiers is improved dramatically. We compare our method with others of the state-of-the-art, using more than 10 data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://sci2s.ugr.es/keel/datasets.php.

References

Esfandiari, N., Babavalian, M.R., Moghadam, A.-M.E., Tabar, V.K.: Review: knowledge discovery in medicine: current issue and future trend. Expert Syst. Appl. 41(9), 4434–4463 (2014)
Article Google Scholar
García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Syst. 25(1), 13–21 (2012). Special Issue on New Trends in Data Mining
Article Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Hilas, C.S., Mastorocostas, P.A.: An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl. Based Syst. 21(7), 721–726 (2008)
Article Google Scholar
Lemnaru, C., Potolea, R.: Imbalanced classification problems: systematic study, issues and best practices. In: Zhang, R., Zhang, J., Zhang, Z., Filipe, J., Cordeiro, J. (eds.) ICEIS 2011. LNBIP, vol. 102, pp. 35–50. Springer, Heidelberg (2012)
Chapter Google Scholar
Sheng, V.S., Gu, B., Fang, W., Wu, J.: Cost-sensitive learning for defect escalation. Knowl. Based Syst. 66, 146–155 (2014)
Article Google Scholar
Sun, J., Li, H., Huang, Q.-H., He, K.-Y.: Predicting financial distress and corporate failure: a review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowl. Based Syst. 57, 41–56 (2014)
Article MATH Google Scholar
Tomasev, N., Mladenic, D.: Class imbalance and the curse of minority hubs. Knowl. Based Syst. 53, 157–172 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Centro Universitario UAEM, Universidad Autónoma Del Estado de México, CP 55600, Zumpango, Estado de Mexico, México
Asdrúbal López-Chau
Centro Universitario UAEM, Universidad Autónoma Del Estado de México, 56159, Texcoco, Estado de Mexico, México
Farid García-Lamont & Jair Cervantes

Authors

Asdrúbal López-Chau
View author publications
You can also search for this author in PubMed Google Scholar
Farid García-Lamont
View author publications
You can also search for this author in PubMed Google Scholar
Jair Cervantes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asdrúbal López-Chau .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Incheon, Korea (Republic of)
Kyungsook Han

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

López-Chau, A., García-Lamont, F., Cervantes, J. (2015). Classification on Imbalanced Data Sets, Taking Advantage of Errors to Improve Performance. In: Huang, DS., Han, K. (eds) Advanced Intelligent Computing Theories and Applications. ICIC 2015. Lecture Notes in Computer Science(), vol 9227. Springer, Cham. https://doi.org/10.1007/978-3-319-22053-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-22053-6_8
Published: 13 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22052-9
Online ISBN: 978-3-319-22053-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics