An Effective Sampling Strategy for Ensemble Learning with Imbalanced Data

Zhang, Chen; Zhang, Xiaolong

doi:10.1007/978-3-319-63315-2_33

Chen Zhang^17,18 &
Xiaolong Zhang^17,18

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10363))

Included in the following conference series:

International Conference on Intelligent Computing

2396 Accesses
1 Citations

Abstract

Classification of imbalanced datasets is one of the challenges in machine learning and data mining domains. The traditional classifiers still need to handle with minority instances. In this paper, we propose an effective method which applies sampling method based on ensemble learning. It uses Adaboost-SVM based on spectral clustering to boost the performance. This method also uses over-sampling and under-sampling methods based on the misclassified instances got by ensemble learning. Compared with the preview algorithms, the experiment results show that the proposed method is effective in dealing with imbalanced data in binary classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. J. Acm SIGKDD Explor. Newslett. 6, 1–6 (2004)
Article Google Scholar
Gao, J.W., Liang, J.Y.: Research and advancement of classification method of imbalanced data sets. J. Comput. Sci. 35, 10–13 (2008)
Google Scholar
Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. J. Pattern Recogn. 45, 3738–3750 (2012)
Article Google Scholar
Chawla, N.V., Cieslak, D.A., Hall, L.O.: Automatically countering imbalance and its empirical relationship to cost. J. Data Mining Knowl. Discov. 17, 225–252 (2008)
Article MathSciNet Google Scholar
Sun, Z., Song, Q., Zhu, X.: A novel ensemble method for classifying imbalanced data. J. Pattern Recogn. 48, 1623–1637 (2015)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. J. Lect. Notes Comput. Sci. 3644, 878–887 (2005)
Article Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv. Knowl. Discov. Data Mining 5476, 475–482 (2009)
Article Google Scholar
Fan, W., Stolfo, S.J, Zhang, J.: AdaCost: misclassification cost-sensitive boosting. In: Sixteenth International Conference on Machine Learning, pp. 97–105 . Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Lertampaiporn, S., Thammarongtham, C., Nukoolkit, C.: Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification. J. Nucleic Acids Res. 41, e21 (2013)
Article Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O.: Smoteboost: improving prediction of the minority class in boosting. J. Lect. Notes Comput. Sci. 2838, 107–119 (2003)
Article Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V.: Rusboost: a hybrid approach to alleviating class imbalance. J IEEE Trans. Syst. Man Cybern. 40, 185–197 (2010)
Article Google Scholar
Wang, C., Hongye, S.U., Yu, Q.U.: Imbalanced data sets classification method based on over-sampling technique. J. Comput. Eng. Appl. 47, 139–143 (2011)
Google Scholar
Li, X.F., Li, J., Dong, Y.F.: A new learning algorithm for imbalanced data—pcboost. J. Chinese J. Comput. 2, 202–209 (2012)
Article MathSciNet Google Scholar
Sobhani, P., Viktor, H., Matwin, S.: Learning from imbalanced data using ensemble methods and cluster-based undersampling. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2014. LNCS, vol. 8983, pp. 69–83. Springer, Cham (2015). doi:10.1007/978-3-319-17876-9_5
Google Scholar
Sun, Z., Song, Q., Zhu, X.: Using coding-based ensemble learning to improve software defect prediction. J. IEEE Trans. Syst. Man Cybern. Part C 42, 1806–1817 (2012)
Article Google Scholar
Schapire, R.E.: The strength of weak learnability. J. Mach. Learn. 5, 197–227 (1990)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1999)
Article MATH Google Scholar
Li, X., Wang, L., Sung, E.: Adaboost with SVM-based component classifiers. J. Eng. Appl. Artif. Intell. 21, 785–795 (2008)
Article Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. J. Pattern Recogn. 30, 1145–1159 (1997)
Article Google Scholar
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. J. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)
Article Google Scholar
Luxburg, U.V., Belkin, M., Bousquet, O.: Consistency of spectral clustering. J. Ann. Stat. 36, 555–586 (2008)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgement

This work was supported in part by National Natural Science Foundation of China (61273225, 61273303, 61373109), the Program for Outstanding Young Science and Technology Innovation Teams in Higher Education Institutions of Hubei Province (No. T201202), and the Program of Wuhan Subject Chief Scientist (201150530152), as well as National “Twelfth Five-Year” Plan for Science & Technology Support (2012BAC22B01).

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, China
Chen Zhang & Xiaolong Zhang
Intelligent Information Processing and Real-Time Industrial Systems Hubei Province Key Laboratory, Wuhan, 430065, China
Chen Zhang & Xiaolong Zhang

Authors

Chen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaolong Zhang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Liverpool John Moores University, Liverpool, United Kingdom
Abir Hussain
Inha University, Incheon, Korea (Republic of)
Kyungsook Han
Indian Institute of Technology Madras, Chennai, India
M. Michael Gromiha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Zhang, X. (2017). An Effective Sampling Strategy for Ensemble Learning with Imbalanced Data. In: Huang, DS., Hussain, A., Han, K., Gromiha, M. (eds) Intelligent Computing Methodologies. ICIC 2017. Lecture Notes in Computer Science(), vol 10363. Springer, Cham. https://doi.org/10.1007/978-3-319-63315-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-63315-2_33
Published: 21 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63314-5
Online ISBN: 978-3-319-63315-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics