Applying Threshold SMOTE Algoritwith Attribute Bagging to Imbalanced Datasets

Wang, Jin; Yun, Bo; Huang, Pingli; Liu, Yu-Ao

doi:10.1007/978-3-642-41299-8_21

Jin Wang²⁴,
Bo Yun²⁴,
Pingli Huang²⁴ &
…
Yu-Ao Liu²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8171))

Included in the following conference series:

International Conference on Rough Sets and Knowledge Technology

1566 Accesses
8 Citations

Abstract

Synthetic minority over-sampling technique (SMOTE) is an effective over-sampling technique and specifically designed for learning from imbalanced data sets. However, in the process of synthetic sample generation, SMOTE is of some blindness. This paper proposes a novel approach for imbalanced problem, based on a combination of the Threshold SMOTE (TSMOTE) and the Attribute Bagging (AB) algorithms. TSMOTE takes full advantage of majority samples to adjust the neighbor selective strategy of SMOTE in order to control the quality of the new sample. Attribute Bagging, a famous ensemble learning algorithm, is also used to improve the predictive power of the classifier. A comprehensive suite of experiments tested on 7 imbalanced data sets collected from UCI machine learning repository is conducted. Experimental results show that TSMOTE-AB outperforms the SMOTE and other previously known algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chan, P., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In: 4th International Conference on Knowledge Discovery and Data Mining, pp. 164–168. AAAI Press (1998)
Google Scholar
Kubat, M., Holte, R.C., Matwin, S., Kohavi, R., Provost, F.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 195–215 (1998)
Google Scholar
Mazurowski, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Netw. 21(2-3), 427–436 (2008)
Article Google Scholar
Liu, Y.H., Chen, Y.T.: Total margin-based adaptive fuzzy support vector machines for multiview face recognition. In: Proc. IEEE Int. Conf. Syst., Man Cybern., vol. 2, pp. 1704–1711 (2005)
Google Scholar
Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: 7th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, New York, pp. 204–213 (2001)
Google Scholar
Wu, G., Chang, E.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)
Article Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Expl. Newslett. 6, 20–29 (2004)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Chawla, N.V., Japkowicz, N., Kolcz, A.(eds.): Special Issue Learning Imbalanced Datasets. SIGKDD Explor. Newsl. 6(1) (2004)
Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)
Article Google Scholar
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)
Article Google Scholar
Freund, Y., Schapire, R.: Experiments with a New Boosting Algorithm. In: 13th International Conference on Machine Learning, pp. 325–332 (1996)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learning 24, 123–140 (1996)
MathSciNet MATH Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, New York (2004)
Book Google Scholar
Bryll, R., Gutierrez-Osuna, R., Quek, F.: Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition 36(6), 1291–1302 (2003)
Article MATH Google Scholar
Wang, B.X., Japkowicz, N.: Imbalanced Data Set Learning with Synthetic Samples. In: Proc. IRIS Machine Learning Workshop (2004)
Google Scholar
Langley, P., Iba, W.: Average-case analysis of nearest neighbor algorithm. In: 13th International Joint Conference on Artificial Intelligence, pp. 889–894. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)
Chapter Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowl. Discovery 2(2), 121–167 (1998)
Article Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
Jin Wang, Bo Yun, Pingli Huang & Yu-Ao Liu

Authors

Jin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yun
View author publications
You can also search for this author in PubMed Google Scholar
Pingli Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Ao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Saint Mary’s University, B3H 3C3, Halifax, NS, Canada
Pawan Lingras
Maria Curie-Skłodowska University, Lublin, Poland
Marcin Wolski
University of Granada, Spain
Chris Cornelis
Indian Statistical Institute, 700108, Kolkata, India
Sushmita Mitra
University of Warsaw, 02-097, Warsaw, Poland
Piotr Wasilewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Yun, B., Huang, P., Liu, YA. (2013). Applying Threshold SMOTE Algoritwith Attribute Bagging to Imbalanced Datasets. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds) Rough Sets and Knowledge Technology. RSKT 2013. Lecture Notes in Computer Science(), vol 8171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41299-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-41299-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41298-1
Online ISBN: 978-3-642-41299-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics