Skip to main content

Applying Threshold SMOTE Algoritwith Attribute Bagging to Imbalanced Datasets

  • Conference paper
Rough Sets and Knowledge Technology (RSKT 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8171))

Included in the following conference series:

Abstract

Synthetic minority over-sampling technique (SMOTE) is an effective over-sampling technique and specifically designed for learning from imbalanced data sets. However, in the process of synthetic sample generation, SMOTE is of some blindness. This paper proposes a novel approach for imbalanced problem, based on a combination of the Threshold SMOTE (TSMOTE) and the Attribute Bagging (AB) algorithms. TSMOTE takes full advantage of majority samples to adjust the neighbor selective strategy of SMOTE in order to control the quality of the new sample. Attribute Bagging, a famous ensemble learning algorithm, is also used to improve the predictive power of the classifier. A comprehensive suite of experiments tested on 7 imbalanced data sets collected from UCI machine learning repository is conducted. Experimental results show that TSMOTE-AB outperforms the SMOTE and other previously known algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chan, P., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In: 4th International Conference on Knowledge Discovery and Data Mining, pp. 164–168. AAAI Press (1998)

    Google Scholar 

  2. Kubat, M., Holte, R.C., Matwin, S., Kohavi, R., Provost, F.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 195–215 (1998)

    Google Scholar 

  3. Mazurowski, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Netw. 21(2-3), 427–436 (2008)

    Article  Google Scholar 

  4. Liu, Y.H., Chen, Y.T.: Total margin-based adaptive fuzzy support vector machines for multiview face recognition. In: Proc. IEEE Int. Conf. Syst., Man Cybern., vol. 2, pp. 1704–1711 (2005)

    Google Scholar 

  5. Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: 7th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, New York, pp. 204–213 (2001)

    Google Scholar 

  6. Wu, G., Chang, E.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)

    Article  Google Scholar 

  7. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Expl. Newslett. 6, 20–29 (2004)

    Article  Google Scholar 

  8. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  9. Chawla, N.V., Japkowicz, N., Kolcz, A.(eds.): Special Issue Learning Imbalanced Datasets. SIGKDD Explor. Newsl. 6(1) (2004)

    Google Scholar 

  10. Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)

    Article  Google Scholar 

  11. Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)

    Article  Google Scholar 

  12. Freund, Y., Schapire, R.: Experiments with a New Boosting Algorithm. In: 13th International Conference on Machine Learning, pp. 325–332 (1996)

    Google Scholar 

  13. Breiman, L.: Bagging predictors. Mach. Learning 24, 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  14. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, New York (2004)

    Book  Google Scholar 

  15. Bryll, R., Gutierrez-Osuna, R., Quek, F.: Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition 36(6), 1291–1302 (2003)

    Article  MATH  Google Scholar 

  16. Wang, B.X., Japkowicz, N.: Imbalanced Data Set Learning with Synthetic Samples. In: Proc. IRIS Machine Learning Workshop (2004)

    Google Scholar 

  17. Langley, P., Iba, W.: Average-case analysis of nearest neighbor algorithm. In: 13th International Joint Conference on Artificial Intelligence, pp. 889–894. Morgan Kaufmann Publishers, San Francisco (1993)

    Google Scholar 

  18. Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowl. Discovery 2(2), 121–167 (1998)

    Article  Google Scholar 

  20. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  21. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, J., Yun, B., Huang, P., Liu, YA. (2013). Applying Threshold SMOTE Algoritwith Attribute Bagging to Imbalanced Datasets. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds) Rough Sets and Knowledge Technology. RSKT 2013. Lecture Notes in Computer Science(), vol 8171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41299-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41299-8_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41298-1

  • Online ISBN: 978-3-642-41299-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics