Skip to main content

Clustering and Combined Sampling Approaches for Multi-class Imbalanced Data Classification

  • Conference paper
  • First Online:
Advances in Information Technology and Industry Applications

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 136))

Abstract

In this paper, we introduce KSMOTE, a new classification technique, that combines k-means [7] with SMOTE [4]. KSMOTE improves the performances of multi-class learning from an imbalanced dataset. K-means is used to split the set of instances into two clusters. For each cluster, two types of sampling methods are used: oversampling and undersampling. Then, Random forests learner [3] is applied for class prediction within a cluster. Finally, the prediction is obtained by combining the results from both clusters through a majority vote. For our experiments, we used 4 multi-class datasets from the UCI machine learning repository [2] with varying levels of imbalance data. KSMOTE is compared with SMOTE and two popular multi-class modeling approaches, OAA and OAO. The experimental results show that our approach achieves high performance rates in learning from imbalanced multi-class problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anand, R., Mehrotra, K., Mohan, C.K., Ranka, S.: Efficient classification for multiclass problems using modular neural networks. IEEE Transactions on Neural Networks 6(1), 117–124 (1995)

    Article  Google Scholar 

  2. Arthur Asuncion, D.N.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/datasets.html

  3. Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (1999)

    Article  Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)

    MATH  Google Scholar 

  5. Chen, S., He, H., Garcia, E.A.: RAMOBoost: Ranked minority oversampling in boosting. IEEE Transactions on Neural Networks 21(10), 1624–1642 (2010)

    Article  Google Scholar 

  6. Fernández, A., del Jesus, M.J., Herrera, F.: Multi-class Imbalanced Data-Sets with Linguistic Fuzzy Rule Based Classification Systems Based on Pairwise Learning. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS, vol. 6178, pp. 89–98. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–780 (1965)

    Google Scholar 

  8. Ghanem, A.S., Venkatesh, S., West, G.: Multi-class pattern classification in imbalanced data. In: Proceedings of the 2010 20th International Conference on Pattern Recognition (2010)

    Google Scholar 

  9. Hand, D.J., Till, R.J.: A simple generalisation of the Area Under the ROC Curve for multiple class classification problems. Machine Learning 45(2), 171–186 (2001)

    Article  MATH  Google Scholar 

  10. Hastie, T., Tibshirani, R.: Classification by Pairwise Coupling 26(2), 451–471 (1998)

    Google Scholar 

  11. Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. on Knowledge. and Data Eng. 17(3), 299–310 (2005)

    Article  Google Scholar 

  12. Lorena, A., de Carvalho, A., Gama, J.: A review on the combination of binary classifiers in multiclass problems. Artificial Intelligence Review 30(1), 19–37 (2008)

    Article  Google Scholar 

  13. Orriols-Puig, A., Bernadó-Mansilla, E.: Evolutionary rule-based systems for imbalanced data sets. Soft Computing - A Fusion of Foundations, Methodologies and Applications 13(3), 213–225 (2009)

    Google Scholar 

  14. Vural, V., Dy, J.G.: A hierarchical method for multi-class support vector machines. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)

    Google Scholar 

  15. Wasikowski, M., Chen, X.-W.: Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering 22, 1388–1400 (2010)

    Article  Google Scholar 

  16. Witten, I.H., Frank, E., Hall, M.A.: Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  17. Yen, S.-J., Lee, Y.-S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36(3), 5718–5727 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanthanee Prachuabsupakij .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Prachuabsupakij, W., Soonthornphisaj, N. (2012). Clustering and Combined Sampling Approaches for Multi-class Imbalanced Data Classification. In: Zeng, D. (eds) Advances in Information Technology and Industry Applications. Lecture Notes in Electrical Engineering, vol 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-26001-8_91

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-26001-8_91

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-26000-1

  • Online ISBN: 978-3-642-26001-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics