Abstract
In classification, when the distribution of the training data among classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. The features in the minority classes are normally difficult to be fully recognized. In this paper, a method is proposed to enhance the classification accuracy for the minority classes. The proposed method combines Synthetic Minority Over-sampling Technique (SMOTE) and Complementary Neural Network (CMTNN) to handle the problem of classifying imbalanced data. In order to demonstrate that the proposed technique can assist classification of imbalanced data, several classification algorithms have been used. They are Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). The benchmark data sets with various ratios between the minority class and the majority class are obtained from the University of California Irvine (UCI) machine learning repository. The results show that the proposed combination techniques can improve the performance for the class imbalance problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter 6, 20–29 (2004)
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, p. 63. Springer, Heidelberg (2001)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Gu, Q., Cai, Z., Zhu, L., Huang, B.: Data mining on imbalanced data sets. In: International Conference on Advanced Computer Theory and Engineering, ICACTE 2008, pp. 1020–1024 (2008)
Gedeon, T.D., Wong, P.M., Harris, D.: Balancing bias and variance: Network topology and pattern set reduction techniques. In: Sandoval, F., Mira, J. (eds.) IWANN 1995. LNCS, vol. 930, pp. 551–558. Springer, Heidelberg (1995)
Tomek, I.: Two Modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics 6, 769–772 (1976)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited Data. IEEE Transactions on Systems, Man and Cybernetics 2, 408–421 (1972)
Gedeon, T.D., Bowden, T.G.: Heuristic pattern reduction. In: International Joint Conference on Neural Networks, Beijing, vol. 2, pp. 449–453 (1992)
Barandela, R., Sanchez, J.S., Garcia, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)
Kraipeerapun, P., Fung, C.C., Nakkrasae, S.: Porosity prediction using bagging of complementary neural networks. In: Yu, W., He, H., Zhang, N. (eds.) ISNN 2009. LNCS, vol. 5551, pp. 175–184. Springer, Heidelberg (2009)
Kraipeerapun, P., Fung, C.C.: Binary classification using ensemble neural networks and interval neutrosophic sets. Neurocomput. 72, 2845–2856 (2009)
Jeatrakul, P., Wong, K.W., Fung, C.C.: Data cleaning for classification using misclassification analysis. Journal of Advanced Computational Intelligence and Intelligent Informatics 14(3), 297–302 (2010)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jeatrakul, P., Wong, K.W., Fung, C.C. (2010). Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Models and Applications. ICONIP 2010. Lecture Notes in Computer Science, vol 6444. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17534-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-17534-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17533-6
Online ISBN: 978-3-642-17534-3
eBook Packages: Computer ScienceComputer Science (R0)