A New Performance Evaluation Method for Two-Class Imbalanced Problems
Conference paper
Abstract
In this paper, we introduce a new approach to evaluate and visualize the classifier performance in two-class imbalanced domains. This method defines a two-dimensional space by combining the geometric mean of class accuracies and a new metric that gives an indication of how balanced they are. A given point in this space represents a certain trade-off between those two measures, which will be expressed as a trapezoidal function. Besides, this evaluation function has the interesting property that it allows to emphasize the correct predictions on the minority class, which is often considered as the most important class. Experiments demonstrate the consistency and validity of the evaluation method here proposed.
Keywords
Imbalance performance measure learning Download
to read the full conference paper text
References
- 1.Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Proc. XVth European Conference on Machine Learning (ECML 2004), Pisa, Italy, pp. 39–50 (2004)Google Scholar
- 2.Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)CrossRefGoogle Scholar
- 3.Batista, G.E., Prati, R.C., Monard, M.C.: Balancing Strategies and Class Overlapping. In: Proc. 6th Intl. Symposium on Intelligent Data Analysis, Madrid, Spain, pp. 24–35 (2005)Google Scholar
- 4.Chawla, N.V., Bowyer, K.W., Hall, L., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)MATHGoogle Scholar
- 5.Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting. In: Proc. 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases, Cavtat–Dubrovnik, Croatia, pp. 107–119 (2003)Google Scholar
- 6.Daskalaki, S., Kopanas, I., Avouris, N.: Evaluation of classifiers for an uneven class distribution problem. Applied Artificial Intelligence 20, 381–417 (2006)CrossRefGoogle Scholar
- 7.Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, pp. 155–164 (1999)Google Scholar
- 8.Drummond, C., Holte, R.C.: Explicitly representing expected cost: an alternative to ROC representation. In: Proc. 6th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, Boston, MA, pp. 198–207 (2000)Google Scholar
- 9.García, V., Mollineda, R.A., Sánchez, J.S., Alejo, R., Sotoca, J.M.: When overlapping unexpectedly alters the class imbalance effects. In: Pattern Recognition and Image Analysis, pp. 499–506 (2007)Google Scholar
- 10.Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Proc. Intl. Conf. on Intelligent Computing, Hefei, China, pp. 878–887 (2005)Google Scholar
- 11.Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. on Knowledge and Data Engineering 17, 299–310 (2005)CrossRefGoogle Scholar
- 12.Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intelligent Data Analysis 6, 40–49 (2002)MATHGoogle Scholar
- 13.Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explorations 6, 40–49 (2004)CrossRefGoogle Scholar
- 14.Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proc. 14th Intl. Conf. on Machine Learning, Nashville, TN, pp. 179–186 (1997)Google Scholar
- 15.Liu, X.Y., Zhou, Z.H.: The influence of class imbalance on cost-sensitive learning: an empirical study. In: Proc. 6th Intl. Conf. on Data Mining, pp. 970–974 (2006)Google Scholar
- 16.Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknow. In: ICML–2003 Workshop on Learning from Imbalanced Data Sets II (2003)Google Scholar
- 17.Okamoto, S., Yugami, N.: Effects of domain characteristics on instance-based learning algorithms. Theoretical Computer Science 298, 207–233 (2003)MathSciNetCrossRefMATHGoogle Scholar
- 18.Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proc. 3rd Mexican Intl. Conf. on Artificial Intelligence, Mexico City, Mexico, pp. 312–321 (2004)Google Scholar
- 19.Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proc. 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Newport Beach, CA, pp. 43–48 (1997)Google Scholar
- 20.Raskutti, B., Kowalczyk, A.: Extreme rebalancing for svms: a case study. SIGKDD Explorations 6, 60–69 (2004)CrossRefGoogle Scholar
- 21.Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
- 22.Yen, S.H., Lee, Y.S., Lin, C.H., Ying, J.C.: Investigating the effect of sampling methods for imbalanced data distributions. In: Proc. IEEE Intl. Conf. on Systems, Man, and Cybernetics, Taipei, Taiwan, pp. 4163–4168 (2006)Google Scholar
- 23.Zhang, J., Srihari, R.K.: kNN approach to unbalanced data distributions: a case study involving information extraction. In: ICML 2003 Workshop on Learning from Imbalanced Data Sets II (2003)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2008