Advertisement

A Measure Oriented Training Scheme for Imbalanced Classification Problems

  • Bo Yuan
  • Wenhuang Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7104)

Abstract

Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and biased, it is commonly evaluated by measures such as G-mean and ROC (Receiver Operating Characteristic) curves. However, for many classifiers, the learning process is still largely driven by error based objective functions. As a result, there is clearly a gap between the measure according to which the classifier is to be evaluated and how the classifier is trained. This paper investigates the possibility of directly using the measure itself to search the hypothesis space to improve the performance of classifiers. Experimental results on three standard benchmark problems and a real-world problem show that the proposed method is effective in comparison with commonly used sampling techniques.

Keywords

Imbalanced Datasets Neural Networks ROC G-Mean SMOTE 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bhowan, U., Zhang, M.J., Johnston, M.: Multi-Objective Genetic Programming for Classification with Unbalanced Data. In: Twenty-Second Australasian Conference on Artificial Intelligence, pp. 370–380 (2009)Google Scholar
  2. 2.
    Chawla, N.V.: Data Mining for Imbalanced Datasets: An Overview. In: Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, pp. 853–867. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  4. 4.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling Wine Preferences by Data Mining from Physicochemical Properties. Decision Support Systems 47(4), 547–553 (2009)CrossRefGoogle Scholar
  6. 6.
    Ducange, P., Lazzerini, B., Marcelloni, F.: Multi-Objective Genetic Fuzzy Classifiers for Imbalanced and Cost-Sensitive Datasets. Soft Computing 14(7), 713–728 (2010)CrossRefGoogle Scholar
  7. 7.
    Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: Misclassification Cost-Sensitive Boosting. In: Sixteenth International Conference on Machine Learning, pp. 97–105. Morgan Kaufmann (1999)Google Scholar
  8. 8.
    Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Thirteenth International Conference on Machine Learning, pp. 148–156 (1996)Google Scholar
  9. 9.
    García, S., Aler, R., Galván, I.M.: Using Evolutionary Multiobjective Techniques for Imbalanced Classification Data. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6352, pp. 422–427. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley (1989)Google Scholar
  11. 11.
    Han, S.L., Yuan, B., Liu, W.H.: Rare Class Mining: Progress and Prospect. In: 2009 Chinese Conference on Pattern Recognition, pp. 137–141. IEEE Press (2009)Google Scholar
  12. 12.
    Hoens, T.R., Chawla, N.V.: Generating Diverse Ensembles to Counter the Problem of Class Imbalance. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 488–499. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Horton, P., Nakai, K.: A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins. In: Fourth International Conference on Intelligent Systems for Molecular Biology, pp. 109–115 (1996)Google Scholar
  14. 14.
    Jin, Y.C., Sendhoff, B.: Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies. IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and Reviews 38(3), 397–415 (2008)CrossRefGoogle Scholar
  15. 15.
    Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One Sided Selection. In: Fourteenth Interactional Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)Google Scholar
  16. 16.
    Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory Under-Sampling for Class-Imbalance Learning. In: Sixth International Conference on Data Mining, pp. 965–969 (2006)Google Scholar
  17. 17.
    Mangasarian, O.L., Setiono, R., Wolberg, W.H.: Pattern Recognition via Linear Programming: Theory and Application to Medical Diagnosis. In: Coleman, T.F., Li, Y. (eds.) Large-Scale Numerical Optimization, pp. 22–30. SIAM Publications (1990)Google Scholar
  18. 18.
    Qu, X.Y., Yuan, B., Liu, W.H.: A Predictive Model for Identifying Possible MCI to AD Conversions in the ADNI Database. In: Second International Symposium on Knowledge Acquisition and Modeling, vol. 3, pp. 102–105. IEEE Press (2009)Google Scholar
  19. 19.
    UCI Machine Learning Repository, http://archive.ics.uci.edu/ml
  20. 20.
    Yao, X.: Evolving Artificial Neural Networks. Proceedings of the IEEE 87(9), 1423–1447 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Bo Yuan
    • 1
  • Wenhuang Liu
    • 1
  1. 1.Intelligent Computing Lab, Division of Informatics, Graduate School at ShenzhenTsinghua UniversityShenzhenP.R. China

Personalised recommendations