Advertisement

Frontiers of Computer Science

, Volume 6, Issue 5, pp 489–497 | Cite as

Measure oriented training: a targeted approach to imbalanced classification problems

  • Bo Yuan
  • Wenhuang Liu
Research Article
  • 104 Downloads

Abstract

Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and biased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques including sampling and cost sensitive learning are often employed to improve the performance of classifiers in such situations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between themeasure according to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard threelayer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently favorable outcomes in comparison with a commonly used sampling technique. The effectiveness of multi-objective optimization in handling imbalanced problems is also demonstrated.

Keywords

imbalanced datasets genetic algorithms (GAs) neural networks G-mean synthetic minority over-sampling technique (SMOTE) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chawla N V. Data mining for imbalanced datasets: an overview. In: Maimon O, Rokach L, eds. Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers. New York: Springer, 2005, 853–867CrossRefGoogle Scholar
  2. 2.
    Han S, Yuan B, Liu W. Rare class mining: progress and prospect. In: Proceedings of the 2009 Chinese Conference on Pattern Recognition. 2009, 137–141Google Scholar
  3. 3.
    Qu X, Yuan B, Liu W. A predictive model for identifying possible MCI to AD conversions in the ADNI database. In: Proceeding of the 2nd International Symposium on Knowledge Acquisition and Modeling, Vol 3. 2009, 102–105CrossRefGoogle Scholar
  4. 4.
    Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning. 1996, 148–156Google Scholar
  5. 5.
    Chawla N V, Lazarevic A, Hall L O, Bowyer K W. SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. 2003, 107–119Google Scholar
  6. 6.
    Fan W, Stolfo S J, Zhang J, Chan P K. AdaCost: misclassification costsensitive boosting. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 97–105Google Scholar
  7. 7.
    Hoens T R, Chawla N V. Generating diverse ensembles to counter the problem of class imbalance. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Part II. 2010, 488–499Google Scholar
  8. 8.
    Yuan B, Liu W. A measure oriented training scheme for imbalanced classification problems. In: Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining Workshop on Biologically Inspired Techniques for Data Mining. 2011, 293–303Google Scholar
  9. 9.
    Kubat M, Matwin S. Addressing the curse of imbalanced training sets: one sided selection. In: Proceedings of the 14th Interactional Conference on Machine Learning. 1997, 179–186Google Scholar
  10. 10.
    Liu X, Wu J, Zhou Z. Exploratory under-sampling for class-imbalance learning. In: Proceedings of the 6th International Conference on Data Mining. 2006, 965–969Google Scholar
  11. 11.
    Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 2002, 16(1): 321–357zbMATHGoogle Scholar
  12. 12.
    Yao X. Evolving artificial neural networks. Proceedings of the IEEE, 1999, 87(9): 1423–1447CrossRefGoogle Scholar
  13. 13.
    Goldberg D. Genetic Algorithms in Search, Optimization, and Machine Learning. Boston: Addison Wesley, 1989zbMATHGoogle Scholar
  14. 14.
    Frank A, Asuncion A. UCI machine learning repository. 2010, http://archive.ics.uci.edu/ml
  15. 15.
    Mangasarian O L, Setiono R, Wolberg W H. Pattern recognition via linear programming: theory and application to medical diagnosis. In: Coleman T F, Li Y, eds. Large-Scale Numerical Optimization. 1990, 22–30Google Scholar
  16. 16.
    Cortez P, Cerdeira A, Almeida F, Matos T, Reis J. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 2009, 47(4): 547–553CrossRefGoogle Scholar
  17. 17.
    Horton P, Nakai K. A probabilistic classification system for predicting the cellular localization sites of proteins. In: Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology. 1996, 109–115Google Scholar
  18. 18.
    Jin Y, Sendhoff B. Pareto-based multiobjective machine learning: an overview and case studies. IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews, 2008, 38(3): 397–415CrossRefGoogle Scholar
  19. 19.
    Bhowan U, Zhang M, Johnston M. Multi-objective genetic programming for classification with unbalanced data. In: Proceedings of the 22nd Australasian Conference on Artificial Intelligence. 2009, 370–380Google Scholar
  20. 20.
    Ducange P, Lazzerini B, Marcelloni F. Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft Computing, 2010, 14(7): 713–728CrossRefGoogle Scholar
  21. 21.
    García S, Aler R, Galván I. Using evolutionary multiobjective techniques for imbalanced classification data. In: Proceedings of the 20th International Conference on Artificial Neural Networks. 2010, 422–427Google Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Division of Informatics, Graduate School at ShenzhenTsinghua UniversityShenzhenChina

Personalised recommendations