, Volume 2, Issue 1, pp 29-44

Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections

Abstract

It has repeatedly been shown that most classification methods suffer from an imbalanced distribution of training instances among classes. Most learning algorithms expect an approximately even distribution of instances among the different classes and suffer, to different degrees, when that is not the case. Dealing with the class-imbalance problem is a difficult but relevant task, as many of the most interesting and challenging real-world problems have a very uneven class distribution. In this paper we present a new approach for dealing with class-imbalanced datasets based on a new boosting method for the construction of ensembles of classifiers. The approach is based on using the distribution of the weights given by a given boosting algorithm for obtaining a supervised projection. Then, the supervised projection is used to train the next classifier using a uniform distribution of the training instances. We tested our method using 35 class-imbalanced datasets and two different base classifiers: a decision tree and a support vector machine. The proposed methodology proved its usefulness achieving better accuracy than other methods both in terms of the geometric mean of specificity and sensibility and the area under the ROC curve.