Instance Selection for Class Imbalanced Problems by Means of Selecting Instances More than Once
Although many more complex learning algorithms exist, k-nearest neighbor (k-NN) is still one of the most successful classifiers in real-world applications. One of the ways of scaling up the k-nearest neighbors classifier to deal with huge datasets is instance selection. Due to the constantly growing amount of data in almost any pattern recognition task, we need more efficient instance selection algorithms, which must achieve larger reductions while maintaining the accuracy of the selected subset.
However, most instance selection method do not work well in class imbalanced problems. Most algorithms tend to remove too many instances from the minority class. In this paper we present a way to improve instance selection for class imbalanced problems by allowing the algorithms to select instances more than once. In this way, the fewer instances of the minority can cover more portions of the space, and the same testing error of the standard approach can be obtained faster and with fewer instances. No other constraint is imposed on the instance selection method.
An extensive comparison using 40 datasets from the UCI Machine Learning Repository shows the usefulness of our approach compared with the established method of evolutionary instance selection. Our method is able to, in the worst case, match the error obtained by standard instance selection with a larger reduction and shorter execution time.
KeywordsMinority Class Instance Selection Multiple Instance Learning Imbalance Ratio Class Imbalanced Problem
Unable to display preview. Download preview PDF.
- 1.Baluja, S.: Population-based incremental learning. Technical Report CMU-CS-94-163, Carnegie Mellon University, Pittsburgh (1994)Google Scholar
- 3.Basri, R., Hassner, T., Zelnik-Manor, L.: Approximate nearest subspace search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1–13 (2010)Google Scholar
- 10.Eshelman, L.J.: The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination. Morgan Kauffman, San Mateo (1990)Google Scholar
- 11.Frank, A., Asuncion, A.: Uci machine learning repository (2010)Google Scholar
- 13.Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp. 148–156 (1996)Google Scholar
- 14.Fu, Z., Robles-Kelly, A., Zhou, J.: Milis: Multiple instance learning with instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence (in press, 2011)Google Scholar
- 26.Maudes-Raedo, J., Rodríguez-Díez, J.J., García-Osorio, C.: Disturbing neighbors diversity for decision forest. In: Valentini, G., Okun, O. (eds.) Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications (SUEMA 2008), Patras, Grecia, pp. 67–71 (July 2008)Google Scholar
- 29.Whitley, D.: The GENITOR algorithm and selective pressure. In: Proc 3rd International Conf. on Genetic Algorithms, pp. 116–121. Morgan Kaufmann Publishers, Los Altos (1989)Google Scholar