SCIS: Combining Instance Selection Methods to Increase Their Effectiveness over a Wide Range of Domains
Instance selection is a feasible strategy to solve the problem of dealing with large databases in inductive learning. There are several proposals in this area, but none of them consistently outperforms the others over a wide range of domains. In this paper we present a set of measures to characterize the databases, as well as a new algorithm that uses these measures and, depending on the data characteristics, it applies the method or combination of methods expected to produce the best results. This approach was evaluated over 20 databases and with six different learning paradigms. The results have been compared with those achieved by five well-known state-of-the-art methods.
KeywordsInstance selection data reduction machine learning
Unable to display preview. Download preview PDF.
- 8.Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6(1), 37–66 (1991)Google Scholar
- 10.Zhao, K., Zhou, S., Guan, J., Zhou, A.: C-pruner: An Improved Instance Pruning Algorithm. In: Int. Conf. on Machine Learning & Cybernetics, 2003, vol. 1, pp. 94–99 (2003)Google Scholar
- 14.Quinlan, J.R.: C4.5: Program for Machine Learning. M. Kaufman, S. Mateo (1993)Google Scholar
- 16.UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
- 18.John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: 11th Conf. on Uncertainty in AI, pp. 338–345. Morgan Kaufmann, San Mateo (1995)Google Scholar
- 19.Frank, E., Witten, I.: Generating Accurate Rule Sets without Global Optimization. In: 15th Int. Conf. on Machine Learning, pp. 144–151. Morgan Kaufmann, San Francisco (1998)Google Scholar
- 20.Platt, J.: Fast Training of Support Vector Machines Using SMO. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Adv. in Kernel Methods, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar