A New Training Sample Selection Method Avoiding Over-Fitting Based on Nearest Neighbor Rule
- 423 Downloads
Sample selection is an important task. Now, there are many sample selecting methods using nearest neighbor rule. But most of them never consider the over-fitting problem. For overcoming this disadvantage, this paper gives a new sample selecting method. This method uses pruning tactics and cross-validation to avoid over-fitting. It divided the original sample set to some disjoint subsets. Every time, a subset is used as validation sample set to prune samples selected from other subsets. All the subsets take turns as validation set. And the final result was gotten by combining all the selected sample sets. The experiments show that, compared with the existing methods, the new method can get smaller selected sample set and better classifiers can be trained on its selected samples.
KeywordsSample selection Nearest neighbor rule Over-fitting Cross-validation
The Project Supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2016JQ6078), and the Fundamental Research Funds for the Central Universities of Chang’an University (300102328107, 0009—2014G6114024).
- 6.Angiulli, F.: Fast condensed nearest neighbor rule. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 25–32 (2005)Google Scholar
- 7.Chou, C.-H., Kuo, B.-H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th International Conference on Pattern Recognition, vol. 02, pp. 556–559 (2006)Google Scholar
- 8.Sogaard, A.: Semisupervised condensed nearest neighbor for part-of-speech tagging. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 48–52 (2011)Google Scholar
- 10.Karkkainen, T.: On cross-validation for MLP model evaluation. Lecture Notes in Computer Science, vol. 8621, pp. 291–300 (2014)Google Scholar
- 14.Witten, I.H., Frank, E., Hall, A.M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)Google Scholar