Abstract
Sample selection is an important task. Now, there are many sample selecting methods using nearest neighbor rule. But most of them never consider the over-fitting problem. For overcoming this disadvantage, this paper gives a new sample selecting method. This method uses pruning tactics and cross-validation to avoid over-fitting. It divided the original sample set to some disjoint subsets. Every time, a subset is used as validation sample set to prune samples selected from other subsets. All the subsets take turns as validation set. And the final result was gotten by combining all the selected sample sets. The experiments show that, compared with the existing methods, the new method can get smaller selected sample set and better classifiers can be trained on its selected samples.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Zhou, X., Jiang, W., Tian, Y., Shi, Y.: Kernel subclass convex hull sample selection method for SVM on face recognition. Neurocomputing 73(10–12), 2234–2246 (2010)
He, Q., Xie, Z., Hu, Q., Wu, C.: Neighborhood based sample and feature selection for SVM classification learning. Neurocomputing 74(10), 1585–1594 (2011)
Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14(3), 515–516 (1968)
Hao, H., Jiang, R.: Training sample selection method for neural networks based on nearest neighbor rule. Acta Autom. Sin. 33(12), 1247–1251 (2007)
Cerveron, V., Ferri, F.J.: Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbor rule. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 31(3), 408–413 (2001)
Angiulli, F.: Fast condensed nearest neighbor rule. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 25–32 (2005)
Chou, C.-H., Kuo, B.-H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th International Conference on Pattern Recognition, vol. 02, pp. 556–559 (2006)
Sogaard, A.: Semisupervised condensed nearest neighbor for part-of-speech tagging. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 48–52 (2011)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Karkkainen, T.: On cross-validation for MLP model evaluation. Lecture Notes in Computer Science, vol. 8621, pp. 291–300 (2014)
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2010)
Escobar, J.W., Linfati, R., Toth, P., Baldoquin, M.G.: A hybrid granular tabu search algorithm for the multi-depot vehicle routing problem. J. Heuristics. 20(5), 483–509 (2014)
Yan, D.M., Bao, G., Zhang, X., Wonka, P.: Low-resolution remeshing using the localized restricted Voronoi diagram. IEEE Trans. Vis. Comput. Graph. 20(10), 1418–1427 (2014)
Witten, I.H., Frank, E., Hall, A.M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Acknowledgments
The Project Supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2016JQ6078), and the Fundamental Research Funds for the Central Universities of Chang’an University (300102328107, 0009—2014G6114024).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, G. (2019). A New Training Sample Selection Method Avoiding Over-Fitting Based on Nearest Neighbor Rule. In: Xhafa, F., Patnaik, S., Tavana, M. (eds) Advances in Intelligent, Interactive Systems and Applications. IISA 2018. Advances in Intelligent Systems and Computing, vol 885. Springer, Cham. https://doi.org/10.1007/978-3-030-02804-6_120
Download citation
DOI: https://doi.org/10.1007/978-3-030-02804-6_120
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02803-9
Online ISBN: 978-3-030-02804-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)