A New Training Sample Selection Method Avoiding Over-Fitting Based on Nearest Neighbor Rule

Li, Guang

doi:10.1007/978-3-030-02804-6_120

A New Training Sample Selection Method Avoiding Over-Fitting Based on Nearest Neighbor Rule

Guang Li¹⁷

Conference paper
First Online: 17 January 2019

1260 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 885))

Abstract

Sample selection is an important task. Now, there are many sample selecting methods using nearest neighbor rule. But most of them never consider the over-fitting problem. For overcoming this disadvantage, this paper gives a new sample selecting method. This method uses pruning tactics and cross-validation to avoid over-fitting. It divided the original sample set to some disjoint subsets. Every time, a subset is used as validation sample set to prune samples selected from other subsets. All the subsets take turns as validation set. And the final result was gotten by combining all the selected sample sets. The experiments show that, compared with the existing methods, the new method can get smaller selected sample set and better classifiers can be trained on its selected samples.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Zhou, X., Jiang, W., Tian, Y., Shi, Y.: Kernel subclass convex hull sample selection method for SVM on face recognition. Neurocomputing 73(10–12), 2234–2246 (2010)
Article Google Scholar
He, Q., Xie, Z., Hu, Q., Wu, C.: Neighborhood based sample and feature selection for SVM classification learning. Neurocomputing 74(10), 1585–1594 (2011)
Article Google Scholar
Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14(3), 515–516 (1968)
Article Google Scholar
Hao, H., Jiang, R.: Training sample selection method for neural networks based on nearest neighbor rule. Acta Autom. Sin. 33(12), 1247–1251 (2007)
MATH Google Scholar
Cerveron, V., Ferri, F.J.: Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbor rule. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 31(3), 408–413 (2001)
Article Google Scholar
Angiulli, F.: Fast condensed nearest neighbor rule. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 25–32 (2005)
Google Scholar
Chou, C.-H., Kuo, B.-H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th International Conference on Pattern Recognition, vol. 02, pp. 556–559 (2006)
Google Scholar
Sogaard, A.: Semisupervised condensed nearest neighbor for part-of-speech tagging. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 48–52 (2011)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Karkkainen, T.: On cross-validation for MLP model evaluation. Lecture Notes in Computer Science, vol. 8621, pp. 291–300 (2014)
Google Scholar
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2010)
MATH Google Scholar
Escobar, J.W., Linfati, R., Toth, P., Baldoquin, M.G.: A hybrid granular tabu search algorithm for the multi-depot vehicle routing problem. J. Heuristics. 20(5), 483–509 (2014)
Article Google Scholar
Yan, D.M., Bao, G., Zhang, X., Wonka, P.: Low-resolution remeshing using the localized restricted Voronoi diagram. IEEE Trans. Vis. Comput. Graph. 20(10), 1418–1427 (2014)
Article Google Scholar
Witten, I.H., Frank, E., Hall, A.M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Google Scholar

Download references

Acknowledgments

The Project Supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2016JQ6078), and the Fundamental Research Funds for the Central Universities of Chang’an University (300102328107, 0009—2014G6114024).

Author information

Authors and Affiliations

School of Electronic and Control Engineering, Chang’an University, Xi’an, 710064, China
Guang Li

Authors

Guang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guang Li .

Editor information

Editors and Affiliations

Department de Ciències de la Computació, Universitat Politècnica de Catalunya, Barcelona, Spain
Fatos Xhafa
Department of Computer Science and Engineering, Faculty of Engineering and Technology, SOA University, Bhubaneswar, Odisha, India
Srikanta Patnaik
Department of Business Systems and Analytics, La Salle University, Philadelphia, PA, USA
Madjid Tavana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, G. (2019). A New Training Sample Selection Method Avoiding Over-Fitting Based on Nearest Neighbor Rule. In: Xhafa, F., Patnaik, S., Tavana, M. (eds) Advances in Intelligent, Interactive Systems and Applications. IISA 2018. Advances in Intelligent Systems and Computing, vol 885. Springer, Cham. https://doi.org/10.1007/978-3-030-02804-6_120

Download citation

DOI: https://doi.org/10.1007/978-3-030-02804-6_120
Published: 17 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02803-9
Online ISBN: 978-3-030-02804-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics