Skip to main content

A New Training Sample Selection Method Avoiding Over-Fitting Based on Nearest Neighbor Rule

  • Conference paper
  • First Online:
  • 1260 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 885))

Abstract

Sample selection is an important task. Now, there are many sample selecting methods using nearest neighbor rule. But most of them never consider the over-fitting problem. For overcoming this disadvantage, this paper gives a new sample selecting method. This method uses pruning tactics and cross-validation to avoid over-fitting. It divided the original sample set to some disjoint subsets. Every time, a subset is used as validation sample set to prune samples selected from other subsets. All the subsets take turns as validation set. And the final result was gotten by combining all the selected sample sets. The experiments show that, compared with the existing methods, the new method can get smaller selected sample set and better classifiers can be trained on its selected samples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhou, X., Jiang, W., Tian, Y., Shi, Y.: Kernel subclass convex hull sample selection method for SVM on face recognition. Neurocomputing 73(10–12), 2234–2246 (2010)

    Article  Google Scholar 

  2. He, Q., Xie, Z., Hu, Q., Wu, C.: Neighborhood based sample and feature selection for SVM classification learning. Neurocomputing 74(10), 1585–1594 (2011)

    Article  Google Scholar 

  3. Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14(3), 515–516 (1968)

    Article  Google Scholar 

  4. Hao, H., Jiang, R.: Training sample selection method for neural networks based on nearest neighbor rule. Acta Autom. Sin. 33(12), 1247–1251 (2007)

    MATH  Google Scholar 

  5. Cerveron, V., Ferri, F.J.: Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbor rule. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 31(3), 408–413 (2001)

    Article  Google Scholar 

  6. Angiulli, F.: Fast condensed nearest neighbor rule. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 25–32 (2005)

    Google Scholar 

  7. Chou, C.-H., Kuo, B.-H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th International Conference on Pattern Recognition, vol. 02, pp. 556–559 (2006)

    Google Scholar 

  8. Sogaard, A.: Semisupervised condensed nearest neighbor for part-of-speech tagging. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 48–52 (2011)

    Google Scholar 

  9. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  10. Karkkainen, T.: On cross-validation for MLP model evaluation. Lecture Notes in Computer Science, vol. 8621, pp. 291–300 (2014)

    Google Scholar 

  11. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2010)

    MATH  Google Scholar 

  12. Escobar, J.W., Linfati, R., Toth, P., Baldoquin, M.G.: A hybrid granular tabu search algorithm for the multi-depot vehicle routing problem. J. Heuristics. 20(5), 483–509 (2014)

    Article  Google Scholar 

  13. Yan, D.M., Bao, G., Zhang, X., Wonka, P.: Low-resolution remeshing using the localized restricted Voronoi diagram. IEEE Trans. Vis. Comput. Graph. 20(10), 1418–1427 (2014)

    Article  Google Scholar 

  14. Witten, I.H., Frank, E., Hall, A.M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

Download references

Acknowledgments

The Project Supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2016JQ6078), and the Fundamental Research Funds for the Central Universities of Chang’an University (300102328107, 0009—2014G6114024).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, G. (2019). A New Training Sample Selection Method Avoiding Over-Fitting Based on Nearest Neighbor Rule. In: Xhafa, F., Patnaik, S., Tavana, M. (eds) Advances in Intelligent, Interactive Systems and Applications. IISA 2018. Advances in Intelligent Systems and Computing, vol 885. Springer, Cham. https://doi.org/10.1007/978-3-030-02804-6_120

Download citation

Publish with us

Policies and ethics