Skip to main content
Log in

A new fast prototype selection method based on clustering

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In supervised classification, a training set T is given to a classifier for classifying new prototypes. In practice, not all information in T is useful for classifiers, therefore, it is convenient to discard irrelevant prototypes from T. This process is known as prototype selection, which is an important task for classifiers since through this process the time for classification or training could be reduced. In this work, we propose a new fast prototype selection method for large datasets, based on clustering, which selects border prototypes and some interior prototypes. Experimental results showing the performance of our method and comparing accuracy and runtimes against other prototype selection methods are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. These runtimes were obtained using an Intel Celeron CPU 2.4 GHz, 512 MB RAM.

  2. For SVM, we used the software from [28], for C4.5 and Naive Bayes WEKA [29] was used, LWLR and k-NN were implemented in MATLAB [30].

References

  1. Kuncheva LI, Bezdek JC (1998) Nearest prototype classification, clustering, genetic algorithms, or random search? IEEE Trans Syst Man Cybern C28(1):160–164

    Google Scholar 

  2. Bezdek JC, Kuncheva LI (2001) Nearest prototype classifier designs: an experimental study. Int J Intell Syst 16(12):1445–1473

    Article  MATH  Google Scholar 

  3. Wilson DR, Martínez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286

    Article  MATH  Google Scholar 

  4. Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172

    Article  MATH  MathSciNet  Google Scholar 

  5. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27

    Article  MATH  Google Scholar 

  6. Atkeson CG, Moorel AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11(1–5):11–73

    Article  Google Scholar 

  7. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  8. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  9. Cristanni N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Google Scholar 

  10. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  11. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York

    Google Scholar 

  12. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516

    Article  Google Scholar 

  13. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2:408–421

    Article  MATH  Google Scholar 

  14. Chidananda GK, Krishna G (1979) The condensed nearest neighbor rule using the concept of mutual nearest neighborhood. IEEE Trans Inf Theory 25:488–490

    Article  Google Scholar 

  15. Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition. IEEE Computer Society, Hong-Kong, pp 556–559

    Google Scholar 

  16. Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6–6:448–452

    MathSciNet  Google Scholar 

  17. Devijver PA, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the fifth international conference on pattern recognition, Los Alamitos, CA, pp 72–80

  18. Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Disc 6:115–130

    Article  MathSciNet  Google Scholar 

  19. Spillmann B, Neuhaus M, Bunke H, Pękalska E, Duin RPW (2006) Transforming strings to vector spaces using prototype selection. In: Yeung D-Y et al (eds) SSPR & SPR 2006, Lecture Notes in Computer Science, vol 4109, Hong-Kong, pp 287–296

  20. Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recogn 39:495–497

    Article  MATH  Google Scholar 

  21. Venmann CJ, Reinders MJT (2005) The nearest sub-class classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Anal Mach Intell 27(9):1417–1429

    Article  Google Scholar 

  22. Venmann CJ, Reinders MJT, Backer E (2002) A maximum variance clustering algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280

    Article  Google Scholar 

  23. Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recogn 35:2771–2782

    Article  MATH  Google Scholar 

  24. Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognit Lett 26(10):1554–1567

    Article  Google Scholar 

  25. Karaçali B, Krim H (2002) Fast minimization of structural risk by nearest neighbor rule. IEEE Trans Neural Netw 14:127–137

    Article  Google Scholar 

  26. Asuncion A, Newman DJ (2007) UCI machine learning repository. In: University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html

  27. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924

    Article  Google Scholar 

  28. Vojtech F, Václav H (2004) Statistical pattern recognition toolbox for Matlab. Research report, Center for Machine Perception Department of Cybernetic, Faculty of Electrical Engineering, Czech Technical University

  29. Witten IH, Frank E (2005) Data mining: practical machine learning tools techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  30. The MathWorks Inc. (1994–2008) Natick. [http://www.mathworks.com]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Arturo Olvera-López.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olvera-López, J.A., Carrasco-Ochoa, J.A. & Martínez-Trinidad, J.F. A new fast prototype selection method based on clustering. Pattern Anal Applic 13, 131–141 (2010). https://doi.org/10.1007/s10044-008-0142-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-008-0142-x

Keywords

Navigation