A review of instance selection methods
Article
First Online:
- 798 Downloads
- 108 Citations
Abstract
In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. This work is focused on presenting a survey of the main instance selection methods reported in the literature.
Keywords
Instance selection Supervised learning Data reduction Pre-processingPreview
Unable to display preview. Download preview PDF.
References
- Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6: 37–66Google Scholar
- Bezdek JC, Kuncheva LI (2001) Nearest prototype classifier designs: an experimental study. Int J Hybrid Intell Syst 16(12): 1445–1473MATHCrossRefGoogle Scholar
- Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2): 153–172MATHCrossRefMathSciNetGoogle Scholar
- Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97: 245–271MATHCrossRefMathSciNetGoogle Scholar
- Caises Y, González A, Leyva E, Pérez R (2009) SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains. In: Corchado E, Yin H (eds) IDEAL 2009, LNCS 5788. Burgos, Spain, pp 17–24Google Scholar
- Cano JR, Herrera F, Lozano M (2005) Stratification for scaling up evolutionary prototype selection. Pattern Recognit Lett 26: 953–963CrossRefGoogle Scholar
- Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6): 561–575CrossRefGoogle Scholar
- Cerverón V, Ferri FJ (2001) Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbour rule. IEEE Trans Syst Man Cybern B 31(3): 408–413CrossRefGoogle Scholar
- Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition. IEEE Computer Society, Hong-Kong, pp 556–559Google Scholar
- Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13: 21–27MATHCrossRefGoogle Scholar
- De Haro-García A, García-Pedrajas N (2009) A divide-and-conquer approach for scaling up instance sele ction algorithm. Data Min Knowl Discov 18: 392–418CrossRefGoogle Scholar
- Devijver PA, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the 5th international conference on pattern recognition. Los Alamitos, CA, pp 72–80Google Scholar
- Friedman JH, Bentley JL, Finkel RA (1997) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3): 209–226CrossRefGoogle Scholar
- Garain U (2008) Prototype reduction using an artificial immune model. Pattern Anal Appl 11: 353–363CrossRefGoogle Scholar
- García S, Cano JR, Herera F (2008) A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit 41: 2693–2709MATHCrossRefGoogle Scholar
- Glover F (1986) The general employee scheduling problem: an integration of management science and artificial intelligence. Comput Oper Res 13(4): 563–593CrossRefMathSciNetGoogle Scholar
- Grochowski M, Jankowski N et al (2004) Comparison of instance selection algorithms II. In: Results , comments. Rutkowski L (eds) ICAISC 2004, LNAI. Zacopane, Poland, pp 580–585Google Scholar
- Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14: 515–516CrossRefGoogle Scholar
- Ke-Ping Z, Shui-Geng Z, Ji-Hong G, Ao-Ying A (2003) C-Pruner: An improved instance pruning algorithm. In: Proceedings of 2nd IEEE international conference on machine learning and cybernetics, vol 1. pp 94–99Google Scholar
- Kittler J (1986) Feature selection and extraction. In: Young TY, Fu KS (eds) Handbook of pattern recognition and image processing. Academic Press, New York, pp 203–217Google Scholar
- Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16: 809–814CrossRefGoogle Scholar
- Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recognit 30: 1041–1049CrossRefGoogle Scholar
- Kuncheva LI, Bezdek JC (1998) Nearest prototype classification, clustering, genetic algorithms, or random search?. IEEE Trans Syst Man Cybern C 28(1): 160–164CrossRefGoogle Scholar
- Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6: 115–130CrossRefMathSciNetGoogle Scholar
- Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recognit 39: 495–497MATHCrossRefGoogle Scholar
- Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recognit 35: 2771–2782MATHCrossRefGoogle Scholar
- Narayan BL, Murthy CA, Pal SK (2006) Maxdiff kd-trees for data condensation. Pattern Recognit Lett 27: 187–200CrossRefGoogle Scholar
- Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2005) Sequential search for decremental edition. In: Gallagher M, Hogan J, Maire F (eds) LNCS 3578: IDEAL 2005. Queensland, Australia, pp 280–285Google Scholar
- Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA (2007a) Restricted sequential floating search applied to object selection. In: Perner P (eds) MLDM 2007:LNAI 4571. Leipzig, Germany, pp 694–702Google Scholar
- Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF et al (2007) Object selection based on clustering and border objects. In: Kurzynski M (eds) Computer recognition systems 2, ASC 45. Wroclaw, Poland, pp 27–34CrossRefGoogle Scholar
- Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2008) Prototype selection via prototype relevance. In: Ruiz-Shulcloper J, Kropatsch WG (eds) CIARP 2008, LNCS 5197. Habana, Cuba, pp 153–160Google Scholar
- Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Kittler J (2009) Prototype selection based on sequeintial search. Intell Data Anal 13(4): 599–631Google Scholar
- Paredes R, Vidal E (2000) Weighting prototypes. A new editing approach. In: Proceedings of the international conference on pattern recognition ICPR, vol. 2. pp 25–28Google Scholar
- Pudil P, Ferri FJ, Novovicová J, Kittler J (1994) Floating search methods for feature selection with nonmonotonic criterion functions. In: Proceedings of the 12th international conference on pattern recognition. IEEE Computer Society Press, pp 279–283Google Scholar
- Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognit Lett 26(10): 1554–1567CrossRefGoogle Scholar
- Ritter GL, Woodruff HB, Lowry SR, Isenhour TL (1975) An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory 21(6): 665–669MATHCrossRefGoogle Scholar
- Riquelme JC, Aguilar-Ruíz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recognit 36: 1009–1018CrossRefGoogle Scholar
- Srisawat A, Phienthrakul T, Kijsirikul B (2006) SV-kNNC: an algorithm for improving the efficency of k-Nearest neighbr. In: Yang Q, Webb G (eds) PRICAI 2006:LNAI 4099. Guilin, China, pp 975–979Google Scholar
- Spillmann B, Neuhaus M, Bunke H, Pȩkalska E, Duin RPW (2006) Transforming strings to vector spaces using prototype selection. In: Yeung D-Y et al (eds) SSPR&SPR 2006, LNCS 4109. Hong-Kong, pp. 287–296Google Scholar
- Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6-6: 448–452MathSciNetGoogle Scholar
- Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkMATHGoogle Scholar
- Vázquez F, Sánchez S, Pla F et al (2005) A stochastic approach to Wilson’s editing algorithm. In: Marques JS (eds) IbPRIA 2005, LNCS 3523. Estoril, Portugal, pp 35–42Google Scholar
- Venmann CJ, Reinders MJT (2005) The nearest sub-class classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Anal Mach Intell 27(9): 1417–1429CrossRefGoogle Scholar
- Venmann CJ, Reinders MJT, Backer E (2002) A maximum variance clustering algorithm. IEEE Trans Pattern Anal Mach Intell 24(9): 1273–1280CrossRefGoogle Scholar
- Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2: 408–421MATHCrossRefGoogle Scholar
- Wilson DR, Martínez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38: 257–286MATHCrossRefGoogle Scholar
- Yuangui L, Zhonhui H, Yunze C, Weidong Z et al (2005) Support vector based prototype selection method for nearest neighbor rules. In: Wang L (eds) ICNC 2005, LNCS 3610. Changsha, China, pp 528–535Google Scholar
- Zhang H, Sun G (2002) Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognit 35: 1481–1490MATHCrossRefGoogle Scholar
Copyright information
© Springer Science+Business Media B.V. 2010