Data Preprocessing in Data Mining pp 195-243

Part of the Intelligent Systems Reference Library book series (ISRL, volume 72) | Cite as

Instance Selection

  • Salvador García
  • Julián Luengo
  • Francisco Herrera
Chapter

Abstract

In this chapter, we consider instance selection as an important focusing task in the data reduction phase of knowledge discovery and data mining. First of all, we define a broader perspective on concepts and topics related with instance selection (Sect. 8.1). Due to the fact that instance selection has been distinguished over the years as two type of tasks, depending on the data mining method applied later, we clearly separate it into two processes: training set selection and prototype selection. Theses trends are explained in Sect. 8.2. Thereafter, and focusing on prototype selection, we present a unifying framework that covers existing properties obtaining as a result a complete taxonomy (Sect. 8.3). The description of the operation as the most well known and some recent instance and/or prototype selection methods are provided in Sect. 8.4. Advanced and recent approaches that incorporate novel solutions based of hybridizations with other types of data reduction techniques or similar solutions are collected in Sect. 8.5. Finally, we summarize example evaluation results for prototype selection in an exhaustive experimental comparative analysis in Sect. 8.6.

References

  1. 1.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)Google Scholar
  2. 2.
    Aha, D.W. (ed.): Lazy Learning. Springer, Heidelberg (2010)Google Scholar
  3. 3.
    Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)Google Scholar
  4. 4.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  5. 5.
    Alpaydin, E.: Voting over multiple condensed nearest neighbors. Artif. Intell. Rev. 11(1–5), 115–132 (1997)Google Scholar
  6. 6.
    Angiulli, F., Folino, G.: Distributed nearest neighbor-based condensation of very large data sets. IEEE Trans. Knowl. Data Eng. 19(12), 1593–1606 (2007)Google Scholar
  7. 7.
    Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Trans. Knowl. Data Eng. 19(11), 1450–1464 (2007)Google Scholar
  8. 8.
    Antonelli, M., Ducange, P., Marcelloni, F.: Genetic training instance selection in multiobjective evolutionary fuzzy systems: A coevolutionary approach. IEEE Trans. Fuzzy Syst. 20(2), 276–290 (2012)Google Scholar
  9. 9.
    Barandela, R., Cortés, N., Palacios, A.: The nearest neighbor rule and the reduction of the training sample size. Proceedings of the IX Symposium of the Spanish Society for Pattern Recognition (2001)Google Scholar
  10. 10.
    Barandela, R., Ferri, F.J., Sánchez, J.S.: Decision boundary preserving prototype selection for nearest neighbor classification. Int. J. Pattern Recognit Artif Intell. 19(6), 787–806 (2005)Google Scholar
  11. 11.
    Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)Google Scholar
  12. 12.
    Bezdek, J.C., Kuncheva, L.I.: Nearest prototype classifier designs: An experimental study. Int. J. Intell. Syst. 16, 1445–1473 (2001)MATHGoogle Scholar
  13. 13.
    Bien, J., Tibshirani, R.: Prototype selection for interpretable classification. Ann. Appl. Stat. 5(4), 2403–2424 (2011)MATHMathSciNetGoogle Scholar
  14. 14.
    Borzeshi, Z.E., Piccardi, M., Riesen, K., Bunke, H.: Discriminative prototype selection methods for graph embedding. Pattern Recognit. 46, 1648–1657 (2013)MATHGoogle Scholar
  15. 15.
    Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Disc. 6(2), 153–172 (2002)MATHMathSciNetGoogle Scholar
  16. 16.
    Brodley, C.E.: Recursive automatic bias selection for classifier construction. Mach. Learn. 20(1–2), 63–94 (1995)Google Scholar
  17. 17.
    Cai, Y.-H., Wu, B., He, Y.-L., Zhang, Y.: A new instance selection algorithm based on contribution for nearest neighbour classification. In: International Conference on Machine Learning and Cybernetics (ICMLC), pp. 155–160 (2010)Google Scholar
  18. 18.
    Cameron-Jones, R.M.: Instance selection by encoding length heuristic with random mutation hill climbing. In: Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence, pp. 99–106 (1995)Google Scholar
  19. 19.
    Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7(6), 561–575 (2003)Google Scholar
  20. 20.
    Cano, J.R., Herrera, F., Lozano, M.: Stratification for scaling up evolutionary prototype selection. Pattern Recogn. Lett. 26(7), 953–963 (2005)Google Scholar
  21. 21.
    Cano, J.R., Herrera, F., Lozano, M.: Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability. Data Knowl. Eng. 60(1), 90–108 (2007)Google Scholar
  22. 22.
    Cano, J.R., García, S., Herrera, F.: Subgroup discover in large size data sets preprocessed using stratified instance selection for increasing the presence of minority classes. Pattern Recogn. Lett. 29(16), 2156–2164 (2008)Google Scholar
  23. 23.
    Cano, J.R., Herrera, F., Lozano, M., García, S.: Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection. Expert Syst. Appl. 35(4), 1949–1965 (2008)Google Scholar
  24. 24.
    Cavalcanti, G.D.C., Ren, T.I., Pereira, C.L.: ATISA: Adaptive threshold-based instance selection algorithm. Expert Syst. Appl. 40(17), 6894–6900 (2013)Google Scholar
  25. 25.
    Cervantes, A., Galván, I.M., Isasi, P.: AMPSO: a new particle swarm method for nearest neighborhood classification. IEEE Trans. Syst. Man Cybern. B Cybern. 39(5), 1082–1091 (2009)Google Scholar
  26. 26.
    Cerverón, V., Ferri, F.J.: Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbor rule. IEEE Trans. Syst. Man Cybern. B Cybern. 31(3), 408–413 (2001)Google Scholar
  27. 27.
    Chang, C.L.: Finding prototypes for nearest neighbor classifiers. IEEE Trans. Comput. 23(11), 1179–1184 (1974)MATHGoogle Scholar
  28. 28.
    Chang, F., Lin, C.C., Lu, C.J.: Adaptive prototype learning algorithms: Theoretical and experimental studies. J. Mach. Learn. Res. 7, 2125–2148 (2006)MATHMathSciNetGoogle Scholar
  29. 29.
    Chen, C.H., Jóźwik, A.: A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn. Lett. 17(8), 819–823 (1996)Google Scholar
  30. 30.
    Chen, Y., Bi, J., Wang, J.Z.: MILES: Multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)Google Scholar
  31. 31.
    Chen, J., Zhang, C., Xue, X., Liu, C.L.: Fast instance selection for speeding up support vector machines. Knowl.-Based Syst. 45, 1–7 (2013)Google Scholar
  32. 32.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)MATHGoogle Scholar
  33. 33.
    Czarnowski, I.: Prototype selection algorithms for distributed learning. Pattern Recognit. 43(6), 2292–2300 (2010)MATHGoogle Scholar
  34. 34.
    Czarnowski, I.: Cluster-based instance selection for machine classification. Knowl. Inf. Syst. 30(1), 113–133 (2012)Google Scholar
  35. 35.
    Dai, B.R., Hsu, S.M.: An instance selection algorithm based on reverse nearest neighbor. In: PAKDD (1), Lecture Notes in Computer Science, vol. 6634, pp. 1–12 (2011)Google Scholar
  36. 36.
    Dasarathy, B.V.: Minimal consistent set (MCS) identification for optimal nearest neighbor decision system design. IEEE Trans. Syst. Man Cybern. B Cybern. 24(3), 511–517 (1994)Google Scholar
  37. 37.
    de Santana Pereira, C., Cavalcanti, G.D.C.: Competence enhancement for nearest neighbor classification rule by ranking-based instance selection. In: International Conference on Tools with Artificial Intelligence, pp. 763–769 (2012)Google Scholar
  38. 38.
    Delany, S.J., Segata, N., Namee, B.M.: Profiling instances in noise reduction. Knowl.-Based Syst. 31, 28–40 (2012)Google Scholar
  39. 39.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MATHMathSciNetGoogle Scholar
  40. 40.
    Derrac, J., García, S., Herrera, F.: IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule. Pattern Recognit. 43(6), 2082–2105 (2010)MATHGoogle Scholar
  41. 41.
    Derrac, J., García, S., Herrera, F.: Stratified prototype selection based on a steady-state memetic algorithm: a study of scalability. Memetic Comput. 2(3), 183–199 (2010)Google Scholar
  42. 42.
    Derrac, J., García, S., Herrera, F.: A survey on evolutionary instance selection and generation. Int. J. Appl. Metaheuristic Comput. 1(1), 60–92 (2010)Google Scholar
  43. 43.
    Derrac, J., Cornelis, C., García, S., Herrera, F.: Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Inf. Sci. 186(1), 73–92 (2012)Google Scholar
  44. 44.
    Derrac, J., Triguero, I., García, S., Herrera, F.: Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms. IEEE Trans. Syst. Man Cybern. B Cybern. 42(5), 1383–1397 (2012)Google Scholar
  45. 45.
    Derrac, J., Verbiest, N., García, S., Cornelis, C., Herrera, F.: On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection. Soft Comput. 17(2), 223–238 (2013)Google Scholar
  46. 46.
    Devi, V.S., Murty, M.N.: An incremental prototype set building technique. Pattern Recognit. 35(2), 505–513 (2002)MATHGoogle Scholar
  47. 47.
    Devijver, P.A., Kittler, J.: A Statistical Approach Pattern Recognition. Prentice Hall, New Jersey (1982)MATHGoogle Scholar
  48. 48.
    Devijver, P.A.: On the editing rate of the multiedit algorithm. Pattern Recogn. Lett. 4, 9–12 (1986)Google Scholar
  49. 49.
    Domingo, C., Gavaldà, R., Watanabe, O.: Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Min. Knowl. Disc. 6, 131–152 (2002)Google Scholar
  50. 50.
    Domingos, P.: Unifying instance-based and rule-based induction. Mach. Learn. 24(2), 141–168 (1996)Google Scholar
  51. 51.
    El-Hindi, K., Al-Akhras, M.: Smoothing decision boundaries to avoid overfitting in neural network training. Neural Netw. World 21(4), 311–325 (2011)Google Scholar
  52. 52.
    Fayed, H.A., Hashem, S.R., Atiya, A.F.: Self-generating prototypes for pattern classification. Pattern Recognit. 40(5), 1498–1509 (2007)MATHGoogle Scholar
  53. 53.
    Fayed, H.A., Atiya, A.F.: A novel template reduction approach for the k-nearest neighbor method. IEEE Trans. Neural Networks 20(5), 890–896 (2009)Google Scholar
  54. 54.
    Fernández, F., Isasi, P.: Evolutionary design of nearest prototype classifiers. J. Heuristics 10(4), 431–454 (2004)Google Scholar
  55. 55.
    Fernández, F., Isasi, P.: Local feature weighting in nearest prototype classification. IEEE Trans. Neural Networks 19(1), 40–53 (2008)Google Scholar
  56. 56.
    Ferrandiz, S., Boullé, M.: Bayesian instance selection for the nearest neighbor rule. Mach. Learn. 81(3), 229–256 (2010)MathSciNetGoogle Scholar
  57. 57.
    Franco, A., Maltoni, D., Nanni, L.: Data pre-processing through reward-punishment editing. Pattern Anal. Appl. 13(4), 367–381 (2010)MathSciNetGoogle Scholar
  58. 58.
    Fu, Z., Robles-Kelly, A., Zhou, J.: MILIS: multiple instance learning with instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 958–977 (2011)Google Scholar
  59. 59.
    Gagné, C., Parizeau, M.: Coevolution of nearest neighbor classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 921–946 (2007)Google Scholar
  60. 60.
    Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. C 42(4), 463–484 (2012)Google Scholar
  61. 61.
    Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit. 46(12), 3460–3471 (2013)Google Scholar
  62. 62.
    García, S., Cano, J.R., Herrera, F.: A memetic algorithm for evolutionary prototype selection: A scaling up approach. Pattern Recognit. 41(8), 2693–2709 (2008)MATHGoogle Scholar
  63. 63.
    García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)MATHGoogle Scholar
  64. 64.
    García, S., Cano, J.R., Bernadó-Mansilla, E., Herrera, F.: Diagnose of effective evolutionary prototype selection using an overlapping measure. Int. J. Pattern Recognit. Artif. Intell. 23(8), 1527–1548 (2009)Google Scholar
  65. 65.
    García, S., Fernández, A., Herrera, F.: Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Appl. Soft Comput. 9(4), 1304–1314 (2009)Google Scholar
  66. 66.
    García, S., Herrera, F.: Evolutionary under-sampling for classification with imbalanced data sets: Proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)Google Scholar
  67. 67.
    García, S., Derrac, J., Luengo, J., Carmona, C.J., Herrera, F.: Evolutionary selection of hyperrectangles in nested generalized exemplar learning. Appl. Soft Comput. 11(3), 3032–3045 (2011)Google Scholar
  68. 68.
    García, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)Google Scholar
  69. 69.
    García, S., Derrac, J., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowl.-Based Syst. 25(1), 3–12 (2012)Google Scholar
  70. 70.
    García-Osorio, C., de Haro-García, A., García-Pedrajas, N.: Democratic instance selection: a linear complexity instance selection algorithm based on classifier ensemble concepts. Artif. Intell. 174(5–6), 410–441 (2010)Google Scholar
  71. 71.
    García-Pedrajas, N.: Constructing ensembles of classifiers by means of weighted instance selection. IEEE Trans. Neural Networks 20(2), 258–277 (2009)Google Scholar
  72. 72.
    García-Pedrajas, N., Romero del Castillo, J.A., Ortiz-Boyer, D.: A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach. Learn. 78(3), 381–420 (2010)MathSciNetGoogle Scholar
  73. 73.
    García-Pedrajas, N., Pérez-Rodríguez, J.: Multi-selection of instances: a straightforward way to improve evolutionary instance selection. Appl. Soft Comput. 12(11), 3590–3602 (2012)Google Scholar
  74. 74.
    García-Pedrajas, N., de Haro-García, A., Pérez-Rodríguez, J.: A scalable approach to simultaneous evolutionary instance and feature selection. Inf. Sci. 228, 150–174 (2013)Google Scholar
  75. 75.
    García-Pedrajas, N., Pérez-Rodríguez, J.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)Google Scholar
  76. 76.
    Gates, G.W.: The reduced nearest neighbor rule. IEEE Trans. Inf. Theory 22, 431–433 (1972)Google Scholar
  77. 77.
    Gil-Pita, R., Yao, X.: Evolving edited k-nearest neighbor classifiers. Int. J. Neural Syst. 18(6), 459–467 (2008)Google Scholar
  78. 78.
    Gowda, K.C., Krishna, G.: The condensed nearest neighbor rule using the concept of mutual nearest neighborhood. IEEE Trans. Inf. Theory 29, 488–490 (1979)Google Scholar
  79. 79.
    Guillén, A., Herrera, L.J., Rubio, G., Pomares, H., Lendasse, A., Rojas, I.: New method for instance or prototype selection using mutual information in time series prediction. Neurocomputing 73(10–12), 2030–2038 (2010)Google Scholar
  80. 80.
    Guo, Y., Zhang, H., Liu, X.: Instance selection in semi-supervised learning. Canadian conference on AI, Lecture Notes in Computer Science, vol. 6657, pp. 158–169 (2011)Google Scholar
  81. 81.
    Haro-García, A., García-Pedrajas, N.: A divide-and-conquer recursive approach for scaling up instance selection algorithms. Data Min. Knowl. Disc. 18(3), 392–418 (2009)Google Scholar
  82. 82.
    de Haro-García, A., García-Pedrajas, N., del Castillo, J.A.R.: Large scale instance selection by means of federal instance selection. Data Knowl. Eng. 75, 58–77 (2012)Google Scholar
  83. 83.
    Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)Google Scholar
  84. 84.
    Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recognit. 33(3), 521–528 (2000)Google Scholar
  85. 85.
    Hernandez-Leal, P., Carrasco-Ochoa, J.A., Trinidad, J.F.M., Olvera-López, J.A.: Instancerank based on borders for instance selection. Pattern Recognit. 46(1), 365–375 (2013)Google Scholar
  86. 86.
    Ho, S.Y., Liu, C.C., Liu, S.: Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm. Pattern Recogn. Lett. 23(13), 1495–1503 (2002)MATHGoogle Scholar
  87. 87.
    Ivanov, M.: Prototype sample selection based on minimization of the complete cross validation functional. Pattern Recognit. Image anal. 20(4), 427–437 (2010)Google Scholar
  88. 88.
    Jankowski, N., Grochowski, M.: Comparison of instances selection algorithms I. algorithms survey. In: ICAISC, Lecture Notes in Computer Science, vol. 3070, pp. 598–603 (2004)Google Scholar
  89. 89.
    Kibler, D., Aha, D.W.: Learning representative exemplars of concepts: an initial case study. In: Proceedings of the Fourth International Workshop on Machine Learning, pp. 24–30 (1987)Google Scholar
  90. 90.
    Kim, S.W., Oomenn, B.J.: Enhancing prototype reduction schemes with LVQ3-type algorithms. Pattern Recognit. 36, 1083–1093 (2003)MATHGoogle Scholar
  91. 91.
    Kim, S.W., Oommen, B.J.: Enhancing prototype reduction schemes with recursion: a method applicable for large data sets. IEEE Trans. Syst. Man Cybern. B 34(3), 1384–1397 (2004)Google Scholar
  92. 92.
    Kim, S.W., Oommen, B.J.: On using prototype reduction schemes to optimize kernel-based nonlinear subspace methods. Pattern Recognit. 37(2), 227–239 (2004)MATHGoogle Scholar
  93. 93.
    Kim, S.W., Oommen, B.J.: On using prototype reduction schemes and classifier fusion strategies to optimize kernel-based nonlinear subspace methods. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 455–460 (2005)Google Scholar
  94. 94.
    Kim, K.J.: Artificial neural networks with evolutionary instance selection for financial forecasting. Expert Syst. Appl. 30(3), 519–526 (2006)Google Scholar
  95. 95.
    Kim, S.W., Oommen, B.J.: On using prototype reduction schemes to optimize dissimilarity-based classification. Pattern Recognit. 40(11), 2946–2957 (2007)MATHGoogle Scholar
  96. 96.
    Kim, S.W., Oommen, B.J.: On using prototype reduction schemes to enhance the computation of volume-based inter-class overlap measures. Pattern Recognit. 42(11), 2695–2704 (2009)MATHGoogle Scholar
  97. 97.
    Kim, S.W.: An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classifications. Pattern Recogn. Lett. 32(6), 816–823 (2011)Google Scholar
  98. 98.
    Kohonen, T.: The self organizing map. Proc. IEEE 78(9), 1464–1480 (1990)Google Scholar
  99. 99.
    Koplowitz, J., Brown, T.: On the relation of performance to editing in nearest neighbor rules. Pattern Recognit. 13, 251–255 (1981)Google Scholar
  100. 100.
    Kuncheva, L.I.: Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn. Lett. 16(8), 809–814 (1995)Google Scholar
  101. 101.
    Kuncheva, L.I., Jain, L.C.: Nearest neighbor classifier: simultaneous editing and feature selection. Pattern Recogn. Lett. 20(11–13), 1149–1156 (1999)Google Scholar
  102. 102.
    Lam, W., Keung, C.K., Liu, D.: Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans. Pattern Anal. Mach. Intell. 14(8), 1075–1090 (2002)Google Scholar
  103. 103.
    Leyva, E., González, A., Pérez, R.: Knowledge-based instance selection: a compromise between efficiency and versatility. Knowl.-Based Syst. 47, 65–76 (2013)Google Scholar
  104. 104.
    Li, Y., Hu, Z., Cai, Y., Zhang, W.: Support vector based prototype selection method for nearest neighbor rules. In: First International Conference on Advances in Natural Computation (ICNC), Lecture Notes in Computer Science, vol. 3610, pp. 528–535 (2005)Google Scholar
  105. 105.
    Li, Y., Maguire, L.P.: Selecting critical patterns based on local geometrical and statistical information. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1189–1201 (2011)Google Scholar
  106. 106.
    Li, I.J., Chen, J.C., Wu, J.L.: A fast prototype reduction method based on template reduction and visualization-induced self-organizing map for nearest neighbor algorithm. Appl. Intell. 39(3), 564–582 (2013)Google Scholar
  107. 107.
    Lipowezky, U.: Selection of the optimal prototype subset for 1-nn classification. Pattern Recogn. Lett. 19(10), 907–918 (1998)Google Scholar
  108. 108.
    Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Kluwer Academic Publishers, Norwell (2001)Google Scholar
  109. 109.
    Liu, H., Motoda, H.: On issues of instance selection. Data Min. Knowl. Disc. 6(2), 115–130 (2002)MathSciNetGoogle Scholar
  110. 110.
    López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)Google Scholar
  111. 111.
    Lowe, D.G.: Similarity metric learning for a variable-kernel classifier. Neural Comput. 7(1), 72–85 (1995)Google Scholar
  112. 112.
    Lozano, M.T., Sánchez, J.S., Pla, F.: Using the geometrical distribution of prototypes for training set condensing. CAEPIA, Lecture Notes in Computer Science, vol. 3040, pp. 618–627 (2003)Google Scholar
  113. 113.
    Lozano, M., Sotoca, J.M., Sánchez, J.S., Pla, F., Pekalska, E., Duin, R.P.W.: Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces. Pattern Recognit. 39(10), 1827–1838 (2006)MATHGoogle Scholar
  114. 114.
    Luaces, O., Bahamonde, A.: Inflating examples to obtain rules. Int. J. Intell. syst. 18, 1113–1143 (2003)MATHGoogle Scholar
  115. 115.
    Luengo, J., Fernández, A., García, S., Herrera, F.: Addressing data complexity for imbalanced data sets: analysis of smote-based oversampling and evolutionary undersampling. Soft Comput. 15(10), 1909–1936 (2011)Google Scholar
  116. 116.
    Marchiori, E.: Hit miss networks with applications to instance selection. J. Mach. Learn. Res. 9, 997–1017 (2008)MATHMathSciNetGoogle Scholar
  117. 117.
    Marchiori, E.: Class conditional nearest neighbor for large margin instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 32, 364–370 (2010)Google Scholar
  118. 118.
    Miloud-Aouidate, A., Baba-Ali, A.R.: Ant colony prototype reduction algorithm for knn classification. In: International Conference on Computational Science and Engineering, pp. 289–294 (2012)Google Scholar
  119. 119.
    Mollineda, R.A., Sánchez, J.S., Sotoca, J.M.: Data characterization for effective prototype selection. In: Proc. of the 2nd Iberian Conf. on Pattern Recognition and Image Analysis (ICPRIA), Lecture Notes in Computer Science, vol. 3523, pp. 27–34 (2005)Google Scholar
  120. 120.
    Narayan, B.L., Murthy, C.A., Pal, S.K.: Maxdiff kd-trees for data condensation. Pattern Recognit. Lett. 27(3), 187–200 (2006)Google Scholar
  121. 121.
    Neo, T.K.C., Ventura, D.: A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric. Pattern Recognit. Lett. 33(1), 92–102 (2012)Google Scholar
  122. 122.
    Nikolaidis, K., Goulermas, J.Y., Wu, Q.H.: A class boundary preserving algorithm for data condensation. Pattern Recognit. 44(3), 704–715 (2011)MATHGoogle Scholar
  123. 123.
    Nikolaidis, K., Rodriguez-Martinez, E., Goulermas, J.Y., Wu, Q.H.: Spectral graph optimization for instance reduction. IEEE Trans. Neural Networks Learn. Syst. 23(7), 1169–1175 (2012)Google Scholar
  124. 124.
    Nikolaidis, K., Mu, T., Goulermas, J.: Prototype reduction based on direct weighted pruning. Pattern Recognit. Lett. 36, 22–28 (2014)Google Scholar
  125. 125.
    Olvera-López, J.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: Edition schemes based on BSE. In: 10th Iberoamerican Congress on Pattern Recognition (CIARP), Lecture Notes in Computer Science, vol. 3773, pp. 360–367 (2005)Google Scholar
  126. 126.
    Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: A new fast prototype selection method based on clustering. Pattern Anal. Appl. 13(2), 131–141 (2010)MathSciNetGoogle Scholar
  127. 127.
    Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)Google Scholar
  128. 128.
    Paredes, R., Vidal, E.: Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recog. 39(2), 180–188 (2006)MATHGoogle Scholar
  129. 129.
    Paredes, R., Vidal, E.: Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1100–1110 (2006)Google Scholar
  130. 130.
    García-Pedrajas, N.: Evolutionary computation for training set selection. Wiley Interdisc. Rev.: Data Min. Knowl. Disc. 1(6), 512–523 (2011)Google Scholar
  131. 131.
    Pekalska, E., Duin, R.P.W., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recognit. 39(2), 189–208 (2006)MATHGoogle Scholar
  132. 132.
    Raniszewski, M.: Sequential reduction algorithm for nearest neighbor rule. In: ICCVG (2), Lecture Notes in Computer Science, vol. 6375, pp. 219–226. Springer, Heidelberg (2010)Google Scholar
  133. 133.
    Reinartz, T.: A unifying view on instance selection. Data Min. Knowl. Disc. 6(2), 191–210 (2002)MATHMathSciNetGoogle Scholar
  134. 134.
    Calana, Y.P., Reyes, E.G., Alzate, M.O., Duin, R.P.W.: Prototype selection for dissimilarity representation by a genetic algorithm. In: International Conference on Pattern Recogition (ICPR), pp. 177–180 (2010)Google Scholar
  135. 135.
    Riquelme, J.C., Aguilar-Ruiz, J.S., Toro, M.: Finding representative patterns with ordered projections. Pattern Recognit. 36(4), 1009–1018 (2003)Google Scholar
  136. 136.
    Ritter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An algorithm for a selective nearest neighbor decision rule. IEEE Trans. Inf. Theory 25, 665–669 (1975)Google Scholar
  137. 137.
    Sáez, J.A., Luengo, J., Herrera, F.: Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit. 46(1), 355–364 (2013)Google Scholar
  138. 138.
    Salzberg, S.: A nearest hyperrectangle learning method. Mach. Learn. 6, 251–276 (1991)Google Scholar
  139. 139.
    Sánchez, J.S., Pla, F., Ferri, F.J.: Prototype selection for the nearest neighbor rule through proximity graphs. Pattern Recognit. Lett. 18, 507–513 (1997)Google Scholar
  140. 140.
    Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recognit. Lett. 24(7), 1015–1022 (2003)Google Scholar
  141. 141.
    Sánchez, J.S.: High training set size reduction by space partitioning and prototype abstraction. Pattern Recognit. 37(7), 1561–1564 (2004)Google Scholar
  142. 142.
    Dos Santos, E.M., Sabourin, R., Maupin, P.: Overfitting cautious selection of classifier ensembles with genetic algorithms. Inf. Fusion 10(2), 150–162 (2009)Google Scholar
  143. 143.
    Sebban, M., Nock, R.: Instance pruning as an information preserving problem. In: ICML ’00: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 855–862 (2000)Google Scholar
  144. 144.
    Sebban, M., Nock, R., Brodley, E., Danyluk, A.: Stopping criterion for boosting-based data reduction techniques: from binary to multiclass problems. J. Mach. Learn. Res. 3, 863–885 (2002)Google Scholar
  145. 145.
    Segata, N., Blanzieri, E., Delany, S.J., Cunningham, P.: Noise reduction for instance-based learning with a local maximal margin approach. J. Intell. Inf. Sys. 35(2), 301–331 (2010)Google Scholar
  146. 146.
    Sierra, B., Lazkano, E., Inza, I., Merino, M., Larrañaga, P., Quiroga, J.: Prototype selection and feature subset selection by estimation of distribution algorithms. a case study in the survival of cirrhotic patients treated with TIPS. In: AIME ’01: Proceedings of the 8th Conference on AI in Medicine in Europe, Lecture Notes in Computer Science, vol. 2101, pp. 20–29 (2001)Google Scholar
  147. 147.
    Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 293–301 (1994)Google Scholar
  148. 148.
    Steele, B.M.: Exact bootstrap k-nearest neighbor learners. Mach. Learn. 74(3), 235–255 (2009)Google Scholar
  149. 149.
    Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 6(6), 448–452 (1976)MATHMathSciNetGoogle Scholar
  150. 150.
    Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6(6), 769–772 (1976)MATHGoogle Scholar
  151. 151.
    Triguero, I., García, S., Herrera, F.: IPADE: iterative prototype adjustment for nearest neighbor classification. IEEE Trans. Neural Networks 21(12), 1984–1990 (2010)Google Scholar
  152. 152.
    Triguero, I., García, S., Herrera, F.: Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification. Pattern Recognit. 44(4), 901–916 (2011)Google Scholar
  153. 153.
    Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. C 42(1), 86–100 (2012)Google Scholar
  154. 154.
    Tsai, C.F., Chang, C.W.: SVOIS: support vector oriented instance selection for text classification. Inf. Syst. 38(8), 1070–1083 (2013)Google Scholar
  155. 155.
    Tsai, C.F., Eberle, W., Chu, C.Y.: Genetic algorithms in feature and instance selection. Knowl.-Based Syst. 39, 240–247 (2013)Google Scholar
  156. 156.
    Ullmann, J.R.: Automatic selection of reference data for use in a nearest-neighbor method of pattern classification. IEEE Trans. Inf. Theory 24, 541–543 (1974)Google Scholar
  157. 157.
    Vascon, S., Cristani, M., Pelillo, M., Murino, V.: Using dominant sets for k-nn prototype selection. In: International Conference on Image Analysis and Processing (ICIAP (2)), pp. 131–140 (2013)Google Scholar
  158. 158.
    Vázquez, F., Sánchez, J.S., Pla, F.: A stochastic approach to Wilson’s editing algorithm. In: 2nd Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA), Lecture Notes in Computer Science, vol. 3523, pp. 35–42 (2005)Google Scholar
  159. 159.
    Verbiest, N., Cornelis, C., Herrera, F.: FRPS: a fuzzy rough prototype selection method. Pattern Recognit. 46(10), 2770–2782 (2013)Google Scholar
  160. 160.
    Wang, X., Miao, Q., Zhai, M.Y., Zhai, J.: Instance selection based on sample entropy for efficient data classification with elm. In: International Conference on Systems, Man and Cybernetics, pp. 970–974 (2012)Google Scholar
  161. 161.
    Wang, X.Z., Wu, B., He, Y.L., Pei, X.H.: NRMCS : Noise removing based on the MCS. In: Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, pp. 89–93 (2008)Google Scholar
  162. 162.
    Wettschereck, D., Dietterich, T.G.: An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Mach. Learn. 19(1), 5–27 (1995)Google Scholar
  163. 163.
    Wettschereck, D., Aha, D.W., Mohri, T.: A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif. Intell. Rev. 11(1–5), 273–314 (1997)Google Scholar
  164. 164.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)Google Scholar
  165. 165.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. B Cybern. 2(3), 408–421 (1972)MATHGoogle Scholar
  166. 166.
    Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)MATHMathSciNetGoogle Scholar
  167. 167.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)MATHGoogle Scholar
  168. 168.
    Wu, Y., Ianakiev, K.G., Govindaraju, V.: Improved k-nearest neighbor classification. Pattern Recognit. 35(10), 2311–2318 (2002)MATHGoogle Scholar
  169. 169.
    Yang, T., Cao, L., Zhang, C.: A novel prototype reduction method for the k-nearest neighbor algorithm with k>= 1. In: PAKDD (2), Lecture Notes in Computer Science, vol. 6119, pp. 89–100 (2010)Google Scholar
  170. 170.
    Zhai, T., He, Z.: Instance selection for time series classification based on immune binary particle swarm optimization. Knowl.-Based Syst. 49, 106–115 (2013)Google Scholar
  171. 171.
    Zhang, H., Sun, G.: Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognit. 35(7), 1481–1490 (2002)MATHGoogle Scholar
  172. 172.
    Zhang, L., Chen, C., Bu, J., He, X.: A unified feature and instance selection framework using optimum experimental design. IEEE Trans. Image Process. 21(5), 2379–2388 (2012)MathSciNetGoogle Scholar
  173. 173.
    Zhao, K.P., Zhou, S.G., Guan, J.H., Zhou, A.Y.: C-pruner: An improved instance pruning algorithm. In: Proceeding of the 2th International Conference on Machine Learning and Cybernetics, pp. 94–99 (2003)Google Scholar
  174. 174.
    Zhu, X., Yang, Y.: A lazy bagging approach to classification. Pattern Recognit. 41(10), 2980–2992 (2008)MATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Salvador García
    • 1
  • Julián Luengo
    • 2
  • Francisco Herrera
    • 3
  1. 1.Department of Computer ScienceUniversity of JaénJaénSpain
  2. 2.Department of Civil EngineeringUniversity of BurgosBurgosSpain
  3. 3.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations