Abstract
In this paper, a hybrid genetic approach is proposed to solve the problem of designing a subdatabase of the original one with the highest classification performances, the lowest number of features and the highest number of patterns. The method can simultaneously treat the double problem of editing instance patterns and selecting features as a single optimization problem, and therefore aims at providing a better level of information. The search is optimized by dividing the algorithm into self-controlled phases managed by a combination of pure genetic process and dedicated local approaches. Different heuristics such as an adapted chromosome structure and evolutionary memory are introduced to promote diversity and elitism in the genetic population. They particularly facilitate the resolution of real applications in the chemometric field presenting databases with large feature sizes and medium cardinalities. The study focuses on the double objective of enhancing the reliability of results while reducing the time consumed by combining genetic exploration and a local approach in such a way that excessive computational CPU costs are avoided. The usefulness of the method is demonstrated with artificial and real data and its performance is compared to other approaches.
Similar content being viewed by others
References
Fauchère LJ, Bouting JA, Henlin JM, Kucharczyk N, Ortuno JC (1998) Combinatorial chemistry for the generation of molecular diversity and the discovery of bioactive lead. Chem Intell Lab Syst 43:43–68
Borman S (1999) Reducing time to drug discovery. Recent advances in solid phase synthesis and high-throughpout screening suggest combinatorial chemistry is coming of age. CENEAR 77(10):33–48
Guyon I, Elisseeff A (2003) An Introduction to Variable and Descriptor Selection. J Mach Learn Res 3:1157–1182
Ng AY (1998) Descriptor selection: learning with exponentially many irrelevant descriptors as training examples. In: 15th international conference on machine learning, San Francisco, pp 404–412
Dasarathy BV (1990) Nearest neighbor (NN) norms: NN pattern recognition techniques. IEEE Computer Society Press, Los Alamitos
Dasarathy BV (1994) Minimal consistent set (MSC) identification for optimal nearest neighbor decision system design. IEEE Trans Syst Man Cybern 24:511–517
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD conference, pp 427–438
Dasarathy BV, Sanchez JS, Townsend S (2003) Nearest neighbour editing and condensing tools-synergy exploitation. Pattern Anal Appl 3:19–30
Kuncheva LI, Jain LC (1999) Nearest neighbor classifier: simultaneous editing and descriptor selection. Pattern Recognit Lett 20(11–13):1149–1156
Ho SY, Chang XI (1999) An efficient generalized multiobjective evolutionary algorithm. In: Proceedings of the genetic and evolutionary computation conference. Morgan Kaufmann Publishers, Los Altos, pp 871–878
Davis TE, Principe JC (1991) A simulated annealing-like converge theory for the simple genetic algorithm, In: ICGA, pp 174–181
Ye T, Kaur HT, Kalyanaraman S (2003) A recursive random search algorithm for large scale network parameter configuration. In: SIGMETRICS 2003, San Diego
Glover F (1989) Tabu Search. ORSA J Comput 1(3):190–206
Boyan J, Moore A (2000) Learning evaluation functions to improve optimisation by local search. J Mach Learn Res 1:77–112
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Boston
Forrest S, Mitchell M (1993) What makes a problem hard for a genetic algorithm? some anomalous results and their explanation. Mach Learn 13:285–319
Glicman MR, Sycara K (2000) Reasons for premature convergence of self-adapting mutation rates. In: Proceedings of the congress on evolutionary computation, San Diego, vol 1, pp 62–69
Schaffer J, Caruana R, Eshelman L, Das R (1989) A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Proceedings of 3rd international conference on genetic algorithm, Morgan Kaufman, pp 51–60
Costa J, Tavares R, Rosa A (1999) An experimental study on dynamic random variation of population size. In: Proceedings of IEEE systems, man and cybernetics conference, Tokyo, vol 6, pp 607–612
Tuson A, Ross P (1998) Adapting operator settings. Genet Algorithms Evol Comput 6(2):161–184
Pelikan M, Lobo FG (2000) Parameter-less genetic algorithm: a worst-case time and space complexity analysis. In: Proceedings of the genetic and evolutionary computation conference, San Francisco, pp 370–377
Eiben AE, Marchiori E, Valko VA (2004) Evolutionary algorithms with on-the-fly population size adjustment. In: Proceedings of the 8th international conference on parallel problem solving from nature (PPSN VIII), Birmingham, pp 41–50
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Piramuthu S (2004) Evaluating feature selection methods for learning in data mining application. Eur J Oper Res 156:483–494
Kohavi R, John G (1997) Wrappers for feature selection. Artif Intell 97:273–324
Stracuzzi DJ, Utgoff PE (2004) Randomized variable elimination. J Mach Learn Res 5:1331–1362
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 9th national conference on artificial intelligence, pp 129–134
Almuallim H, Diettrerich TG (1994) Learning boolean concepts in the presence of many irrelevant feautres. Artif Intell 69(1–2):279–305
Ratanamahatan A, Gunopulos D (2003) Feature selection for the naive bayesian classifier using decision trees. Appl Artif Intell 17:475–487
Shalkoff R (1992) Pattern recognition statistical, structural and neural approaches. Wiley, Singapore
Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice-Hall, Englewood Cliffs
Caruana R, Freitag D (1994) Greedy attibute selection. In: Proceedings of 11th international conference on machine learning. Morgan Kaufman, New Jersey, pp 28–36
Shalak DB (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: Proceedings of the 11th international conference on machine learning, New Brunswick. Morgan Kaufman, New Jersey, pp 293–301
Collins RJ, Jeferson DR (1991) Selection in massively parallel genetic algorithms. In: Proceedings of the 4th international conference on genetic algorithms, San Diego, pp 244–248
Jain AK, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158
Zongker D, Jain AK (2004) Algorithms for feature selection: an evaluation. IEEE Trans Pattern Anal Mach Intell 26(9):1105–1113
Zhang H, Sun G (2002) Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognit 35:1481–1490
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6:153–172
Dasarathy BV (1994) Minimal consistent subset (MCS) identification for optimal nearest neighbor decision systems design. IEEE Trans Syst Man Cybern 24:511–517
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 16:515–516
Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433
Swonger CW (1972) Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition. In: Watanabe S (ed) Academic, Orlando, pp 511–519
Aha D, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286
Kuncheva LI (1997) Fitness functions in editing k-NN reference set by genetic algorithms. Pattern Recognit 30(6):1041–1049
Guo L, Huang DS, Zhao W (2003) Combining genetic optimization with hybrid learning algorithm for radial basis function neural networks. Electron Lett Online 39(22)
Bezdek JC, Kuncheva LI (2000) Nearest prototype classifier designs: an experimental study. Int J Intell Syst 16(12):1445–1473
Bezdek JC, Kuncheva LI (2000) Some notes on twenty one (21) nearest prototype classifiers. In: Ferri FJ et al (eds) SSPR&SPR. Springer, Berlin, pp 1–16
Kim SW, Oommen BJ (2003) A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Anal Appl 6:232–244
Shekhar S, Lu CT, Zhang P (2003) A unified approach to detecting spatial outliers. Geoinformatica 7(2):139–166
Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253
Shekhar S, Lu CT, Zhang P (2002) Detecting graph-based spatial outliers. Int J Intell Data Anal 6(5):451–468
Lun C-T, Chen, Kou Y. (2003) Algorithms for spatial outliers detection. In: Proceedings of the 3rd IEEE international conference on data mining
Aguilar JC, Riquelme JC, Toro M (2001) Data set editing by ordered projection. Intell Data Anal 5(5):1–13
Quinlan J (1992) C4.5 programs for machine learning. Morgan Kaufman, San Francisco
Kim SW, Oommen BJ (2003) Enhancing Prototype reduction schemes with recursion: a method applicable for “Large” data sets. IEEE Trans Syst Man Cybern 34(3):Part B
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2:408–421
Francesco JF, Jesus V, Vidal A (1999) Considerations about sample-size sensitivity of a family of edited nearest-neighbor rules. IEEE Trans Syst Man Cybern 29(4):Part B
Devijver P, Kittler J (1980) On the Edited Nearest Neighbor Rule. IEEE Pattern Recognition 1:72–80
Garfield E (1979) Citation indexing: its theory and application in science, technology and humanities. Wiley, New York
Barandela R, Gasca E (2000) Decontamination of training samples for supervised pattern recognition methods. In: Ferri FJ, Inesta Quereda JM, Amin A, Paudil P (eds) Lecture Notes in Computer Science, vol 1876. Springer, Berlin, pp 621–630
Jiang Y, Zhou ZH () Editing training data for kNN classifiers with neural network ensemble
Eiben AE, Hinterding R, Michalewicz Z (1999) Parameter control in evolutionary algorithms. IEEE Trans Evol Comput 3(2):124–141
Tuson A, Ross P (1998) Adapting operator settings. Genet Algorithms Evol Comput 6(2):161–184
Costa J, Tavares R, Rosa A (1999) An experimental study on dynamic random variation of population size. In: Proceedings of IEEE systems, man and cybernetics Conference, Tokyo, vol 6, pp 607–612
Arabas J, Michalewicz Z, Mulawka J (1994) A genetic algorithm with varying population size. In: Proceedings of the 1st IEEE conference on evolutionary computation, Piscataway, pp 73–78
Deb K, Goldberg DE (1989) An investigation of niche and species formation in genetic function optimisation. In: Schaffer JD (ed) Proceedings of the 3rd international conference on genetic algorithms. Morgan Kaufmann, San Mateo, pp 42–50
Beasley D, Bull DR, Martin RR (1993) A sequential niche technique for multimodal function optimization. Evol Comput 1(2):101–125
Goldberg DE, Richardson J (1987) Genetic algorithms with sharing for multimodal function optimisation. In: Grefensette JJ (ed) Proceedings of the 2nd international conference on genetic algorithms, Hillsdale, pp 41–49
Deb K (1989) Genetic Algorithm in multimodal function optimisation. MS thesis, TCGA Report n°89002, University of Alabama
Miller BL, Shaw MJ (1996) Genetic algorithms with dynamic sharing for multimodal function optimization. In: Proceedings of international conference on evolutionary computation, Piscataway, pp 786–791
Sareni B, Krahenbuhl L (1998) Fitness sharing and niching methods revisited. IEEE Trans Evol Comput 2(3):97–106
Youang B (2002) Deterministic crowding, recombination and self-similarity. In: Proceedings of IEEE
Li JP, Balazs ME, Parks GT, Clarkson PJ (2002) A species conserving genetic algorithm for multimodal function optimization. Evol Comput 10(3):207–234
DeJong KA (1975) Analysis of the behavior of a class of genetic adaptive systems. PhD thesis, University of Michigan
Mahfoud SW (1992) Crowding and preselection revisited. In: 2nd Conference on parallel problem solving from nature (PPSN’92), Brussels, vol 2, pp 27–36
Harik G (1995) Finding multimodal solutions using restricted tournament selection. In: Eshelman LJ (ed) Proceedings of 6th international conference on genetic algorithms. Morgan Kaufman, San Mateo, pp 24–31
Deb K, Pratap A, Agarwal S, Meyarivan T (2000) A fast and elitist multi-objective genetic algorithm: NSGA-II, KanGal (Kanpur Genetic Algorithm Laboratory) Report No. 200001
Wiese K, Goodwin SD (1998) Keep-best reproduction: a selection strategy for genetic algorithms. In: Proceedings of the 1998 symposium on applied computing, pp 343–348
Matsui K (1999) New selection method to improve the population diversity in genetic algorithms systems, man and cybernetics. IEEE Int Conf 1:625–630
Lozano M, Herrera F, Cano JR (2007) Replacement strategies to preserve useful diversity in steady-state genetic algorithms. Elsevier, Amsterdam (in press)
Knowles JD (2002) Local search and hybrid evolutionary algorithms for Pareto optimization. PhD Thesis, University of Reading
Zitzler E, Teich J, Bhattacharyya (2000) Optimizing the efficiency of parameterized local search within global search: a preliminary study. In: Proceedings of the congress on evolutionary computation, San Diego, pp 365–372
Moscato P (1999) Memetic algorithms: a short introduction. In: Corne D, Glover F, Dorigo M (eds) New ideas in optimization. McGraw-Hill, Maidenhead, pp 219–234
Hart WE (1994) adaptative global optimization with local search. PhD Thesis, University of California, San Diego
Land MWS (1998) Evolutionary algorithms with local search for combinatorial optimization. PhD Thesis, University of California, San Diego
Ros F, Pintore M, Chretien JR (2002) Molecular description selection combining genetic algorithms and fuzzy logic: application to database mining procedures. J Chem Int Lab Syst 63:15–22
Leardi R, Gonzalez AL (1998) Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chem Intell Lab Syst 41(2):195–207
Merz P (2000) Memetic algorithms for combinatorial optimization problems: fitness landscapes and effective search strategies. PhD thesis, University of Siegen
Merz P, Freisleben (1999) A comparison of memetic algorithms, tabu search and ant colonies for the quadratic assignment problem. In: Proceedings of the international congress of evolutionary computation, Washington DC
Krasnogor N (2002) Studies on the theory and design space of memetic algorithms. Thesis University of the West of England, Bristol
Zitzler E, Laumanns M, Bleuler S (2004) A tutorial on evolutionary multiobjective optimization
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading
Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the11th international conference on genetic algorithms, pp 93–100
Horn J, Nafpliotis N, Goldberg DE (1994) A niched Pareto genetic algorithm for multiobjective optimization. In: Proceedings of the 1st IEEE conference on evolutionary computation, vol 1, pp 82–87
Laumanns M, Thiele L, Deb K, Zitzler E (2000) On the convergence and diversity-preservation properties of multi-objective evolutionary algorithms. Evol Comput 8(2):149–172
Mitsuo G, Runwei C (1997) Genetic algorithms and engineering design. Wiley, NewYork
Coello CA, Van Veldhuizen, Lamont GB (2002) Evolutionary algorithms for solving multi-objective problems. Kluwer, New York
Zitzler E (1999) Evolutionary algorithms for multiobjective optimization: methods and applications. PhD Thesis, Shaker Verlag, Aachen
Tamaki H, Mori M, Araki M, Ogai H (1995) Multicriteria optimization by genetic algorithms: a case of scheduling in hot rolling process. In: Proceedings of the 3rd APORS, pp 374–381
Skalak DB (1997) Prototype selection for composite nearest neighbor classifiers, Phd Thesis. University of Massachuset Amherst
Kuncheva LI, Jain LC (1999) Nearest neighbor classifier: simultaneous editing and descriptor selection. Pattern Recognit Lett 20(11–13):1149–1156
Ho S-H, Lui C-C, Liu S (2002) Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm. Pattern Recognit Lett 23:1495–1503
Cano JR, Herrera F, Lozano (2003) Using evolutionary algorithms as instance selection for data reduction in kdd: an experimental study. IEEE Trans Evol Comput 7(6):193–208
Chen JH, Chen HM, Ho SY (2005) Design of nearest neighbor classifiers: multi-objective approach. Int J Approx Reason (in press)
Blake C, Keogh E, Merz CJ (1998) UCI repository of machine learning databases (http://www.ics.uci.edi/∼mlearn/MLRepository.html), Department of Information and Computer Science, University of California
Geiger DL, Brooke LT, Call DJ (Eds) (1990) Acute toxicities of organic chemicals to Fathead Minnows (Pimephales promelas), Center for Lake Superior Environmental Studies, University of Wisconsin, Superior
Directive 92/32/ECC (1992), the 7th amendment to directive 67/548/ECC, OJL 154 of 5.VI.92, p1
Knowles JD, Corne DW (2000) Approximating the nondominated front using the Pareto archived evolution strategy. Evol Comput 8(2):149–172
Jacquet-Lagrèze E (1990) Interactive assessment of preferences using holistic judgements: the PREFCALC system. In: Bana e Costa CA (ed) Readings in multiple criteria decision aid, Springer, Heidelberg, pp 336–350
Blayo F, Demartines P (1991) Data analysis: How to compare Kohonen neural networks to others techniques? International workshop in artificial neural networks (IWANN 1991), Barcelona, Lectures Notes on Computer Science. Springer, Heidelberg, pp 469–476
Kireev D, Bernard D, Chretien JR, Ros F (1998) Application of Kohonen neural networks in classification of biologically active compounds. SAR QSAR Environ Res 8:93–107
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ros, F., Guillaume, S., Pintore, M. et al. Hybrid genetic algorithm for dual selection. Pattern Anal Applic 11, 179–198 (2008). https://doi.org/10.1007/s10044-007-0089-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-007-0089-3