Skip to main content

Advertisement

Log in

Hybrid genetic algorithm for dual selection

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this paper, a hybrid genetic approach is proposed to solve the problem of designing a subdatabase of the original one with the highest classification performances, the lowest number of features and the highest number of patterns. The method can simultaneously treat the double problem of editing instance patterns and selecting features as a single optimization problem, and therefore aims at providing a better level of information. The search is optimized by dividing the algorithm into self-controlled phases managed by a combination of pure genetic process and dedicated local approaches. Different heuristics such as an adapted chromosome structure and evolutionary memory are introduced to promote diversity and elitism in the genetic population. They particularly facilitate the resolution of real applications in the chemometric field presenting databases with large feature sizes and medium cardinalities. The study focuses on the double objective of enhancing the reliability of results while reducing the time consumed by combining genetic exploration and a local approach in such a way that excessive computational CPU costs are avoided. The usefulness of the method is demonstrated with artificial and real data and its performance is compared to other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Fauchère LJ, Bouting JA, Henlin JM, Kucharczyk N, Ortuno JC (1998) Combinatorial chemistry for the generation of molecular diversity and the discovery of bioactive lead. Chem Intell Lab Syst 43:43–68

    Article  Google Scholar 

  2. Borman S (1999) Reducing time to drug discovery. Recent advances in solid phase synthesis and high-throughpout screening suggest combinatorial chemistry is coming of age. CENEAR 77(10):33–48

    Google Scholar 

  3. Guyon I, Elisseeff A (2003) An Introduction to Variable and Descriptor Selection. J Mach Learn Res 3:1157–1182

    Article  MATH  Google Scholar 

  4. Ng AY (1998) Descriptor selection: learning with exponentially many irrelevant descriptors as training examples. In: 15th international conference on machine learning, San Francisco, pp 404–412

  5. Dasarathy BV (1990) Nearest neighbor (NN) norms: NN pattern recognition techniques. IEEE Computer Society Press, Los Alamitos

    Google Scholar 

  6. Dasarathy BV (1994) Minimal consistent set (MSC) identification for optimal nearest neighbor decision system design. IEEE Trans Syst Man Cybern 24:511–517

    Article  Google Scholar 

  7. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD conference, pp 427–438

  8. Dasarathy BV, Sanchez JS, Townsend S (2003) Nearest neighbour editing and condensing tools-synergy exploitation. Pattern Anal Appl 3:19–30

    Article  Google Scholar 

  9. Kuncheva LI, Jain LC (1999) Nearest neighbor classifier: simultaneous editing and descriptor selection. Pattern Recognit Lett 20(11–13):1149–1156

    Article  Google Scholar 

  10. Ho SY, Chang XI (1999) An efficient generalized multiobjective evolutionary algorithm. In: Proceedings of the genetic and evolutionary computation conference. Morgan Kaufmann Publishers, Los Altos, pp 871–878

  11. Davis TE, Principe JC (1991) A simulated annealing-like converge theory for the simple genetic algorithm, In: ICGA, pp 174–181

  12. Ye T, Kaur HT, Kalyanaraman S (2003) A recursive random search algorithm for large scale network parameter configuration. In: SIGMETRICS 2003, San Diego

  13. Glover F (1989) Tabu Search. ORSA J Comput 1(3):190–206

    MATH  Google Scholar 

  14. Boyan J, Moore A (2000) Learning evaluation functions to improve optimisation by local search. J Mach Learn Res 1:77–112

    Article  Google Scholar 

  15. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Boston

    MATH  Google Scholar 

  16. Forrest S, Mitchell M (1993) What makes a problem hard for a genetic algorithm? some anomalous results and their explanation. Mach Learn 13:285–319

    Article  Google Scholar 

  17. Glicman MR, Sycara K (2000) Reasons for premature convergence of self-adapting mutation rates. In: Proceedings of the congress on evolutionary computation, San Diego, vol 1, pp 62–69

  18. Schaffer J, Caruana R, Eshelman L, Das R (1989) A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Proceedings of 3rd international conference on genetic algorithm, Morgan Kaufman, pp 51–60

  19. Costa J, Tavares R, Rosa A (1999) An experimental study on dynamic random variation of population size. In: Proceedings of IEEE systems, man and cybernetics conference, Tokyo, vol 6, pp 607–612

  20. Tuson A, Ross P (1998) Adapting operator settings. Genet Algorithms Evol Comput 6(2):161–184

    Google Scholar 

  21. Pelikan M, Lobo FG (2000) Parameter-less genetic algorithm: a worst-case time and space complexity analysis. In: Proceedings of the genetic and evolutionary computation conference, San Francisco, pp 370–377

  22. Eiben AE, Marchiori E, Valko VA (2004) Evolutionary algorithms with on-the-fly population size adjustment. In: Proceedings of the 8th international conference on parallel problem solving from nature (PPSN VIII), Birmingham, pp 41–50

  23. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156

    Article  Google Scholar 

  24. Piramuthu S (2004) Evaluating feature selection methods for learning in data mining application. Eur J Oper Res 156:483–494

    Article  MATH  Google Scholar 

  25. Kohavi R, John G (1997) Wrappers for feature selection. Artif Intell 97:273–324

    Article  MATH  Google Scholar 

  26. Stracuzzi DJ, Utgoff PE (2004) Randomized variable elimination. J Mach Learn Res 5:1331–1362

    MathSciNet  Google Scholar 

  27. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 9th national conference on artificial intelligence, pp 129–134

  28. Almuallim H, Diettrerich TG (1994) Learning boolean concepts in the presence of many irrelevant feautres. Artif Intell 69(1–2):279–305

    Article  MATH  Google Scholar 

  29. Ratanamahatan A, Gunopulos D (2003) Feature selection for the naive bayesian classifier using decision trees. Appl Artif Intell 17:475–487

    Article  Google Scholar 

  30. Shalkoff R (1992) Pattern recognition statistical, structural and neural approaches. Wiley, Singapore

    Google Scholar 

  31. Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice-Hall, Englewood Cliffs

    MATH  Google Scholar 

  32. Caruana R, Freitag D (1994) Greedy attibute selection. In: Proceedings of 11th international conference on machine learning. Morgan Kaufman, New Jersey, pp 28–36

  33. Shalak DB (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: Proceedings of the 11th international conference on machine learning, New Brunswick. Morgan Kaufman, New Jersey, pp 293–301

  34. Collins RJ, Jeferson DR (1991) Selection in massively parallel genetic algorithms. In: Proceedings of the 4th international conference on genetic algorithms, San Diego, pp 244–248

  35. Jain AK, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158

    Article  Google Scholar 

  36. Zongker D, Jain AK (2004) Algorithms for feature selection: an evaluation. IEEE Trans Pattern Anal Mach Intell 26(9):1105–1113

    Article  Google Scholar 

  37. Zhang H, Sun G (2002) Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognit 35:1481–1490

    Article  MATH  Google Scholar 

  38. Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6:153–172

    Article  MathSciNet  MATH  Google Scholar 

  39. Dasarathy BV (1994) Minimal consistent subset (MCS) identification for optimal nearest neighbor decision systems design. IEEE Trans Syst Man Cybern 24:511–517

    Article  Google Scholar 

  40. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 16:515–516

    Article  Google Scholar 

  41. Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433

    Article  Google Scholar 

  42. Swonger CW (1972) Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition. In: Watanabe S (ed) Academic, Orlando, pp 511–519

  43. Aha D, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66

    Google Scholar 

  44. Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286

    Article  MATH  Google Scholar 

  45. Kuncheva LI (1997) Fitness functions in editing k-NN reference set by genetic algorithms. Pattern Recognit 30(6):1041–1049

    Article  Google Scholar 

  46. Guo L, Huang DS, Zhao W (2003) Combining genetic optimization with hybrid learning algorithm for radial basis function neural networks. Electron Lett Online 39(22)

  47. Bezdek JC, Kuncheva LI (2000) Nearest prototype classifier designs: an experimental study. Int J Intell Syst 16(12):1445–1473

    Article  Google Scholar 

  48. Bezdek JC, Kuncheva LI (2000) Some notes on twenty one (21) nearest prototype classifiers. In: Ferri FJ et al (eds) SSPR&SPR. Springer, Berlin, pp 1–16

    Google Scholar 

  49. Kim SW, Oommen BJ (2003) A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Anal Appl 6:232–244

    Article  MathSciNet  Google Scholar 

  50. Shekhar S, Lu CT, Zhang P (2003) A unified approach to detecting spatial outliers. Geoinformatica 7(2):139–166

    Article  Google Scholar 

  51. Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Google Scholar 

  52. Shekhar S, Lu CT, Zhang P (2002) Detecting graph-based spatial outliers. Int J Intell Data Anal 6(5):451–468

    MATH  Google Scholar 

  53. Lun C-T, Chen, Kou Y. (2003) Algorithms for spatial outliers detection. In: Proceedings of the 3rd IEEE international conference on data mining

  54. Aguilar JC, Riquelme JC, Toro M (2001) Data set editing by ordered projection. Intell Data Anal 5(5):1–13

    Google Scholar 

  55. Quinlan J (1992) C4.5 programs for machine learning. Morgan Kaufman, San Francisco

    Google Scholar 

  56. Kim SW, Oommen BJ (2003) Enhancing Prototype reduction schemes with recursion: a method applicable for “Large” data sets. IEEE Trans Syst Man Cybern 34(3):Part B

  57. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2:408–421

    Article  MATH  Google Scholar 

  58. Francesco JF, Jesus V, Vidal A (1999) Considerations about sample-size sensitivity of a family of edited nearest-neighbor rules. IEEE Trans Syst Man Cybern 29(4):Part B

  59. Devijver P, Kittler J (1980) On the Edited Nearest Neighbor Rule. IEEE Pattern Recognition 1:72–80

    Google Scholar 

  60. Garfield E (1979) Citation indexing: its theory and application in science, technology and humanities. Wiley, New York

    Google Scholar 

  61. Barandela R, Gasca E (2000) Decontamination of training samples for supervised pattern recognition methods. In: Ferri FJ, Inesta Quereda JM, Amin A, Paudil P (eds) Lecture Notes in Computer Science, vol 1876. Springer, Berlin, pp 621–630

  62. Jiang Y, Zhou ZH () Editing training data for kNN classifiers with neural network ensemble

  63. Eiben AE, Hinterding R, Michalewicz Z (1999) Parameter control in evolutionary algorithms. IEEE Trans Evol Comput 3(2):124–141

    Article  Google Scholar 

  64. Tuson A, Ross P (1998) Adapting operator settings. Genet Algorithms Evol Comput 6(2):161–184

    Google Scholar 

  65. Costa J, Tavares R, Rosa A (1999) An experimental study on dynamic random variation of population size. In: Proceedings of IEEE systems, man and cybernetics Conference, Tokyo, vol 6, pp 607–612

  66. Arabas J, Michalewicz Z, Mulawka J (1994) A genetic algorithm with varying population size. In: Proceedings of the 1st IEEE conference on evolutionary computation, Piscataway, pp 73–78

  67. Deb K, Goldberg DE (1989) An investigation of niche and species formation in genetic function optimisation. In: Schaffer JD (ed) Proceedings of the 3rd international conference on genetic algorithms. Morgan Kaufmann, San Mateo, pp 42–50

  68. Beasley D, Bull DR, Martin RR (1993) A sequential niche technique for multimodal function optimization. Evol Comput 1(2):101–125

    Article  Google Scholar 

  69. Goldberg DE, Richardson J (1987) Genetic algorithms with sharing for multimodal function optimisation. In: Grefensette JJ (ed) Proceedings of the 2nd international conference on genetic algorithms, Hillsdale, pp 41–49

  70. Deb K (1989) Genetic Algorithm in multimodal function optimisation. MS thesis, TCGA Report n°89002, University of Alabama

  71. Miller BL, Shaw MJ (1996) Genetic algorithms with dynamic sharing for multimodal function optimization. In: Proceedings of international conference on evolutionary computation, Piscataway, pp 786–791

  72. Sareni B, Krahenbuhl L (1998) Fitness sharing and niching methods revisited. IEEE Trans Evol Comput 2(3):97–106

    Article  Google Scholar 

  73. Youang B (2002) Deterministic crowding, recombination and self-similarity. In: Proceedings of IEEE

  74. Li JP, Balazs ME, Parks GT, Clarkson PJ (2002) A species conserving genetic algorithm for multimodal function optimization. Evol Comput 10(3):207–234

    Article  Google Scholar 

  75. DeJong KA (1975) Analysis of the behavior of a class of genetic adaptive systems. PhD thesis, University of Michigan

  76. Mahfoud SW (1992) Crowding and preselection revisited. In: 2nd Conference on parallel problem solving from nature (PPSN’92), Brussels, vol 2, pp 27–36

  77. Harik G (1995) Finding multimodal solutions using restricted tournament selection. In: Eshelman LJ (ed) Proceedings of 6th international conference on genetic algorithms. Morgan Kaufman, San Mateo, pp 24–31

  78. Deb K, Pratap A, Agarwal S, Meyarivan T (2000) A fast and elitist multi-objective genetic algorithm: NSGA-II, KanGal (Kanpur Genetic Algorithm Laboratory) Report No. 200001

  79. Wiese K, Goodwin SD (1998) Keep-best reproduction: a selection strategy for genetic algorithms. In: Proceedings of the 1998 symposium on applied computing, pp 343–348

  80. Matsui K (1999) New selection method to improve the population diversity in genetic algorithms systems, man and cybernetics. IEEE Int Conf 1:625–630

    Google Scholar 

  81. Lozano M, Herrera F, Cano JR (2007) Replacement strategies to preserve useful diversity in steady-state genetic algorithms. Elsevier, Amsterdam (in press)

  82. Knowles JD (2002) Local search and hybrid evolutionary algorithms for Pareto optimization. PhD Thesis, University of Reading

  83. Zitzler E, Teich J, Bhattacharyya (2000) Optimizing the efficiency of parameterized local search within global search: a preliminary study. In: Proceedings of the congress on evolutionary computation, San Diego, pp 365–372

  84. Moscato P (1999) Memetic algorithms: a short introduction. In: Corne D, Glover F, Dorigo M (eds) New ideas in optimization. McGraw-Hill, Maidenhead, pp 219–234

    Google Scholar 

  85. Hart WE (1994) adaptative global optimization with local search. PhD Thesis, University of California, San Diego

  86. Land MWS (1998) Evolutionary algorithms with local search for combinatorial optimization. PhD Thesis, University of California, San Diego

  87. Ros F, Pintore M, Chretien JR (2002) Molecular description selection combining genetic algorithms and fuzzy logic: application to database mining procedures. J Chem Int Lab Syst 63:15–22

    Article  Google Scholar 

  88. Leardi R, Gonzalez AL (1998) Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chem Intell Lab Syst 41(2):195–207

    Article  Google Scholar 

  89. Merz P (2000) Memetic algorithms for combinatorial optimization problems: fitness landscapes and effective search strategies. PhD thesis, University of Siegen

  90. Merz P, Freisleben (1999) A comparison of memetic algorithms, tabu search and ant colonies for the quadratic assignment problem. In: Proceedings of the international congress of evolutionary computation, Washington DC

  91. Krasnogor N (2002) Studies on the theory and design space of memetic algorithms. Thesis University of the West of England, Bristol

  92. Zitzler E, Laumanns M, Bleuler S (2004) A tutorial on evolutionary multiobjective optimization

  93. Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading

    MATH  Google Scholar 

  94. Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the11th international conference on genetic algorithms, pp 93–100

  95. Horn J, Nafpliotis N, Goldberg DE (1994) A niched Pareto genetic algorithm for multiobjective optimization. In: Proceedings of the 1st IEEE conference on evolutionary computation, vol 1, pp 82–87

  96. Laumanns M, Thiele L, Deb K, Zitzler E (2000) On the convergence and diversity-preservation properties of multi-objective evolutionary algorithms. Evol Comput 8(2):149–172

    Article  Google Scholar 

  97. Mitsuo G, Runwei C (1997) Genetic algorithms and engineering design. Wiley, NewYork

    Google Scholar 

  98. Coello CA, Van Veldhuizen, Lamont GB (2002) Evolutionary algorithms for solving multi-objective problems. Kluwer, New York

    MATH  Google Scholar 

  99. Zitzler E (1999) Evolutionary algorithms for multiobjective optimization: methods and applications. PhD Thesis, Shaker Verlag, Aachen

  100. Tamaki H, Mori M, Araki M, Ogai H (1995) Multicriteria optimization by genetic algorithms: a case of scheduling in hot rolling process. In: Proceedings of the 3rd APORS, pp 374–381

  101. Skalak DB (1997) Prototype selection for composite nearest neighbor classifiers, Phd Thesis. University of Massachuset Amherst

  102. Kuncheva LI, Jain LC (1999) Nearest neighbor classifier: simultaneous editing and descriptor selection. Pattern Recognit Lett 20(11–13):1149–1156

    Article  Google Scholar 

  103. Ho S-H, Lui C-C, Liu S (2002) Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm. Pattern Recognit Lett 23:1495–1503

    Article  MATH  Google Scholar 

  104. Cano JR, Herrera F, Lozano (2003) Using evolutionary algorithms as instance selection for data reduction in kdd: an experimental study. IEEE Trans Evol Comput 7(6):193–208

    Google Scholar 

  105. Chen JH, Chen HM, Ho SY (2005) Design of nearest neighbor classifiers: multi-objective approach. Int J Approx Reason (in press)

  106. Blake C, Keogh E, Merz CJ (1998) UCI repository of machine learning databases (http://www.ics.uci.edi/∼mlearn/MLRepository.html), Department of Information and Computer Science, University of California

  107. Geiger DL, Brooke LT, Call DJ (Eds) (1990) Acute toxicities of organic chemicals to Fathead Minnows (Pimephales promelas), Center for Lake Superior Environmental Studies, University of Wisconsin, Superior

  108. Directive 92/32/ECC (1992), the 7th amendment to directive 67/548/ECC, OJL 154 of 5.VI.92, p1

  109. Knowles JD, Corne DW (2000) Approximating the nondominated front using the Pareto archived evolution strategy. Evol Comput 8(2):149–172

    Article  Google Scholar 

  110. Jacquet-Lagrèze E (1990) Interactive assessment of preferences using holistic judgements: the PREFCALC system. In: Bana e Costa CA (ed) Readings in multiple criteria decision aid, Springer, Heidelberg, pp 336–350

  111. Blayo F, Demartines P (1991) Data analysis: How to compare Kohonen neural networks to others techniques? International workshop in artificial neural networks (IWANN 1991), Barcelona, Lectures Notes on Computer Science. Springer, Heidelberg, pp 469–476

  112. Kireev D, Bernard D, Chretien JR, Ros F (1998) Application of Kohonen neural networks in classification of biologically active compounds. SAR QSAR Environ Res 8:93–107

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederic Ros.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ros, F., Guillaume, S., Pintore, M. et al. Hybrid genetic algorithm for dual selection. Pattern Anal Applic 11, 179–198 (2008). https://doi.org/10.1007/s10044-007-0089-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-007-0089-3

Keywords

Navigation