Evolutionary Computation in Microarray Data Analysis

  • Jason H. Moore
  • Joel S. Parker


We are facing an information explosion in the biomedical sciences. For example, our ability to measure the expression levels of thousands of different genes simultaneously in a particular cell or tissue has far outpaced our ability to store, manage, and analyse the data being generated. In this review, we explore the use of evolutionary computation for dealing with some of the difficult statistical and computational challenges that have resulted from the development and implementation of new technologies such as DNA microarrays. We review genetic algorithms and genetic programming as evolutionary computation strategies that have been applied to the analysis of DNA microarray data.

Key words

Evolutionary computation genetic algorithms genetic programming machine learning DNA microarrays gene expression 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Andre D, Koza JR (1996) A parallel implementation of genetic programming that achieves superlinear performance. In Arabnia HR (ed) Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Volume III, pp 1163–1174. CSREA, Athens.Google Scholar
  2. Banzhaff W, Mordin P, Keller RE, Francone FD (1998) Genetic Programming: An Introduction. Morgan Kaufmann Publishers, San Francisco.Google Scholar
  3. Cantu-Paz (2000) Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publishers, Boston.CrossRefGoogle Scholar
  4. Carlborg O, Andersson L, Kinghorn B (2000) The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics 155:2003–2010.PubMedGoogle Scholar
  5. Congdon CB, Sing CF, Reilly SL (1993) Genetic algorithms for identifying combinations of genes and other risk factors associated with coronary artery disease. In: Proceedings of the Workshop on Artificial Intelligence and the Genome. Chambery.Google Scholar
  6. Cowgill MC, Harvey RJ, Watson LT (1999) A genetic algorithm approach to cluster analysis. Comput Math App 37:99–108.CrossRefGoogle Scholar
  7. Dandekar T, Argos P (1992) Potential of genetic algorithms in protein folding and protein engineering simulations. Protein Eng 5:637–645.PubMedCrossRefGoogle Scholar
  8. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–8PubMedCrossRefGoogle Scholar
  9. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188.Google Scholar
  10. Gibson G (1996) Epistasis and pleiotropy as natural properties of transcriptional regulation. Theor Popul Biol 49:58–89.PubMedCrossRefGoogle Scholar
  11. Gilbert RJ, Rowland JJ, Kell DB (2000) Genomic computing: explanatory modeling for functional genomics. In Whitley D, Goldberg D, Cantu-Paz E, Spector L, Parmee I, Beyer H-G (eds) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO-2000, pp 551–557. Morgan Kauffman Publishers, San Francisco.Google Scholar
  12. Goldberg DE (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading.Google Scholar
  13. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537.PubMedCrossRefGoogle Scholar
  14. Handley S (1993) Automatic learning of a detector for alpha-helices in protein sequences via genetic programming. In Forrest S (ed) Proceedings of the 5th International Conference on Genetic Algorithms, ICGA-93, pages 271–278. Morgan Kaufman Publishers, San Francisco.Google Scholar
  15. Handley S (1994) The prediction of the degree of exposure to solvent of amino acid residues via genetic programming. Proc Int Conf Intell Syst Mol Biol 2:156–160.PubMedGoogle Scholar
  16. Handley S (1996) A new class of function sets for solving sequence problems. In Koza JR, Goldberg DE, Fogel DB, Riolo RL (eds) Genetic Programming 1996: Proceedings of the First Annual Conference, pages 301–308. MIT Press, Cambridge.Google Scholar
  17. Haupt RL, Haupt SE (1998) Practical Genetic Algorithms. John Wiley & Sons, Inc., New York.Google Scholar
  18. Holland JH (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.Google Scholar
  19. Kauffman SA (1993) The Origins of Order. Oxford University Press, New York.Google Scholar
  20. Kel A, Ptitsyn A, Babenko V, Meier-Ewert S, Lehrach H (1998) A genetic algorithm for designing gene family-specific oligonucleotides sets used for hybridization: the G protein-coupled receptor protein superfamily. Bioinformatics 14:259–270.PubMedCrossRefGoogle Scholar
  21. Kirkpatrick S, Gelatt C, Vecchi M (1983) Optimization by simulated annealing. Science 220:671–680.PubMedCrossRefGoogle Scholar
  22. Koza JR (1992) Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge.Google Scholar
  23. Koza JR (1994) Evolution of a computer program for classifying protein segments as transmembrane domains using genetic programming. Proc Int Conf Intell Syst Mol Biol 2:244–252.PubMedGoogle Scholar
  24. Koza JR, Andre D (1996) A case study where biology inspired a solution to a computer science problem. Pac Symp Biocomput 500–511.Google Scholar
  25. Koza JR, Bennett III FH, Andre D, Keane MA (1999) Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kauffman Publishers, San Francisco.Google Scholar
  26. Koza JR, Mydlowec W, Lanza G, Yu J, Keane MA (2001) Reverse engineering of metabolic pathways from observed data using genetic programming. Pac Symp Biocomput 434–445.Google Scholar
  27. Langley P (1996) Elements of Machine Learning. Morgan Kaufmann Publishers, Inc., San Francisco.Google Scholar
  28. Li L, Bushel P, Pedersen L, Darden T, Hamadeh H, Bennett L, Afshari C, Paules R, Umbach D and Weinberg C. Computational analysis of leukemia microarray expression data using the GA/KNN method and other existing tools. In press.Google Scholar
  29. Lozano JA, Larranaga (1999) Applying genetic algorithms to search for the best hierarchical clustering of a dataset. Pattern Recog Lett 20:911–918.CrossRefGoogle Scholar
  30. Mitchell M (1996) An Introduction to Genetic Algorithms. MIT Press, Cambridge.Google Scholar
  31. Moore JH, Parker JS and Hahn LW (2001) Symbolic discriminant analysis for mining gene expression patterns. Lecture Notes in Artificial Intelligence, in press.Google Scholar
  32. Parzen E (1962) On the estimation of a probability density function and mode. Ann Math Stat 33:1065–1076.CrossRefGoogle Scholar
  33. Raymer ML, Punch WF, Goodman ED, Kuhn LA (1996) Genetic programming for improved data mining: An application to the biochemistry of protein interactions. In Koza JR,Google Scholar
  34. Reijmers TH, Wehrens R, Buydens MC (1999) Quality criteria of genetic algorithms for construction of phylogenetic trees. J Comput Chem 20:867–876.CrossRefGoogle Scholar
  35. Ripley BD (1996) Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge.Google Scholar
  36. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Plummer WD, Parl FF and Moore JH (2001) Multifactor Dimensionality Reduction Reveals High-Order Interactions among Estrogen Metabolism Genes in Sporadic Breast Cancer. Am J Hum Genet 69:138–147.PubMedCrossRefGoogle Scholar
  37. Tapadar P, Ghosh S, Majumder PP (2000) Haplotyping in pedigrees via a genetic algorithm. Hum Hered 50:43–56.PubMedCrossRefGoogle Scholar
  38. Templeton AR (2000) Epistasis and Complex Traits. In: Wade M, Brodie III B, Wolf J (eds) Epistasis and Evolutionary Process. Oxford University Press.Google Scholar
  39. Thomas JG, Olson JM, Tapscott SJ, Zhao LP (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11:1227–1236.PubMedCrossRefGoogle Scholar
  40. Wright S (1932) The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proceedings of the 6th International Congress of Genetics 1:356–366.Google Scholar
  41. Wright S (1943) Isolation by distance. Genetics 28:114–138.PubMedGoogle Scholar
  42. Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intelligent Systems 13:44–49.CrossRefGoogle Scholar
  43. Zhang C, Wong AK (1997) A genetic algorithm for multiple molecular sequence alignment. Comput Appl Biosci 13:565–581.PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2002

Authors and Affiliations

  • Jason H. Moore
    • 1
  • Joel S. Parker
    • 1
  1. 1.Program in Human Genetics, Department of Molecular Physiology and BiophysicsVanderbilt University Medical SchoolNashvilleUSA

Personalised recommendations