Genetic Programming and Evolvable Machines

, Volume 8, Issue 4, pp 395–411 | Cite as

Genomic mining for complex disease traits with “random chemistry”

  • Margaret J. Eppstein
  • Joshua L. Payne
  • Bill C. White
  • Jason H. Moore
Original paper


Our rapidly growing knowledge regarding genetic variation in the human genome offers great potential for understanding the genetic etiology of disease. This, in turn, could revolutionize detection, treatment, and in some cases prevention of disease. While genes for most of the rare monogenic diseases have already been discovered, most common diseases are complex traits, resulting from multiple gene–gene and gene-environment interactions. Detecting epistatic genetic interactions that predispose for disease is an important, but computationally daunting, task currently facing bioinformaticists. Here, we propose a new evolutionary approach that attempts to hill-climb from large sets of candidate epistatic genetic features to smaller sets, inspired by Kauffman’s “random chemistry” approach to detecting small auto-catalytic sets of molecules from within large sets. Although the algorithm is conceptually straightforward, its success hinges upon the creation of a fitness function able to discriminate large sets that contain subsets of interacting genetic features from those that don’t. Here, we employ an approximate and noisy fitness function based on the ReliefF data mining algorithm. We establish proof-of-concept using synthetic data sets, where individual features have no marginal effects. We show that the resulting algorithm can successfully detect epistatic pairs from up to 1,000 candidate single nucleotide polymorphisms in time that is linear in the size of the initial set, although success rate degrades as heritability declines. Research continues into seeking a more accurate fitness approximator for large sets and other algorithmic improvements that will enable us to extend the approach to larger data sets and to lower heritabilities.


Evolutionary algorithms Epistasis Single nucleotide polymorphisms Data mining Genome-wide association studies Complex traits Feature selection 



This work was supported, in part, by a pilot award and a graduate research assistantship, funded by DOE-FG02-00ER45828 awarded by the US Department of Energy through its EPSCoR Program and by National Institutes of Health grants AI59694 and LM009012. We thank Joshua Gilbert for his aid in creating the synthetic data sets.


  1. 1.
    Barrett, H.H., Myers, K.J.: Foundations of Image Science. John Wiley & Sons, Inc., New Jersey (2004)Google Scholar
  2. 2.
    Culverhouse, R., Suarez, B.K., Lin, J., Reich, T.: A perspective on epistasis: Limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002)CrossRefGoogle Scholar
  3. 3.
    Glazier, A.M., Nadeau, J.H., Aitman, T.J.: Finding genes that underlie complex traits. Science 298, 2345–2349 (2002)CrossRefGoogle Scholar
  4. 4.
    Hirschhorn, J.N., Daly, M.J.: Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005)CrossRefGoogle Scholar
  5. 5.
    Hoh, J., Wille, A., Ott, J.: Trimming, weighting, and grouping SNPs in human case-control association studies. Gen. Res. 11, 2115–2119 (2001)CrossRefGoogle Scholar
  6. 6.
    International HapMap Consortium: The international HapMap project. Nature 426, 789–796 (2003)CrossRefGoogle Scholar
  7. 7.
    International human genome sequencing consortium: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)CrossRefGoogle Scholar
  8. 8.
    International SNP map working group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)CrossRefGoogle Scholar
  9. 9.
    Kauffman, S.: At Home in the Universe: The Search for the Laws of Self-Organization and Complexity. Oxford Univ. Press, USA (1996)Google Scholar
  10. 10.
    Kruglyak, L., Nickerson, D.A.: Variation is the spice of life. Nat. Genet. 27, 234–236 (2001)CrossRefGoogle Scholar
  11. 11.
    Lucek, P.R., Ott, J.: Neural network analysis of complex traits. Gen. Epidem. 14, 1101–1106 (1997)CrossRefGoogle Scholar
  12. 12.
    McKinney, B.A., Reif, D.M., Ritchie, M.D., Moore, J.H.: Machine learning for detecting gene-gene interactions. Appl. Bioinformatics 5, 77–88 (2006)CrossRefGoogle Scholar
  13. 13.
    Merikangas, K.R., Low, N.C.P, Hardy, J.: Understanding sources of complexity in chronic diseases—the importance of integration of genetics and epidemiology. Int. J. Epidemiol. 35, 590–592 (2006)CrossRefGoogle Scholar
  14. 14.
    Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003)CrossRefGoogle Scholar
  15. 15.
    Moore, J.H.: Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev. Mol. Diagn. 4, 795–803 (2004)CrossRefGoogle Scholar
  16. 16.
    Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N, White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Moore, J.H., Ritchie, M.D.: The challenges of whole-genome approaches to common diseases. JAMA 291, 1642–1643 (2002)CrossRefGoogle Scholar
  18. 18.
    Moore J.H., White B.C.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice IV. Springer, New York (2006)Google Scholar
  19. 19.
    Moore J.H., White B.C.: Tuning ReliefF for genome-wide genetic analysis. In: Rajapakse, J.C. et al. (eds.) Lecture Notes in Computer Science, 4447, pp. 166–175, Springer, New York (2007)Google Scholar
  20. 20.
    Ott, J., Hoh, J.: Statistical multilocus methods for disequilibrium analysis in complex traits. Hum. Mut. 17, 285–288 (2001)CrossRefGoogle Scholar
  21. 21.
    Peltonen, L., McKusick, V.A.: Dissecting human disease in the postgenomic era. Science 291, 1224–1229 (2001)CrossRefGoogle Scholar
  22. 22.
    Proulx, S.R., Phillips, P.C.: The opportunity for canalization and the evolution of genetic networks. Am. Nat. 165, 147–162 (2005)CrossRefGoogle Scholar
  23. 23.
    Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., Barabasi, A.-L.: Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002)CrossRefGoogle Scholar
  24. 24.
    Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am. J. Hum. Gen. 69, 138–147 (2001)CrossRefGoogle Scholar
  25. 25.
    Robnik-Sikonja, M., Konenenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003)zbMATHCrossRefGoogle Scholar
  26. 26.
    Syvanen, A.C.: Accessing genetic variation: genotyping single nucleotide polymorphisms. Nat. Rev. Genet. 2, 930–942 (2001)CrossRefGoogle Scholar
  27. 27.
    Thornton-Wells, T.A., Moore, J.H., Haines, J.L.: Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet. 20, 640–647 (2004)CrossRefGoogle Scholar
  28. 28.
    Tong, A.H. et al.: Global mapping of the yeast genetic interaction network. Science 303, 808–813 (2004)CrossRefGoogle Scholar
  29. 29.
    Venter, J.C., et al.: The sequence of the human genome. Science 291, 1304–1351 (2001)CrossRefGoogle Scholar
  30. 30.
    Wang, W.Y., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet. 6, 109–118 (2005)CrossRefGoogle Scholar
  31. 31.
    White, B.C., Gilbert, J.C., Reif, D.M., Moore, J.H.: A statistical comparison of grammatical evolution strategies in the domain of human genetics. In: Corne, D. et al (eds.) Proc. of the IEEE Congress on Evol. Computing pp. 676–682. IEEE Press, Edinburgh, UK, (2005)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Margaret J. Eppstein
    • 1
  • Joshua L. Payne
    • 2
  • Bill C. White
    • 3
  • Jason H. Moore
    • 3
  1. 1.Departments of Computer Science and BiologyUniversity of VermontBurlingtonUSA
  2. 2.Department of Computer ScienceUniversity of VermontBurlingtonUSA
  3. 3.Computational Genetics LaboratoryDartmouth CollegeLebanonUSA

Personalised recommendations