Applied Bioinformatics

, Volume 5, Issue 2, pp 77–88 | Cite as

Machine Learning for Detecting Gene-Gene Interactions

A Review
  • Brett A. McKinney
  • David M. Reif
  • Marylyn D. Ritchie
  • Jason H. Moore
Biomedical Genomics and Proteomics


Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are ‘the norm’ and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics.


Hide Layer Genetic Programming Multifactor Dimensionality Reduction Traditional Statistical Method Genomewide Association Study 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by National Institutes of Health (NIH) grants AI059694, LM009012, AI057661, AI064625, HL65234, RR018787, ES007373 and HD047447. This work was also supported by generous funds from the Vanderbilt Program in Biomathematics and the Norris-Cotton Cancer Center at Dartmouth Medical School.

The authors have no conflicts of interest that are directly relevant to the content of this review.


  1. 1.
    Moore JH, Williams SM. Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 2005 Jun; 27(6): 637–46PubMedCrossRefGoogle Scholar
  2. 2.
    Moore JH. A global view of epistasis. Nat Genet 2005 Jan; 37(1): 13–4PubMedCrossRefGoogle Scholar
  3. 3.
    Bateson W. Mendel’s principles of heredity. Cambridge: Cambridge University Press, 1909CrossRefGoogle Scholar
  4. 4.
    Fisher RA. The correlation between relatives on the assumption of Mendelian inheritance. Trans R Soc Edinb 1918; 52: 399–433Google Scholar
  5. 5.
    Phillips PC. The language of gene interaction. Genetics 1998 Jul; 149(3): 1167–71PubMedGoogle Scholar
  6. 6.
    Freitas AA. Understanding the crucial role of attribute interaction in data mining. Artif Intell Rev 2001; 16(3): 177–99CrossRefGoogle Scholar
  7. 7.
    Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common diseases. Hum Hered 2003; 56: 73–82PubMedCrossRefGoogle Scholar
  8. 8.
    Sing CF, Stengard JH, Kardia SL. Genes, environment, and cardiovascular disease. Arterioscler Thromb Vasc Biol 2003 Jul; 23(7): 1190–6PubMedCrossRefGoogle Scholar
  9. 9.
    Gibson G, Wagner G. Canalization in evolutionary genetics: a stabilizing theory? Bioessays 2000 Apr; 22(4): 372–80PubMedCrossRefGoogle Scholar
  10. 10.
    Templeton AR. Epistasis and complex traits. In: Wolf JB, Brodie ED, Wade MJ, editors. Epistasis and the evolutionary process. Oxford: Oxford University Press, 2000: 41–57Google Scholar
  11. 11.
    Remold SK, Lenski RE. Pervasive joint influence of epistasis and plasticity on mutational effects in Escherichia coli. Nat Genet 2004 Apr; 36(4): 423–6PubMedCrossRefGoogle Scholar
  12. 12.
    Segre D, Deluna A, Church GM, et al. Modular epistasis in yeast metabolism. Nat Genet 2005 Jan; 37(1): 77–83PubMedGoogle Scholar
  13. 13.
    Hirschhorn JN, Lohmueller K, Byrne E, et al. A comprehensive review of genetic association studies. Genet Med 2002; 4: 45–61PubMedCrossRefGoogle Scholar
  14. 14.
    Moore JH, Williams SM. New strategies for identifying gene-gene interactions in hypertension. Ann Med 2002; 34: 88–95PubMedCrossRefGoogle Scholar
  15. 15.
    Thornton-Wells TA, Moore JH, Haines JL. Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet 2004; 20(12): 640–7PubMedCrossRefGoogle Scholar
  16. 16.
    Li W, Reich J. A complete enumeration and classification of two-locus disease models. Hum Hered 2000; 50: 334–49PubMedCrossRefGoogle Scholar
  17. 17.
    Culverhouse R, Suarez BK, Lin J, et al. A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 2002; 70: 461–71PubMedCrossRefGoogle Scholar
  18. 18.
    Moore JH, Hahn LW, Ritchie MD, et al. Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. In: Langden WB, Cantú-Paz E, Mathias K, et al., editors. Proceedings of the Genetic and Evolutionary Computational Conference 2002. San Francisco (CA): Morgan-Kauffman, 2002: 1150–5Google Scholar
  19. 19.
    Bellman R. Adaptive control processes. Princeton (NJ): Princeton University Press, 1961Google Scholar
  20. 20.
    Gauderman WJ, Faucett CL. Detection of gene-environment interactions in joint segregation and linkage analysis. Am J Hum Genet 1997 Nov; 61(5): 1189–99PubMedCrossRefGoogle Scholar
  21. 21.
    Coffey CS, Hebert PR, Krumholz HM, et al. Reporting of model validation procedures in human studies of genetic interactions. Nutrition 2004; 20(1): 69–73PubMedCrossRefGoogle Scholar
  22. 22.
    Coffey CS, Hebert PR, Ritchie MD, et al. An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene interactions on risk of myocardial infarction: the importance of model validation. BMC Bioinformatics 2004; 5: 49PubMedCrossRefGoogle Scholar
  23. 23.
    Mitchell T. Machine learning. Boston (MA): McGraw Hill, 1997Google Scholar
  24. 24.
    Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science 1983; 220: 671–80PubMedCrossRefGoogle Scholar
  25. 25.
    Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading (MA): Addison-Wesley, 1989Google Scholar
  26. 26.
    Koza JR. Genetic programming: on the programming of computers by means of natural selection. Cambridge (MA): MIT Press, 1992Google Scholar
  27. 27.
    Fogel GB, Corne DW. Evolutionary computation in bioinformatics. San Francisco (CA): Morgan-Kauffman, 2003Google Scholar
  28. 28.
    Skapura D. Building neural networks. New York: ACM Press, 1995Google Scholar
  29. 29.
    Tarassenko L. A guide to neural computing applications. London: Arnold Publishers, 1998Google Scholar
  30. 30.
    Anderson J. An introduction to neural networks. Cambridge (MA): MIT Press, 1995Google Scholar
  31. 31.
    Bhat A, Lucek PR, Ott J. Analysis of complex traits using neural networks. Genet Epidemiol 1999; 17Suppl. 1: S503-7Google Scholar
  32. 32.
    Bicciato S, Pandin M, Didone G, et al. Pattern identification and classification in gene expression data using an autoassociative neural network model. Biotechnol Bioeng 2003 Mar; 81(5): 594–606PubMedCrossRefGoogle Scholar
  33. 33.
    Curtis D, North BV, Sham PC. Use of artificial neural network to detect association between a disease and multiple marker genotypes. Ann Hum Genet 2001; 65: 95–107PubMedCrossRefGoogle Scholar
  34. 34.
    Hsia TC, Chiang HC, Chiang D, et al. Prediction of survival in surgical unresectable lung cancer by artificial neural networks including genetic polymorphisms and clinical parameters. J Clin Lab Anal 2003; 17(6): 229–34PubMedCrossRefGoogle Scholar
  35. 35.
    Li W, Haghighi F, Falk C. Design of artificial neural network and its applications to the analysis of alcoholism data. Genet Epidemiol 1999; 17: S223–8PubMedGoogle Scholar
  36. 36.
    Lucek PR, Hanke J, Reich J, et al. Multi-locus nonparametric linkage analysis of complex trait loci with neural networks. Hum Hered 1998; 48(5): 275–84PubMedCrossRefGoogle Scholar
  37. 37.
    Lucek PR, Ott J. Neural network analysis of complex traits. Genet Epidemiol 1997; 14(6): 1101–6PubMedCrossRefGoogle Scholar
  38. 38.
    Marinov M, Weeks D. The complexity of linkage analysis with neural networks. Hum Hered 2001; 51: 169–76PubMedCrossRefGoogle Scholar
  39. 39.
    Ott J. Neural networks and disease association studies. Am J Med Genet 2001; 105: 60–1PubMedCrossRefGoogle Scholar
  40. 40.
    Saccone NL, Downey TJ, Meyer DJ, et al. Mapping genotype to phenotype for linkage analysis. Genet Epidemiol 1997; 17: S703–8Google Scholar
  41. 41.
    Serretti A, Smeraldi E. Neural network analysis in pharmacogenetics of mood disorders. BMC Med Genet 2004 Dec; 5: 27PubMedCrossRefGoogle Scholar
  42. 42.
    Sherriff A, Ott J. Application of neural networks for gene finding. Adv Genet 2001; 42: 287–97PubMedCrossRefGoogle Scholar
  43. 43.
    Tomita Y, Tomida S, Hasegawa Y, et al. Artificial neural network approach for selection of susceptible single nucleotide polymorphisms and construction of rediction model on childhood allergic asthma. BMC Bioinformatics 2004 Sep; 5: 20CrossRefGoogle Scholar
  44. 44.
    Ritchie MD, White BC, Parker JS, et al. Optimization of neural network architecture using genetic programming improves the detection and modeling of genegene interactions in studies of human diseases. BMC Bioinformatics 2003; 4: 28PubMedCrossRefGoogle Scholar
  45. 45.
    Koza JR, Rice JP. Genetic generation of both the weights and architecture for a neural network. cataway (NJ): IEEE Press 1991Google Scholar
  46. 46.
    Ritchie MD, Coffey CS, Moore JH. Genetic programming neural networks as a bioinformatics tool in human genetics. In: Deb K, Poli R, Banthaf W, et al., editors. Lecture notes in computer science. Vol. 3102. New York: Springer, 2004; 438-48Google Scholar
  47. 47.
    Bush WS, Motsinger AA, Dudek SM, et al. Can neural network constraints in GP provide power to detect genes associated with human disease? In: Rothlauf F, Branke J, Cagnoni S, et al., editors. Lecture notes in computer science. Vol. 3449. New York: Springer, 2005; 44–53Google Scholar
  48. 48.
    Motsinger AA, Lee SL, Mellick G, et al. Power of genetic programming neural networks for detecting high-order gene-gene interactions in association studies of human disease and an application in Parkinson’s disease. BMC Bioinformatics 2006; 7: 39PubMedCrossRefGoogle Scholar
  49. 49.
    Von Neumann. The theory of self-reproducing automata. Urbana (IL): University of Illinois Press, 1966Google Scholar
  50. 50.
    Spezzano G, Talia D, Gregorio SD, et al. A parallel cellular tool for interaction modeling and simulation. IEEE Computational Science and Engineering 1996; 3: 33–43CrossRefGoogle Scholar
  51. 51.
    Toffoli T. Cellular automata as an alternative to (rather than approximation of) differential equations in modeling physics. Physica D 1984; 10: 117–27CrossRefGoogle Scholar
  52. 52.
    Mitchell M, Crutchfield JP, Hraber PT. Evolving cellular automata to perform computations: mechanisms and impediments. Physica D 1994; 75: 361–91CrossRefGoogle Scholar
  53. 53.
    Packard NH. Adaptation toward the edge of chaos. In: Kelso JAS, Mandell AJ, Shlesinger MF, editors. Dynamical patterns in complex systems. Singapore: World Scientific, 1988: 293–301Google Scholar
  54. 54.
    Capcarrere MS, Sipper M. Necessary conditions for density classification by cellular automata. Phys Rev E Stat Nonlin Soft Matter Phys 2001; 64: 036113PubMedCrossRefGoogle Scholar
  55. 55.
    Moore JH, Hahn LW. Cellular automata and genetic algorithms for parallel problem solving in human genetics. In: Merelo JJ, Panagiotis A, Beyer H-G, editors. Lecture notes in computer science. Vol. 2439. New York: Springer, 2002; 821–30Google Scholar
  56. 56.
    Moore JH, Hahn LW. A cellular automata approach to detecting interactions among single-nucleotide polymorphisms in complex multifactorial diseases. Pac Symp Biocomput 2002, 53–64Google Scholar
  57. 57.
    Busch C, Hegele R. Genetic determinants of type 2 diabetes mellitus. Clin Genet 2002; 60: 243–54CrossRefGoogle Scholar
  58. 58.
    Breiman L. Random forests. Mach Learn 2001; 45(1): 5–32CrossRefGoogle Scholar
  59. 59.
    Bureau A, Dupuis J, Falls K, et al. Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 2005 Feb; 28(2): 171–82PubMedCrossRefGoogle Scholar
  60. 60.
    Breiman L, Friedman JH, Olshen RA, et al. Classification and regression trees. Belmont (CA): Wadsworth International Group, 1984Google Scholar
  61. 61.
    Cook NR, Zee RY, Ridker PM. Tree and spline based association analysis of gene-gene interaction models for ischemic stroke. Stat Med 2004 May; 23(9): 1439–53PubMedCrossRefGoogle Scholar
  62. 62.
    Lunetta KL, Hayward LB, Segal J, et al. Screening large-scale association study data: exploiting interactions using random forests. BMC Genet 2004 Dec; 5(1): 32PubMedCrossRefGoogle Scholar
  63. 63.
    Schwender H, Zucknick M, Ickstadt K, et al. A pilot study on the application of statistical classification procedures to molecular epidemiological data. Toxicol Lett 2004 Jun; 151(1): 291–9PubMedCrossRefGoogle Scholar
  64. 64.
    Hahn LW, Moore JH. Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biol 2004; 4(2): 183–94Google Scholar
  65. 65.
    Hahn LW, Ritchie MD, Moore JH. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 2003; 19(3): 376–82PubMedCrossRefGoogle Scholar
  66. 66.
    Moore JH. Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev Mol Diagn 2004; 4(6): 795–803PubMedCrossRefGoogle Scholar
  67. 67.
    Ritchie MD, Hahn LW, Roodi N, et al. Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am J Hum Genet 2001; 69: 138–47PubMedCrossRefGoogle Scholar
  68. 68.
    Ritchie MD, Hahn LW, Moore JH. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 2003 Feb; 24(2): 150–7PubMedCrossRefGoogle Scholar
  69. 69.
    Michalski RS. A theory and methodology of inductive learning. Artif Intell 1983; 20: 111–61CrossRefGoogle Scholar
  70. 70.
    Moore JH, Gilbert JC, Tsai CT, et al. A flexible framework for data mining and knowledge discovery in human genetics. J Theor Biol. In pressGoogle Scholar
  71. 71.
    Langley P. The computer-aided discovery of scientific knowledge. In: Carbonell JG, Siekmann J, editors. Lecture notes in artifical intelligence. Vol. 1532. New York: Springer, 1998; 25–39Google Scholar
  72. 72.
    Langley P. The computational support of scientific discovery. In: Carbonell JG, Siekmann J, editors. Lecture notes in artifical intelligence. Vol. 2049. New York: Springer, 2001; 230–48Google Scholar
  73. 73.
    Langley P. Lessons for the computational discovery of scientific knowledge. International Conference on Machine Learning; 2002 Jul 8–12; Sydney (NSW). San Francisco (CA): Morgan-Kauffman, 2002; 9–12Google Scholar
  74. 74.
    Williams SM, Ritchie MD, Phillips JA, et al. Multilocus analysis of hypertension. Hum Hered 2004; 57: 28–38PubMedCrossRefGoogle Scholar
  75. 75.
    Cho YM, Ritchie MD, Moore JH, et al. Multifactor-dimensionality reduction shows a two-locus interaction associated with type 2 diabetes mellitus. Diabetologia 2004; 47: 549–54PubMedCrossRefGoogle Scholar
  76. 76.
    Motsinger AA, Donahue BS, Brown NJ, et al. Risk factor interactions and genetic effects associated with post-operative atrial fibrillation. Pac Symp Biocomput 2006; 11: 514–95Google Scholar
  77. 77.
    Tsai CT, Lai LP, Lin JL, et al. Renin-angiotensin system gene polymorphisms and atrial fibrillation. Circulation 2004; 109: 1640–6PubMedCrossRefGoogle Scholar
  78. 78.
    Soares ML, Coelho T, Sousa A, et al. Susceptibility and modifier genes in Portuguese transthyretin V30M amyloid polyneuropathy: complexity in a single-gene disease. Hum Mol Genet 2005 Feb 15; 14(4): 543–53PubMedCrossRefGoogle Scholar
  79. 79.
    Ashley-Koch AE, Mei H, Jaworski J, et al. An analysis paradigm for investigating multi-locus effects in complex disease: examination of three GABAA receptor subunit genes on 15q11-q13 as risk factors for autistic disorder. Ann Hum Genet 2006; 70: 281–92PubMedCrossRefGoogle Scholar
  80. 80.
    Ma DQ, Whitehead PL, Menold MM, et al. Identification of significant association and gene-gene interaction of GABA receptor subunit genes in autism. Am J Hum Genet 2005 Sep; 77(3): 377–88PubMedCrossRefGoogle Scholar
  81. 81.
    Bastone L, Reilly M, Rader DJ, et al. MDR and PRP: a comparison of methods for high-order genotype-phenotype associations. Hum Hered 2004; 58(2): 82–92PubMedCrossRefGoogle Scholar
  82. 82.
    Wilke RA, Reif DM, Moore JH. Combinatorial pharmacogenetics. Nat Rev Drug Discov 2005 Nov; 4(11): 911–8PubMedCrossRefGoogle Scholar
  83. 83.
    Wilke RA, Moore JH, Burmester JK. Relative impact of CYP3A genotype and concomitant medication on the severity of atorvastatin-induced muscle damage. Pharmacogenet Genomics 2005 Jun; 15(6): 415–21PubMedCrossRefGoogle Scholar
  84. 84.
    Andrew AS, Nelson HN, Kelsey KT, et al. Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking, and bladder cancer susceptibility. Carcinogenesis 2006; 27: 1030–7PubMedCrossRefGoogle Scholar
  85. 85.
    Xu J, Lowey J, Wiklund F, et al. The interaction of four genes in the inflammation pathway significantly predicts prostate cancer risk. Cancer Epidemiol Biomarkers Prev 2005 Nov; 14(11): 2563–8PubMedCrossRefGoogle Scholar
  86. 86.
    Qin S, Zhao X, Pan Y, et al. An association study of the N-methyl-D-aspartate receptor NR1 subunit gene (GRIN1) and NR2B subunit gene (GRIN2B) in schizophrenia with universal DNA microarray. Eur J Hum Genet 2005 Jul; 13(7): 807–14PubMedCrossRefGoogle Scholar
  87. 87.
    Robnik-Sikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 2003; 53(1): 23–69CrossRefGoogle Scholar
  88. 88.
    Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005 Feb; 6(2): 95–108PubMedCrossRefGoogle Scholar
  89. 89.
    Wang WY, Barratt BJ, Clayton DG, et al. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005 Feb; 6(2): 109–18PubMedCrossRefGoogle Scholar
  90. 90.
    Jakulin A, Bratko I. Analyzing attribute dependencies. In: Lavrac N, Gamberger D, Todorovski L, et al., editors. Lecture notes in artificial intelligence. Berlin: Springer-Verlag, 2003: 229–40Google Scholar
  91. 91.
    Jakulin A. Attribute interactions in machine learning [PhD thesis]. Ljubljana, Slovenia: University of Ljubljana, 2003Google Scholar
  92. 92.
    Moore JH, Ritchie MD. The challenges of whole-genome approaches to common diseases. JAMA 2004 Apr; 291(13): 1642–3PubMedCrossRefGoogle Scholar

Copyright information

© Adis Data Information BV 2006

Authors and Affiliations

  • Brett A. McKinney
    • 1
    • 2
  • David M. Reif
    • 1
    • 2
  • Marylyn D. Ritchie
    • 1
  • Jason H. Moore
    • 2
    • 3
    • 4
    • 5
    • 6
  1. 1.Department of Molecular Physiology and Biophysics, Center for Human Genetics ResearchVanderbilt University Medical SchoolNashvilleUSA
  2. 2.Computational Genetics Laboratory, Department of GeneticsDartmouth Medical SchoolLebanonUSA
  3. 3.Department of Community and Family MedicineDartmouth Medical SchoolLebanonUSA
  4. 4.Department of Biological SciencesDartmouth CollegeHanoverUSA
  5. 5.Department of Computer ScienceUniversity of New HampshireDurhamUSA
  6. 6.Department of Computer ScienceUniversity of VermontBurlingtonUSA

Personalised recommendations