Machine Learning for Detecting Gene-Gene Interactions

A Review


Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are ‘the norm’ and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics.

This is a preview of subscription content, log in to check access.

Table I
Table II
Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    Moore JH, Williams SM. Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 2005 Jun; 27(6): 637–46

    PubMed  Article  CAS  Google Scholar 

  2. 2.

    Moore JH. A global view of epistasis. Nat Genet 2005 Jan; 37(1): 13–4

    PubMed  Article  CAS  Google Scholar 

  3. 3.

    Bateson W. Mendel’s principles of heredity. Cambridge: Cambridge University Press, 1909

    Google Scholar 

  4. 4.

    Fisher RA. The correlation between relatives on the assumption of Mendelian inheritance. Trans R Soc Edinb 1918; 52: 399–433

    Google Scholar 

  5. 5.

    Phillips PC. The language of gene interaction. Genetics 1998 Jul; 149(3): 1167–71

    PubMed  CAS  Google Scholar 

  6. 6.

    Freitas AA. Understanding the crucial role of attribute interaction in data mining. Artif Intell Rev 2001; 16(3): 177–99

    Article  Google Scholar 

  7. 7.

    Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common diseases. Hum Hered 2003; 56: 73–82

    PubMed  Article  Google Scholar 

  8. 8.

    Sing CF, Stengard JH, Kardia SL. Genes, environment, and cardiovascular disease. Arterioscler Thromb Vasc Biol 2003 Jul; 23(7): 1190–6

    PubMed  Article  CAS  Google Scholar 

  9. 9.

    Gibson G, Wagner G. Canalization in evolutionary genetics: a stabilizing theory? Bioessays 2000 Apr; 22(4): 372–80

    PubMed  Article  CAS  Google Scholar 

  10. 10.

    Templeton AR. Epistasis and complex traits. In: Wolf JB, Brodie ED, Wade MJ, editors. Epistasis and the evolutionary process. Oxford: Oxford University Press, 2000: 41–57

    Google Scholar 

  11. 11.

    Remold SK, Lenski RE. Pervasive joint influence of epistasis and plasticity on mutational effects in Escherichia coli. Nat Genet 2004 Apr; 36(4): 423–6

    PubMed  Article  CAS  Google Scholar 

  12. 12.

    Segre D, Deluna A, Church GM, et al. Modular epistasis in yeast metabolism. Nat Genet 2005 Jan; 37(1): 77–83

    PubMed  CAS  Google Scholar 

  13. 13.

    Hirschhorn JN, Lohmueller K, Byrne E, et al. A comprehensive review of genetic association studies. Genet Med 2002; 4: 45–61

    PubMed  Article  CAS  Google Scholar 

  14. 14.

    Moore JH, Williams SM. New strategies for identifying gene-gene interactions in hypertension. Ann Med 2002; 34: 88–95

    PubMed  Article  CAS  Google Scholar 

  15. 15.

    Thornton-Wells TA, Moore JH, Haines JL. Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet 2004; 20(12): 640–7

    PubMed  Article  CAS  Google Scholar 

  16. 16.

    Li W, Reich J. A complete enumeration and classification of two-locus disease models. Hum Hered 2000; 50: 334–49

    PubMed  Article  CAS  Google Scholar 

  17. 17.

    Culverhouse R, Suarez BK, Lin J, et al. A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 2002; 70: 461–71

    PubMed  Article  Google Scholar 

  18. 18.

    Moore JH, Hahn LW, Ritchie MD, et al. Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. In: Langden WB, Cantú-Paz E, Mathias K, et al., editors. Proceedings of the Genetic and Evolutionary Computational Conference 2002. San Francisco (CA): Morgan-Kauffman, 2002: 1150–5

  19. 19.

    Bellman R. Adaptive control processes. Princeton (NJ): Princeton University Press, 1961

    Google Scholar 

  20. 20.

    Gauderman WJ, Faucett CL. Detection of gene-environment interactions in joint segregation and linkage analysis. Am J Hum Genet 1997 Nov; 61(5): 1189–99

    PubMed  Article  CAS  Google Scholar 

  21. 21.

    Coffey CS, Hebert PR, Krumholz HM, et al. Reporting of model validation procedures in human studies of genetic interactions. Nutrition 2004; 20(1): 69–73

    PubMed  Article  CAS  Google Scholar 

  22. 22.

    Coffey CS, Hebert PR, Ritchie MD, et al. An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene interactions on risk of myocardial infarction: the importance of model validation. BMC Bioinformatics 2004; 5: 49

    PubMed  Article  Google Scholar 

  23. 23.

    Mitchell T. Machine learning. Boston (MA): McGraw Hill, 1997

    Google Scholar 

  24. 24.

    Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science 1983; 220: 671–80

    PubMed  Article  CAS  Google Scholar 

  25. 25.

    Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading (MA): Addison-Wesley, 1989

    Google Scholar 

  26. 26.

    Koza JR. Genetic programming: on the programming of computers by means of natural selection. Cambridge (MA): MIT Press, 1992

    Google Scholar 

  27. 27.

    Fogel GB, Corne DW. Evolutionary computation in bioinformatics. San Francisco (CA): Morgan-Kauffman, 2003

    Google Scholar 

  28. 28.

    Skapura D. Building neural networks. New York: ACM Press, 1995

    Google Scholar 

  29. 29.

    Tarassenko L. A guide to neural computing applications. London: Arnold Publishers, 1998

    Google Scholar 

  30. 30.

    Anderson J. An introduction to neural networks. Cambridge (MA): MIT Press, 1995

    Google Scholar 

  31. 31.

    Bhat A, Lucek PR, Ott J. Analysis of complex traits using neural networks. Genet Epidemiol 1999; 17Suppl. 1: S503-7

    Google Scholar 

  32. 32.

    Bicciato S, Pandin M, Didone G, et al. Pattern identification and classification in gene expression data using an autoassociative neural network model. Biotechnol Bioeng 2003 Mar; 81(5): 594–606

    PubMed  Article  CAS  Google Scholar 

  33. 33.

    Curtis D, North BV, Sham PC. Use of artificial neural network to detect association between a disease and multiple marker genotypes. Ann Hum Genet 2001; 65: 95–107

    PubMed  Article  CAS  Google Scholar 

  34. 34.

    Hsia TC, Chiang HC, Chiang D, et al. Prediction of survival in surgical unresectable lung cancer by artificial neural networks including genetic polymorphisms and clinical parameters. J Clin Lab Anal 2003; 17(6): 229–34

    PubMed  Article  CAS  Google Scholar 

  35. 35.

    Li W, Haghighi F, Falk C. Design of artificial neural network and its applications to the analysis of alcoholism data. Genet Epidemiol 1999; 17: S223–8

    PubMed  Google Scholar 

  36. 36.

    Lucek PR, Hanke J, Reich J, et al. Multi-locus nonparametric linkage analysis of complex trait loci with neural networks. Hum Hered 1998; 48(5): 275–84

    PubMed  Article  CAS  Google Scholar 

  37. 37.

    Lucek PR, Ott J. Neural network analysis of complex traits. Genet Epidemiol 1997; 14(6): 1101–6

    PubMed  Article  CAS  Google Scholar 

  38. 38.

    Marinov M, Weeks D. The complexity of linkage analysis with neural networks. Hum Hered 2001; 51: 169–76

    PubMed  Article  CAS  Google Scholar 

  39. 39.

    Ott J. Neural networks and disease association studies. Am J Med Genet 2001; 105: 60–1

    PubMed  Article  CAS  Google Scholar 

  40. 40.

    Saccone NL, Downey TJ, Meyer DJ, et al. Mapping genotype to phenotype for linkage analysis. Genet Epidemiol 1997; 17: S703–8

    Google Scholar 

  41. 41.

    Serretti A, Smeraldi E. Neural network analysis in pharmacogenetics of mood disorders. BMC Med Genet 2004 Dec; 5: 27

    PubMed  Article  Google Scholar 

  42. 42.

    Sherriff A, Ott J. Application of neural networks for gene finding. Adv Genet 2001; 42: 287–97

    PubMed  Article  CAS  Google Scholar 

  43. 43.

    Tomita Y, Tomida S, Hasegawa Y, et al. Artificial neural network approach for selection of susceptible single nucleotide polymorphisms and construction of rediction model on childhood allergic asthma. BMC Bioinformatics 2004 Sep; 5: 20

    Article  Google Scholar 

  44. 44.

    Ritchie MD, White BC, Parker JS, et al. Optimization of neural network architecture using genetic programming improves the detection and modeling of genegene interactions in studies of human diseases. BMC Bioinformatics 2003; 4: 28

    PubMed  Article  Google Scholar 

  45. 45.

    Koza JR, Rice JP. Genetic generation of both the weights and architecture for a neural network. cataway (NJ): IEEE Press 1991

    Google Scholar 

  46. 46.

    Ritchie MD, Coffey CS, Moore JH. Genetic programming neural networks as a bioinformatics tool in human genetics. In: Deb K, Poli R, Banthaf W, et al., editors. Lecture notes in computer science. Vol. 3102. New York: Springer, 2004; 438-48

    Google Scholar 

  47. 47.

    Bush WS, Motsinger AA, Dudek SM, et al. Can neural network constraints in GP provide power to detect genes associated with human disease? In: Rothlauf F, Branke J, Cagnoni S, et al., editors. Lecture notes in computer science. Vol. 3449. New York: Springer, 2005; 44–53

    Google Scholar 

  48. 48.

    Motsinger AA, Lee SL, Mellick G, et al. Power of genetic programming neural networks for detecting high-order gene-gene interactions in association studies of human disease and an application in Parkinson’s disease. BMC Bioinformatics 2006; 7: 39

    PubMed  Article  Google Scholar 

  49. 49.

    Von Neumann. The theory of self-reproducing automata. Urbana (IL): University of Illinois Press, 1966

    Google Scholar 

  50. 50.

    Spezzano G, Talia D, Gregorio SD, et al. A parallel cellular tool for interaction modeling and simulation. IEEE Computational Science and Engineering 1996; 3: 33–43

    Article  Google Scholar 

  51. 51.

    Toffoli T. Cellular automata as an alternative to (rather than approximation of) differential equations in modeling physics. Physica D 1984; 10: 117–27

    Article  Google Scholar 

  52. 52.

    Mitchell M, Crutchfield JP, Hraber PT. Evolving cellular automata to perform computations: mechanisms and impediments. Physica D 1994; 75: 361–91

    Article  Google Scholar 

  53. 53.

    Packard NH. Adaptation toward the edge of chaos. In: Kelso JAS, Mandell AJ, Shlesinger MF, editors. Dynamical patterns in complex systems. Singapore: World Scientific, 1988: 293–301

    Google Scholar 

  54. 54.

    Capcarrere MS, Sipper M. Necessary conditions for density classification by cellular automata. Phys Rev E Stat Nonlin Soft Matter Phys 2001; 64: 036113

    PubMed  Article  CAS  Google Scholar 

  55. 55.

    Moore JH, Hahn LW. Cellular automata and genetic algorithms for parallel problem solving in human genetics. In: Merelo JJ, Panagiotis A, Beyer H-G, editors. Lecture notes in computer science. Vol. 2439. New York: Springer, 2002; 821–30

    Google Scholar 

  56. 56.

    Moore JH, Hahn LW. A cellular automata approach to detecting interactions among single-nucleotide polymorphisms in complex multifactorial diseases. Pac Symp Biocomput 2002, 53–64

    Google Scholar 

  57. 57.

    Busch C, Hegele R. Genetic determinants of type 2 diabetes mellitus. Clin Genet 2002; 60: 243–54

    Article  Google Scholar 

  58. 58.

    Breiman L. Random forests. Mach Learn 2001; 45(1): 5–32

    Article  Google Scholar 

  59. 59.

    Bureau A, Dupuis J, Falls K, et al. Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 2005 Feb; 28(2): 171–82

    PubMed  Article  Google Scholar 

  60. 60.

    Breiman L, Friedman JH, Olshen RA, et al. Classification and regression trees. Belmont (CA): Wadsworth International Group, 1984

    Google Scholar 

  61. 61.

    Cook NR, Zee RY, Ridker PM. Tree and spline based association analysis of gene-gene interaction models for ischemic stroke. Stat Med 2004 May; 23(9): 1439–53

    PubMed  Article  Google Scholar 

  62. 62.

    Lunetta KL, Hayward LB, Segal J, et al. Screening large-scale association study data: exploiting interactions using random forests. BMC Genet 2004 Dec; 5(1): 32

    PubMed  Article  Google Scholar 

  63. 63.

    Schwender H, Zucknick M, Ickstadt K, et al. A pilot study on the application of statistical classification procedures to molecular epidemiological data. Toxicol Lett 2004 Jun; 151(1): 291–9

    PubMed  Article  CAS  Google Scholar 

  64. 64.

    Hahn LW, Moore JH. Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biol 2004; 4(2): 183–94

    CAS  Google Scholar 

  65. 65.

    Hahn LW, Ritchie MD, Moore JH. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 2003; 19(3): 376–82

    PubMed  Article  CAS  Google Scholar 

  66. 66.

    Moore JH. Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev Mol Diagn 2004; 4(6): 795–803

    PubMed  Article  CAS  Google Scholar 

  67. 67.

    Ritchie MD, Hahn LW, Roodi N, et al. Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am J Hum Genet 2001; 69: 138–47

    PubMed  Article  CAS  Google Scholar 

  68. 68.

    Ritchie MD, Hahn LW, Moore JH. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 2003 Feb; 24(2): 150–7

    PubMed  Article  Google Scholar 

  69. 69.

    Michalski RS. A theory and methodology of inductive learning. Artif Intell 1983; 20: 111–61

    Article  Google Scholar 

  70. 70.

    Moore JH, Gilbert JC, Tsai CT, et al. A flexible framework for data mining and knowledge discovery in human genetics. J Theor Biol. In press

  71. 71.

    Langley P. The computer-aided discovery of scientific knowledge. In: Carbonell JG, Siekmann J, editors. Lecture notes in artifical intelligence. Vol. 1532. New York: Springer, 1998; 25–39

    Google Scholar 

  72. 72.

    Langley P. The computational support of scientific discovery. In: Carbonell JG, Siekmann J, editors. Lecture notes in artifical intelligence. Vol. 2049. New York: Springer, 2001; 230–48

    Google Scholar 

  73. 73.

    Langley P. Lessons for the computational discovery of scientific knowledge. International Conference on Machine Learning; 2002 Jul 8–12; Sydney (NSW). San Francisco (CA): Morgan-Kauffman, 2002; 9–12

    Google Scholar 

  74. 74.

    Williams SM, Ritchie MD, Phillips JA, et al. Multilocus analysis of hypertension. Hum Hered 2004; 57: 28–38

    PubMed  Article  Google Scholar 

  75. 75.

    Cho YM, Ritchie MD, Moore JH, et al. Multifactor-dimensionality reduction shows a two-locus interaction associated with type 2 diabetes mellitus. Diabetologia 2004; 47: 549–54

    PubMed  Article  CAS  Google Scholar 

  76. 76.

    Motsinger AA, Donahue BS, Brown NJ, et al. Risk factor interactions and genetic effects associated with post-operative atrial fibrillation. Pac Symp Biocomput 2006; 11: 514–95

    Google Scholar 

  77. 77.

    Tsai CT, Lai LP, Lin JL, et al. Renin-angiotensin system gene polymorphisms and atrial fibrillation. Circulation 2004; 109: 1640–6

    PubMed  Article  CAS  Google Scholar 

  78. 78.

    Soares ML, Coelho T, Sousa A, et al. Susceptibility and modifier genes in Portuguese transthyretin V30M amyloid polyneuropathy: complexity in a single-gene disease. Hum Mol Genet 2005 Feb 15; 14(4): 543–53

    PubMed  Article  CAS  Google Scholar 

  79. 79.

    Ashley-Koch AE, Mei H, Jaworski J, et al. An analysis paradigm for investigating multi-locus effects in complex disease: examination of three GABAA receptor subunit genes on 15q11-q13 as risk factors for autistic disorder. Ann Hum Genet 2006; 70: 281–92

    PubMed  Article  CAS  Google Scholar 

  80. 80.

    Ma DQ, Whitehead PL, Menold MM, et al. Identification of significant association and gene-gene interaction of GABA receptor subunit genes in autism. Am J Hum Genet 2005 Sep; 77(3): 377–88

    PubMed  Article  CAS  Google Scholar 

  81. 81.

    Bastone L, Reilly M, Rader DJ, et al. MDR and PRP: a comparison of methods for high-order genotype-phenotype associations. Hum Hered 2004; 58(2): 82–92

    PubMed  Article  CAS  Google Scholar 

  82. 82.

    Wilke RA, Reif DM, Moore JH. Combinatorial pharmacogenetics. Nat Rev Drug Discov 2005 Nov; 4(11): 911–8

    PubMed  Article  CAS  Google Scholar 

  83. 83.

    Wilke RA, Moore JH, Burmester JK. Relative impact of CYP3A genotype and concomitant medication on the severity of atorvastatin-induced muscle damage. Pharmacogenet Genomics 2005 Jun; 15(6): 415–21

    PubMed  Article  CAS  Google Scholar 

  84. 84.

    Andrew AS, Nelson HN, Kelsey KT, et al. Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking, and bladder cancer susceptibility. Carcinogenesis 2006; 27: 1030–7

    PubMed  Article  CAS  Google Scholar 

  85. 85.

    Xu J, Lowey J, Wiklund F, et al. The interaction of four genes in the inflammation pathway significantly predicts prostate cancer risk. Cancer Epidemiol Biomarkers Prev 2005 Nov; 14(11): 2563–8

    PubMed  Article  CAS  Google Scholar 

  86. 86.

    Qin S, Zhao X, Pan Y, et al. An association study of the N-methyl-D-aspartate receptor NR1 subunit gene (GRIN1) and NR2B subunit gene (GRIN2B) in schizophrenia with universal DNA microarray. Eur J Hum Genet 2005 Jul; 13(7): 807–14

    PubMed  Article  CAS  Google Scholar 

  87. 87.

    Robnik-Sikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 2003; 53(1): 23–69

    Article  Google Scholar 

  88. 88.

    Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005 Feb; 6(2): 95–108

    PubMed  Article  CAS  Google Scholar 

  89. 89.

    Wang WY, Barratt BJ, Clayton DG, et al. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005 Feb; 6(2): 109–18

    PubMed  Article  CAS  Google Scholar 

  90. 90.

    Jakulin A, Bratko I. Analyzing attribute dependencies. In: Lavrac N, Gamberger D, Todorovski L, et al., editors. Lecture notes in artificial intelligence. Berlin: Springer-Verlag, 2003: 229–40

    Google Scholar 

  91. 91.

    Jakulin A. Attribute interactions in machine learning [PhD thesis]. Ljubljana, Slovenia: University of Ljubljana, 2003

    Google Scholar 

  92. 92.

    Moore JH, Ritchie MD. The challenges of whole-genome approaches to common diseases. JAMA 2004 Apr; 291(13): 1642–3

    PubMed  Article  CAS  Google Scholar 

Download references


This work was supported by National Institutes of Health (NIH) grants AI059694, LM009012, AI057661, AI064625, HL65234, RR018787, ES007373 and HD047447. This work was also supported by generous funds from the Vanderbilt Program in Biomathematics and the Norris-Cotton Cancer Center at Dartmouth Medical School.

The authors have no conflicts of interest that are directly relevant to the content of this review.

Author information



Corresponding author

Correspondence to Dr Jason H. Moore.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

McKinney, B.A., Reif, D.M., Ritchie, M.D. et al. Machine Learning for Detecting Gene-Gene Interactions. Appl-Bioinformatics 5, 77–88 (2006).

Download citation


  • Hide Layer
  • Genetic Programming
  • Multifactor Dimensionality Reduction
  • Traditional Statistical Method
  • Genomewide Association Study