Epistasis pp 257-268 | Cite as

Epistasis Analysis Using Information Theory

  • Jason H. MooreEmail author
  • Ting Hu
Part of the Methods in Molecular Biology book series (MIMB, volume 1253)


Here we introduce entropy-based measures derived from information theory for detecting and characterizing epistasis in genetic association studies. We provide a general overview of the methods and highlight some of the modifications that have greatly improved its power for genetic analysis. We end with a few published studies of complex human diseases that have used these measures.

Key words

Epistasis Information theory Entropy Association studies Genetic analysis Gene–gene interaction 



This work was supported by National Institutes of Health (NIH) grants AI59694, EY022300, GM103534, GM103506, LM009012, LM010098, and LM011360.


  1. 1.
    Phillips PC (2008) Epistasis – the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9:855–867PubMedCentralPubMedCrossRefGoogle Scholar
  2. 2.
    Phillips PC (1998) The language of gene interaction. Genetics 149:1167–1171PubMedCentralPubMedGoogle Scholar
  3. 3.
    Cordell HJ (2002) Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 11:2463–2468PubMedCrossRefGoogle Scholar
  4. 4.
    Cordell HJ (2009) Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 10:392–404PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Moore JH (2003) The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 56:73–82PubMedCrossRefGoogle Scholar
  6. 6.
    Moore JH (2005) A global view of epistasis. Nat Genet 37:13–14PubMedCrossRefGoogle Scholar
  7. 7.
    Moore JH, Williams SM (2005) Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 27:637–646PubMedCrossRefGoogle Scholar
  8. 8.
    Tyler AL, Asselbergs FW, Williams SM et al (2009) Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. Bioessays 31:220–227PubMedCentralPubMedCrossRefGoogle Scholar
  9. 9.
    Cowper-Sal lari R, Cole MD, Karagas MR et al (2011) Layers of epistasis: genome-wide regulatory networks and network approaches to genome-wide association studies. Wiley Interdiscip Rev Syst Biol Med 3:513–526PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Moore JH, Williams SM (2009) Epistasis and its implications for personal genetics. Am J Hum Genet 85:309–320PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Millstein J, Conti DV, Gilliland FD et al (2006) A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet 78:15–27PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Kooperberg C, Ruczinski I (2005) Identifying interacting SNPs using Monte Carlo logic regression. Genet Epidemiol 28:157–170PubMedCrossRefGoogle Scholar
  13. 13.
    Kooperberg C, Ruczinski I, LeBlanc ML et al (2001) Sequence analysis using logic regression. Genet Epidemiol 21(Suppl 1):S626–S631PubMedGoogle Scholar
  14. 14.
    Schwender H, Ruczinski I (2010) Logic regression and its extensions. Adv Genet 72:25–45PubMedCrossRefGoogle Scholar
  15. 15.
    Hahn LW, Ritchie MD, Moore JH (2003) Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19:376–382PubMedCrossRefGoogle Scholar
  16. 16.
    Ritchie MD, Hahn LW, Roodi N et al (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Hahn LW, Moore JH (2004) Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biol 4:183–194PubMedGoogle Scholar
  18. 18.
    Ritchie MD, Hahn LW, Moore JH (2003) Power of multifactor dimensionality reduction for detecting gene–gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 24:150–157PubMedCrossRefGoogle Scholar
  19. 19.
    Moore JH (2004) Computational analysis of gene–gene interactions using multifactor dimensionality reduction. Expert Rev Mol Diagn 4:795–803PubMedCrossRefGoogle Scholar
  20. 20.
    Moore JH (2010) Detecting, characterizing, and interpreting nonlinear gene–gene interactions using multifactor dimensionality reduction. Adv Genet 72:101–116PubMedCrossRefGoogle Scholar
  21. 21.
    Velez DR, White BC, Motsinger AA et al (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31:306–315PubMedCrossRefGoogle Scholar
  22. 22.
    Pattin KA, White BC, Barney N et al (2009) A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genet Epidemiol 33:87–94PubMedCentralPubMedCrossRefGoogle Scholar
  23. 23.
    Moore JH (2007) Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. In: Zhu X, Davidson I (eds) Knowledge discovery and data mining: challenges and realities. IGI Global, Hershey, PA, pp 17–30CrossRefGoogle Scholar
  24. 24.
    Moore JH (2008) Bases, bits and disease: a mathematical theory of human genetics. Eur J Hum Genet 16:143–144PubMedCrossRefGoogle Scholar
  25. 25.
    Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423CrossRefGoogle Scholar
  26. 26.
    McGill WJ (1954) Multivariate information transmission. Psychometrika 19:97–116CrossRefGoogle Scholar
  27. 27.
    Jakulin A, Bratko I (2003) Analyzing attribute dependencies. In: Lavrač N, Gamberger D, Todorovski L et al (eds) Knowledge discovery in databases: PKDD 2003. Springer, Berlin, pp 229–240Google Scholar
  28. 28.
    Moore JH, Gilbert JC, Tsai C-T et al (2006) A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 241:252–261PubMedCrossRefGoogle Scholar
  29. 29.
    Hu T, Chen Y, Kiralis JW et al (2013) ViSEN: methodology and software for visualization of statistical epistasis networks. Genet Epidemiol 37(3):283–285PubMedCentralPubMedCrossRefGoogle Scholar
  30. 30.
    Demšar J, Curk T, Erjavec A et al (2013) Orange: data mining toolbox in python. J Mach Learn Res 14:2349–2353Google Scholar
  31. 31.
    Cover TM, Thomas JA (2006) Elements of information theory. Wiley-Interscience, Hoboken, NJGoogle Scholar
  32. 32.
    Fan R, Zhong M, Wang S et al (2011) Entropy-based information gain approaches to detect and to characterize gene–gene and gene–environment interactions/correlations of complex diseases. Genet Epidemiol 35:706–721PubMedCentralPubMedCrossRefGoogle Scholar
  33. 33.
    Hu T, Andrew AS, Karagas MR et al (2013) Statistical epistasis networks reduce the computational complexity of searching three-locus genetic models. Pac Symp Biocomput 397–408Google Scholar
  34. 34.
    Hu T, Sinnott-Armstrong NA, Kiralis JW et al (2011) Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 12:364PubMedCentralPubMedCrossRefGoogle Scholar
  35. 35.
    McKinney BA, Reif DM, White BC et al (2007) Evaporative cooling feature selection for genotypic data involving interactions. Bioinformatics 23:2113–2120PubMedCentralPubMedCrossRefGoogle Scholar
  36. 36.
    Dong C, Chu X, Wang Y et al (2008) Exploration of gene–gene interaction effects using entropy-based methods. Eur J Hum Genet 16:229–235PubMedCrossRefGoogle Scholar
  37. 37.
    Kang G, Yue W, Zhang J et al (2008) An entropy-based approach for testing genetic epistasis underlying complex diseases. J Theor Biol 250:362–374PubMedCrossRefGoogle Scholar
  38. 38.
    Wu C, Li S, Cui Y (2012) Genetic association studies: an information content perspective. Curr Genomics 13:566–573PubMedCentralPubMedCrossRefGoogle Scholar
  39. 39.
    Chanda P, Zhang A, Brazeau D et al (2007) Information-theoretic metrics for visualizing gene–environment interactions. Am J Hum Genet 81:939–963PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Sucheston L, Chanda P, Zhang A et al (2010) Comparison of information-theoretic to statistical methods for gene–gene interactions in the presence of genetic heterogeneity. BMC Genomics 11:487PubMedCentralPubMedCrossRefGoogle Scholar
  41. 41.
    Chanda P, Zhang A, Ramanathan M (2011) Modeling of environmental and genetic interactions with AMBROSIA, an information-theoretic model synthesis method. Heredity 107:320–327PubMedCentralPubMedCrossRefGoogle Scholar
  42. 42.
    Tritchler DL, Sucheston L, Chanda P et al (2011) Information metrics in genetic epidemiology. Stat Appl Genet Mol Biol 10, Article 12Google Scholar
  43. 43.
    Hu T, Chen Y, Kiralis JW et al (2013) An information-gain approach to detecting three-way epistatic interactions in genetic association studies. J Am Med Inform Assoc 20:630–636PubMedCentralPubMedCrossRefGoogle Scholar
  44. 44.
    Anastassiou D (2007) Computational analysis of the synergy among multiple interacting genes. Mol Syst Biol 3:83PubMedCentralPubMedCrossRefGoogle Scholar
  45. 45.
    Varadan V, Miller DM 3rd, Anastassiou D (2006) Computational inference of the molecular logic for synaptic connectivity in C. elegans. Bioinformatics 22:e497–e506PubMedCrossRefGoogle Scholar
  46. 46.
    Chechik G, Globerson A, Tishby N et al (2002) Group redundancy measures reveal redundancy reduction in the auditory pathway. In: Becker S, Ghaharamani Z, Dietterich TG (eds) Advances in neural information processing systems. MIT Press, Cambridge, MA, pp 173–180Google Scholar
  47. 47.
    West D (2007) Introduction to graph theory. Prentice Hall PTR, Upper Saddle River, NJGoogle Scholar
  48. 48.
    Newman MEJ (2010) Networks: an introduction. Oxford University Press, Oxford, UKCrossRefGoogle Scholar
  49. 49.
    Andrew AS, Nelson HH, Kelsey KT et al (2006) Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. Carcinogenesis 27:1030–1037PubMedCrossRefGoogle Scholar
  50. 50.
    Urbanowicz RJ, Andrew AS, Karagas MR et al (2013) Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J Am Med Inform Assoc 20:603–612PubMedCentralPubMedCrossRefGoogle Scholar
  51. 51.
    Andrew AS, Gui J, Sanderson AC et al (2009) Bladder cancer SNP panel predicts susceptibility and survival. Hum Genet 125:527–539PubMedCentralPubMedCrossRefGoogle Scholar
  52. 52.
    Hu T, Pan Q, Andrew AS et al (2014) Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility. BioData Min 7:5PubMedCentralPubMedCrossRefGoogle Scholar
  53. 53.
    Moore JH, Asselbergs FW, Williams SM (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26:445–455PubMedCentralPubMedCrossRefGoogle Scholar
  54. 54.
    Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575PubMedCentralPubMedCrossRefGoogle Scholar
  55. 55.
    Greene CS, Hill DP, Moore JH (2010) Environmental sensing of expert knowledge in a computational evolution system for complex problem solving in human genetics. In: Riolo R, O’Reilly U-M, McConaghy T (eds) Genetic programming theory and practice VII. Springer, New York, USA, pp 19–36CrossRefGoogle Scholar
  56. 56.
    Moore JH, Andrews PC, Barney N et al (2008) Development and evaluation of an open-ended computational evolution system for the genetic analysis of susceptibility to common human diseases. In: Marchiori E, Moore JH (eds) Evolutionary computation, machine learning and data mining in bioinformatics. Springer, Berlin, pp 129–140CrossRefGoogle Scholar
  57. 57.
    Moore JH, Greene CS, Andrews PC et al (2009) Does complexity matter? artificial evolution, computational evolution and the genetic analysis of epistasis in common human diseases. Genetic programming theory and practice VI. Springer, New York, USA, pp 1–19Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Community and Family MedicineGeisel School of Medicine, DHMCLebanonUSA
  2. 2.Department of GeneticsGeisel School of Medicine, DHMCLebanonUSA
  3. 3.Department of GeneticsInstitute for Quantitative Biomedical Sciences, Geisel School of MedicineHanoverUSA

Personalised recommendations