Human Genetics

, Volume 124, Issue 1, pp 19–29 | Cite as

Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases

  • Kristine A. Pattin
  • Jason H. MooreEmail author


One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene–gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available.


Genetic Modifier Expert Knowledge Multifactor Dimensionality Reduction Human Protein Reference Database Distribute Annotation System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This publication was funded in part by National Institute of Health grants LM009012 and AI59694. We would like to thank Drs. Scott Gerber, David Jewell, Dean Madden and Mike Whitfield for helpful discussions that lead to some of the ideas in this paper.


  1. Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Ouellette BFF, Hogue CWV et al (2005) The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 33:D418–D424PubMedCrossRefGoogle Scholar
  2. Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly MJ (2005) A haplotype map of the human genome. Nature 437:1299–1320CrossRefGoogle Scholar
  3. Asselbergs FW, Williams SM, Hebert PR, Coffey CS, Hillege HL, Navis G, Vaughan DE, van Gilst WH, Moore JH (2007) Epistatic effects of polymorphisms in genes from the renin–angiotensin, bradykinin, and fibrinolytic systems on plasma t-PA and PAI-1 levels. Genomics 89(3):362–369PubMedCrossRefGoogle Scholar
  4. Bateson W (1909) Mendel’s principles of heredity. Cambridge University Press, CambridgeGoogle Scholar
  5. Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bähler J, Wood V, Dolinski K, Tyers M (2008) The BioGRID interaction database: 2008 update. Nucleic Acids Res 36:D637–D640PubMedCrossRefGoogle Scholar
  6. Carlson CS (2006) Agnosticism and equity in genome-wide association studies. Nat Genet 38(6):605–606PubMedCrossRefGoogle Scholar
  7. Cavallo A, Martin AC (2005) Mapping SNPs to protein sequence and structure data. Bioinformatics 21(8):1443–1450PubMedCrossRefGoogle Scholar
  8. Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, Hirschhorn JN, Abecasis G, Altshuler D, Bailey-Wilson JE, Brooks LD, Cardon LR, Daly M, Donnelly P, Fraumeni JF Jr, Freimer NB, Gerhard DS, Gunter C, Guttmacher AE et al (2007) Replicating genotype–phenotype associations. Nature 447(7145):655–660PubMedCrossRefGoogle Scholar
  9. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G (2007) MINT: the Molecular INTeraction database. Nucleic Acids Res 35:D572–D574PubMedCrossRefGoogle Scholar
  10. Chaurasia G, Yasir I, Hanig C, Herzel H, Wanker EE, Futschik ME (2007) UniHI: an entry gate to the human protein interactome. Nucleic Acids Res 35:D590–D594PubMedCrossRefGoogle Scholar
  11. Cordell HJ (2002) Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 11:2463–2468PubMedCrossRefGoogle Scholar
  12. Coutinho AM, Sousa I, Martins M, Correia C, Morgadinho T, Bento C, Marques C, Ataide A, Miguel TS, Moore JH, Oliveira G, Vicente AM (2007) Evidence for epistasis between SLC6A4 and ITGB3 in autism etiology and in the determination of platelet serotonin levels. Hum Genet 121:243–256PubMedCrossRefGoogle Scholar
  13. Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb 52:399–433Google Scholar
  14. Franke L, van-Bakel H, Fokkens L, de-Jong ED, Egmont-Petersen M, Wijmenga C (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 78:1011–1025PubMedCrossRefGoogle Scholar
  15. Hahn LW, Moore JH (2004) Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biol 4:0016Google Scholar
  16. Hahn LW, Ritchie MD, Moore JH (2003) Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19:376–382PubMedCrossRefGoogle Scholar
  17. Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108PubMedCrossRefGoogle Scholar
  18. Kaltenbach LS, Romero E, Becklin RR, Chettier R, Bell R, Phansalkar A, Strand A, Torcassi C, Savage J, Hurlburt A, Cha G-H, Ukani L, Chepanoske CL, Zhen Y, Sahasrabuhde S, Olson J, Kurschner C, Ellerby LM, Peltier JM, Botas J, Hughes RE (2007) Huntingtin interacting proteins are genetic modifiers of neurodegeneration. PloS Genet 3:e82PubMedCrossRefGoogle Scholar
  19. Li SH, Li XJ (2004) Huntingtin–protein interactions and the pathogenesis of Huntington’s disease. Trends Genet 20:146–154PubMedCrossRefGoogle Scholar
  20. Lim J, Hao T, Shaw C, Patel AJ, Szabó G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE, Barabási AL, Vidal M, Zoghbi HY (2006) A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell 125(4):801–814PubMedCrossRefGoogle Scholar
  21. Limviphuvadh V, Tanaka S, Goto S, Ueda K, Kanehisa M (2007) The commonality of protein interaction networks determined in neurodegenerative disorders. Bioinformatics 23:2129–2138PubMedCrossRefGoogle Scholar
  22. Mathivanan S, Periaswamy B, Gandi T, Kandasamy K, Suresh S, Mohmood R (2006) An evaluation of human protein–protein interaction data in the public domain. BMC Bioinformatics 7(Suppl 5):S19PubMedCrossRefGoogle Scholar
  23. Mishra G, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivkumar K, Prasad TSK, Pandey A et al (2006) Human protein reference database—2006 update. Nucleic Acids Res 34:D411–D414PubMedCrossRefGoogle Scholar
  24. Moore JH (2003) The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 56:73–82PubMedCrossRefGoogle Scholar
  25. Moore JH (2004) Computational analysis of gene–gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev Mol Diagn 4:795–803PubMedCrossRefGoogle Scholar
  26. Moore JH (2005) A global view of epistasis. Nat Genet 37:13–14PubMedCrossRefGoogle Scholar
  27. Moore JH (2007) Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. In: Zhu X, Davidson I (eds) Knowledge discovery and data mining: challenges and realities with real world data. IGI Press, Hershey, pp 17–30Google Scholar
  28. Moore JH, Ritchie MD (2004) The challenges of whole-genome approaches to common diseases. JAMA 291:1642–1643PubMedCrossRefGoogle Scholar
  29. Moore JH, White, BC (2007) Tuning Relief for genome-wide genetic analysis. In: Marchiori E, Moore JH, Rajapakse J (eds) Evolutionary computation, machine learning and data mining in bioinformatics, vol 4447. Lecture Notes in Computer Science, pp 166–175Google Scholar
  30. Moore JH, Williams SM (2005) Traversing the conceptual divide between biological and statistical epistasis: Systems biology and a more modern synthesis. BioEssays 27:637–646PubMedCrossRefGoogle Scholar
  31. Moore JH, Gilbert JC, Tsai CT, Chiang FT, Holden W, Barney N, White BC (2006) A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 241:252–261PubMedCrossRefGoogle Scholar
  32. Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya OG (2005) Discovery of biological networks from diverse functional genomic data. Genome Biol 6:R114PubMedCrossRefGoogle Scholar
  33. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 96:4285–4288PubMedCrossRefGoogle Scholar
  34. Pellegrini M, Haynor D, Johnson JM (2004) Protein interaction networks. Expert Rev Proteomics 1:239–249PubMedCrossRefGoogle Scholar
  35. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N et al (2005) Towards a proteome-scale map of the human protein–protein interaction network. Nature 437:1173–1178PubMedCrossRefGoogle Scholar
  36. Rea TJ, Brown CM, Sing CF (2006) Complex adaptive system models and the genetic analysis of plasma HDL-cholesterol concentration. Perspect Biol Med 49(4):490–503PubMedCrossRefGoogle Scholar
  37. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147PubMedCrossRefGoogle Scholar
  38. Ritchie MD, Hahn LW, Moore JH (2003) Power of multifactor dimensionality reduction for detecting gene–gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 24:150–157PubMedCrossRefGoogle Scholar
  39. Risch NJ, Merikangas KR (1996) The future of genetic studies of complex human disease. Science 273:1516–1517PubMedCrossRefGoogle Scholar
  40. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32:D449–D451PubMedCrossRefGoogle Scholar
  41. Sing CF, Stengard JH, Kardia SL (2003) Genes, environment, and cardiovascular disease. Arterioscler Thromb Vasc Biol 23:1190–1196PubMedCrossRefGoogle Scholar
  42. Singleton AB, Farrer M, Johnson J, Singleton A, Hague S, Kachergus J, Hulihan M, Peuralinna T, Dutra A, Nussbaum R, Lincoln S, Crawley A, Hanson M, Maraganore D, Adler C, Cookson MR, Muenter M, Baptista M, Miller D, Blancato J, Hardy J, Gwinn-Hardy K (2003) alpha-Synuclein locus triplication causes Parkinson’s disease. Science 302(5646):841PubMedCrossRefGoogle Scholar
  43. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksöz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, Wanker EE (2005) A human protein–protein interaction network: a resource for annotating the proteome. Cell 122:957–968PubMedCrossRefGoogle Scholar
  44. Tan SH, Zhang Z, Ng SK (2004) ADVICE: automated detection and validation of interaction by co-evolution. Nucleic Acids Res 32:W69–W72PubMedCrossRefGoogle Scholar
  45. Templeton AR (2000) Epistasis and complex traits. In: Wade M, Brodie BIII, Wolf J (eds) Epistasis and evolutionary process. Oxford University Press, New YorkGoogle Scholar
  46. Thornton-Wells TA, Moore JH, Haines JL (2004) Genetics, statistics, and human disease: analytical retooling for complexity. Trends Genet 20:640–647PubMedCrossRefGoogle Scholar
  47. Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Pagé N, Robinson M, Raghibizadeh S, Hogue CWV, Bussey H, Andrews B, Tyers M, Boone C (2001) Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294:2364–2368PubMedCrossRefGoogle Scholar
  48. Vastrik I, D’Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, Matthews L, Wu G, Birney E, Stein L (2007) Reactome: a knowledge base of biologic pathways and processes. Genome Biol 8:R39PubMedCrossRefGoogle Scholar
  49. von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Krüger B, Snel B, Bork P (2007) STRING 7: recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35:D358–D362CrossRefGoogle Scholar
  50. Wang WY, Barratt BJ, Clayton DG, Todd JA (2005) Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 6:109–118PubMedCrossRefGoogle Scholar
  51. Wang Z, Moult J (2001) SNPs, protein structure, and disease. Hum Mutat 4:263–270CrossRefGoogle Scholar
  52. Willis RC, Hoque CW (2006) Searching, viewing, and visualizing data in the Biomolecular Interaction Network Database (BIND). Curr Protoc Bioinformatics, chap 8.8.9Google Scholar
  53. Yates JR (2000) Mass spectrometry: from genomics to proteomics. Trends Genet 16:5–8PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.Computational Genetics LaboratoryNorris-Cotton Cancer Center, Dartmouth Medical SchoolLebanonUSA
  2. 2.Department of GeneticsNorris-Cotton Cancer Center, Dartmouth Medical SchoolLebanonUSA
  3. 3.Department of Community and Family MedicineNorris-Cotton Cancer Center, Dartmouth Medical SchoolLebanonUSA
  4. 4.Department of Computer ScienceUniversity of New HampshireDurhamUSA
  5. 5.Department of Computer ScienceUniversity of VermontBurlingtonUSA
  6. 6.Translational Genomics Research InstitutePhoenixUSA
  7. 7.Dartmouth-Hitchcock Medical CenterLebanonUSA

Personalised recommendations