Detection of Regulator Genes and eQTLs in Gene Networks

  • Lingfei Wang
  • Tom Michoel


Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in noncoding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are “expression quantitative trait loci” or eQTLs, for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins, and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, as well as to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and software to identify eQTLs and their associated genes, to reconstruct coexpression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.


Bayesian Network Coexpression Network Causal Interaction Probability Density Function Graphic Processor Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors’ work is supported by the BBSRC (BB/M020053/1) and Roslin Institute Strategic Grant funding from the BBSRC (BB/J004235/1).


  1. Albert FW, Kruglyak L (2015) The role of regulatory variation in complex traits and disease. Nat Rev Genet 16:197–212PubMedCrossRefGoogle Scholar
  2. Ardlie KG et al (2015) The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348:648–660CrossRefGoogle Scholar
  3. Ashburner M et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29PubMedPubMedCentralCrossRefGoogle Scholar
  4. Aten JE et al (2008) Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Syst Biol 2:34PubMedPubMedCentralCrossRefGoogle Scholar
  5. Ayroles JF et al (2009) Systems genetics of complex traits in drosophila melanogaster. Nat Genet 41:299–307PubMedPubMedCentralCrossRefGoogle Scholar
  6. Basso K et al (2005) Reverse engineering of regulatory networks in human b cells. Nat Genet 37:382–390PubMedCrossRefGoogle Scholar
  7. Björkegren JL et al (2015) Genome-wide significant loci: how important are they?: systems genetics to understand heritability of coronary artery disease and other common complex disorders. J Am Coll Cardiol 65:830–845PubMedPubMedCentralCrossRefGoogle Scholar
  8. Bonnet E, Calzone L, Michoel T (2015) Integrative multi-omics module network inference with Lemon-Tree. PLoS Comput Biol 11, e1003983PubMedPubMedCentralCrossRefGoogle Scholar
  9. Brem RB et al (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296:752–755PubMedCrossRefGoogle Scholar
  10. Butte A, Kohane I (2000) Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocompu 5:415–426Google Scholar
  11. Cenik C et al (2015) Integrative analysis of rna, translation and protein levels reveals distinct regulatory variation across humans. Genome Res. doi: 10.1101/gr.193342.115 PubMedPubMedCentralGoogle Scholar
  12. Chatr-Aryamontri A et al (2015) The BioGRID interaction database: 2015 update. Nucleic Acids Res 43(Database issue):D470–D478. doi: 10.1093/nar/gku1204 PubMedCrossRefGoogle Scholar
  13. Chen LS, Emmert-Streib F, Storey JD (2007) Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol 8:R219PubMedPubMedCentralCrossRefGoogle Scholar
  14. Chen Y et al (2008) Variations in DNA elucidate molecular networks that cause disease. Nature 452:429–435PubMedPubMedCentralCrossRefGoogle Scholar
  15. Cheung VG, Spielman RS (2009) Genetics of human gene expression: mapping dna variants that influence gene expression. Nat Rev Genet 10:595–604PubMedPubMedCentralCrossRefGoogle Scholar
  16. Civelek M, Lusis AJ (2014) Systems genetics approaches to understand complex traits. Nat Rev Genet 15:34–48PubMedCrossRefGoogle Scholar
  17. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111CrossRefGoogle Scholar
  18. Cookson W et al (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10:184–194PubMedPubMedCentralCrossRefGoogle Scholar
  19. Cubillos FA, Coustham V, Loudet O (2012) Lessons from eQTL mapping studies: non-coding regions and their role behind natural phenotypic variation in plants. Curr Opin Plant Biol 15:192–198PubMedCrossRefGoogle Scholar
  20. Cusanovich DA et al (2014) The functional consequences of variation in transcription factor binding. PLoS Genet 10, e1004226PubMedPubMedCentralCrossRefGoogle Scholar
  21. Daub CO et al (2004) Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data. BMC Bioinf 5:118CrossRefGoogle Scholar
  22. Dimas AS et al (2009) Common regulatory variation impacts gene expression in a cell type–dependent manner. Science 325:1246–1250PubMedPubMedCentralCrossRefGoogle Scholar
  23. Eisen MB et al (1998) Cluster analysis and display of genome-wide expression patterns. PNAS 95:14863–14868PubMedPubMedCentralCrossRefGoogle Scholar
  24. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584PubMedPubMedCentralCrossRefGoogle Scholar
  25. Faith JJ et al (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5, e8PubMedPubMedCentralCrossRefGoogle Scholar
  26. Foroughi Asl H et al (2015) Expression quantitative trait loci acting across multiple tissues are enriched in inherited risk of coronary artery disease. Circulation Cardiovasc Genet 8:305–315CrossRefGoogle Scholar
  27. Foss EJ et al (2007) Genetic basis of proteome variation in yeast. Nat Genet 39:1369–1375PubMedCrossRefGoogle Scholar
  28. Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science 308:799–805CrossRefGoogle Scholar
  29. Friedman N, Nachman I, Peér D (1999) Learning bayesian network structure from massive datasets: the “sparse candidate” algorithm. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence, UAI’99. Morgan Kaufmann Publishers Inc., San Francisco, pp 206–215Google Scholar
  30. Friedman N, Goldszmidt M, Wyner A (1999b) Data analysis with Bayesian networks: a bootstrap approach. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, San Francisco, pp 196–205Google Scholar
  31. Friedman N et al (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7:601–620PubMedCrossRefGoogle Scholar
  32. Furey TS (2012) ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat Rev Genet 13:840–852PubMedPubMedCentralCrossRefGoogle Scholar
  33. Georges M (2007) Mapping, fine mapping, and molecular dissection of quantitative trait loci in domestic animals. Annu Rev Genomics Hum Genet 8:131–162PubMedCrossRefGoogle Scholar
  34. Gerstein M et al (2010) Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 330:1775–1787PubMedPubMedCentralCrossRefGoogle Scholar
  35. Goddard ME, Hayes BJ (2009) Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet 10:381–391PubMedCrossRefGoogle Scholar
  36. Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, BaltimoreGoogle Scholar
  37. Greenawalt DM et al (2011) A survey of the genetics of stomach, liver, and adipose gene expression from a morbidly obese cohort. Genome Res 21:1008–1016PubMedPubMedCentralCrossRefGoogle Scholar
  38. Grubert F et al (2015) Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162:1051–1065PubMedPubMedCentralCrossRefGoogle Scholar
  39. Hartwell LH et al (1999) From molecular to modular cell biology. Nature 402:C47–C52PubMedCrossRefGoogle Scholar
  40. Hemani G et al (2014) Detection and replication of epistasis influencing transcription in humans. Nature 508:249–253PubMedPubMedCentralCrossRefGoogle Scholar
  41. Hindorff LA et al (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci 106:9362–9367PubMedPubMedCentralCrossRefGoogle Scholar
  42. Joshi A, Van de Peer Y, Michoel T (2008) Analysis of a Gibbs sampler for model based clustering of gene expression data. Bioinformatics 24:176–183PubMedCrossRefGoogle Scholar
  43. Joshi A et al (2009) Module networks revisited: computational assessment and prioritization of model predictions. Bioinformatics 25:490–496PubMedCrossRefGoogle Scholar
  44. Kadarmideen HN, von Rohr P, Janss LL (2006) From genetical genomics to systems genetics: potential applications in quantitative genomics and animal breeding. Mamm Genome 17:548–564PubMedPubMedCentralCrossRefGoogle Scholar
  45. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. The MIT Press, Cambridge, MAGoogle Scholar
  46. Kundaje A et al (2015) Integrative analysis of 111 reference human epigenomes. Nature 518:317–330PubMedPubMedCentralCrossRefGoogle Scholar
  47. Laird N, Lange C (2011) The fundamentals of modern statistical genetics. Springer, New YorkCrossRefGoogle Scholar
  48. Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol 1:54PubMedPubMedCentralCrossRefGoogle Scholar
  49. Langfelder P, Horvath S (2008) Wgcna: an r package for weighted correlation network analysis. BMC Bioinf 9:559CrossRefGoogle Scholar
  50. Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r. Bioinformatics 24:719–720PubMedCrossRefGoogle Scholar
  51. Lappalainen T et al (2013) Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501:506–511PubMedPubMedCentralCrossRefGoogle Scholar
  52. Lee S et al (2006) Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proc Natl Acad Sci U S A 103:14062–14067PubMedPubMedCentralCrossRefGoogle Scholar
  53. Lee SI et al (2009) Learning a prior on regulatory potential from eqtl data. PLoS Genet 5, e1000358PubMedPubMedCentralCrossRefGoogle Scholar
  54. Li Y et al (2010) Critical reasoning on causal inference in genome-wide linkage and association studies. Trends Genet 26:493–498PubMedPubMedCentralCrossRefGoogle Scholar
  55. Liu JS (2002) Monte Carlo strategies in scientific computing. Springer, New YorkGoogle Scholar
  56. Lu P et al (2007) Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotech 25:117–124CrossRefGoogle Scholar
  57. Mackay TF, Stone EA, Ayroles JF (2009) The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 10:565–577PubMedCrossRefGoogle Scholar
  58. Manolio TA (2013) Bringing genome-wide association findings into clinical use. Nat Rev Genet 14:549–558PubMedCrossRefGoogle Scholar
  59. Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18:1194–1206PubMedCrossRefGoogle Scholar
  60. Michoel T, Nachtergaele B (2012) Alignment and integration of complex networks by hypergraph-based spectral clustering. Phys Rev E 86:056111CrossRefGoogle Scholar
  61. Millstein J et al (2009) Disentangling molecular relationships with a causal inference test. BMC Genet 10:23PubMedPubMedCentralCrossRefGoogle Scholar
  62. Neto EC et al (2008) Inferring causal phenotype networks from segregating populations. Genetics 179:1089–1100CrossRefGoogle Scholar
  63. Neto EC et al (2010) Causal graphical models in systems genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Stat 4:320PubMedPubMedCentralCrossRefGoogle Scholar
  64. Neto EC et al (2013) Modeling causality for pairs of phenotypes in system genetics. Genetics 193:1003–1013PubMedPubMedCentralCrossRefGoogle Scholar
  65. Newman MEJ (2006) Modularity and community structure in networks. PNAS 103:8577–8582PubMedPubMedCentralCrossRefGoogle Scholar
  66. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113CrossRefGoogle Scholar
  67. Nicholson G et al (2011) A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection. PLoS Genet 7, e1002270PubMedPubMedCentralCrossRefGoogle Scholar
  68. Qi J et al (2014) kruX: Matrix-based non-parametric eQTL discovery. BMC Bioinf 15:11CrossRefGoogle Scholar
  69. Qu K et al (2016) Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace. Nat Methods 13:245–247PubMedPubMedCentralCrossRefGoogle Scholar
  70. Rao SS et al (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159:1665–1680PubMedCrossRefGoogle Scholar
  71. Ravasz E et al (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551–1555PubMedCrossRefGoogle Scholar
  72. Ritchie MD et al (2015) Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet 16:85–97PubMedCrossRefGoogle Scholar
  73. Rockman MV (2008) Reverse engineering the genotype–phenotype map with natural genetic variation. Nature 456:738–744PubMedCrossRefGoogle Scholar
  74. Rockman MV, Kruglyak L (2006) Genetics of global gene expression. Nat Rev Genet 7:862–872PubMedCrossRefGoogle Scholar
  75. Roy S et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330:1787–1797PubMedPubMedCentralCrossRefGoogle Scholar
  76. Schadt EE (2009) Molecular networks as sensors and drivers of common human diseases. Nature 461:218–223PubMedCrossRefGoogle Scholar
  77. Schadt EE, Björkegren JL (2012) New: network-enabled wisdom in biology, medicine, and health care. Sci Transl Med 4:115rv1PubMedCrossRefGoogle Scholar
  78. Schadt EE et al (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37:710–717PubMedPubMedCentralCrossRefGoogle Scholar
  79. Schadt EE et al (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6, e107PubMedPubMedCentralCrossRefGoogle Scholar
  80. Schadt EE, Friend SH, Shaywitz DA (2009) A network view of disease and compound screening. Nat Rev Drug Disc 8:286–295CrossRefGoogle Scholar
  81. Schaub MA et al (2012) Linking disease associations with regulatory information in the human genome. Genome Res 22:1748–1759PubMedPubMedCentralCrossRefGoogle Scholar
  82. Schmidt M, Niculescu-Mizil A, Murphy K (2007) Learning graphical model structure using L1-regularization paths. AAAI 7:1278–1283Google Scholar
  83. Schwanhausser B et al (2011) Global quantification of mammalian gene expression control. Nature 473:337–342PubMedCrossRefGoogle Scholar
  84. Scutari M et al (2014) Multiple quantitative trait analysis using Bayesian networks. Genetics 198:129–137PubMedPubMedCentralCrossRefGoogle Scholar
  85. Segal E et al (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34:166–167PubMedCrossRefGoogle Scholar
  86. Shabalin AA (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28:1353–1358PubMedPubMedCentralCrossRefGoogle Scholar
  87. Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis. In Proc Int Conf Intell Syst Mol Biol 8:16Google Scholar
  88. Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3:88PubMedPubMedCentralCrossRefGoogle Scholar
  89. Smith GD, Ebrahim S (2003) ‘mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32:1–22PubMedCrossRefGoogle Scholar
  90. Stegle O et al (2012) Using probabilistic estimation of expression residuals (peer) to obtain increased power and interpretability of gene expression analyses. Nat Protoc 7:500–507PubMedPubMedCentralCrossRefGoogle Scholar
  91. Talukdar H et al (2016) Cross-tissue regulatory gene networks in coronary artery disease. Cell Syst 2:196–208PubMedPubMedCentralCrossRefGoogle Scholar
  92. Tavazoie S et al (1999) Systematic determination of genetic network architecture. Nat Genet 22:281–285PubMedCrossRefGoogle Scholar
  93. The ENCODE (2012) Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74CrossRefGoogle Scholar
  94. Van Dongen SM (2001) Graph clustering by flow simulation. Dissertation, Utrecht University RepositoryGoogle Scholar
  95. Walhout AJ (2006) Unraveling transcription regulatory networks by protein–DNA and protein–protein interaction mapping. Genome Res 16:1445–1454PubMedCrossRefGoogle Scholar
  96. Waszak SM et al (2015) Population variation and genetic control of modular chromatin architecture in humans. Cell 162:1039–1050PubMedCrossRefGoogle Scholar
  97. Williams RW (2006) Expression genetics and the phenotype revolution. Mamm Genome 17:496–502PubMedCrossRefGoogle Scholar
  98. Wu L et al (2013) Variation and genetic control of protein abundance in humans. Nature 499:79–82PubMedPubMedCentralCrossRefGoogle Scholar
  99. Yue F et al (2014) A comparative encyclopedia of DNA elements in the mouse genome. Nature 515:355–364PubMedPubMedCentralCrossRefGoogle Scholar
  100. Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4:17Google Scholar
  101. Zhang W et al (2010) A Bayesian partition method for detecting pleiotropic and epistatic eQTL modules. PLoS Comput Biol 6, e1000642PubMedPubMedCentralCrossRefGoogle Scholar
  102. Zhang B et al (2013) Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153:707–720PubMedPubMedCentralCrossRefGoogle Scholar
  103. Zhu J et al (2004) An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet Genome Res 105:363–374PubMedCrossRefGoogle Scholar
  104. Zhu J et al (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40:854–861PubMedPubMedCentralCrossRefGoogle Scholar
  105. Zhu J et al (2012) Stitching together multiple data dimensions reveals interacting metabolomic and transcriptomic networks that modulate cell regulation. PLoS Biol 10, e1001301PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Division of Genetics and GenomicsThe Roslin Institute, The University of EdinburghMidlothianUK

Personalised recommendations