Discovering Gene–Gene and Gene–Environment Causal Interactions Using Bioinformatics Approaches

  • Changwon YooEmail author


The exponential growth of molecular biology data led to an intense focus on the study of interactions among DNA, RNA, protein biosynthesis, and environment. Using advanced genomics revolution tools, large datasets generated by the gene expression profiling experiments and next generation sequences technologies enable us to link molecular states and environmental effects to physiological states through the reverse engineering of gene–gene and gene–environment interaction networks that sense DNA and environmental perturbations. This will ultimately let us understand variations in physiological states associated with disease. In this chapter we review different mathematical and statistical bioinformatics approaches to discover and model gene–gene and gene–environment causal interactions. We also present new additional modeling methods in probabilistic networks to incorporate various interventions to perturb the system.


Gene–gene and gene–environment interactions Bioinformatics Causal discovery Causal modeling Causal bayesian networks 


  1. Achcar, J. A. (1984). “Use of Bayesian analysis to design of clinical trials with one treatment.” Communications in Statistics, Theory, and Methods 13: 1693–1707.CrossRefGoogle Scholar
  2. Akutsu, T., S. Miyano, et al. (1999). Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  3. Alizadeh, A. A., M. B. Eisen, et al. (2000). “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.” Nature 403(6769): 503–511.PubMedCrossRefGoogle Scholar
  4. Arkin, A., P. Shen, et al. (1997). “A test case of correlation metric construction of a reaction pathway from measurements.” Science 277: 1275–1279.CrossRefGoogle Scholar
  5. Benfey, P. N. and T. Mitchell-Olds (2008). “From genotype to phenotype: systems biology meets natural variation.” Science 320: 495–497.PubMedCrossRefGoogle Scholar
  6. Berry, D. A. and D. K. Stangl (1996). Bayesian methods in health-related research. Bayesian Biostatistics D. A. Berry and D. K. Stangl. New York, Marcel Dekker: 3–66.Google Scholar
  7. Blaschke, C., J. C. Oliveros, et al. (2001). “Mining functional information associated with expression arrays.” Functional and Integrative Genomics 4: 256–268.Google Scholar
  8. Boucher, A., A. Doisy, et al. (1998). “A society of goal-oriented agents for the analysis of living cells.” Artificial Intelligence in Medicine 14(1–2): 183–199.PubMedCrossRefGoogle Scholar
  9. Brooks, R. J. (1987). “On the design of comparative lifetime studies.” Communications in Statistics Theory and Methods 16: 1221–1240.CrossRefGoogle Scholar
  10. Brown, P. O. and D. Botstein (1999). “Exploring the new world of the genome with DNA microarrays.” Nature Genetics 21(supplement): 33–37.PubMedCrossRefGoogle Scholar
  11. Caspi, A., K. Sugden, et al. (2003). “Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene.” Science 301(5631): 386–389.PubMedCrossRefGoogle Scholar
  12. Chaloner, K. and I. Verdinelli (1995). “Bayesian experimental design: a review.” Statistical Science 10: 273–304.CrossRefGoogle Scholar
  13. Chen, T., V. Filkov, et al. (1999). Identifying gene regulatory networks from experimental data. ACM-SIGAT, Proceedings of the Third Annual International Conference on Computational Molecular Biology (RECOMB99), Lyon, France.Google Scholar
  14. Chen, T., H. L. He, et al. (1999). Modeling gene expression with differential equations. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  15. Chevrolat, J., J. Golmard, et al. (1998). “Modelling behavioral syndromes using Bayesian networks.” Artificial Intelligence in Medicine 14(3): 259–277.PubMedCrossRefGoogle Scholar
  16. Citro, G., G. Banks, et al. (1997). “INKBLOT: A neurological diagnostic decision support system integrating causal and anatomical knowledge.” Artificial Intelligence in Medicine 10: 257–267.PubMedCrossRefGoogle Scholar
  17. Cooper, G. F. (1987). Probabilistic inference using belief networks is NP-hard, Stanford University, Stanford, CA.Google Scholar
  18. Cooper, G. F. and E. Herskovits (1992). “A Bayesian method for the induction of probabilistic networks from data.” Machine Learning 9: 309–347.Google Scholar
  19. Cooper, G. F. and C. Yoo (1999). Causal discovery from a mixture of experimental and observa- tional data. Proceedings of the Conference on Uncertainty in Artificial Intelligence, San Fransisco, CA, Morgan Kaufmann.Google Scholar
  20. D’haeseleer, P., X. Wen, et al. (1999). Linear modeling of mRNA expression levels during CNS development and injury. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  21. de Jong, H. (2002). “Modeling and simulation of genetic regulatory systems: a literature review.” Journal of Computational Biology 9(1): 67–103.PubMedCrossRefGoogle Scholar
  22. Dupont, W. D. and W. D. Plummer (1990). “Power and sample size calculations: a review and computer program.” Controlled Clinical Trials 11: 116–128.PubMedCrossRefGoogle Scholar
  23. Dutilh, B. (1999). Gene Networks from Microarray Data. Unpublished manuscript, Literature thesis, Utrecht University.Google Scholar
  24. Fisher, R. A. (1925). Statistical Methods for Research Workers. London, Oliver and Boyd.Google Scholar
  25. Fisher, R. A. (1971). The Design of Experiments. New York, Hafner Publishing Company.Google Scholar
  26. Friedman, L. M., C. D. Furberg, et al. (1996).  Chapter 7, sample size. Fundamentals of Clinical Trials, 3rd Edition. St. Louis, MO, Mosby-Year Book: 94–129.Google Scholar
  27. Friedman, N., M. Linial, et al. (2000). “Using Bayesian networks to analyze expression data.” Journal of Computational Biology 7: 601–620.PubMedCrossRefGoogle Scholar
  28. Getz, G., E. Levine, et al. (2000). “Coupled two-way clustering analysis of gene microarray data”. Proceedings of the National Academy of Sciences 97(22): 12079–12084.CrossRefGoogle Scholar
  29. Golub, T. R., D. K. Slonim, et al. (1999). “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.” Science 286: 531–537.PubMedCrossRefGoogle Scholar
  30. Goodwin, B. C. (1965). “Oscillatory behavior of enzymatic control processes.” Advances in Enzyme Regulation 3: 425–439.PubMedCrossRefGoogle Scholar
  31. Goss, P. J. E. and J. Peccoud (1999). Analysis of the stabilizing effect of ROM on the genetic network controlling ColE1 plasmid replication. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  32. Griffith, J. S. (1968). “Mathematics of cellular control processes I. Negative feedback to one gene.” Journal of Theoretical Biology 20: 202–208.PubMedCrossRefGoogle Scholar
  33. Griffith, J. S. (1968). “Mathematics of cellular control processes II. Positive feedback to one gene.” Journal of Theoretical Biology 20: 209–216.PubMedCrossRefGoogle Scholar
  34. Hahn, L. W., M. D. Ritchie, et al. (2003). “Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions.” Bioinformatics 19(3): 376–382.PubMedCrossRefGoogle Scholar
  35. Hartemink, A. J., D. K. Gifford, et al. (2001). Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Pacific Symposium on Bio-computing, Maui, Hawaii.Google Scholar
  36. Heckerman, D. (1995). A Bayesian approach to learning causal networks. Proceedings of the Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, Morgan Kaufmann.Google Scholar
  37. Heckerman, D., D. Geiger, et al. (1995). “Learning Bayesian networks: the combination of knowledge and statistical data.” Machine Learning 20: 197–243.Google Scholar
  38. Heckerman, D., C. Meek, et al. (1999). A Bayesian approach to causal discovery. Computation, Causation, and Discovery. C. Glymour and G. F. Cooper. Menlo Park, CA, AAAI Press: 141–165.Google Scholar
  39. Heiman, M. G. and P. Walter (2000). “Prm1p, a pheromone-regulated multispanning membrane protein, facilitates plasma membrane fusion during yeast mating.” Journal of Cell Biology 151: 719–730.PubMedCrossRefGoogle Scholar
  40. Herwig, R., A. J. Poustka, et al. (1999). “Large-scale clustering of cDNA-fingerprinting data.” Genome Research 9: 1093–1105.PubMedCrossRefGoogle Scholar
  41. Ideker, T., T. Galitski, et al. (2001). “A new approach to decoding life: system biology.” Annual Review of Genomics and Human Genetics 2: 343–372.PubMedCrossRefGoogle Scholar
  42. Ideker, T., V. Thorsson, et al. (2000). Discovery of regulatory interactions through perturbation: inference and experimental design. Pacific Symposium Biocompution, Maui, Hawaii.Google Scholar
  43. Ideker, T., V. Thorsson, et al. (2001). “Integrated genomic and proteomic analysis of a systematically perturbed metabolic network.” Science 292: 929–934.PubMedCrossRefGoogle Scholar
  44. Jones, S., X. Zhan, et al. (2008). “Core signaling pathways in human pancreatic cancers revealed by global genomic analyses”. Science 321(5897): 1801–1806.PubMedCrossRefGoogle Scholar
  45. Karp, P. D. (1990). Hypothesis formation as design. Computational Models of Discovery and Theory Formation. J. Shrager and P. Langley. San Mateo, CA, Morgan Kaufman: 276–317.Google Scholar
  46. Karp, P. D., M. Krummenacker, et al. (1999). “Integrated pathway/genome database and their role in drug discovery”. Trends in Biotechnology 17(7): 275–281.PubMedCrossRefGoogle Scholar
  47. Karp, R. M., R. Stoughton, et al. (1999). Algorithms for choosing differential gene expression experiments. Annual Conference on Research in Computational Biology, Lyon, France.Google Scholar
  48. KEGG (Kyoto Encyclopedia of Genes and Genomes). Available at:
  49. Kerr, M. K. and G. A. Churchill (2001). “Experimental design for gene expression microarrays.” Biostatistics 2: 183–201.PubMedCrossRefGoogle Scholar
  50. Kitano, H. (2002). “Systems biology: a brief overview.” Science 295: 1662–1664.PubMedCrossRefGoogle Scholar
  51. Koza, J., W. Mydlowec, et al. (2001). Reverse engineering of metabolic pathways from observed data using genetic programming. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  52. Lakatos, E. (1988). “Sample sizes based on the log-rank statistic in complex clinical trials.” Biometrics 44: 229–242.PubMedCrossRefGoogle Scholar
  53. Landrya, C. R., J. Ohb, et al. (2005). “Genome-wide scan reveals that genetic variation for transcriptional plasticity in yeast is biased towards multi-copy and dispensable genes”. Gene 366(2): 343–351.CrossRefGoogle Scholar
  54. Li, Y., Alvarez, O. A., Gutteling, E. W., et al. (2006). “Mapping determinants of gene expression plasticity by genetical genomics in C. elegans.” PLoS Genetics 2(12): e222.PubMedCrossRefGoogle Scholar
  55. Liang, S., S. Fuhrman, et al. (1998). REVEAL, A general reverse engineering algorithm for inference of genetic network architectures. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  56. Lilienfield, A. M. (1982). “Ceteris paribusthe evolution of the clinical trial.” Bulletin of the History of Medicine 56: 1–18.Google Scholar
  57. Lindley, D. V. (1972). Bayesian Statistics, a Review. Philadelphia, PA, SIAM.Google Scholar
  58. Lipshutz, R. J., S. P. A. Fodor, et al. (1999). “High density synthetic oligonucleotide arrays.” Nature Genetics 21(supplement): 20–24.PubMedCrossRefGoogle Scholar
  59. Lucas, P. J. F., de Bruijn, N. C., et al. (2000). “A probabilistic and decision-theoretic approach to the management of infectious disease at the ICU.” Artificial Intelligence in Medicine 19(3): 251–279.PubMedCrossRefGoogle Scholar
  60. Margulies, M., M. Egholm, et al. (2005). “Genome sequencing in microfabricated high-density picolitre reactors.” Nature 437: 376–380.PubMedGoogle Scholar
  61. Matsuno, H., A. Doi, et al. (2000). Hybrid Petri net representation of gene regulatory network. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  62. McAdams, H. and L. Shapiro (1995). “Circuit simulation of genetic networks”. Science 269(4): 650–656.PubMedCrossRefGoogle Scholar
  63. Mestl, T., C. Lemay, et al. (1996). “Chaos in high-dimensional neural and gene networks.” Physica D 98: 33–52.CrossRefGoogle Scholar
  64. Michaels, G. S., D. B. Carr, et al. (1998). Cluster analysis and data visualization of large-scale gene expression data. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  65. MIPS (Munich Information Center for Protein Sequences). Yeast pathway. Available at:
  66. Murphy, K. and S. Mian (1999). Modelling gene expression data using dynamic Bayesian networks. Technical report, Department of Computer Science, University of California, Berkeley.Google Scholar
  67. Parsons, D. W., S. Jones, et al. (2008). “An integrated genomic analysis of human glioblastoma multiforme”. Science 26(5897): 1807–1812.CrossRefGoogle Scholar
  68. Pe’er, D., A. Regev, et al. (2001). Inferring subnetworks from perturbed expression profiles. Proceedings in 9th International Conference on Intelligent Systems for Molecular Biology (ISMB), Copenhagen, Denmark.Google Scholar
  69. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo, CA, Morgan Kaufmann.Google Scholar
  70. Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge, UK, Cambridge University Press.Google Scholar
  71. Plahte, E., T. Mestl, et al. (1998). “A methodological basis for description and analysis of systems with complex switch-like interactions.” Journal of Mathematical Biology 36: 321–348.PubMedCrossRefGoogle Scholar
  72. Quaglini, S., M. Stefanelli, et al. (2001). “Flexible guideline-based patient careflow systems”. Artificial Intelligence in Medicine 22(1): 65–80.PubMedCrossRefGoogle Scholar
  73. Reis, B. Y., A. S. Butte, et al. (2001). “Extracting knowledge from dynamics in gene expression.” Journal of Biomedical Informatics 1(1): 1–13.Google Scholar
  74. Sabeti, P. C., P. Varilly, et al. (2007). “Genome-wide detection and characterization of positive selection in human populations.” Nature 449: 913–919.PubMedCrossRefGoogle Scholar
  75. Samsonova, M. G. and V. N. Serov (1999). NetWork: an interactive interface to the tools for analy- sis of genetic network structure and dynamics. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  76. Schadt, E. E. (2009). “Molecular networks as sensors and drivers of common human diseases.” Nature 461: 218–223.PubMedCrossRefGoogle Scholar
  77. Schuemie, M., C. Chichester, et al. (2007). “Assignment of protein function and discovery of novel nucleolar proteins based on automatic analysis of MEDLINE.” Proteomics 7: 921–931.PubMedCrossRefGoogle Scholar
  78. Smith, E. N. and L. Kruglyak (2008). “Gene–environment interaction in yeast gene expression”. PLoS Biology 6(4): 810–824.CrossRefGoogle Scholar
  79. Smolen, P., D. A. Baxter, et al. (2000). “Modeling transciptional control in gene networks – methods, recent results and future directions.” Bulletin of Mathematical Biology 62: 247–292.PubMedCrossRefGoogle Scholar
  80. Snoussi, E. H. and R. Thomas (1993). “Logical identification of all steady states: the concept of feedback loop characteristic states.” Bulletin of Mathematical Biology 55: 973–991.Google Scholar
  81. Somogyi, R. and C. Sniegoski (1996). “Modeling the complexity of genetic networks: understanding multigenetic and pleiotropic regulation.” Complexity 1(6): 45–63.Google Scholar
  82. Spellman, P. T., G. Sherlock, et al. (1998). “Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization.” Molecular Biology of the Cell 9: 3273–3297.PubMedGoogle Scholar
  83. Spiegelhalter, D. J., L. S. Freedman, et al. (1994). “Bayesian approach to randomized trials.” Journal of the Royal Statistical Society 157(Part 3): 357–416.Google Scholar
  84. Spirtes, P., C. Glymour, et al. (2000). Causation, Prediction, and Search. Cambridge, MA, MIT Press.Google Scholar
  85. Spirtes, P., C. Glymour, et al. (2001). Constructing Bayesian network models of gene expression networks from microarray data. The Proceedings of the Atlantic Symposium on Computational Biology, Genome Information Systems and Technology (to appear).Google Scholar
  86. Steele, E., A. Tucker, et al. (2009). “Literature-based priors for gene regulatory networks.” Bioinformatics 25(14): 1768–1774.PubMedCrossRefGoogle Scholar
  87. Stephens, M., M. Palakal, et al. (2001). Detecting gene relations from MEDLINE abstracts. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  88. Sylvester, R. J. (1988). “A Bayesian approach to the design of phase II clinical trials.” Biometrics 44: 823–836.PubMedCrossRefGoogle Scholar
  89. Talmud, P. (2007). “Gene–environment interaction and its impact on coronary heart disease risk”. Nutrition, Metabolism, and Cardiovascular Diseases 17(2): 148–152.PubMedCrossRefGoogle Scholar
  90. The Gene Ontology Consortium (2000). “Gene ontology: tool for the unification of biology.” Nature Genetics 25: 25–29.CrossRefGoogle Scholar
  91. The Genome International Sequencing Consortium (2001). “Initial sequencing and analysis of the human genome.” Nature 409: 860–921.CrossRefGoogle Scholar
  92. The International Haplotype Map Consortium (2005). “A haplotype map of the human genome.” Nature 437: 1299–1320.CrossRefGoogle Scholar
  93. Tomita, M., K. Hashimoto, et al. (1999). “E-CELL: software environment for whole cell simulation.” Bioinformatics 15(1): 72–84.PubMedCrossRefGoogle Scholar
  94. Tong, S. and D. Koller (2001). Active learning for structure in Bayesian networks. International Joint Conference on Artificial Intelligence, Seattle, WA.Google Scholar
  95. Tsang, J. (1999). Gene expression, DNA arrays, and genetic network. Unpublished manuscript, Bioinformatics Laboratory, University of Waterloo.Google Scholar
  96. Ulrich, C. M., E. Kampman, et al. (1999). “Colorectal adenomas and the C677T MTHFR polymorphism: evidence for gene–environment interaction?” Cancer Epidemiology, Biomarkers & Prevention 8(669): 659–668.Google Scholar
  97. Weaver, D. C., C. T. Workman, et al. (1999). Modeling regulatory networks with weight matrices. Pacific Symposium on Biocomputing, Maui, Hawaii, 123.Google Scholar
  98. Wessels, L. F. A., E. P. V. Someren, et al. (2001). A comparison of genetic network models. Pacific Symposium on Biocomputing, Maui, Hawaii.Google Scholar
  99. Yoo, C. and E. Blitz (2008). “Local causal discovery algorithm using causal Bayesian networks.” Annals of the New York Academy of Science 1158: 93–101.CrossRefGoogle Scholar
  100. Yoo, C. and G. Cooper (2004). “An evaluation of a system that recommends microarray experiments to perform to discover gene-regulation pathways.” Journal of Artificial Intelligence in Medicine 31: 169–182.CrossRefGoogle Scholar
  101. Yoo, C., V. Thorsson, et al. (2002). Discovery of a gene-regulation pathway from a mixture of exp- erimental and observational DNA microarray data. Pacific Symposium on Biocomputing, Maui, Hawaii, World Scientific.Google Scholar
  102. Yuh, C., H. Bolouri, et al. (1998). “Genomic Cis-regulatory logic: experimental and computational analysis of a sea urchin gene.” Science 279: 1896–1902.PubMedCrossRefGoogle Scholar
  103. Zhu, J., B. Zhang, et al. (2008). “Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks.” Nature Genetics 40(7): 854–861.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+ Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Epidemiology and Biostatistics, Robert Stempel School of Public HealthFlorida International UniversityMiamiUSA

Personalised recommendations