Disease Models, Part I: Graphical Models

  • Ilya Shpitser


Scientists building models of the world by necessity abstract away features not directly relevant to their line of inquiry. Furthermore, complete knowledge of relevant features is not generally possible. The mathematical formalism that has proven to be the most successful at simultaneously abstracting the irrelevant, while effectively summarizing incomplete knowledge, is probability theory. First studied in the context of analyzing games of chance, probability theory has flowered into a mature mathematical discipline today whose tools, methods, and concepts permeate statistics, engineering, and social and empirical sciences. A key insight, discovered multiple times independently during the 20th century, but refined, generalized, and popularized by computer scientists, is that there is a close link between probabilities and graphs. This link allows numerical, quantitative relationships such as conditional independence found in the study of probability to be expressed in a visual, qualitative way using the language of graphs. As human intuitions are more readily brought to bear in visual rather than algebraic and computational settings, graphs aid human comprehension in complex probabilistic domains. This connection between probabilities and graphs has other advantages as well - for instance the magnitude of computational resources needed to reason about a particular probabilistic domain can be read from a graph representing this domain. Finally, graphs provide a concise and intuitive language for reasoning about causes and effects. In this chapter, we explore the basic laws of probability, the relationship between probability and causation, the way in which graphs can be used to reason about probabilistic and causal models, and finally how such graphical models can be learned from data. The application of these graphs to formalize observations and knowledge about disease are provided.


Bayesian Network Causal Effect Causal Model Conditional Independence Bayesian Belief Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Acid S, de Campos LM (2001) A hybrid methodology for learning belief networks: BENEDICT. Intl J Approximate Reasoning, 27(3):235-262.MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Acid S, de Campos LM, Fernandez-Luna JM, Rodriguez S, Maria Rodriguez J, Luis Salcedo J (2004) A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service. Artif Intell Med, 30(3):215-232.CrossRefGoogle Scholar
  3. 3.
    Andreassen S, Suojanen M, Falck B, Olesen K (2001) Improving the diagnostic performance of MUNIN by remodelling of the diseases. Artificial Intelligence in Medicine, pp 167-176.Google Scholar
  4. 4.
    Andreassen S, Woldbye M, Falck B, Andersen SK (1987) MUNIN: A causal probabilistic network for interpretation of electromyographic findings. Proc 10th Intl Joint Conf on Artificial Intelligence, pp 366-372.Google Scholar
  5. 5.
    Antal P, Fannes G, Timmerman D, Moreau Y, De Moor B (2004) Using literature and data to learn Bayesian networks as clinical models of ovarian tumors. Artif Intell Med, 30(3):257-281.CrossRefGoogle Scholar
  6. 6.
    Ash RB, Doleans-Dade CA (2000) Probability & Measure Theory. 2nd edition. Academic Press, San Diego, CA.MATHGoogle Scholar
  7. 7.
    Balke A, Pearl J (1994) Counterfactual probabilities: Computational methods, bounds, and applications. Proc 10th Conf Uncertainty in Artificial Intelligence (UAI), pp 46-54.Google Scholar
  8. 8.
    Balke A, Pearl J (1994) Probabilistic evaluation of counterfactual queries. Proc 12th American Assoc Artificial Intelligence (AAAI), pp 230-237.Google Scholar
  9. 9.
    Brown LE, Tsamardinos I, Aliferis CF (2004) A novel algorithm for scalable and accurate Bayesian network learning. Stud Health Technol Inform, 107(Pt 1):711-715.Google Scholar
  10. 10.
    Bryk AS, Raudenbush SW (1992) Hierarchical linear models: Applications and data analysis methods. Sage Publications, Newbury Park.Google Scholar
  11. 11.
    Buchanan BG, Shortliffe EH (1984) Rule-based expert systems: The MYCIN experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, Mass..Google Scholar
  12. 12.
    Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK (2006) Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: Initial experience. Radiology, 240(3):666-673.CrossRefGoogle Scholar
  13. 13.
    Carrerira-Perpinan MA (1997) A review of dimension reduction techniques (Technical Report). Dept Computer Science, University of Sheffield. . Accessed February 5, 2009.
  14. 14.
    Caruana R (2001) A non-parametric EM-style algorithm for imputing missing values. Proc 8th Intl Workshop Artificial Intelligence and Statistics, Key West, FL.Google Scholar
  15. 15.
    Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res, 3:507-554.CrossRefMathSciNetGoogle Scholar
  16. 16.
    Chung KL (2001) A Course in Probability Theory Revised. 2nd edition. Academic Press, San Diego, CA.Google Scholar
  17. 17.
    Cooper GF (1995) A Bayesian method for learning belief networks that contain hidden variables. J Intell Inf Sys, 4(1):71-88.CrossRefGoogle Scholar
  18. 18.
    Cooper GF (2000) A Bayesian method for causal modeling and discovery under selection. Proc 16th Conf Uncertainty in Artificial Intelligence (UAI), pp 98-106.Google Scholar
  19. 19.
    Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309-347.MATHGoogle Scholar
  20. 20.
    Coupé VM, Peek N, Ottenkamp J, Habbema JD (1999) Using sensitivity analysis for efficient quantification of a belief network. Artif Intell Med, 17(3):223-247.CrossRefGoogle Scholar
  21. 21.
    Darwiche A (2009) Modeling and reasoning with Bayesian networks. Cambridge University Press, New York.MATHGoogle Scholar
  22. 22.
    Dawid AP (1979) Conditional independence in statistical theory. J Royal Statistical Society, 41(1):1-31.MATHMathSciNetGoogle Scholar
  23. 23.
    Dekhtyar A, Goldsmith J, Goldstein B, Mathias KK, Isenhour C (2009) Planning for success: The interdisciplinary approach to building Bayesian models. International Journal of Approximate Reasoning, 50(3):416-428.CrossRefGoogle Scholar
  24. 24.
    Dempster AP, Laird M, Rubin D (1977) Maximum likelihood from incomplete data using the EM algorithm. J Royal Statistical Society, 39(1):1-38.MATHMathSciNetGoogle Scholar
  25. 25.
    Dojer N, Gambin A, Mizera A, Wilczynski B, Tiuryn J (2006) Applying dynamic Bayesian networks to perturbed gene expression data. BMC Bioinformatics, 7:249.CrossRefGoogle Scholar
  26. 26.
    Druzdel MJ, van der Gaag LC (2000) Building probabilistic networks: “Where do the numbers come from?” (Guest editorial). IEEE Trans Knowledge and Data Engineering, 12(4):481-486.CrossRefGoogle Scholar
  27. 27.
    Duda RO, Hart PE, Nilsson NJ (1976) Subjective Bayesian methods for rule-based inference systems. Proc Natl Computer Conf (AFIPS), pp 1075-1082.Google Scholar
  28. 28.
    Fishelson M, Geiger D (2002) Exact genetic linkage computations for general pedigrees. Bioinformatics, 18(S1):189-198.Google Scholar
  29. 29.
    Fishelson M, Geiger D (2004) Optimizing exact genetic linkage computations. J Comput Biol, 11(2-3):263-275.CrossRefGoogle Scholar
  30. 30.
    Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science, 303(5659):799-805.CrossRefGoogle Scholar
  31. 31.
    Friedman N, Linial M, Nachman I, Pe'er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol, 7(3-4):601-620.CrossRefGoogle Scholar
  32. 32.
    Greenland S (2003) Quantifying biases in causal models: Classical confounding vs collider-stratification bias. Epidemiology, 14(3):300-306.CrossRefGoogle Scholar
  33. 33.
    Haavelmo T (1943) The statistical implications of a system of simultaneous equations. Econometrica, 11:1-12.MATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Harrell FE, Jr., Lee KL, Mark DB (1996) Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 15(4):361-387.CrossRefGoogle Scholar
  35. 35.
    Heckerman D (1999) A tutorial on learning with Baysesian networks. In: Jordan M (ed) Learning in Graphical Models. MIT Press, Cambridge, MA.Google Scholar
  36. 36.
    Heckerman DE, Horvitz EJ, Nathwani BN (1992) Toward normative expert systems: Part I. The Pathfinder project. Methods Inf Med, 31(2):90-105.Google Scholar
  37. 37.
    Helman P, Veroff R, Atlas SR, Willman C (2004) A Bayesian network classification methodology for gene expression data. J Computational Biology, 11(4):581-615.CrossRefGoogle Scholar
  38. 38.
    Huang Y, Valtorta M (2006) Pearl's Calculus of intervention is complete. Proc 22nd Conf Uncertainty in Artificial Intelligence (UAI), pp 217-224.Google Scholar
  39. 39.
    Kahn CE, Jr., Roberts LM, Shaffer KA, Haddawy P (1997) Construction of a Bayesian network for mammographic diagnosis of breast cancer. Comput Biol Med, 27(1):19-29.CrossRefGoogle Scholar
  40. 40.
    Kindermann R, Snell JL (1980) Markov Random Fields and their Applications. American Mathematical Society.Google Scholar
  41. 41.
    Kline JA, Novobilski AJ, Kabrhel C, Richman PB, Courtney DM (2005) Derivation and validation of a Bayesian network to predict pretest probability of venous thromboembolism. Ann Emerg Med, 45(3):282-290.CrossRefGoogle Scholar
  42. 42.
    Kline RB (2005) Principles and Practice of Structural Equation Modeling. The Guilford Press, New York, NY.Google Scholar
  43. 43.
    Lam W, Bacchus F (1994) Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10(4):269-293.CrossRefGoogle Scholar
  44. 44.
    Lavrac N, Keravnou E, Zupan B (2000) Intelligent data analysis in medicine. In: Kent A, et al. (eds) Encyclopedia of Computer Science and Technology, vol 42, pp 113-157.Google Scholar
  45. 45.
    Ledley RS, Lusted LB (1959) Reasoning foundations of medical diagnosis. Science, 130(3366):9-21.CrossRefGoogle Scholar
  46. 46.
    Leibovici L, Fishman M, Schonheyder HC, Riekehr C, Kristensen B, Shraga I, Andreassen S (2000) A causal probabilistic network for optimal treatment of bacterial infections. IEEE Trans Knowledge and Data Engineering, 12(4):517-528.CrossRefGoogle Scholar
  47. 47.
    Lewis D (1973) Counterfactuals. Harvard University Press, Cambridge, MA.Google Scholar
  48. 48.
    Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4):393-423.CrossRefMathSciNetGoogle Scholar
  49. 49.
    Lucas PJ, Segaar RW, Janssens AR (1989) HEPAR: An expert system for the diagnosis of disorders of the liver and biliary tract. Liver, 9(5):266-275.Google Scholar
  50. 50.
    Lucas PJ, van der Gaag LC, Abu-Hanna A (2004) Bayesian networks in biomedicine and healthcare. Artif Intell Med, 30(3):201-214.CrossRefGoogle Scholar
  51. 51.
    Luciani D, Marchesi M, Bertolini G (2003) The role of Bayesian networks in the diagnosis of pulmonary embolism. J Thromb Haemost, 1(4):698-707.CrossRefGoogle Scholar
  52. 52.
    Meyer J, Phillips MH, Cho PS, Kalet I, Doctor JN (2004) Application of influence diagrams to prostate intensity-modulated radiation therapy plan selection. Phys Med Biol, 49(9):1637-1653.CrossRefGoogle Scholar
  53. 53.
    Monti S, Carenini G (2000) Dealing with the expert inconsistency in probability elicitation. IEEE Trans Knowledge and Data Engineering, 12(4):499-508.CrossRefGoogle Scholar
  54. 54.
    Monti S, Cooper GF (1998) A multivariate discretization method for learning Bayesian networks from mixed data. Proc 14th Conf Uncertainty in Artificial Intelligence (UAI), pp 404–413.Google Scholar
  55. 55.
    Murphy K (2002) Dynamic Bayesian networks: Representation, inference, and learning. Department of Computer Science, PhD dissertation. University of California, Berkeley.Google Scholar
  56. 56.
    Neapolitan RE (2003) Chapter 8, Bayesian structure learning. Learning Bayesian Networks. Prentice Hall, London.Google Scholar
  57. 57.
    Neil M, Fenton N, Nielson L (2000) Building large-scale Bayesian networks. The Knowledge Engineering Review, 15(3):257-284.MATHCrossRefGoogle Scholar
  58. 58.
    Neyman J (1923) Sur les applications de la thar des probabilities aux expereince agaricales: Essay des principles. (Excerpts reprinted and translated to English, 1990). Statistical Science, 5:463-472.MathSciNetGoogle Scholar
  59. 59.
    Nikiforidis GC, Sakellaropoulos GC (1998) Expert system support using Bayesian belief networks in the prognosis of head-injured patients of the ICU. Med Inform, 23(1):1-18.CrossRefGoogle Scholar
  60. 60.
    O'Hagan A, al. E (2006) Uncertain Judgements: Eliciting Experts' Probabilities. John Wiley & Sons, London.MATHCrossRefGoogle Scholar
  61. 61.
    Ogunyemi OI, Clarke JR, Ash N, Webber BL (2002) Combining geometric and probabilistic reasoning for computer-based penetrating-trauma assessment. J Am Med Inform Assoc, 9(3):273-282.CrossRefGoogle Scholar
  62. 62.
    Onisko A (2003) Probabilistic causal models in medicine: Application to diagnosis in liver disorders. Institute of Biocybernetics and Biomedical Engineering, PhD dissertation. Polish Academy of Science.Google Scholar
  63. 63.
    Parker RC, Miller RA (1987) Using causal knowledge to create simulated patient cases: The CPCS Project as an extension of INTERNIST-1. Proc Ann Symp Computer Applications in Medical Care, pp 473-480.Google Scholar
  64. 64.
    Patil RS (1987) Causal reasoning in computer programs for medical diagnosis. Comp Methods and Programs in Biomedicine, 25(2):117-124.CrossRefMathSciNetGoogle Scholar
  65. 65.
    Pauker SG, Gorry GA, Kassirer JP, Schwartz WB (1976) Towards the simulation of clinical cognition: Taking a present illness by computer. Am J Med, 60(7):981-996.CrossRefGoogle Scholar
  66. 66.
    Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA.Google Scholar
  67. 67.
    Pearl J (2000) Causality: Models, Reasoning, and Inference. Cambridge University Press, New York.MATHGoogle Scholar
  68. 68.
    Pople H (1977) The formation of composite hypotheses in diagnostic problem solving: An exercise in synthetic reasoning. Proc 5th Intl Joint Conf Artificial Intelligence, Cambridge, MA, pp 1030-1037.Google Scholar
  69. 69.
    Pople H (1982) Heuristic methods for imposing structure on ill-structured problems: The structuring of medical diagnostics. In: Szolovits P (ed) Artificial Intelligence in Medicine. Westview Press, Boulder, CO, pp 119-190.Google Scholar
  70. 70.
    Press SJ (2003) Subjective and Objective Bayesian Statistics: Principles, Models, and Applications. John Wiley & Sons, Hoboken, NJ.Google Scholar
  71. 71.
    Price GJ, McCluggage WG, Morrison MM, McClean G, Venkatraman L, Diamond J, Bharucha H, Montironi R, Bartels PH, Thompson D, Hamilton PW (2003) Computerized diagnostic decision support system for the classification of preinvasive cervical squamous lesions. Hum Pathol, 34(11):1193-1203.CrossRefGoogle Scholar
  72. 72.
    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257-286.CrossRefGoogle Scholar
  73. 73.
    Reiter R (1980) A logic for default reasoning. Artificial Intelligence, 13:81-132.MATHCrossRefMathSciNetGoogle Scholar
  74. 74.
    Reiter R (1981) On interacting defaults. Proc 4th Intl Joint Conf Artificial Intelligence (IJCAI), pp 270-276.Google Scholar
  75. 75.
    Richardson T, Spirtes P (2002) Ancetral graph Markov models. Annals of Statistics, 30:962-1030.MATHCrossRefMathSciNetGoogle Scholar
  76. 76.
    Riva A, Bellazzi R (1996) Learning temporal probabilistic causal models from longitudinal data. Artif Intell Med, 8(3):217-234.CrossRefGoogle Scholar
  77. 77.
    Robins JM (1987) A graphical approach to the identification and estimation of causal prameters in mortality studies with sustained exposure periods. J Chronic Disease, 2:139-161.CrossRefGoogle Scholar
  78. 78.
    Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studies. J Educational Psychology, 66:688-701.CrossRefGoogle Scholar
  79. 79.
    Rubin DB (1997) Estimating causal effects from large data sets using propensity scores. Ann Intern Med, 127(8 Pt 2):757-763.Google Scholar
  80. 80.
    Schafer JL, Olsen MK (1998) Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivariate Behavioral Research, 33:545-571.CrossRefGoogle Scholar
  81. 81.
    Shpitser I, Pearl J (2006) Identification of conditional interventional distributions. Proc 22nd Conf Uncertainty in Artificial Intelligence (UAI).Google Scholar
  82. 82.
    Shpitser I, Pearl J (2006) Identification of joint interventional distributions in recursive semi-Markovian causal models. Proc 21st National Conf Artificial Intelligence, p 1219.Google Scholar
  83. 83.
    Shpitser I, Pearl J (2007) What counterfactuals can be tested. Proc 23rd Conf Uncertainty in Artificial Intelligence (UAI).Google Scholar
  84. 84.
    Shwe MA, Middleton B, Heckerman DE, Henrion M, Horvitz EJ, Lehmann HP, Cooper GF (1991) Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. Part I: The probabilistic model and inference algorithms. Methods Inf Med, 30(4):241-255.Google Scholar
  85. 85.
    Spirtes P, Glymour C, Scheines R (1993) Causation, Prediction, and Search. Springer, New York, NY.MATHGoogle Scholar
  86. 86.
    Spirtes P, Glymour C, Scheines R, al. E (2001) Constructing Bayesian network models of gene expression networks from microarray data. Proc Atlantic Symp Computational Biology, Duke University.Google Scholar
  87. 87.
    Spirtes P, Meek C, Richardson T (1995) Causal inference in the presence of latent variables and selection bias. Proc 11th Conf Uncertainty in Artificial Intelligence (UAI), pp 499-506.Google Scholar
  88. 88.
    Suzuki J (1993) A construction of Bayesian networks from databases based on an MDL scheme. Proc Conf Uncertainty in Artificial Intelligence (UAI), pp 266-273.Google Scholar
  89. 89.
    Tabachneck-Schijf HJM, Geenen PL (2009) Preventing knowledge transfer errors: Probabilistic decision support systems through the users' eyes. International Journal of Approximate Reasoning, 50(3):461-471.CrossRefGoogle Scholar
  90. 90.
    Tenenbaum JB, da Silva V, Landford JC (2000) A global framework for nonlinear dimensionality reduction. Science, 29:2319-2321.CrossRefGoogle Scholar
  91. 91.
    Tian J, Pearl J (2000) Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28(1):287-313.MATHCrossRefMathSciNetGoogle Scholar
  92. 92.
    Tinbergen J (1937) An Econometric Approach to Business Cycle Problems. Hermann Publishers, Paris, France.Google Scholar
  93. 93.
    Tsamardinos I, Brown L, Aliferis C (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1):31-78.CrossRefGoogle Scholar
  94. 94.
    van der Gaag LC, Tabachneck-Schijf HJM, Geenen PL (2009) Verifying monotonicity of Bayesian networks with domain experts. Intl J Approximate Reasoning, 50(3):429-436.CrossRefGoogle Scholar
  95. 95.
    van der Maaten LJP, Postma EO, van den Jerik HJ (2007) Dimensionality reduction: A comparative review. Maastricht University. . Accessed February 5, 2009.
  96. 96.
    Vapnik VN (1998) Statistical Learning Theory. Wiley, New York.MATHGoogle Scholar
  97. 97.
    Verma TS, Pearl J (1990) Equivalence and synthesis of causal models (Technical Report). Computer Science Department, UCLA.Google Scholar
  98. 98.
    Weiss S, Kulikowski C, Amarel S, Safir A (1978) A model-based method for computer-aided medical decision making. Artificial Intelligence, 11(2):145-172.CrossRefGoogle Scholar
  99. 99.
    Witteman CL, Renooij S, Koele P (2007) Medicine in words and numbers: A cross-sectional survey comparing probability assessment scales. BMC Med Inform Decis Mak, 7:13-21.CrossRefGoogle Scholar
  100. 100.
    Wright S (1921) Correlation and causation. J Agricultural Research, 20(7):557-585.Google Scholar
  101. 101.
    Wu X, Lucas P, Kerr S, Dijkhuizen R (2001) Learning Bayesian network topologies in realistic medical domains. Proc 2nd Intl ACM Symp Medical Data Analysis, pp 302-308.Google Scholar
  102. 102.
    Xiang Y, Pant B, Eisen A, Beddoes MP, Poole D (1993) Multiply sectioned Bayesian networks for neuromuscular diagnosis. Artif Intell Med, 5(4):293-314.CrossRefGoogle Scholar
  103. 103.
    Yang Y, Webb GI (2002) A comparative study of discretization methods for naive-Bayes classifiers. Proc Pacific Rim Knowledge Acquisition Workshop (PKAW), pp 159-173.Google Scholar
  104. 104.
    Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED (2004) Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20(18):3594-3603.CrossRefGoogle Scholar
  105. 105.
    Zhang J (2006) Causal inference and reasoning in causally insufficient systems. Department of Philosophy, PhD dissertation. Carnegie Mellon University.Google Scholar
  106. 106.
    Zhao W, Serpedin E, Dougherty ER (2006) Inferring gene regulatory networks from time series data using the minimum description length principle. Bioinformatics, 22(17):2129-2135.CrossRefGoogle Scholar
  107. 107.
    Zou M, Conzen SD (2005) A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 21(1):71-79.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Ilya Shpitser
    • 1
  1. 1.School of Public HealthHarvard UniversityHarvardUSA

Personalised recommendations