# Disease Models, Part I: Graphical Models

• Ilya Shpitser
Chapter

## Abstract

Scientists building models of the world by necessity abstract away features not directly relevant to their line of inquiry. Furthermore, complete knowledge of relevant features is not generally possible. The mathematical formalism that has proven to be the most successful at simultaneously abstracting the irrelevant, while effectively summarizing incomplete knowledge, is probability theory. First studied in the context of analyzing games of chance, probability theory has flowered into a mature mathematical discipline today whose tools, methods, and concepts permeate statistics, engineering, and social and empirical sciences. A key insight, discovered multiple times independently during the 20th century, but refined, generalized, and popularized by computer scientists, is that there is a close link between probabilities and graphs. This link allows numerical, quantitative relationships such as conditional independence found in the study of probability to be expressed in a visual, qualitative way using the language of graphs. As human intuitions are more readily brought to bear in visual rather than algebraic and computational settings, graphs aid human comprehension in complex probabilistic domains. This connection between probabilities and graphs has other advantages as well - for instance the magnitude of computational resources needed to reason about a particular probabilistic domain can be read from a graph representing this domain. Finally, graphs provide a concise and intuitive language for reasoning about causes and effects. In this chapter, we explore the basic laws of probability, the relationship between probability and causation, the way in which graphs can be used to reason about probabilistic and causal models, and finally how such graphical models can be learned from data. The application of these graphs to formalize observations and knowledge about disease are provided.

## Keywords

Bayesian Network Causal Effect Causal Model Conditional Independence Bayesian Belief Network

## References

1. 1.
Acid S, de Campos LM (2001) A hybrid methodology for learning belief networks: BENEDICT. Intl J Approximate Reasoning, 27(3):235-262.
2. 2.
Acid S, de Campos LM, Fernandez-Luna JM, Rodriguez S, Maria Rodriguez J, Luis Salcedo J (2004) A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service. Artif Intell Med, 30(3):215-232.
3. 3.
Andreassen S, Suojanen M, Falck B, Olesen K (2001) Improving the diagnostic performance of MUNIN by remodelling of the diseases. Artificial Intelligence in Medicine, pp 167-176.Google Scholar
4. 4.
Andreassen S, Woldbye M, Falck B, Andersen SK (1987) MUNIN: A causal probabilistic network for interpretation of electromyographic findings. Proc 10th Intl Joint Conf on Artificial Intelligence, pp 366-372.Google Scholar
5. 5.
Antal P, Fannes G, Timmerman D, Moreau Y, De Moor B (2004) Using literature and data to learn Bayesian networks as clinical models of ovarian tumors. Artif Intell Med, 30(3):257-281.
6. 6.
Ash RB, Doleans-Dade CA (2000) Probability & Measure Theory. 2nd edition. Academic Press, San Diego, CA.
7. 7.
Balke A, Pearl J (1994) Counterfactual probabilities: Computational methods, bounds, and applications. Proc 10th Conf Uncertainty in Artificial Intelligence (UAI), pp 46-54.Google Scholar
8. 8.
Balke A, Pearl J (1994) Probabilistic evaluation of counterfactual queries. Proc 12th American Assoc Artificial Intelligence (AAAI), pp 230-237.Google Scholar
9. 9.
Brown LE, Tsamardinos I, Aliferis CF (2004) A novel algorithm for scalable and accurate Bayesian network learning. Stud Health Technol Inform, 107(Pt 1):711-715.Google Scholar
10. 10.
Bryk AS, Raudenbush SW (1992) Hierarchical linear models: Applications and data analysis methods. Sage Publications, Newbury Park.Google Scholar
11. 11.
Buchanan BG, Shortliffe EH (1984) Rule-based expert systems: The MYCIN experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, Mass..Google Scholar
12. 12.
Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK (2006) Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: Initial experience. Radiology, 240(3):666-673.
13. 13.
Carrerira-Perpinan MA (1997) A review of dimension reduction techniques (Technical Report). Dept Computer Science, University of Sheffield. www.dcs.shef.ac.uk/intranet/re-search/resmes/CS9609.pdf . Accessed February 5, 2009.
14. 14.
Caruana R (2001) A non-parametric EM-style algorithm for imputing missing values. Proc 8th Intl Workshop Artificial Intelligence and Statistics, Key West, FL.Google Scholar
15. 15.
Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res, 3:507-554.
16. 16.
Chung KL (2001) A Course in Probability Theory Revised. 2nd edition. Academic Press, San Diego, CA.Google Scholar
17. 17.
Cooper GF (1995) A Bayesian method for learning belief networks that contain hidden variables. J Intell Inf Sys, 4(1):71-88.
18. 18.
Cooper GF (2000) A Bayesian method for causal modeling and discovery under selection. Proc 16th Conf Uncertainty in Artificial Intelligence (UAI), pp 98-106.Google Scholar
19. 19.
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309-347.
20. 20.
Coupé VM, Peek N, Ottenkamp J, Habbema JD (1999) Using sensitivity analysis for efficient quantification of a belief network. Artif Intell Med, 17(3):223-247.
21. 21.
Darwiche A (2009) Modeling and reasoning with Bayesian networks. Cambridge University Press, New York.
22. 22.
Dawid AP (1979) Conditional independence in statistical theory. J Royal Statistical Society, 41(1):1-31.
23. 23.
Dekhtyar A, Goldsmith J, Goldstein B, Mathias KK, Isenhour C (2009) Planning for success: The interdisciplinary approach to building Bayesian models. International Journal of Approximate Reasoning, 50(3):416-428.
24. 24.
Dempster AP, Laird M, Rubin D (1977) Maximum likelihood from incomplete data using the EM algorithm. J Royal Statistical Society, 39(1):1-38.
25. 25.
Dojer N, Gambin A, Mizera A, Wilczynski B, Tiuryn J (2006) Applying dynamic Bayesian networks to perturbed gene expression data. BMC Bioinformatics, 7:249.
26. 26.
Druzdel MJ, van der Gaag LC (2000) Building probabilistic networks: “Where do the numbers come from?” (Guest editorial). IEEE Trans Knowledge and Data Engineering, 12(4):481-486.
27. 27.
Duda RO, Hart PE, Nilsson NJ (1976) Subjective Bayesian methods for rule-based inference systems. Proc Natl Computer Conf (AFIPS), pp 1075-1082.Google Scholar
28. 28.
Fishelson M, Geiger D (2002) Exact genetic linkage computations for general pedigrees. Bioinformatics, 18(S1):189-198.Google Scholar
29. 29.
Fishelson M, Geiger D (2004) Optimizing exact genetic linkage computations. J Comput Biol, 11(2-3):263-275.
30. 30.
Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science, 303(5659):799-805.
31. 31.
Friedman N, Linial M, Nachman I, Pe'er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol, 7(3-4):601-620.
32. 32.
Greenland S (2003) Quantifying biases in causal models: Classical confounding vs collider-stratification bias. Epidemiology, 14(3):300-306.
33. 33.
Haavelmo T (1943) The statistical implications of a system of simultaneous equations. Econometrica, 11:1-12.
34. 34.
Harrell FE, Jr., Lee KL, Mark DB (1996) Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 15(4):361-387.
35. 35.
Heckerman D (1999) A tutorial on learning with Baysesian networks. In: Jordan M (ed) Learning in Graphical Models. MIT Press, Cambridge, MA.Google Scholar
36. 36.
Heckerman DE, Horvitz EJ, Nathwani BN (1992) Toward normative expert systems: Part I. The Pathfinder project. Methods Inf Med, 31(2):90-105.Google Scholar
37. 37.
Helman P, Veroff R, Atlas SR, Willman C (2004) A Bayesian network classification methodology for gene expression data. J Computational Biology, 11(4):581-615.
38. 38.
Huang Y, Valtorta M (2006) Pearl's Calculus of intervention is complete. Proc 22nd Conf Uncertainty in Artificial Intelligence (UAI), pp 217-224.Google Scholar
39. 39.
Kahn CE, Jr., Roberts LM, Shaffer KA, Haddawy P (1997) Construction of a Bayesian network for mammographic diagnosis of breast cancer. Comput Biol Med, 27(1):19-29.
40. 40.
Kindermann R, Snell JL (1980) Markov Random Fields and their Applications. American Mathematical Society.Google Scholar
41. 41.
Kline JA, Novobilski AJ, Kabrhel C, Richman PB, Courtney DM (2005) Derivation and validation of a Bayesian network to predict pretest probability of venous thromboembolism. Ann Emerg Med, 45(3):282-290.
42. 42.
Kline RB (2005) Principles and Practice of Structural Equation Modeling. The Guilford Press, New York, NY.Google Scholar
43. 43.
Lam W, Bacchus F (1994) Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10(4):269-293.
44. 44.
Lavrac N, Keravnou E, Zupan B (2000) Intelligent data analysis in medicine. In: Kent A, et al. (eds) Encyclopedia of Computer Science and Technology, vol 42, pp 113-157.Google Scholar
45. 45.
Ledley RS, Lusted LB (1959) Reasoning foundations of medical diagnosis. Science, 130(3366):9-21.
46. 46.
Leibovici L, Fishman M, Schonheyder HC, Riekehr C, Kristensen B, Shraga I, Andreassen S (2000) A causal probabilistic network for optimal treatment of bacterial infections. IEEE Trans Knowledge and Data Engineering, 12(4):517-528.
47. 47.
Lewis D (1973) Counterfactuals. Harvard University Press, Cambridge, MA.Google Scholar
48. 48.
Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4):393-423.
49. 49.
Lucas PJ, Segaar RW, Janssens AR (1989) HEPAR: An expert system for the diagnosis of disorders of the liver and biliary tract. Liver, 9(5):266-275.Google Scholar
50. 50.
Lucas PJ, van der Gaag LC, Abu-Hanna A (2004) Bayesian networks in biomedicine and healthcare. Artif Intell Med, 30(3):201-214.
51. 51.
Luciani D, Marchesi M, Bertolini G (2003) The role of Bayesian networks in the diagnosis of pulmonary embolism. J Thromb Haemost, 1(4):698-707.
52. 52.
Meyer J, Phillips MH, Cho PS, Kalet I, Doctor JN (2004) Application of influence diagrams to prostate intensity-modulated radiation therapy plan selection. Phys Med Biol, 49(9):1637-1653.
53. 53.
Monti S, Carenini G (2000) Dealing with the expert inconsistency in probability elicitation. IEEE Trans Knowledge and Data Engineering, 12(4):499-508.
54. 54.
Monti S, Cooper GF (1998) A multivariate discretization method for learning Bayesian networks from mixed data. Proc 14th Conf Uncertainty in Artificial Intelligence (UAI), pp 404–413.Google Scholar
55. 55.
Murphy K (2002) Dynamic Bayesian networks: Representation, inference, and learning. Department of Computer Science, PhD dissertation. University of California, Berkeley.Google Scholar
56. 56.
Neapolitan RE (2003) Chapter 8, Bayesian structure learning. Learning Bayesian Networks. Prentice Hall, London.Google Scholar
57. 57.
Neil M, Fenton N, Nielson L (2000) Building large-scale Bayesian networks. The Knowledge Engineering Review, 15(3):257-284.
58. 58.
Neyman J (1923) Sur les applications de la thar des probabilities aux expereince agaricales: Essay des principles. (Excerpts reprinted and translated to English, 1990). Statistical Science, 5:463-472.
59. 59.
Nikiforidis GC, Sakellaropoulos GC (1998) Expert system support using Bayesian belief networks in the prognosis of head-injured patients of the ICU. Med Inform, 23(1):1-18.
60. 60.
O'Hagan A, al. E (2006) Uncertain Judgements: Eliciting Experts' Probabilities. John Wiley & Sons, London.
61. 61.
Ogunyemi OI, Clarke JR, Ash N, Webber BL (2002) Combining geometric and probabilistic reasoning for computer-based penetrating-trauma assessment. J Am Med Inform Assoc, 9(3):273-282.
62. 62.
Onisko A (2003) Probabilistic causal models in medicine: Application to diagnosis in liver disorders. Institute of Biocybernetics and Biomedical Engineering, PhD dissertation. Polish Academy of Science.Google Scholar
63. 63.
Parker RC, Miller RA (1987) Using causal knowledge to create simulated patient cases: The CPCS Project as an extension of INTERNIST-1. Proc Ann Symp Computer Applications in Medical Care, pp 473-480.Google Scholar
64. 64.
Patil RS (1987) Causal reasoning in computer programs for medical diagnosis. Comp Methods and Programs in Biomedicine, 25(2):117-124.
65. 65.
Pauker SG, Gorry GA, Kassirer JP, Schwartz WB (1976) Towards the simulation of clinical cognition: Taking a present illness by computer. Am J Med, 60(7):981-996.
66. 66.
Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA.Google Scholar
67. 67.
Pearl J (2000) Causality: Models, Reasoning, and Inference. Cambridge University Press, New York.
68. 68.
Pople H (1977) The formation of composite hypotheses in diagnostic problem solving: An exercise in synthetic reasoning. Proc 5th Intl Joint Conf Artificial Intelligence, Cambridge, MA, pp 1030-1037.Google Scholar
69. 69.
Pople H (1982) Heuristic methods for imposing structure on ill-structured problems: The structuring of medical diagnostics. In: Szolovits P (ed) Artificial Intelligence in Medicine. Westview Press, Boulder, CO, pp 119-190.Google Scholar
70. 70.
Press SJ (2003) Subjective and Objective Bayesian Statistics: Principles, Models, and Applications. John Wiley & Sons, Hoboken, NJ.Google Scholar
71. 71.
Price GJ, McCluggage WG, Morrison MM, McClean G, Venkatraman L, Diamond J, Bharucha H, Montironi R, Bartels PH, Thompson D, Hamilton PW (2003) Computerized diagnostic decision support system for the classification of preinvasive cervical squamous lesions. Hum Pathol, 34(11):1193-1203.
72. 72.
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257-286.
73. 73.
Reiter R (1980) A logic for default reasoning. Artificial Intelligence, 13:81-132.
74. 74.
Reiter R (1981) On interacting defaults. Proc 4th Intl Joint Conf Artificial Intelligence (IJCAI), pp 270-276.Google Scholar
75. 75.
Richardson T, Spirtes P (2002) Ancetral graph Markov models. Annals of Statistics, 30:962-1030.
76. 76.
Riva A, Bellazzi R (1996) Learning temporal probabilistic causal models from longitudinal data. Artif Intell Med, 8(3):217-234.
77. 77.
Robins JM (1987) A graphical approach to the identification and estimation of causal prameters in mortality studies with sustained exposure periods. J Chronic Disease, 2:139-161.
78. 78.
Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studies. J Educational Psychology, 66:688-701.
79. 79.
Rubin DB (1997) Estimating causal effects from large data sets using propensity scores. Ann Intern Med, 127(8 Pt 2):757-763.Google Scholar
80. 80.
Schafer JL, Olsen MK (1998) Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivariate Behavioral Research, 33:545-571.
81. 81.
Shpitser I, Pearl J (2006) Identification of conditional interventional distributions. Proc 22nd Conf Uncertainty in Artificial Intelligence (UAI).Google Scholar
82. 82.
Shpitser I, Pearl J (2006) Identification of joint interventional distributions in recursive semi-Markovian causal models. Proc 21st National Conf Artificial Intelligence, p 1219.Google Scholar
83. 83.
Shpitser I, Pearl J (2007) What counterfactuals can be tested. Proc 23rd Conf Uncertainty in Artificial Intelligence (UAI).Google Scholar
84. 84.
Shwe MA, Middleton B, Heckerman DE, Henrion M, Horvitz EJ, Lehmann HP, Cooper GF (1991) Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. Part I: The probabilistic model and inference algorithms. Methods Inf Med, 30(4):241-255.Google Scholar
85. 85.
Spirtes P, Glymour C, Scheines R (1993) Causation, Prediction, and Search. Springer, New York, NY.
86. 86.
Spirtes P, Glymour C, Scheines R, al. E (2001) Constructing Bayesian network models of gene expression networks from microarray data. Proc Atlantic Symp Computational Biology, Duke University.Google Scholar
87. 87.
Spirtes P, Meek C, Richardson T (1995) Causal inference in the presence of latent variables and selection bias. Proc 11th Conf Uncertainty in Artificial Intelligence (UAI), pp 499-506.Google Scholar
88. 88.
Suzuki J (1993) A construction of Bayesian networks from databases based on an MDL scheme. Proc Conf Uncertainty in Artificial Intelligence (UAI), pp 266-273.Google Scholar
89. 89.
Tabachneck-Schijf HJM, Geenen PL (2009) Preventing knowledge transfer errors: Probabilistic decision support systems through the users' eyes. International Journal of Approximate Reasoning, 50(3):461-471.
90. 90.
Tenenbaum JB, da Silva V, Landford JC (2000) A global framework for nonlinear dimensionality reduction. Science, 29:2319-2321.
91. 91.
Tian J, Pearl J (2000) Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28(1):287-313.
92. 92.
Tinbergen J (1937) An Econometric Approach to Business Cycle Problems. Hermann Publishers, Paris, France.Google Scholar
93. 93.
Tsamardinos I, Brown L, Aliferis C (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1):31-78.
94. 94.
van der Gaag LC, Tabachneck-Schijf HJM, Geenen PL (2009) Verifying monotonicity of Bayesian networks with domain experts. Intl J Approximate Reasoning, 50(3):429-436.
95. 95.
van der Maaten LJP, Postma EO, van den Jerik HJ (2007) Dimensionality reduction: A comparative review. Maastricht University. http://tsam-fich.wdfiles.com/local-files/apunt-es/TPAMI_Paper.pdf . Accessed February 5, 2009.
96. 96.
Vapnik VN (1998) Statistical Learning Theory. Wiley, New York.
97. 97.
Verma TS, Pearl J (1990) Equivalence and synthesis of causal models (Technical Report). Computer Science Department, UCLA.Google Scholar
98. 98.
Weiss S, Kulikowski C, Amarel S, Safir A (1978) A model-based method for computer-aided medical decision making. Artificial Intelligence, 11(2):145-172.
99. 99.
Witteman CL, Renooij S, Koele P (2007) Medicine in words and numbers: A cross-sectional survey comparing probability assessment scales. BMC Med Inform Decis Mak, 7:13-21.
100. 100.
Wright S (1921) Correlation and causation. J Agricultural Research, 20(7):557-585.Google Scholar
101. 101.
Wu X, Lucas P, Kerr S, Dijkhuizen R (2001) Learning Bayesian network topologies in realistic medical domains. Proc 2nd Intl ACM Symp Medical Data Analysis, pp 302-308.Google Scholar
102. 102.
Xiang Y, Pant B, Eisen A, Beddoes MP, Poole D (1993) Multiply sectioned Bayesian networks for neuromuscular diagnosis. Artif Intell Med, 5(4):293-314.
103. 103.
Yang Y, Webb GI (2002) A comparative study of discretization methods for naive-Bayes classifiers. Proc Pacific Rim Knowledge Acquisition Workshop (PKAW), pp 159-173.Google Scholar
104. 104.
Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED (2004) Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20(18):3594-3603.
105. 105.
Zhang J (2006) Causal inference and reasoning in causally insufficient systems. Department of Philosophy, PhD dissertation. Carnegie Mellon University.Google Scholar
106. 106.
Zhao W, Serpedin E, Dougherty ER (2006) Inferring gene regulatory networks from time series data using the minimum description length principle. Bioinformatics, 22(17):2129-2135.
107. 107.
Zou M, Conzen SD (2005) A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 21(1):71-79.