Abstract
Scientists building models of the world by necessity abstract away features not directly relevant to their line of inquiry. Furthermore, complete knowledge of relevant features is not generally possible. The mathematical formalism that has proven to be the most successful at simultaneously abstracting the irrelevant, while effectively summarizing incomplete knowledge, is probability theory. First studied in the context of analyzing games of chance, probability theory has flowered into a mature mathematical discipline today whose tools, methods, and concepts permeate statistics, engineering, and social and empirical sciences. A key insight, discovered multiple times independently during the 20th century, but refined, generalized, and popularized by computer scientists, is that there is a close link between probabilities and graphs. This link allows numerical, quantitative relationships such as conditional independence found in the study of probability to be expressed in a visual, qualitative way using the language of graphs. As human intuitions are more readily brought to bear in visual rather than algebraic and computational settings, graphs aid human comprehension in complex probabilistic domains. This connection between probabilities and graphs has other advantages as well - for instance the magnitude of computational resources needed to reason about a particular probabilistic domain can be read from a graph representing this domain. Finally, graphs provide a concise and intuitive language for reasoning about causes and effects. In this chapter, we explore the basic laws of probability, the relationship between probability and causation, the way in which graphs can be used to reason about probabilistic and causal models, and finally how such graphical models can be learned from data. The application of these graphs to formalize observations and knowledge about disease are provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We follow the standard notation of capitalizing random variables and of using lower case letters for outcomes. Bold characters symbolize sets or vectors of variables.
- 2.
Bayesian networks are also referred to as Bayesian belief networks (BBNs), or beliefnetworks. We use all three terms interchangeably throughout this book.
- 3.
These and other types of Bayesian queries are covered in further detail in Chapter 9.
- 4.
For causal models, described later in this chapter, this process is called inductive causal inference (also known as causal discovery), which is learning the causal graph from data.
- 5.
Briefly, the EM algorithm consists of two parts: the E-step, wherein the missing data are estimated using the conditional expectation, based on the observed data and the current estimate of the model parameters; and the M-step, where the likelihood function is maximized assuming the missing data are known (the estimated data from the E-step being used in lieu of the actual missing data).
References
Acid S, de Campos LM (2001) A hybrid methodology for learning belief networks: BENEDICT. Intl J Approximate Reasoning, 27(3):235-262.
Acid S, de Campos LM, Fernandez-Luna JM, Rodriguez S, Maria Rodriguez J, Luis Salcedo J (2004) A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service. Artif Intell Med, 30(3):215-232.
Andreassen S, Suojanen M, Falck B, Olesen K (2001) Improving the diagnostic performance of MUNIN by remodelling of the diseases. Artificial Intelligence in Medicine, pp 167-176.
Andreassen S, Woldbye M, Falck B, Andersen SK (1987) MUNIN: A causal probabilistic network for interpretation of electromyographic findings. Proc 10th Intl Joint Conf on Artificial Intelligence, pp 366-372.
Antal P, Fannes G, Timmerman D, Moreau Y, De Moor B (2004) Using literature and data to learn Bayesian networks as clinical models of ovarian tumors. Artif Intell Med, 30(3):257-281.
Ash RB, Doleans-Dade CA (2000) Probability & Measure Theory. 2nd edition. Academic Press, San Diego, CA.
Balke A, Pearl J (1994) Counterfactual probabilities: Computational methods, bounds, and applications. Proc 10th Conf Uncertainty in Artificial Intelligence (UAI), pp 46-54.
Balke A, Pearl J (1994) Probabilistic evaluation of counterfactual queries. Proc 12th American Assoc Artificial Intelligence (AAAI), pp 230-237.
Brown LE, Tsamardinos I, Aliferis CF (2004) A novel algorithm for scalable and accurate Bayesian network learning. Stud Health Technol Inform, 107(Pt 1):711-715.
Bryk AS, Raudenbush SW (1992) Hierarchical linear models: Applications and data analysis methods. Sage Publications, Newbury Park.
Buchanan BG, Shortliffe EH (1984) Rule-based expert systems: The MYCIN experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, Mass..
Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK (2006) Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: Initial experience. Radiology, 240(3):666-673.
Carrerira-Perpinan MA (1997) A review of dimension reduction techniques (Technical Report). Dept Computer Science, University of Sheffield. www.dcs.shef.ac.uk/intranet/re-search/resmes/CS9609.pdf . Accessed February 5, 2009.
Caruana R (2001) A non-parametric EM-style algorithm for imputing missing values. Proc 8th Intl Workshop Artificial Intelligence and Statistics, Key West, FL.
Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res, 3:507-554.
Chung KL (2001) A Course in Probability Theory Revised. 2nd edition. Academic Press, San Diego, CA.
Cooper GF (1995) A Bayesian method for learning belief networks that contain hidden variables. J Intell Inf Sys, 4(1):71-88.
Cooper GF (2000) A Bayesian method for causal modeling and discovery under selection. Proc 16th Conf Uncertainty in Artificial Intelligence (UAI), pp 98-106.
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309-347.
Coupé VM, Peek N, Ottenkamp J, Habbema JD (1999) Using sensitivity analysis for efficient quantification of a belief network. Artif Intell Med, 17(3):223-247.
Darwiche A (2009) Modeling and reasoning with Bayesian networks. Cambridge University Press, New York.
Dawid AP (1979) Conditional independence in statistical theory. J Royal Statistical Society, 41(1):1-31.
Dekhtyar A, Goldsmith J, Goldstein B, Mathias KK, Isenhour C (2009) Planning for success: The interdisciplinary approach to building Bayesian models. International Journal of Approximate Reasoning, 50(3):416-428.
Dempster AP, Laird M, Rubin D (1977) Maximum likelihood from incomplete data using the EM algorithm. J Royal Statistical Society, 39(1):1-38.
Dojer N, Gambin A, Mizera A, Wilczynski B, Tiuryn J (2006) Applying dynamic Bayesian networks to perturbed gene expression data. BMC Bioinformatics, 7:249.
Druzdel MJ, van der Gaag LC (2000) Building probabilistic networks: “Where do the numbers come from?” (Guest editorial). IEEE Trans Knowledge and Data Engineering, 12(4):481-486.
Duda RO, Hart PE, Nilsson NJ (1976) Subjective Bayesian methods for rule-based inference systems. Proc Natl Computer Conf (AFIPS), pp 1075-1082.
Fishelson M, Geiger D (2002) Exact genetic linkage computations for general pedigrees. Bioinformatics, 18(S1):189-198.
Fishelson M, Geiger D (2004) Optimizing exact genetic linkage computations. J Comput Biol, 11(2-3):263-275.
Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science, 303(5659):799-805.
Friedman N, Linial M, Nachman I, Pe'er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol, 7(3-4):601-620.
Greenland S (2003) Quantifying biases in causal models: Classical confounding vs collider-stratification bias. Epidemiology, 14(3):300-306.
Haavelmo T (1943) The statistical implications of a system of simultaneous equations. Econometrica, 11:1-12.
Harrell FE, Jr., Lee KL, Mark DB (1996) Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 15(4):361-387.
Heckerman D (1999) A tutorial on learning with Baysesian networks. In: Jordan M (ed) Learning in Graphical Models. MIT Press, Cambridge, MA.
Heckerman DE, Horvitz EJ, Nathwani BN (1992) Toward normative expert systems: Part I. The Pathfinder project. Methods Inf Med, 31(2):90-105.
Helman P, Veroff R, Atlas SR, Willman C (2004) A Bayesian network classification methodology for gene expression data. J Computational Biology, 11(4):581-615.
Huang Y, Valtorta M (2006) Pearl's Calculus of intervention is complete. Proc 22nd Conf Uncertainty in Artificial Intelligence (UAI), pp 217-224.
Kahn CE, Jr., Roberts LM, Shaffer KA, Haddawy P (1997) Construction of a Bayesian network for mammographic diagnosis of breast cancer. Comput Biol Med, 27(1):19-29.
Kindermann R, Snell JL (1980) Markov Random Fields and their Applications. American Mathematical Society.
Kline JA, Novobilski AJ, Kabrhel C, Richman PB, Courtney DM (2005) Derivation and validation of a Bayesian network to predict pretest probability of venous thromboembolism. Ann Emerg Med, 45(3):282-290.
Kline RB (2005) Principles and Practice of Structural Equation Modeling. The Guilford Press, New York, NY.
Lam W, Bacchus F (1994) Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10(4):269-293.
Lavrac N, Keravnou E, Zupan B (2000) Intelligent data analysis in medicine. In: Kent A, et al. (eds) Encyclopedia of Computer Science and Technology, vol 42, pp 113-157.
Ledley RS, Lusted LB (1959) Reasoning foundations of medical diagnosis. Science, 130(3366):9-21.
Leibovici L, Fishman M, Schonheyder HC, Riekehr C, Kristensen B, Shraga I, Andreassen S (2000) A causal probabilistic network for optimal treatment of bacterial infections. IEEE Trans Knowledge and Data Engineering, 12(4):517-528.
Lewis D (1973) Counterfactuals. Harvard University Press, Cambridge, MA.
Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4):393-423.
Lucas PJ, Segaar RW, Janssens AR (1989) HEPAR: An expert system for the diagnosis of disorders of the liver and biliary tract. Liver, 9(5):266-275.
Lucas PJ, van der Gaag LC, Abu-Hanna A (2004) Bayesian networks in biomedicine and healthcare. Artif Intell Med, 30(3):201-214.
Luciani D, Marchesi M, Bertolini G (2003) The role of Bayesian networks in the diagnosis of pulmonary embolism. J Thromb Haemost, 1(4):698-707.
Meyer J, Phillips MH, Cho PS, Kalet I, Doctor JN (2004) Application of influence diagrams to prostate intensity-modulated radiation therapy plan selection. Phys Med Biol, 49(9):1637-1653.
Monti S, Carenini G (2000) Dealing with the expert inconsistency in probability elicitation. IEEE Trans Knowledge and Data Engineering, 12(4):499-508.
Monti S, Cooper GF (1998) A multivariate discretization method for learning Bayesian networks from mixed data. Proc 14th Conf Uncertainty in Artificial Intelligence (UAI), pp 404–413.
Murphy K (2002) Dynamic Bayesian networks: Representation, inference, and learning. Department of Computer Science, PhD dissertation. University of California, Berkeley.
Neapolitan RE (2003) Chapter 8, Bayesian structure learning. Learning Bayesian Networks. Prentice Hall, London.
Neil M, Fenton N, Nielson L (2000) Building large-scale Bayesian networks. The Knowledge Engineering Review, 15(3):257-284.
Neyman J (1923) Sur les applications de la thar des probabilities aux expereince agaricales: Essay des principles. (Excerpts reprinted and translated to English, 1990). Statistical Science, 5:463-472.
Nikiforidis GC, Sakellaropoulos GC (1998) Expert system support using Bayesian belief networks in the prognosis of head-injured patients of the ICU. Med Inform, 23(1):1-18.
O'Hagan A, al. E (2006) Uncertain Judgements: Eliciting Experts' Probabilities. John Wiley & Sons, London.
Ogunyemi OI, Clarke JR, Ash N, Webber BL (2002) Combining geometric and probabilistic reasoning for computer-based penetrating-trauma assessment. J Am Med Inform Assoc, 9(3):273-282.
Onisko A (2003) Probabilistic causal models in medicine: Application to diagnosis in liver disorders. Institute of Biocybernetics and Biomedical Engineering, PhD dissertation. Polish Academy of Science.
Parker RC, Miller RA (1987) Using causal knowledge to create simulated patient cases: The CPCS Project as an extension of INTERNIST-1. Proc Ann Symp Computer Applications in Medical Care, pp 473-480.
Patil RS (1987) Causal reasoning in computer programs for medical diagnosis. Comp Methods and Programs in Biomedicine, 25(2):117-124.
Pauker SG, Gorry GA, Kassirer JP, Schwartz WB (1976) Towards the simulation of clinical cognition: Taking a present illness by computer. Am J Med, 60(7):981-996.
Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA.
Pearl J (2000) Causality: Models, Reasoning, and Inference. Cambridge University Press, New York.
Pople H (1977) The formation of composite hypotheses in diagnostic problem solving: An exercise in synthetic reasoning. Proc 5th Intl Joint Conf Artificial Intelligence, Cambridge, MA, pp 1030-1037.
Pople H (1982) Heuristic methods for imposing structure on ill-structured problems: The structuring of medical diagnostics. In: Szolovits P (ed) Artificial Intelligence in Medicine. Westview Press, Boulder, CO, pp 119-190.
Press SJ (2003) Subjective and Objective Bayesian Statistics: Principles, Models, and Applications. John Wiley & Sons, Hoboken, NJ.
Price GJ, McCluggage WG, Morrison MM, McClean G, Venkatraman L, Diamond J, Bharucha H, Montironi R, Bartels PH, Thompson D, Hamilton PW (2003) Computerized diagnostic decision support system for the classification of preinvasive cervical squamous lesions. Hum Pathol, 34(11):1193-1203.
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257-286.
Reiter R (1980) A logic for default reasoning. Artificial Intelligence, 13:81-132.
Reiter R (1981) On interacting defaults. Proc 4th Intl Joint Conf Artificial Intelligence (IJCAI), pp 270-276.
Richardson T, Spirtes P (2002) Ancetral graph Markov models. Annals of Statistics, 30:962-1030.
Riva A, Bellazzi R (1996) Learning temporal probabilistic causal models from longitudinal data. Artif Intell Med, 8(3):217-234.
Robins JM (1987) A graphical approach to the identification and estimation of causal prameters in mortality studies with sustained exposure periods. J Chronic Disease, 2:139-161.
Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studies. J Educational Psychology, 66:688-701.
Rubin DB (1997) Estimating causal effects from large data sets using propensity scores. Ann Intern Med, 127(8 Pt 2):757-763.
Schafer JL, Olsen MK (1998) Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivariate Behavioral Research, 33:545-571.
Shpitser I, Pearl J (2006) Identification of conditional interventional distributions. Proc 22nd Conf Uncertainty in Artificial Intelligence (UAI).
Shpitser I, Pearl J (2006) Identification of joint interventional distributions in recursive semi-Markovian causal models. Proc 21st National Conf Artificial Intelligence, p 1219.
Shpitser I, Pearl J (2007) What counterfactuals can be tested. Proc 23rd Conf Uncertainty in Artificial Intelligence (UAI).
Shwe MA, Middleton B, Heckerman DE, Henrion M, Horvitz EJ, Lehmann HP, Cooper GF (1991) Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. Part I: The probabilistic model and inference algorithms. Methods Inf Med, 30(4):241-255.
Spirtes P, Glymour C, Scheines R (1993) Causation, Prediction, and Search. Springer, New York, NY.
Spirtes P, Glymour C, Scheines R, al. E (2001) Constructing Bayesian network models of gene expression networks from microarray data. Proc Atlantic Symp Computational Biology, Duke University.
Spirtes P, Meek C, Richardson T (1995) Causal inference in the presence of latent variables and selection bias. Proc 11th Conf Uncertainty in Artificial Intelligence (UAI), pp 499-506.
Suzuki J (1993) A construction of Bayesian networks from databases based on an MDL scheme. Proc Conf Uncertainty in Artificial Intelligence (UAI), pp 266-273.
Tabachneck-Schijf HJM, Geenen PL (2009) Preventing knowledge transfer errors: Probabilistic decision support systems through the users' eyes. International Journal of Approximate Reasoning, 50(3):461-471.
Tenenbaum JB, da Silva V, Landford JC (2000) A global framework for nonlinear dimensionality reduction. Science, 29:2319-2321.
Tian J, Pearl J (2000) Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28(1):287-313.
Tinbergen J (1937) An Econometric Approach to Business Cycle Problems. Hermann Publishers, Paris, France.
Tsamardinos I, Brown L, Aliferis C (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1):31-78.
van der Gaag LC, Tabachneck-Schijf HJM, Geenen PL (2009) Verifying monotonicity of Bayesian networks with domain experts. Intl J Approximate Reasoning, 50(3):429-436.
van der Maaten LJP, Postma EO, van den Jerik HJ (2007) Dimensionality reduction: A comparative review. Maastricht University. http://tsam-fich.wdfiles.com/local-files/apunt-es/TPAMI_Paper.pdf . Accessed February 5, 2009.
Vapnik VN (1998) Statistical Learning Theory. Wiley, New York.
Verma TS, Pearl J (1990) Equivalence and synthesis of causal models (Technical Report). Computer Science Department, UCLA.
Weiss S, Kulikowski C, Amarel S, Safir A (1978) A model-based method for computer-aided medical decision making. Artificial Intelligence, 11(2):145-172.
Witteman CL, Renooij S, Koele P (2007) Medicine in words and numbers: A cross-sectional survey comparing probability assessment scales. BMC Med Inform Decis Mak, 7:13-21.
Wright S (1921) Correlation and causation. J Agricultural Research, 20(7):557-585.
Wu X, Lucas P, Kerr S, Dijkhuizen R (2001) Learning Bayesian network topologies in realistic medical domains. Proc 2nd Intl ACM Symp Medical Data Analysis, pp 302-308.
Xiang Y, Pant B, Eisen A, Beddoes MP, Poole D (1993) Multiply sectioned Bayesian networks for neuromuscular diagnosis. Artif Intell Med, 5(4):293-314.
Yang Y, Webb GI (2002) A comparative study of discretization methods for naive-Bayes classifiers. Proc Pacific Rim Knowledge Acquisition Workshop (PKAW), pp 159-173.
Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED (2004) Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20(18):3594-3603.
Zhang J (2006) Causal inference and reasoning in causally insufficient systems. Department of Philosophy, PhD dissertation. Carnegie Mellon University.
Zhao W, Serpedin E, Dougherty ER (2006) Inferring gene regulatory networks from time series data using the minimum description length principle. Bioinformatics, 22(17):2129-2135.
Zou M, Conzen SD (2005) A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 21(1):71-79.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Shpitser, I. (2010). Disease Models, Part I: Graphical Models. In: Bui, A., Taira, R. (eds) Medical Imaging Informatics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0385-3_8
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0385-3_8
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0384-6
Online ISBN: 978-1-4419-0385-3
eBook Packages: EngineeringEngineering (R0)