Data Mining and Knowledge Discovery

, Volume 1, Issue 1, pp 79–119

Bayesian Networks for Data Mining

  • David Heckerman

DOI: 10.1023/A:1009730122752

Cite this article as:
Heckerman, D. Data Mining and Knowledge Discovery (1997) 1: 79. doi:10.1023/A:1009730122752


A Bayesian network is a graphical model that encodesprobabilistic relationships among variables of interest. When used inconjunction with statistical techniques, the graphical model hasseveral advantages for data modeling. One, because the model encodesdependencies among all variables, it readily handles situations wheresome data entries are missing. Two, a Bayesian network can be used tolearn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequencesof intervention. Three, because the model has both a causal andprobabilistic semantics, it is an ideal representation for combiningprior knowledge (which often comes in causal form) and data. Four,Bayesian statistical methods in conjunction with Bayesian networksoffer an efficient and principled approach for avoiding theoverfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarizeBayesian statistical methods for using data to improve these models.With regard to the latter task, we describe methods for learning boththe parameters and structure of a Bayesian network, includingtechniques for learning with incomplete data. In addition, we relateBayesian-network methods for learning to techniques for supervised andunsupervised learning. We illustrate the graphical-modeling approachusing a real-world case study.

Bayesian networks Bayesian statistics learning missing data classification regression clustering causal discovery 

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • David Heckerman
    • 1
  1. 1.Microsoft Research, 9SRedmond

Personalised recommendations