Disease Models, Part II: Querying & Applications

Chapter

Abstract

In the previous chapter, the mathematical formalisms that allow us to encode medical knowledge into graphical models were described. Here, we focus on how users can interact with these models (specifically, belief networks) to pose a wide range of questions and understand inferred results - an essential part of the healthcare process as patients and healthcare providers make decisions. Two general classes of queries are explored: belief updating, which computes the posterior probability of the network variables in the presence of evidence; and abductive reasoning, which identifies the most probable instantiation of network variables given some evidence. Many diagnostic, prognostic, and therapeutic questions can be represented in terms of these query Types. For models that are complex, exact inference techniques are computationally intractable; instead, approximate inference methods can be leveraged. We also briefly cover special classes of belief networks that are relevant in medicine: probabilistic relational models, which provide a compact representation of large number of propositional variables through the use of first-order logic; influence diagrams, which provide a means of selecting optimal plans given cost/preference constraints; and naïve Bayes classifiers. Importantly, the question of how to validate the accuracy of belief networks is explored through cross validation and sensitivity analysis. Finally, we explore how the intrinsic properties of a graphical model (e.g., variable selection, structure, parameters) can assist users with interacting with and understanding the results of a model through feedback. Applications of Bayesian belief networks in image processing, querying, and case-based retrieval from large imaging repositories are demonstrated.

In the previous chapter, the mathematical formalisms that allow us to encode medical knowledge into graphical models were described. Here, we focus on how users can interact with these models (specifically, belief networks) to pose a wide range of questions and understand inferred results - an essential part of the healthcare process as patients and healthcare providers make decisions. Two general classes of queries are explored: belief updating, which computes the posterior probability of the network variables in the presence of evidence; and abductive reasoning, which identifies the most probable instantiation of network variables given some evidence. Many diagnostic, prognostic, and therapeutic questions can be represented in terms of these query Types. For models that are complex, exact inference techniques are computationally intractable; instead, approximate inference methods can be leveraged. We also briefly cover special classes of belief networks that are relevant in medicine: probabilistic relational models, which provide a compact representation of large number of propositional variables through the use of first-order logic; influence diagrams, which provide a means of selecting optimal plans given cost/preference constraints; and naïve Bayes classifiers. Importantly, the question of how to validate the accuracy of belief networks is explored through cross validation and sensitivity analysis. Finally, we explore how the intrinsic properties of a graphical model (e.g., variable selection, structure, parameters) can assist users with interacting with and understanding the results of a model through feedback. Applications of Bayesian belief networks in image processing, querying, and case-based retrieval from large imaging repositories are demonstrated.

Exploring the Network: Queries and Evaluation

Inference: Answering Queries

The usefulness of a belief network (and other graphical models) lies in the ability to ask questions of the model. The output of such queries is a probability that assesses some likelihood of the states across the variables and modeled joint probability distribution, and can provide diagnostic/prognostic guidance and/or classification. Inference is the process of computing the probabilities of each variable based on evidence that has been specified. The inference process begins when the user instantiates the model by assigning one or more variables to a specific state. Dependent on the provided evidence and the nature of the query, a model can invoke methods for belief updating or abductive inference to compute the probabilities needed to provide an answer. This section describes algorithms involved in both Types of queries, and several of the issues surrounding the efficient computation of query probabilities.

Belief Updating

Belief updating involves the computation of a posterior probability for one or more variables in the network, given the instantiation of other nodes in the model (i.e., evidence). Several Types of queries are associated with belief updating, described below.

Probability of evidence. The simplest query that can be posed to a BBN is to ask for the probability of some variable, X1, being instantiated to a specific value x, as represented mathematically by the statement, P(X = x). By way of illustration, using the model in Fig. 9.1, we may be interested in knowing the probability of an individual having a hip fracture, H (P(H = true)), given without having a stroke, S (P(S = false)). Here, the set of variables E = {H, S} are considered evidence variables, and the query, P(e), is known as a probability of evidence query. Though computing the probability of a single variable instantiated in the model is useful, most queries involve instantiating multiple variables: often, we want to examine a logical combination of variables (e.g., the probability of a propositional sentence). For example, if we are interested in finding the probability of stroke or hip fracture occurring, the statement may be written as P(S = trueH = true). The answer can be computed indirectly using one of two techniques. First, the case analysis method can be used to rewrite the original statement as a combination of instantiations of the evidence variables, P(S = trueH = true) = P(S = true, H = true) + P(S = true, H = false) + P(S = false, H= true). By summing these terms, the original probability can be calculated accordingly. Alternatively, the auxiliary-node method adds an additional node, E, to the network with S and H as its parents and a conditional probability table (CPT) as follows:

S

H

E

P(E | s, h)

 

true

true

true

1

 

true

false

true

1

 

false

true

true

1

 

false

false

true

0

 
Figure 9.1

Hypothetical Bayesian belief network relating causes of stroke and hip fracture. The boxes shown per variable are called node monitors, and graphically indicate the potential values taken on by the variable, along with the current probability. In this case, the BBN shows the calculation for a posterior marginal for age, gender, and stroke given the evidence that the patient has a hip fracture; grayed-out node monitors are inactive.

With this CPT, the event, E = true, is equivalent to the statement S = true or H = true.

Posterior marginals. To see how the addition of evidence by instantiating certain variables in the model affects all of the other variables, the posterior marginal may be calculated. Given a joint probability distribution, P(X1,…,Xn), the marginal distribution is the probability over a subset of the variables, P(X1,…,Xm) where m < n. The marginal distribution can thus be viewed as a projection of the joint distributions onto a potentially smaller set of variables. Marginal distributions are also called prior distributions, as no evidence is given to affect their values. From the marginal distribution, the posterior marginal is computed by summing the entire joint probability distribution over the instantiated variables given the evidence, e:
$$P(x_1,...,x_m |e) = \sum\limits_{x_{m + 1},...,x_n } {P(x_1,...x_n |e)} $$

Continuing with the previous BBN, an example of such a computation would be to answer a query such as, what are the probable states of age, gender, and stroke given that the patient experienced a hip fracture? This query is depicted in Fig. 9.1; the boxes that visualize the probabilities for each state are called node monitors and are updated to reflect updated probabilities as the user inputs a new piece of evidence. For this query, the hip fracture variable is set to true (100%) and the remaining variables are accordingly computed. In general, the computation of posterior marginals in a belief network is considered to be NP-hard [11].

Relative likelihood queries. In some cases, we only wish to know the comparative difference between two variables given some evidence. To illustrate, consider the basic network shown in Fig. 9.2, consisting of Boolean variables: if we observe that an individual is coughing and wish to know whether the cough (C) is more likely due to emphysema (E) or asthma (A), Bayes' rule can be applied to compute the conditional probability of each explanation from the conditional probability tables:
Figure 9.2

Example belief network with conditional probability tables shown. In some queries, the need for certain probabilities can be ignored if two variables are being compared, such as in computing the relative likelihood of two causes.

$$\eqalign{ &amp\ P(E|C = T) = \frac{{P(E = T \wedge C = T)}}{{P(C = T)}} = \frac{{\sum\limits_{S,A} {P(S = s \wedge E = T) \wedge P(A = a \wedge C = T)} }}{{P(C = T)}} = 0.256 \cr&amp\ P(A|C = T) = \frac{{P(A = T \wedge C = T)}}{{P(C = T)}} = \frac{{\sum\limits_{S,E} {P(S = s \wedge A = T) \wedge P(E = e \wedge C = T)} }}{{P(C = T)}} = 0.064 \cr&amp\ P(C = T) = \sum\limits_{S,E,A} {P(S = s \wedge E = e \wedge A = a \wedge C = T) = 0.32} \cr} $$

Computing the likelihood ratio of the two conditional probabilities (i.e.,0.5750.200), the cough is much more likely due to emphysema rather than asthma by a factor of 2.8. Note that the calculation of P(C = true) is not required if only the ratio is desired.

Computing the probabilities. The most direct way to perform inference is to calculate the marginalization over non-instantiated variables. However, the number of terms involved in the marginalization exponentially grows with the number of variables. A range of efficient algorithms thus exist for answering queries involving marginals, including summing out, cutset conditioning, and variable/bucket elimination [18]. Still, in larger, more complex networks with limited resources, exact computations to answer queries may be taxing, if not computationally intractable; therefore a variety of techniques may be used to instead approximate the desired probability. This difference gives rise to exact inference vs. approximate inference algorithms. We briefly describe some key techniques in both areas; for a more detailed discussion, the reader is referred to [4, 15].

Belief propagation (BP)2 is an iterative algorithm that was originally intended for the exact computation of marginals on graphical models and polytrees [57]. The core idea is as follows: each node, X, computes a belief, BEL(x) = P(x | E) = P(x | e+, e-), where E is the observed evidence contributed by evidence from the node's parents (e+) and children (e-). Expanding the last term, BEL(x) can be determined in terms of a combination of messages from its children, λ(x) = P(e- | x), and messages from its parents, π(x) = P(x | e+), so that BEL(x) = αλ(x)π(x) where α is a normalization constant equal to (∑Xλ(X)π(X))-1. To start, the graph is first initialized such that: ∀xi ϵ E, λ(xi) = π(xi) = 1 if xi = ei and 0 otherwise; for nodes without parents, π(xi) = P(xi); and for nodes without children, λ(xi) = 1. Next, the algorithm iterates until convergence such that for each node, X:
  • If X has received all π messages from its parents, compute π(x).

  • If X has received all λ messages from its children, compute λ(x).

  • If π(x) is calculated and all λ messages are received from all children except parent node Y, compute πXY(x) and send it to Y.

  • If λ(x) is calculated and all π messages are received from all children except child node U, compute λXU(x) and send it to U.

Finally, compute BEL(x) on the final configuration of the nodes. BP can be implemented using dynamic programming methods. For the specific case of polytrees, BP provides exact inference in at most linear time relative to the diameter of the tree. The amount of computation performed per node is proportional to the size of the node's CPT. [57] modifies this approach to provide approximate inference for general networks that may contain cycles; in this situation, the algorithm is often referred to as loopy belief propagation. It remains unclear as to under what situations loopy BP will converge (though empirical evidence supports its utility). Several variants of BP have been developed, including generalized BP and Gaussian belief propagation [76]. These newer approaches focus on restricting the set of messages being passed (e.g., only passing messages that are likely to convey useful information), and can be seen in terms of approximating the graph structure via a simpler graph on which computation is more feasible.

Although a BBN permits one to compactly represent a distribution, its direct formulation is not suited for obtaining answers to arbitrary probabilistic queries. Instead, many (exact) inference algorithms compile an intermediate representation that can be used to more efficiently answer queries. A widespread construct for this purpose is the junction tree or join tree [33, 43], which also handles the problems associated with using BP on general graphs. The construction of a junction tree from a belief network can be abstracted in four steps:
  1. 1.

    An undirected graph is constructed from the BBN, termed the moral graph, wherein edges become undirected and nodes with a common child are connected.

     
  2. 2.

    Edges are added to the moral graph to triangulate the graph such that any two non-adjacent nodes on a cycle have an edge connecting them. Note that a graph can be triangulated in several ways (i.e., the solution is not necessarily unique). The choice of triangulation greatly affects the end result such that inferences on the junction tree may go from being polynomial to exponential time in some cases; and the challenge of determining the optimal triangulation for a BBN is known to be NP-hard [62].

     
  3. 3.

    The cliques are identified in the triangulated graph, along with a potential function obtained by multiplying P(X | Pa(X)) for each node X in the clique and where Pa(X) represents the parents of X.

     
  4. 4.

    From the graph constructed by the clique identification step, a minimum spanning tree can be constructed, resulting in the final junction tree.

     
Central to Steps 3 & 4 is an elimination ordering that considers each node in sequence and determines a set of immediate nodes not yet seen in order to form cliques; the choice of variable order affects the final tree. Fig. 9.3 shows an example of this process. Given this tree, BP can then be applied to compute a probability using the calculated potential functions. The standard junction tree process is structure-based, and the size of the final structure is dependent only on the network topology. In practice, if the network topology is loosely connected, then junction tree algorithms work well; but when a network is densely connected, then this framework is less optimal. This observation has triggered research for alternative methods that can exploit local structure as well as network topology; for instance, exact inference using arithmetic circuits has been developed, taking advantage of local regularities within a BBN [14].
Figure 9.3

Transformation of a directed graph into a junction tree. (a) The original belief network. (b) Edge directions are removed and edges between nodes sharing children are created, establishing the moral graph (bold line); the graph is then triangulated as needed. In this case, the moral graph is already triangulated. (c) An elimination ordering of the variables is determined, and each node is considered sequentially to create cliques. In the first step, node f is examined, resulting in a node bef (d) The cliques are arranged in a graph, and a minimum spanning tree is determined using edge weights based on common variables. The final junction tree and labeled edges are shown with bold lines.

In addition to loopy BP, two other methods exist that perform approximate inference: sampling methods and variational methods; the former set of approaches is described here. In general, sampling methods operate on the premise that samples can be taken of a probability of a variable being assigned a specific state. The basic operation involves sampling each variable in topological order according to the conditional probability over its parents. If we represent P(X1,…,Xn) as a BBN, the model can be sampled according to its structure by writing the distribution using the chain rule and sampling each variable given its parents. This process is called forward sampling (also known as direct Monte Carlo sampling). For each root node X, with probabilities P(X = xi), a random number r is drawn uniformly from the interval [0, 1]. To illustrate how forward sampling works, we refer to the example BBN in Fig. 9.2. We first sample the value of the variable smoking where P(smoking) = <0.2, 0.8> and assume that we obtained the result smoking = true. We then sample the value of the variable emphysema. As smoking = true, we are limited to using the corresponding conditional probability: P(emphysema | smoking = true) = <0.8, 0.2>. Let us next assume that the sample returns emphysema = false. We then proceed to sample the variable asthma using P(asthma | smoking = true) = <0.8, 0.2>. Again, let us assume that the sample returns asthma = true. We finally sample the value of the variable cough using the conditional probability P(cough | emphysema = false, asthma = true) and obtain cough = true. Through this first iteration, we thus obtain the event <smoking, emphysema, asthma, cough> = <true, false, true, true>. If we perform this process over multiple iterations while keeping track of how many times a specific combination of states occur, then the sampled population approaches the true joint probability distribution. Sampling a complete joint probability distribution from a BBN is linear in the number of variables regardless of the structure of the network. In this example, however, the marginals are not computed; two approaches address this requirement: rejection sampling and likelihood weighting. In rejection sampling, samples that are randomly drawn but do not agree with the specified evidence (i.e., eixi) are thrown out. The problem with this approach is that many samples are potentially rejected, resulting in a largely inefficient process. Likelihood weighting addresses this pitfall by fixing the evidence variables and sampling only the remaining variables. To avoid biasing the sampling process, each sample is associated with a weight that expresses the probability that the sample could have been produced if the evidence variables are not fixed. Weights initially have the value of 1, but with each iteration in which the evidence variable is assigned a state, this probability of assignment is multiplied with the existing weight of the sample.

Rather than independently creating each sample as done in forward sampling, suppose we generate new samples by altering the previous one. To achieve this, we use a general class of methods called Markov Chain Monte Carlo (MCMC) sampling [50]. MCMC sampling is based on the premise that if all neighbors in the Bayesian network of Xi have assignments, their values must be accounted for before sampling Xi. The idea is based on the property of Markov chains, which are sequences of discrete random variables such that knowing the present state makes past and future states independent of one another: subsequent states are generated by sampling a value from one of the non-evidence variables after instantiating the variables in its Markov blanket using their current states. In order for a Markov chain to be useful for sampling from P(x), we require for any starting state X0, that limt→∞Pt(x) = P(x), and the stationary distribution of the Markov chain must be P(x). Given these constraints, we can start at an arbitrary state and use the Markov chain to do a random walk over a specified number of iterations, and the resulting state will be sampled from P(x). One popular sampler implementing this process is Gibbs sampling [22]. The process of Gibbs sampling can be understood as a random walk in the space of all instantiations, e, and can be used when the joint distribution is not known explicitly, but the conditional distribution of each variable is known - a situation well-suited for BBNs. To illustrate using Fig. 9.2, Gibbs sampling may be used to estimate the posterior probability, P(asthma | emphysema = true, cough = true). Given that emphysema and cough are set to true, the Gibbs sampler draws samples from P(asthma, smoking | emphysema = true, cough = true) and proceeds as follows. In the initialization stage, say we arbitrarily instantiate asthma = true, smoking = true as our X0. Then, for each iteration (t = 1, 2,…) we pick a variable to update from {asthma, smoking} uniformly at random. If asthma is picked, sample asthma from P(asthma | smoking = st-1, emphysema = true, cough = true) and set Xt = (asthma = at, smoking = st-1), where st-1 represents the value for smoking from the previous iteration, and at is the value of asthma for the current iteration. If smoking is picked, then perform a similar computation as in the case of asthma, but instead, sample smoking from P(smoking | asthma = at-1, emphysema = true, cough = true), where at-1 is the value for asthma from the previous iteration. The sequence of samples being drawn by relying on the immediate prior is a Markov chain. This process can be further simplified by computing the distribution on Xi that is only part of the Markov blanket of Xi. Gibbs sampling is a one instance of a broader class of methods known as Metropolis-Hastings algorithms. The reader is referred to [50] for additional discussion.

Lastly, we briefly mention here the inference issues with respect dynamic Bayesian networks (DBNs). For DBNs with a minimal number of time slices, the DBN can be recast as a hidden Markov model and exact inference methods applied via unrolling. For larger DBNs, where such techniques are computationally intractable, approximate inference is applied as described above. Key work in this area includes the Boyen-Koller algorithm and its variants [5, 6, 49]; particle filtering, which use sampling methods [38]; and more recently, a hybrid approach called factored sampling [52].

Abductive Reasoning

Unlike the previous class of queries that computes the probabilities of variables in the presence of given evidence, abductive inference identifies the most probable instantiation of network variables given some evidence. Abductive inference, also sometimes referred to as inference to the best explanation, is a common Type of query asked by physicians in clinical practice: for instance, given the symptoms presented, what is the most likely diagnosis; or given the diagnosis, what is the most likely state of the patient? There are two Types of abductive inference: most probable explanation and maximum a posteriori.

Most probable explanation queries. The objective of a most probable explanation (MPE) query is to identify the most likely instantiation of the entire network (i.e., the state of all evidence variables) given some evidence [57]. If {X1,…,Xn} are network variables, and e represents the set of available evidence, the goal of MPE is to find the specific network instantiation, x = {x1,…,xn}, for which the probability of P(x1,…,xn | e) is maximized. More concisely, MPE queries solve: argmaxxP(x | e) = argmaxxP(x, e). Consider the following query, again based on Fig. 9.1: given that the patient is a 65-yearold male and has had a stroke, but has a normal x-ray, what is the most likely state of the other variables in the network (gait analysis, DXA scan, hemiosteoporotic, hip fracture, and fall)? There lies a certain subtlety to an MPE calculation, as it cannot be obtained directly from individual conditional probabilities: if {x1,…,xn} are chosen to maximize each P(xi | e) rather than the global problem, then the choice of xi is not necessarily the most probable explanation. Also, given the nature of an MPE query, the result may not be unique: there may in fact by several configurations of the network's variables that result in the same maximal probability. For the special case of a hidden Markov model (HMM; see Chapter 8), the MPE problem is solved by Viterbi decoding, where the most likely sequence of states is determined. However, in general, one can see that the search space for MPE is potentially enormous. As such, while an exhaustive set of permutations can be examined for smaller networks, most MPE algorithms employ approximate inference methods and can be divided between stochastic sampling methods and search techniques. In particular, the latter category include best-first search, AND/OR search [19], and genetic algorithms [46]. The efficiency of MPE algorithms is considered in terms of a treewidth metric that measures the number of graph nodes mapped onto a tree node in the decomposition.

Maximuma posterioriqueries. Unlike MPE queries, a more general Type of query that attempts to find the most probable instantiation for a subset of network variables is called a maximum a posteriori (MAP) query. MPE is hence a specific instance of a MAP query where the subset is defined as the entire set of evidence variables in the network. Let M represent some subset of variables in the belief network, and e some given evidence; the objective of a MAP query is to find an instantiation of m such that P(m | e) is maximized. MAP queries are an optimization problem, with the resulting probability as the objective function that one tries to maximize. As such, the MAP problem can be stated as: argmaxmP(m | e) = argmaxmZP(m, z | e) where Z is equal to the set of variables remaining once evidence and the query variables in M are removed from X (i.e.,Z = X - E - M). From Fig. 9.1, one may ask the following: what is the most likely state for hip fracture and stroke given that the patient is female and that she fell? Note that this query does not attempt to provide information on gait analysis, x-ray, DXA scan, age, or the hemiosteoporotic states.

Variable elimination can be used to solve a MAP query by marginalizing non-MAP variables, thereby simplifying the problem to a MPE query. The key is to decide on an elimination order of the variables such that the MAP variables, M, are marginalized last. The process is summarized by the following equation: ∑X1X2…∑XmjθXj|Pa(Xj). Intuitively, this equation states that the probability of the query variables M is computed by implicitly constructing the joint probability distribution induced by the Bayesian network and summing over each non-query variable. Variable elimination utilizes the notion of factors, which enable variables to be summed out while keeping the original distribution. The use of factors helps to overcome the exponential complexity seen with the brute-force method of simply summing out variables. Factors are tables that contain two components: an instantiation and a number. The instantiation is an assignment of values to variables; the number represents the probability of the corresponding instantiation. Two operations can be performed on factors: multiplication and summing out. Multiplication can be likened to a natural join (Cartesian product) on two database tables: the set of variables in the product of two factors is the union of the sets of variables in the operands. Summing out is the same as the process of marginalization (see Chapter 8). Variable elimination commences with each factor represented as a CPT in the model. To compute the marginal over M, the algorithm iterates over each

variable Xi in the given elimination order. Next, every factor fi that mentions variable Xi is multiplied together to generate a new factor, f. We then proceed to sum out variable Xi from f and replace factors fi by factor ∑Xif. After going through each variable in the elimination order, only the set of factors over variables M will remain. Multiplying these factors produces the answer to the desired query, P(M).

To ground this discussion, we refer back to the example of the osteoporosis BBN illustrated in Fig. 8.4 . Assume that we are interested in finding the probability that a patient is at risk of getting a fracture and are given a predefined elimination order of {renal disease (R), DXA finding (D), age (A), kidney function (K), activity level (L), hormone usage (H), osteoporosis (O)}. While determining the optimal elimination order is outside the scope of this chapter, the reader may refer to [15] for additional discussion on the topic. The MAP query is written mathematically as:
$$P(F) = \sum\limits_{O,H,L,K,A,D,R} {\theta _{F|O,H} \theta _{F|K,L} \theta _{H|A} \theta _{L|A} \theta _A \theta _{D|O} \theta _{K|R} \theta _R } $$
The process starts by eliminating the first variable in our elimination order, R. Writing out the operation, ∑RθK| RθR, we see that two terms mention R and involve variable K. We then compute the product of each value of K and summarize the result as factor, f1, which can in turn be substituted into the summation to remove R from the network:
$$P(F) = \sum\limits_{O,H,L,K,A,D} {\theta _{F|O,H} \theta _{F|K,L} \theta _{H|A} \theta _{L|A} \theta _A \theta _{D|O} f_1 (K)} $$

The next variable for elimination is D. From the equation, we find that only one term involves D, and it also involves O. So for each value of O, we compute the sum over D of P(D | O). However, if we fix O and sum over D, the probabilities need to add up to 1, and therefore D can be removed from the network without adding a new factor to the expression. The process of identifying the next elimination variable, multiplying factors, and summing over variables continues for all variables in the elimination order. We then multiply the remaining factors together, resulting in the exact answer for P(F). The prior marginal is a special case of the posterior marginal query where the evidence set is empty. To compute the posterior marginal, a similar process is followed but prior to eliminating variables, rows in the joint probability distribution that are inconsistent with the evidence are zeroed out.

In examining the complexity of variable elimination, the algorithm runs in time exponential in the number of variables involved in the largest factor. The elimination order is critical because a bad order potentially can generate large factors; finding the best elimination order is itself an NP-hard problem. Computationally, MPE queries are easier to compute than MAP: the decision problem for MPE is NP-complete, whereas

the corresponding MAP problem is considered NPPP-complete [54]. Because of this intractability, most software implementations answering MAP queries provide only an approximate answer. A variety of approaches have been explored for approximate MAP inference, including genetic algorithms [16], MCMC with simulated annealing [77], and hill climbing [55] to name a few. More recently, exact methods employing search have been developed [30, 56].

Inference on Relational Models

Standard probabilistic models are said to be propositional, not permitting quantification over an object. In some domains, this limitation results in an unwieldy number of statements that must be explicitly made to represent an instantiated dataset, especially when dealing with similar entities that may arise in slightly different configurations. For example, consider the problem of trying to correlate radiographic imaging features, gene expression, and end outcomes in brain tumor patients. Fig. 9.4 presents a portion of a hypothetical relational schema that links these elements of data together. Though each Type/grade of tumor (e.g., astrocytoma, glioblastoma multiforme, etc.) presents different gene expressions, appearances on imaging, and responds to different chemotherapies, there is some commonality. Capturing such variation is relatively straightforward in a relational model and can be expressed as tables within a database. Imagine, however, that a BBN is to be created from the same entities and attributes: the number of variables needed to express all of the variations will increase dramatically, thereby creating an overly complex network.
Figure 9.4

Translation of a relational schema into a BBN via a probabilistic relational model. (a) An entity-relational (ER) model showing a part of a relational schema in M2 notation (see Chapter 7). Standard ER relationships are shown with solid arrows; connectors shown with dashed arrows between the attributes represent BBN linkages. (b) A database instantiation of the schema. (c) The resultant generated BBN.

Thus, efforts to augment probabilistic models with quantifiable operators have led to the development of frameworks that take advantage of relational and/or first-order probabilistic constructs to extend graphical models. [17] provides a recent survey of the efforts to link BBNs with first-order logic, including a discussion of relational Bayesian networks [32] and probabilistic relational models (PRMs) [23]; we use the latter here as an example. A PRM consists of two parts: a relational component that describes the domain schema; and a probabilistic component modeling the distribution of the attributes of the entities in the domain. From the PRM specification, a Bayesian belief network can be compiled. One of the simplest advantages of PRMs over propositional models is that of compactness: a large number of propositional variables and models can result from instantiating the same piece of “generic knowledge” over a large domain of objects. The succinctness of relational models can provide a more intuitive representation. Furthermore, statistical techniques have been developed to learn PRMs directly from a database. Building from plate models and PRMs, a probabilistic entity-relation (PER) model has been described [26].

Inference in PRMs and other relational models can be categorized twofold: 1) approaches that transform the relational model into a propositional graphical model, permitting the inference algorithms discussed previously to be used; and 2) approaches developing a new set of algorithms that operate directly on the relational representation. The former in effect constructs the BBN associated with the PRM. [37] remarks that in some cases, the use of PRMs can actually aid in the inference process, given: that unlike standard BBNs, the relational model encapsulates influence by grouping influencing attributes together within the same object; and that the relational model lends itself to reuse in terms of multiple objects being of the same entity (and thus the same inference methods can be used). Both factors can be exploited within an inference algorithm to speed computations. Still, efficient reasoning and inference is a major challenge for PRMs. Systems have been demonstrated for exact inference across relational models, Primula being a prime example [9]. Lifted inference methods also provide another approach to computations on PRMs [58].

Diagnostic, Prognostic, and Therapeutic Questions

As demonstrated by the various queries described thus far, many Types of questions familiar to clinical care can be answered via a disease model represented by a BBN. Four categories of BBN querying have been suggested and are useful to bear in mind in the context of medicine and decision-making:
  1. 1.

    Diagnostic/evidential. The first category employs bottom-up reasoning to deduce a cause when an effect is observed. For instance, a patient presents with a symptom (e.g., cough) and the physician is trying to find the most likely cause (e.g., bronchitis, asthma). Abductive inference can be seen to fall into this category.

     
  2. 2.

    Causal. In contrast to diagnostic queries, this second category involves top-down reasoning to determine, given a known cause, the probabilities of different effects. In essence, a prognostic query is posited. For example, given the flu, what is the chance of experiencing a headache? Or given an intervention or drug, what is the likely outcome for a given patient? (e.g., if we give the patient a bronchodilator, will the coughing go away?). Belief updating and causal inference with counterfactuals (see Chapter 8) comprise this group of queries.

     
  3. 3.

    Explaining away. Sometimes referred to as intercausal queries, explaining away is a common reasoning pattern that looks to contrast causes with a common effect, often deducing one cause being the reason for an event (as opposed to another cause) given some evidence. For example, two diseases may be suspected; however, given evidence of some symptom, the probability of one cause increases while the other is lowered.

     
  4. 4.

    Mixed. Lastly, one can consider queries that combine any of the above three techniques into a single inquiry.

     
Influence diagrams. A subset of clinical decisions often involves the selection of a treatment plan for an individual or course of action to optimize some criteria. Unto themselves, BBNs do not provide these answers directly, providing only tools for reasoning under uncertainty; instead, an important class of models known as influence diagrams aids in decision making in uncertain situations. Rudimentarily, a decision is aimed at selecting a strategy that maximizes the chance of a desired outcome occurring given our knowledge of the domain (as represented by a model). Originally framed as a compact alternative to decision trees, influence diagrams permit different configurations of this model and potential choices to be considered in terms of quantifiable values supplied via a utility function, U(a), where a represents an action. The aim, therefore, is to identify the configuration and actions that maximize the utility functions that solve argmaxAU(x,a)P(x | e). Influence diagrams consist of nodes and edges as their graphical model counterparts, but reclassify the nodes into three Types:
  1. 1.

    Chance nodes. Chancenodes are random variables, similar to the evidence variables in a BBN. Like evidence variable nodes, CPTs are associated with chance nodes. Chance nodes can have both decision and other chance nodes as parents.

     
  2. 2.

    Decision nodes. Decision nodes represent those points in the state/process where a choice of actions can be made; the result of a decision is to influence the state of some other variable (e.g., a chance node). An influence diagram will have one or more decision nodes.

     
  3. 3.

    Utility nodes. Utility nodes are a measure of the overall “outcome” state, with the goal of optimizing the utility (i.e., maximizing) based on the contributing chance, decision, and causal factors. Utility nodes may not have children in the graph.

     
Additionally, some Types of influence diagrams include deterministic nodes, defined as nodes with constant values or algebraically calculated from parent nodes' states - once the parent nodes are known, the child node's state is definitively assigned. Fig. 9.5 shows an example of a simple influence diagram, where the decision points involve the use of COX-2 inhibitors to relieve knee pain due to inflammation, or the use of MR imaging to further diagnose a problem before doing endoscopic surgery. It is important to note the implications of influence diagrams with respect to evidence-based medicine (EBM). An underlying principle of EBM is that decisions take into consideration an individual's preferences (e.g., with respect to diagnostic and treatment options): by fixing the selection within a decision node, an influence diagram can view a patient's preferences as an explicit constraint within the optimization problem. The utility node can be seen as being related to a patient's quality of life (e.g., for decisions involving substantial risk, quality-adjusted life years, QALY) in addition to considering cost and other factors. [51] gives additional examples on the use of influence diagrams in medicine.
Figure 9.5

Example of an influence diagram. Chance nodes are drawn as ovals, decision nodes are rectangles, and utility nodes are illustrated as diamonds.

A basic algorithm for querying the influence diagram instantiates the entire network based on the given constraints/evidence; each possible decision is analyzed, examining the output of the utility nodes. The decision that maximizes the utility node is deemed the best decision and returned. For influence diagrams with only a single decision node, selection of the decision that maximizes the utility node is straightforward; however, the challenge is more profound when multiple decision nodes exist and/or require explicit sequential modeling of actions (i.e., action X before action Y) - resulting in large search spaces. [10] thus shows how influence diagrams can be transformed into a belief network, incorporating decision theory. The method effectively translates all of the nodes in an influence diagram into chance nodes, making the structure equivalent to a BBN: CPTs are assigned to decision nodes with an equal chance distribution; and utility nodes are changed into binary variables with probabilities proportional to the node's parents' utility functions. From this transformed BBN, the inference algorithms described prior can be applied to select decision nodes' states based on the desired utility (e.g., as MPE/MAP queries). For further discussion of decision making theory and influence diagrams, the reader is referred to [63].

Evaluating BBNs

Inference results are only useful if the underlying BBN is capable of representing the real world. The question then naturally occurs as to how to assess the ability of a belief network to provide true answers; this issue is perhaps particularly significant given the use of approximate inference techniques. BBN verification can be performed with respect to different criteria. We touch upon two strategies: examining predictive power, where the BBN's diagnostic/prognostic capabilities are compared against known outcomes; and sensitivity analysis, which aims to determine what aspects of the model have the greatest impact on the probabilities of query variables (and therefore must be accurate).

Predictive Power

In healthcare applications, classic BBN evaluation compares the predictions of the model vs. known outcomes (or expert judgment). A test set of cases is compiled and used as a benchmark for ground truth; precision and accuracy metrics are often reported. For instance, as an aid to classification or as a diagnostic tool, a BBN can be given partial evidence per test case and asked to infer the remaining values (or a subset of values, as per a MAP query); the BBN result is then compared to the true value stated in the case. A confusion matrix can then be composed to identify the rate of (in)correct classifications (see Chapter 10). [68] also details a method for estimating the variance associated with each query result, in effect determining “error bars” for each point estimate. Though a BBN is capable of answering a gamut of queries, given the size of some belief networks it is untenable to test all variables against all combinations of partial evidence in a test set. Rather, there is usually a specific purpose in its construction, and the queries that the BBN is designed primarily to answer are evaluated.

An automated Monte Carlo-based technique is described in [59] to discover those portions of the model responsible for the majority of predictive errors; and to identify changes that will improve model performance. Briefly, the algorithm consist of three steps: 1) labeling each node as one of three categories (observations, such as labs or image findings; phenomenon, such as the underlying etiology of a tumor; and ancillary, providing additional clarity or convenience in the model); 2) selecting a subset of phenomenon nodes and explicitly setting the state, and using a Monte Carlo simulation to determine the state of observation nodes; and 3) computing the posterior probability for all phenomenon nodes, given the states of the observation nodes. The second step in this algorithm uses normal Bayesian inference techniques to calculate the posterior distribution of a node, with Monte Carlo sampling of the posterior distribution to assign the node's state.

Depending on how a BBN is constructed, the test set must be carefully specified to avoid bias and overfitting. For example, if the network topology and the CPTs are both derived from experts (i.e., the structure and its parameters are not learned), then any reasonably derived test data can be used (assuming it is representative of the population against which the BBN is targeted). If either (or both) the structure and probabilities are learned, then the training data and test set must be separated. For instance, an n-fold cross-validation study is reasonable if sufficient data is available: the dataset is randomly partitioned into n groups; training is performed using n-1 groups, with the resultant model tested on the remaining group; and this train-test process is then repeated a total of n times until each group has been used once for evaluation. Unfortunately, as with any framework using training data, overfitting of the model can still be a concern when the amount of training data set is small or when the number of parameters in the model is large. As such, a 10-90 test can be used to examine model stability: reversing a 10-fold cross validation pattern, 10% of the test set is instead used to train and 90% of the data is used to test in each iteration. In theory, a well-formulated model will provide consistent results per iteration of a 10-90 test; the results can also be compared to the tenfold cross-validation to ascertain if overfitting has occurred in the latter. Ultimately, the most convincing evaluation is one that uses a holdout set, wherein a portion of the dataset is withheld from training (and testing, in a cross-validation study) so that unbiased performance metrics are computed on a “clean” set of data. Markedly, a common complaint about BBNs is that the trained probabilities (and to a lesser extent, structure) are developed and assessed relative to a single environment, and thus subject to local operational bias: simply exporting a BBN from one locale to another often fails to achieve the same degree of performance. Hence, if the holdout set is instead obtained from an outside source (e.g., a published national database or public data repository), then evaluation bias can potentially be overcome.

Comparison to other models. Belief networks are only one means of providing classification and/or diagnostic/prognostic insights - a range of statistical and probabilistic methods also exist. As such, it is worthwhile to evaluate a BBN's performance relative to these other techniques. For example, if the primary application of the BBN is to answer causal or predictive queries, then a comparison against a decision tree or a logistic regression model may be appropriate (see Chapter 10); and a discussion of current methods for predictive modeling is given in [3].

Sensitivity Analysis

In broad terms, sensitivity analysis is concerned with variations in the model or inputs to a BBN impacts the quality of decision making [2, 13, 42]. [35] discerns between evidence sensitivity analysis, in which the sensitivity of results is examined in light of variations in evidence; and parametric sensitivity analysis, which looks at how changes in model parameters (i.e., the CPTs) affect query results. Both capabilities are instrumental in giving users a handle on the models they build, and can be critical in model validation and debugging. For example, through sensitivity analysis, we can evaluate false positive/negative rates of a diagnostic test on the quality of decision making; as a corollary, it is possible to search for the appropriate false positive and negative rates that would be necessary to confirm a hypothesis at a certain level of confidence. We can assess the utility of information through sensitivity analysis, allowing a user to decide what additional evidence is needed in order to gain useful insights. Given the scope of this text, we limit our discussion to a description of parametric sensitivity analysis. Formally, parametric sensitivity analysis is concerned with three Types of questions:
  1. 1.

    What guarantees can be made on the sensitivity/robustness of a query, q, to changes in parameter values, θ1,…,θn?

     
  2. 2.

    What are the necessary and sufficient changes to parameters θ1,…,θn that would enforce some integrity constraints, q1,…,qm, assuming that these integrity constraints are violated by the current model?

     
  3. 3.

    What guarantees can be offered on the sensitivity/robustness of some decision, d, to changes in parameter values θ1,…,θn, where the decision is computed as a function of some probabilities?

     

As shown by these questions, sensitivity analysis can elucidate the stability of a BBN relative to specific inquiries. In single parameter sensitivity analysis, the influence of one parameter within a query is examined by fixing all other parameters and “perturbing” the selected parameter in the network. Sensitivity analysis tools permit inspection of how secondary evidence variables change in response to alterations in the parameter. Single parameter sensitivity analysis can be used as a Type of query to BBNs: by specifying a single constraint or condition on a conditional probability, it is possible to derive what other network variables must be changed to satisfy this constraint (e.g., given that we want to diagnose with 99.5% certainty, what other tests would need to be performed, and/or what probabilities must be changed?). Additionally, this Type of sensitivity analysis can be used in conjunction with model building tasks to identify those variables that have a high degree of influence over results (and therefore, the CPTs must be as accurate as possible) [12]. Multi-parameter sensitivity analysis perturbs n-pair combinations of variables simultaneously, either within the same CPT, or across different CPTs. This technique is much more computationally expensive - but it is able to estimate the training error for the associated statistics and calculate a generalization error for the entire network [8]. Software packages such as Hugin, Netica, and SamIam provide graphical tools for conducting sensitivity analysis.

Using the model depicted in Fig. 9.1, let us consider an example of how sensitivity analysis helps us determine what improvements are needed to existing tests. In the model, we are interested in determining whether the combination of x-ray and DXA scan is capable of accurately diagnosing whether the patient is hemiosteoporotic. When performing sensitivity analysis, the question to be posed to the model is which network parameters can we change, and by how much, to ensure that the probability of the patient being hemiosteoporotic is above 95% given that the patient has positive x-ray and DXA scan tests? Currently, the model states that the specificity of the having a DXA scan is 66% and the specificity of the x-ray is 64%. If we run a sensitivity analysis on the model with the variables x-ray and DXA scan instantiated to abnormal, we would see that the results return three possible changes that each satisfy the constraint P(hemiosteoporotic = true) ≥ 0.95:
  1. 1.

    If the true negative rate for the DXA scan was 92% instead of 66%.

     
  2. 2.

    If the true negative rate for the x-ray scan was 91% rather than 64%.

     
  3. 3.

    If the probability of being hemiosteoporotic given that the patient did not have a stroke was greater than or equal to 0.768 rather than 0.27.

     

As making the third change would not be feasible, we could act on one of the first two suggestions by investing in a better DXA or x-ray scan. If we are willing to compromise on the constraint and be satisfied with P(hemiosteoporotic = true) ≥ 0.90, then we can find tests that achieve true negative rates of 83% (DXA scan) or 82% (x-ray) instead. The same approach can also be used to determine what changes are necessary to make the model fit the beliefs of a domain expert. For instance, if an expert believes that the probability of a hip fracture given that the patient has fallen after having a stroke is greater than the result that the model returns, we can identify which variables (e.g., age, gender, hip fracture) need to be modified such that his beliefs holds true.

Interacting with Medical BBNs/Disease Models

The focus of the prior sections has been on the underlying concepts and algorithms that permit inference and other computational analyses on BBNs. We now turn to the secondary issue of interacting with the belief network, enabling a user to specify the queries and explore the model. Today's BBN graphical user interfaces (GUIs) typically employ the directed acyclic graph (DAG) as a pictorial representation upon which queries are made and results presented. Visual cues and animation (e.g., highlighting nodes, changing colors, motion) are used to denote key structures and altered values in response to queries [7, 25, 29, 78]. While providing sophisticated querying, two problems arise: 1) as the complexity of the BBN grows, understanding the nuances of variable interaction is difficult at best; and 2) the variables are visually abstracted and thus lose contextual meaning - a concern for clinically-oriented BBNs when interpreting a patient's data.

In general, the challenges arising in interacting with larger BBNs handle two areas: 1) methods for building and exploring the graphical structure, along with the model's parameters (i.e., the CPTs); and 2) methods for posing queries and seeing the resultant response.

Defining and Exploring Structure

The most obvious difficulty with BBN visualization lies in the organization of a large number of variables in a constrained amount of space. An array of methods has been developed for graph visualization, including: various geometric layouts (e.g., radial, 3D navigation); hyperbolic trees; and distortion techniques (e.g., fisheye views). An overview of these approaches is given in Chapter 4. Here, we highlight some challenges specific to BBNs. One issue in BBN visualization is the depiction of the linkages between nodes, emphasizing those variables that are clustered together through a high degree of connectivity; and variables that in particular are dependent on a large number of parents, or conversely, serve as a parent to a large number of other dependent variables (i.e., the number of incoming edges, the in-degree; and the number of outgoing edges, the out-degree). A common graphical method of emphasizing the importance of such nodes is through size: larger nodes represent a higher number of connections (Fig. 6.9a). However, not all relationships are of equal importance: therefore, some systems render graph edges using line thickness in proportion to the strength of the relationship between the two variables (i.e., a thicker line indicates a stronger link; Fig. 9.6b). Object-oriented paradigms can also be applied to present related entities together, subsuming related variables into a single visual representation (e.g., a super-node); or to collapse chains of variable into one edge (Fig. 9.6c). The causal semantics between variables have also been visualized using animation [34]. [70] also considers the problem of navigating the conditional probability tables: as the number of possible states and dependencies grows, the depiction of the CPT itself can outgrow the available visual space, thus requiring scrolling or other means to change focal points. Methods including the use of a treetable widget for hierarchical presentation of CPTs, and the dynamic hiding/reveal of parent/child relationships within a CPT are discussed.
Figure 9.6

Different methods for belief network visualization. (a) To emphasize the in- and/or out-degree of a given node (and hence its connectivity), the node size can be varied. In this example, stroke is deemed important, and so is rendered as the largest node. (b) The strength of a relationship (e.g., based on sensitivity analysis or conditional probabilities, for instance) is often depicted via line thickness. (c) Grouping of node clusters or the collapsing of variable chains into edges can help to compact space.

Expressing Queries and Viewing Results

Query formulation broadly consists of two steps: 1) stating the Type of query that is desired (e.g., MAP), if not the variables of interest; and 2) specifying the evidence available as part of the query (i.e., the query constraints). Most GUIs provide a means of choosing variables by directly selecting and highlighting nodes from the DAG, with options to invoke the corresponding Type of inference. As mentioned earlier, node monitors provide a direct view of a variable's state, graphically depicting associated probabilities. Node monitors can be made interactive, permitting a user to directly manipulate the values to set evidence: numeric scroll wheels, sliding bars, and probabilitywheels are used to elicit probabilities across a variable's different states (Fig. 9.7). Rather than use the graph, form-based and checklist querying approaches have been explored as front-ends to BBNs [71], but arguably become untenable given a large number of variables.
Figure 9.7

(a) A node monitor. For a given variable, the different states the node can take are shown alongside a probability. Sliding bars are often used to help provide a sense of visual distribution, and can be made interactive to set specific levels of evidence. (b) A probability wheel can be manipulated by users to specify a value for the likelihood of an event. The idea is that based on some question of an event’s occurrence, the wheel’s wedges are proportioned accordingly.

Once an inference result is computed, the probabilities across the network are updated and displayed to the user, who looks to see to what extent a variable may change and/or the end state of some node. Although results can be tabulated into a separate table, node monitors are commonly employed for displaying the information directly. However, the use of node monitors can prove problematic: the user must actively search for changes in values, so subtle differences between states or variables can be lost in reviewing results. Furthermore, given limited space to show information, the overuse of node monitors leads to a cluttered interface. To ameliorate the identification of changes, visual changes are often made to the underlying DAG rendering: nodes are colored to indicate the degree of change (e.g., shades of red and blue are used to indicate positive/negative changes in values, with the intensity of the color proportional to the magnitude of the change); transparency/opacity is changed; or highlights are added to nodes to indicate updated statuses. Similarly, edges can be colored or adjusted based on changes in the conditional probability tables.

In the healthcare domain, an alternative strategy is to create problem- or disease-specific applications that tailor the visualization and querying capabilities to a target domain. The literature contains many examples of such problem-specific interfaces (e.g., diabetes, oncology) and there is evidence that such problem-oriented data visualization can enhance the cognitive processes of physicians (see Chapter 4). Unfortunately, many of these interfaces sacrifice flexibility: the displays restrict the Types of queries that can be posed by the user, limiting discovery and “what if” questions. A new class of visualizations has been developed to make interacting with probabilistic disease models more intuitive by providing tools to pose queries visually. One such system is TraumaSCAN [53], in which the user interacts with a 3D model of the body to place entry and exit wounds for injuries from gunshots; then, in combination with inputted patient findings, the system performs reasoning on a Bayesian model to predict the most probable symptoms and conditions arising from the specified injuries. However, many of these querying interfaces have been developed for specific diseases; they do not address the long standing problem of creating GUIs that can effectively support the broad spectrum of physician activities (e.g., reviewing a patient for the first time, diagnosis and treatment planning, a follow-up visit, etc.). Part of the difficulty lies in working with the diversity and amount of information that is pertinent to a given clinical task: not all data is of equal importance at any given time, and must be selected and presented appropriately based on the task at hand. The Vista [27] and Lumiere [28] projects are examples of using BBNs to automatically select the most valuable information or program functionality: an influence diagram is used to model the user's background, goals, and competency in working with the software.

Explaining results. One application for which interaction with BBNs has been explored is to explain a recommendation to a user as part of a medical decision-making tool [67]. Explanations are useful: for determining what configuration of unobserved variables provides the most probable outcome for a target variable; for eliciting what information is contained in a model; and for understanding the reasons for a model's inference results. A review of explanation methods can be found in [41].

The majority of approaches to conveying explanations have centered on the use of verbal or multimedia methods. For instance, [21] translates qualitative and quantitative information of a BBN into linguistic expressions. Probability values are mapped to semantic expressions of uncertainty; for example, the range 0.25-0.4 is mapped to the adjective “fairly unlikely” and the adverb “fairly rarely.” These adjectives are then used in combination with the structure of the network to generate meaningful statements. To illustrate, given the model depicted in Fig. 9.2, one statement would be, “Smoking commonly causes emphysema.” Visual cues have been used: [45] utilizes color coding and line thickness to support explanations in terms of weight of evidence and evidence flows. One system that combines both graphical and verbal approaches to explaining inference results is Banter [24]. This system allows the user to enter a scenario by specifying known values for history and physical findings for a disease of interest using standard node monitors in a GUI. Based on this data, the system uses the BBN to assess which tests best determine whether the patient has the identified disease. Explanations are provided in natural language using two methods: identifying the evidence that have the greatest impact on a target variable using mutual information; and describing the path that maximizes the overall impact of evidence variables to the target variable. Alternatively, [69] describes the use of a three-tier system to address the inability of traditional BBNs to provide updated prognostic expectations given new data during the healthcare process: 1) the first tier is a BBN composed of a collection of local supervised learning models, which are recursively learned from the data; 2) the second tier is a task layer that translates the user's clinical information needs to a query for the network; and 3) the third tier is a presentation layer that aggregates the results of the inferences and presents them to the user using a bar graph representation. The novelty of this method allows users to pose new queries at each stage of patient care (e.g., pre-treatment, treatment, post-treatment), having the model explain the changes in the target variable based on updated information at each point.

Research has also been done to utilize the network topology to aid in the generation of explanations. [75] exploits Markov blankets to identify a subset of variables that result in a concise explanation of a target variable's behavior. This approach first restructures the BBN so that the target variable has its Markov nodes as its parents. Next, the target node's conditional probability tables are converted into decision trees. Explanations are finally derived by traversing the decision trees.

Discussion and Applications

As seen here and in Chapter 8, disease models provide a method of extracting the scientific knowledge encoded within routine clinical observations and applying this knowledge to inform decisions related to diagnosis, prognosis, and treatment. Such models, represented as Bayesian belief networks, enable a probabilistic framework upon which a range of queries can be made. Advances in BBN inference techniques are providing the computational means to answer increasingly complex questions over complex models throughout healthcare (e.g., see Chapter 8, and [20] provides a review of BBN applications specific to bioinformatics). But unless these tools are made readily accessible to a broader audience, the translation of the models to routine practice will be limited. To this end, we examine several applications of belief networks: 1) a simplified version used for classification purposes, the naïve Bayes classifier; 2) the use of BBNs in imaging, particularly focusing on its applications for medical image processing and related retrieval tasks; and 3) the use of belief networks to guide the visualization process, and in turn, to serve as a front-end to BBN model interaction.

Naïve Bayes

There is a special case of Bayesian belief networks, called naïve Bayesian belief networks (also sometimes called simple or naïve Bayes), which are often used as classifiers; as such, they have been used extensively in medical image processing, text classification, diagnostic/prognostic queries, and other tasks. Naïve Bayes classifiers are also well-suited to visualization in terms of nomograms [48]. Structurally, a naïve Bayesian classifier consists of one parent node (the class) and multiple children (the attributes) (Fig. 9.8b), and is predicated on a very strong assumption about the independence of the modeled attributes. Often this assumption is unrealistic; but accepting this restriction, naïve Bayesian classifiers can be trained efficiently over both large datasets and a number of attributes. And despite their simple design, naïve Bayes classifiers tend to perform well in real-world scenarios. Studies comparing classification algorithms have found the naïve Bayesian classifier to be comparable in performance with classification trees and with neural network classifiers. Analyses have demonstrated possible theoretical reasons for naïve Bayes' efficacy [40, 61].

To appreciate how naïve Bayesian classifiers work, consider discriminating between histological Types of lung cancer based on size, lobulation, and margin appearance on computed tomography (CT) imaging. Given a (labeled) dataset, a normal Bayesian classifier operates by stating that, if a tumor is encountered that is between 0.5-6 cm in diameter, and with poorly defined margins, and is lobulated, what is the most likely classification based on the observed data sample? Unfortunately, to properly estimate these probabilities, a sufficiently large number of observations is needed to capture all possible combination of features (thereby representing the joint distribution). In assuming that the features are independent of one another, naïve Bayes classification instead circumvents this problem: in our example, the probability that the tumor is 0.5-6 cm, has poorly defined margins, and appears lobulated (and is probably an adenosquamous carcinoma) can be computed from the independent probabilities that a tumor is of a given size, that it has a specific Type of margin, and that it is lobulated. These independent probabilities are much easier to obtain, requiring less sample data.

More formally, let A = {A1,…,An} be the n attributes used in a classifier, C. For a given instance {a1,…an}, the optimal prediction is class C = c such that P(c | A1 = a1 ∧ … ∧ An = an) is maximized. Using Bayes' rule, this probability can be rewritten as:
$$P(c|A_1,...,A_n ) = \frac{{P(c)P(A_1 = a_1 \wedge... \wedge A_n = a_n |c)}}{{P(A_1 = a_1 \wedge... \wedge A_n = a_n )}}$$
where P(c) is readily computed from a given training set. For classification purposes, as the denominator is identical across all values of c, we need only concern ourselves with the numerator term. Using Bayes' rule again, the numerator can be stated as P(A1 = a1 | A2 = a2 ∧ … ∧ An = an, c)P(A2 = a2 ∧ … ∧ An = an | c), recursively rewritten for each corresponding attribute. Given the independence assumption, then: P(A1 = a1 | A2 = a2 ∧ … ∧ An = an, c) = P(A1 = a1 | c). The original numerator is thus equal to the product of each independent conditional probability. Accumulating these results, each probability can then be estimated from a training data set such that:
$$P(A_j = a_j |c) = \frac{{{\text{count}}(A_j = a_j \wedge c)}}{{{\text{count}}(C = c)}}$$
where the above equation provides maximum likelihood probability estimates. It can be further shown that naïve Bayes is a non-parametric, nonlinear generalization of the more well-recognized logistic regression. Caution must be used when naïve Bayes techniques are applied. Consider Fig. 9.8, which collapses three different causes into one condition that has four possible effects. In general, these two graphs are not equivalent unless the single-fault assumption is made. The single-fault assumption states that only one condition can exist at any given time, as the multiple values of the condition variables are exclusive to each other. Applying the single-fault assumption to Fig. 9.8b, inconsistencies quickly arise; for instance, if a patient is known to have a cold, then whether the patient has a fever does not influence the belief that the patient also has a sore throat (intuitively, this does not make sense, as if the patient has a fever, this should increase our evidence for tonsillitis, which would in turn increase our belief in a sore throat). The single-fault assumption requires that the sore throat and fever be d-separated (i.e., that they are independent variables), but based on the original network, sore throat and fever are not d-separated.
Figure 9.8

(a) An example belief network consisting of three diseases and four symptoms. Examination of the topology (via d-separation) shows that the symptoms are not independent. (b) A naïve Bayes classifier. This classifier is not equivalent to Fig. 9.8a given that various symptoms (attributes) of the class (i.e., condition) are dependent; the naïve Bayesian classifier requires independence.

Imaging Applications

Graphical models have become increasingly popular in computer vision, being applied to bridge the gap between low-level features (e.g., pixels) and high-level understanding (e.g., object identification). Given the wide variety of images that exist (e.g., natural images, medical images) and the large number of pixels that compose an image, problems in computer vision benefit from a BBN's ability to integrate domain-specific knowledge and to simplify computations by exploiting conditional independence relationships; a few applications are summarized here:
  • Enhancing image processing. [64] creates a geometric knowledge-base using BBNs to provide an efficient framework for integrating multiple sources of information (e.g., various edge detectors). The results of such detectors are typically partial, disjoint shapes; but when used as inputs into the BBN, the model infers the most probable complete geometrical shape based on these available features. [47] uses a BBN to perform real-time semi-automatic object segmentation. First, the image is segmented using watershed segmentation; then, a graph is imposed onto the resulting gradient lines, placing a node where three or more converge and drawing an edge along the section of the watershed line between two nodes. The model's prior probabilities encode the confidence that an edge belongs to an object boundary while the CPTs enforce global contour properties. The image segmentation is determined by computing the MPE. [1] improves on a split-and-grow segmentation approach by using a BBN to integrate high-level information cues. The BBN is used to infer the probability of a region having an object of interest based on image attributes such as location of the region and color. Regions with the highest probabilities are taken as seed points for subsequent region-growing stages.

  • Image object identification/classification. [44] demonstrates the ability of using BBNs to perform scene categorization: an image is initially characterized into two sets of descriptors: low-level features (e.g., color and texture) and semantic features (e.g., sky and grass). These features are used as evidence to instantiate a BBN-based inference engine that produces semantic labels for each image region. [72] examines how visual and physical interactions between objects in an image can provide contextual information to improve object detection. A BBN is automatically generated from features detected in the image; this model is then used to generate multiple hypotheses about how the objects interact with each other (e.g., occlusions). The system then generates a synthetic image that represents the most probable configuration of individual image features.

While many of these algorithms have been developed for natural images, they may also be applied to medical images. Many of the low-level features (e.g., texture) are used to characterize medical images and in combination with BBNs, may provide for more accurate indexing and retrieval of images from large biomedical databases to support content-based image retrieval.

Querying and Problem-centric BBN Visualization

An emergent question in applications of BBNs to the medical domain has been how to merge their use into clinical care, abstracting away the underlying complexity while still exposing their utility as tools for decision-making. Rather than use the DAG representation of a BBN to interact with a disease model, one approach is to create an intermediate layer that replaces BBN variables with common graphical representations (icons or visual metaphors) that can be drawn using the patient's own data to compose queries. A user's visual query is then interpreted by the application and translated into a question to the BBN. Additionally, this “visual querying” approach is well-suited to imaging data, where geometric and spatial features (e.g., size, shape, location) are more readily graphically depicted. The BBN itself can also be used as a source of knowledge to guide the display of patient information and results, enabling “problem-centric” BBN visualization: the network topology and conditional probabilities give clues as to which variables (and thus, which data) are closely related and should be presented as part of the query's context. We conclude by presenting two systems that illustrate these techniques.

Visual Query Interface

The first system, called the visual query interface (VQI), facilitates inference on a disease model through a graphical paradigm; moreover, the user's querying process itself is guided by the topology and the parameters of the underlying model. This system is designed for radiologists and other physicians who are interested in using image features (e.g., color, texture, shape) to find other similar studies in a large repository (e.g., picture archive and communication systems, PACS), such as in applications for medical content-based image retrieval. For example, consider the request, retrieve all related patient cases that have nodules with a speckled appearance in the right lower lobe of the lung. Given the nature of medical images, a visual query-by-example interface is well-suited to the task of query composition: spatial (e.g., location) and morphological attributes (e.g., irregular tumor border) are naturally described by a graphical representation. In point of fact, one usability study showed that when asked to specify complex queries, users' found visual queries to be more intuitive and expressive than traditional text query languages [66]. Yet as the number of queryable features within a domain grows, the tools to facilitate visual expression of the query must be well-organized and guarantees on the logic of the query must be made.

To this end, in VQI the user manipulates a pictographic representation of BBN variables, referred to as graphical metaphors. Two Types of graphical metaphors exist: 1) a freehand metaphor that allows the user to sketch a query object (i.e., a tumor) and its environment (e.g., surrounding anatomical structures); and 2) a component metaphor that prompts the user to input numerical or categorical values based on fields in the patient record. By combining graphical metaphors in different ways, a variety of diagnostic, prognostic, and treatment-related questions may be posed. For imaging-based variables, graphical metaphors take on the properties of their image feature counterparts, allowing users to alter their sizes, locations, relative geometrical positions, and shapes to obtain the desired query. The metaphors bridge a user's knowledge of a familiar domain (e.g., a radiologist's expertise in image interpretation) to an exploratory framework that may include additional variables. Additionally, the selection of graphical metaphors in VQI is context-specific such that as the query is built, different metaphors are made available (or removed) to enable the user to draw a permissible query. A feedback loop exists between the user and the underlying graphical model, as illustrated in Fig. 9.9: given a disease model, contextual information provided by the variables, structure, and user interaction with the model influence what graphical metaphors or functionality is the displayed to the user. As the user selects metaphors to formulate a query, the inputs provide some context about the Types of variables that are of interest to the user and in turn can be used to identify the subsets of variables in the model that are directly related and relevant for the query. This feedback loop provides a form of relevance feedback: as the user chooses a set of variables to be a part of the query, the system uses this information to refine what metaphors are presented next to the user.
Figure 9.9

VQI’s relationship between user interaction and the underlying BBN.

VQI supports the use of labeled imaging atlases to provide spatial information about anatomical structures. For example, when the user overlays a tumor metaphor atop a representative slice from an atlas, the anatomical information encoded in the atlas is used to determine the location of the metaphor and whether the metaphor affects any surrounding structures (e.g., mass effect on the right ventricles).

Adaptive interfaces using BBNs. To dynamically adapt the interface, the BBN is utilized to perform two tasks: 1) to capture knowledge about a disease in a probabilistic manner so that inference may be performed with instantiations of the information; and 2) to map variables to graphical metaphors and to determine when a metaphor is pertinent to the user's query. The variables, structure, and user interaction with the model are hence used by VQI to determine when a given variable is “relevant” as described subsequently:
  1. 1.

    Variables. In constructing a disease model, a select number of variables are chosen and modeled to characterize a disease process. Each variable is mapped to a unique graphical metaphor. By way of illustration, an age variable would map to a component graphical metaphor that prompts the user to specify a numerical value. In addition, each variable has a number of states that the variable can take on; these states dictate what properties a graphical metaphor can take on. For a variable that models the percentage of tumor removed from a patient, the states may be specified by a range of percentage values (e.g., 90-100% resection); the graphical metaphor is responsible for transforming a user's numerical input and placing it into one of the variable's states. Variable names can also be mapped to a broader knowledge source, such as an ontology, that allows the variable to be defined and placed into the context of other related variables. For example, if a disease model representing brain-related symptoms includes a variable word blindness that represents a loss of the patient's ability to read written text, the variable is mapped to the term alexia in the Unified Medical Language System (UMLS) lexicon and assigned to the semantic Type T047 - Disease or Symptom. After mapping all of variables to UMLS, variables with identical or similar semantic Types are grouped and presented together in the query interface.

     
  2. 2.

    Model structure. The network topology encodes information about the conditional independencies that exist in the model. Based on the Markov assumption, conditional independencies allow the model to be decomposed into small subgroups given evidence about certain variables. For instance, a variable, given information about the parents, children, and children's parents, can be fully explained by these variables and therefore isolated from the rest of the network. This specific property is called a Markov blanket. VQI leverages this property to identify those subsets of variables in the model related to a given variable of interest. When a variable of interest is selected, VQI examines the variable's Markov blanket to identify additional graphical metaphors to be presented in the interface. Also, the in- and out-degree of a variable help to determine the relative importance of a variable: highly connected variables can be considered more crucial to a disease process than variables that are sparsely connected. In VQI, the connectedness of a variable is used to determine the initial group of metaphors that is presented to the user.

     
  3. 3.

    Query. Information about the user's goals is gleamed from the query itself. The variables that the user selects to be a part of the query elucidate the Types of information that the user is seeking from the model. As an example, if the user selects several imaging-related variables, the probability that the user is interested in determining how imaging features affect the outcome of the patient is increased. Therefore, the model increases the weight of other imaging-related variables in the model so that they are visually highlighted or presented prior to other metaphors in the interface.

     

This adaptive presentation of relevant graphical metaphors not only simplifies the process of creating a query by reducing visual (selection) clutter, but also enforces logical rules regarding the order that metaphors are selected to formulate a query. For instance, in neuroradiology, contrast enhancement, if present, appears around certain image features of a tumor, such as a cyst or necrosis. Therefore, the option to add a rim contrast metaphor is only applicable when a cyst or necrosis metaphor is already present in the query.

Formulating a query. The process of posing a visual query is as follows: from a normal or patient imaging study, the user selects a representative slice or location to pose the query; the user iteratively constructs a query by drawing upon the available set of presented metaphors to represent visual features of the disease; and the final query is translated into an object representation that is used to set the states of variables in the BBN as the basis of a MAP query. Fig. 9.10 demonstrates how VQI's adaptive interface works in the context of posing a query in the domain of neuro-oncology: users are presented with a normal brain atlas (ICBM452 [60]), from which axial, coronal, or sagittal slices can be selected (Fig. 9.10a). An adaptive toolbar presents available metaphors based on context: as the user selects structures (e.g., white matter) or metaphors (e.g., edema metaphor) in the editor, related metaphors are presented in the toolbar (and unrelated metaphors are removed). For instance, when the contrast enhancement metaphor is selected, the user is prompted to define whether the border is thick or thin. A user progressively composes a visual query, which is then translated to values posed against the BBN and inference can then take place.
Figure 9.10

Demonstrating query formulation using VQI and how the adaptive interface uses the model to determine the presentation of graphical metaphors. (a) The user initially selects a representative slice from an atlas to place a tumor object. (b) After drawing an edema metaphor in the query; the model then identifies which metaphors to present next based on the structure of the model. (c) After adding a necrotic metaphor, the next relevant metaphor is contrast enhancement.(d) The user specifies properties of the contrast enhancement based on the states defined in the variable.

Case-based retrieval. VQI supports case-based retrieval by using the Kullback-Leibler(KL) divergence (DKL). Originally posed as an information theoretic measure [39], DKL assesses the difference between two probability distributions (over the same event space) and is formally defined for discrete random variables as:
$$D_{KL} (P,Q) = \sum\limits_{x \in \chi } {P(x)\log \frac{{P(x)}}{{Q(x)}}} $$

where P and Q are the two probability distributions, and χ is a set of variables: the smaller DKL, the more similar the distributions. DKL has, for example, been used to compute the magnitude of nonlinear deformation needed in image registration problems [74] and in BBNs for visualizing relationship strengths [36]. In VQI, KL divergence is used to measure the similarity between the query and cases in a patient database. Based on the imaging features of the query (e.g., size, location, geometric relationships between objects, etc.) and other non-imaging values specified in the query, the posterior probability distribution for the combination of variables given as evidence is computed; this value is assigned as P(x). Next, the posterior probability distribution is then calculated for all of cases in the database using the same query variables; the resulting value is assigned as Q(x). The KL divergence is iteratively calculated for each case in the database, and the results are ranked from lowest to highest. The case associated with the lowest KL divergence value is the “closest” matching case (with a KL divergence of 0 being a perfect match). The benefit of using this approach is that unlike traditional case-based approaches, combinations of variables that have not previously been inputted in the database can still be supported; the model will attempt to find the next best combination of features that result in a posterior probability distribution closest to that of the query.

AneurysmDB

Building from concepts developed in VQI, a second application is AneurysmDB. AneurysmDB is an ongoing project to develop an interface for the integrated visualization and querying of a clinical research database for intracranial aneurysms (ICAs) (Fig. 9.11). ICAs are a relatively common autopsy finding, occurring in approximately 1-6% of the general population; this statistic suggests that up to 15 million Americans have or will develop this potentially debilitating, if not deadly problem [65, 73]. Yet little is known about the true etiology of intracranial aneurysms and optimal treatment is still largely debated.
Figure 9.11

Example AneurysmDB interface, shown with a fictitious research patient. A BBN can be automatically populated with information extracted from the patient’s electronic medical record. The user can then select which variables to query on and formulate a belief update or MPE/MAP query. Additionally, a 3D sketch interface is integrated to enable the user to draw an aneurysm shape/location as part of the query process. Elements in this display (e.g., the extracted text components, task ribbon, overall layout) are also driven by the topology of the BBN.

Like VQI's application domain of neuro-oncology, this application's motivation is to support prognostic “what if” queries to an underlying disease model and the retrieval of similar cases (and hence, potential outcomes for a given individual). Additionally, the ribbon toolbar presented at the top of the interface is also guided by examination of the BBN and current query to identify likely variables to include. Unlike VQI, where the predominant focus is on guiding image-oriented queries, AneurysmDB aims to expand the querying process to all clinical variables extracted from the electronic medical record. Some key differences are highlighted:
  • Linkage to a phenomenon-centric data model. As outlined toward the end of Chapter 8, a BBN can be connected to a phenomenon-centric data model (PCDM) to drive computation of the CPTs. Moreover, the classification of the variables and relations expressed in the data model provide additional semantic information that can be used to control what aspects of data are presented in the user interface. In this case, AneurysmDB is linked to a PCDM for ICAs, facilitating access to patient-level records for perusal; moreover, the elements that are shown (e.g., the summaries for each document or imaging study; the grouping and ordering of elements in the task ribbon) are decided through a weighting of the relationships in the PCDM and the BBN. For instance, in addition to using Markov blankets, path length is used to determine the opacity and size of a graphical element as it is rendered in the interface. Path length is measured by the minimum number of arcs required to go between output variables and the given query input variables in both the BBN and PCDM: as the path length increases, the graphical representation for that variable would be rendered with reduced opacity and/or size. An example of adjusting opacity/size is in showing the list of findings for a report: those extracted elements that are most influential (e.g., ICA size) should be highlighted and given in a summary before more ancillary variables (e.g., smoking history).

  • Temporal modeling. While clinical care is governed by making decisions about patient treatment with the latest information, sometimes researchers ask questions that are driven by retrospective analysis (e.g., given the information up till a certain point in time, such as one year ago, how would the probabilities change compared to now?). To answer such a question, we must carefully separate out the data elements and inferences made over time. Connection to the PCDM allows us to pose queries based on different time points in the patient's history.

  • Integration of a 3D sketch interface. Aneurysms are 3D entities, the morphology of which is critical to understanding the risk of rupture. In contrast to the 2D interface in VQI, a 3D sketch interface based on [31] is introduced, along with standard templates for common ICA shapes.

Footnotes

  1. 1.

    As in Chapter 8, we follow standard notation with uppercase letters representing a random variable; lowercase letters indicating instantiations/specific values of the random variable; and bold characters symbolizing sets or vectors of variables.

  2. 2.

    Belief propagation is sometimes also referred to as the sum-product algorithm.

References

  1. 1.
    Alvarado P, Berner A, Akyol S (2002) Combination of high-level cues in unsupervised single image segmentation using Bayesian belief networks. Proc Intl Conf Imaging Science, Systems, and Technology, Las Vegas, NV, pp 235-240.Google Scholar
  2. 2.
    Bednarski M, Cholewa W, Frid W (2004) Identification of sensitivities in Bayesian networks. Engineering Applications of Artificial Intelligence, 17(4):327-335.CrossRefGoogle Scholar
  3. 3.
    Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: Current issues and guidelines. Intl J Medical Informatics, 77(2):81-97.CrossRefGoogle Scholar
  4. 4.
    Bishop CM (2006) Graphical models. Pattern Recognition and Machine Learning. Springer, New York, pp 359-418.Google Scholar
  5. 5.
    Boyen X (2002) Inference and learning in complex stochastic processes. Department of Computer Science, PhD dissertation. Stanford University.Google Scholar
  6. 6.
    Boyen X, Koller D (1998) Tractable inference for complex stochastic processes. Proc 16th Conf Uncertainty in Artificial Intelligence (UAI), pp 313-320.Google Scholar
  7. 7.
    Breitkreutz BJ, Stark C, Tyers M (2003) Osprey: A network visualization system. Genome Biol, 4(3):R22.CrossRefGoogle Scholar
  8. 8.
    Chan H, Darwiche A (2004) Sensitivity analysis in Bayesian networks: From single to multiple parameters. Proc 20th Conf Uncertainty in Artificial Intelligence (UAI), pp 67-75.Google Scholar
  9. 9.
    Chavira M, Darwiche A, Jaeger M (2006) Compiling relational Bayesian networks for exact inference. Intl J Approximate Reasoning, 42(1-2):4-20.MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Cooper GF (1988) A method for using belief networks as influence diagrams. Proc 12th Conf Uncertainty in Artificial Intelligence, pp 55-63.Google Scholar
  11. 11.
    Cooper GF (1990) The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42:393-405.MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Coupé VM, Peek N, Ottenkamp J, Habbema JD (1999) Using sensitivity analysis for efficient quantification of a belief network. Artif Intell Med, 17(3):223-247.CrossRefGoogle Scholar
  13. 13.
    Coupé VMH, van der Gaag LC (2002) Properties of sensitivity analysis of Bayesian belief networks. Annals of Mathematics and Artificial Intelligence, 36(4):323-356.MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Darwiche A (2003) A differential approach to inference in Bayesian networks. Journal of the ACM, 50(3):280-305.CrossRefMathSciNetGoogle Scholar
  15. 15.
    Darwiche A (2009) Modeling and reasoning with Bayesian networks. Cambridge University Press, New York.MATHGoogle Scholar
  16. 16.
    de Campos LM, Gámez JA, Moral S (1999) Partial abductive inference in Bayesian belief networks using a genetic algorithm. Pattern Recognition Letters, 20(11-13):1211-1217.CrossRefGoogle Scholar
  17. 17.
    de Salvo Braz R, Amir E, Roth D (2008) A survey of first-order probabilistic models. In: Holmes DE, Jain LC (eds) Innovations in Bayesian Networks: Theory and Applications. Springer, pp 289-317.CrossRefGoogle Scholar
  18. 18.
    Dechter R (1999) Bucket elimination: A unifying framework for probabilistic inference. Learning in Graphical Models, pp 75-104.Google Scholar
  19. 19.
    Dechter R, Mateescu R (2007) AND/OR search spaces for graphical models. Artificial Intelligence, 171(2-3):73-106.MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Donkers J, Tuyls K (2008) Belief networks for bioinformatics. Computational Intelligence in Bioinformatics, pp 75-111.Google Scholar
  21. 21.
    Druzdzel MJ (1996) Qualitiative verbal explanations in Bayesian belief networks. AISB Quarterly:43-54.Google Scholar
  22. 22.
    Geman S, Geman D (1987) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms:564-584.Google Scholar
  23. 23.
    Getoor L, Friedman N, Koller D, Pfeffer A, Taskar B (2007) Probabilistic relational models. In: Getoor L, Taskar B (eds) Introduction to Statistical Relational Learning. MIT Press, Cambridge, MA, pp 129-174.Google Scholar
  24. 24.
    Haddaway P, Jacobson J, Kahn CE, Jr. (1997) BANTER: A Bayesian network tutoring shell. Artif Intell Med, 10(2):177-200.CrossRefGoogle Scholar
  25. 25.
    Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C (2001) Dependency networks for inference, collaborative filtering, and data visualization. J Machine Learning Research, 1:49-75.MATHCrossRefGoogle Scholar
  26. 26.
    Heckerman D, Meek C, Koller D (2004) Probabilistic Models for Relational Data (MSR-TR-2004-30). Microsoft Research. http://research.microsoft.com/pubs/70050/tr-2004-30.pdf . Accessed March 3, 2009.
  27. 27.
    Horvitz E, Barry M (1995) Display of information for time-critical decision making. Proc 11th Conf Uncertainty in Artificial Intelligence (UAI), pp 296-305.Google Scholar
  28. 28.
    Horvitz E, Breese J, Heckerman D, Hovel D, Rommelse K (1998) The Lumiere Project: Bayesian user modeling for inferring the goals and needs of software users. Proc 14th Conf Uncertainty in Artificial Intelligence (UAI), pp 256-265.Google Scholar
  29. 29.
    Hu Z, Mellor J, Wu J, Yamada T, Holloway D, DeLisi C (2005) VisANT: Data-integrating visual framework for biological networks and modules. Nucleic Acids Res:W352-357.Google Scholar
  30. 30.
    Huang J, Chavira M, Darwiche A (2006) Solving MAP exactly by searching on compiled arithmetic circuits. Proc 21st Natl Conf Artificial Intelligence (AAAI-06), Boston, MA, pp 143-148.Google Scholar
  31. 31.
    Igarashi T, Hughes JF (2003) Smooth meshes for sketch-based freeform modeling. Proc ACM Symp Interactive 3D Graphics (ACM I3D 2003), pp 139-142.Google Scholar
  32. 32.
    Jaeger M (1997) Relational Bayesian nets. Proc 13th Conf Uncertainty in Artificial Intelligence (UAI), pp 266-273.Google Scholar
  33. 33.
    Jensen FV, Lauritzen SL, Olesen KG (1990) Bayesian updating in recursive graphical models by local computation. Computational Statistics Quarterly, 4:269-282.MathSciNetGoogle Scholar
  34. 34.
    Kadaba NR, Irani PP, Leboe J (2007) Visualizing causal semantics using animations. IEEE Trans Vis Comput Graph, 13(6):1254-1261.CrossRefGoogle Scholar
  35. 35.
    Kjærulff UB, Madsen AL (2008) Sensitivity analysis. Bayesian Networks and Influence Diagrams, pp 273-290.Google Scholar
  36. 36.
    Koiter JR (2006) Visualizing inference in Bayesian networks. Department of Computer Science, PhD dissertation. Delft University of Technology (Netherlands).Google Scholar
  37. 37.
    Koller D (1999) Probabilistic relational models. Inductive Logic Programming, vol 1634. Springer, pp 3-13.CrossRefMathSciNetGoogle Scholar
  38. 38.
    Koller D, Lerner U (2001) Sampling in factored dynamic systems. In: Doucet A, de Freitas JFG, Gordon N (eds) Sequential Monte Carlo Methods in Practice. Springer-Verlag, pp 445-464.Google Scholar
  39. 39.
    Kullback S, Leibler RA (1951) On information and sufficiency. Annals Mathematical Statistics, 22:79-86.MATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Kuncheva LI (2006) On the optimality of naïve Bayes with dependent binary features. Pattern Recognition Letters, 27(7):830-837.CrossRefGoogle Scholar
  41. 41.
    Lacave C, Dez FJ (2002) A review of explanation methods for Bayesian networks. Knowl Eng Rev, 17(2):107-127.CrossRefGoogle Scholar
  42. 42.
    Laskey KB (1995) Sensitivity analysis for probability assessments in Bayesian networks. IEEE Trans Syst Man Cybern, 25:901-909.CrossRefGoogle Scholar
  43. 43.
    Lauritzen SL, Spiegelhalter DJ (1988) Local computations with probabilities on graphical structures and their application to expert systems. J Royal Statistical Society, 50(2):157-224.MATHMathSciNetGoogle Scholar
  44. 44.
    Luo J, Savakis AE, Singhal A (2005) A Bayesian network-based framework for semantic image understanding. Pattern Recognition, 38(6):919-934.CrossRefGoogle Scholar
  45. 45.
    Madigan D, Mosurski K, Almond RG (1996) Graphical explanation in belief networks. J Comput Graphical Statistics, 6:160-181.CrossRefGoogle Scholar
  46. 46.
    Mengshoel O, Wilkins D (1998) Genetic algorithms for belief network inference: The role of scaling and niching. Evolutionary Programming VII, vol 1447. Springer, pp 547-556.CrossRefGoogle Scholar
  47. 47.
    Mortensen EN, Jin J (2006) Real-time semi-automatic segmentation using a Bayesian network. IEEE Proc Conf Computer Vision and Pattern Recognition, vol 1, pp 1007-1014.Google Scholar
  48. 48.
    Mozina M, Demsar J, Kattan MW, Zupan B (2004) Nomograms for visualization of naive bayesian classifier. Proc Principles Practice of Knowledge Discovery in Databases (PKDD-04), Pisa, Italy, pp 337-348.Google Scholar
  49. 49.
    Murphy K, Weiss Y (2001) The factored frontier algorithm for approximate inference in DBNs. Proc 18th Conf Uncertainty in Artificial Intelligence (UAI), pp 378-385.Google Scholar
  50. 50.
    Neal R (1993) Probabilistic inference using Markov chain Monte Carlo methods (CRG-TR-93-1). Department of Computer Science, University of Toronto.Google Scholar
  51. 51.
    Nease RF, Owens DK (1997) Use of influence diagrams to structure medical decisions. Med Decis Making, 17(3):263-275.CrossRefGoogle Scholar
  52. 52.
    Ng BM (2006) Factored inference for efficient reasoning of complex dynamic systems. Computer Science Department, PhD dissertation. Harvard University.Google Scholar
  53. 53.
    Ogunyemi OI, Clarke JR, Ash N, Webber BL (2002) Combining geometric and probabilistic reasoning for computer-based penetrating-trauma assessment. J Am Med Inform Assoc, 9(3):273-282.CrossRefGoogle Scholar
  54. 54.
    Park J (2002) MAP complexity results and approximation methods. Proc 18th Conf Uncertainty in Artificial Intelligence (UAI), pp 388-396.Google Scholar
  55. 55.
    Park J, Darwiche A (2001) Approximating MAP using local search. Proc 17th Conf Uncertainty in Artificial Intelligence (UAI), pp 403-410.Google Scholar
  56. 56.
    Park JD, Darwiche A (2003) Solving MAP exactly using systematic search. Proc 19th Conf Uncertainty in Artificial Intelligence (UAI), pp 459-468.Google Scholar
  57. 57.
    Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA.Google Scholar
  58. 58.
    Poole D (2003) First-order probabilistic inference. Proc 18th Intl Joint Conf Artificial Intelligence, pp 985-991.Google Scholar
  59. 59.
    Przytula KW, Dash D, Thompson D (2003) Evaluation of Bayesian networks used for diagnostics. Proc IEEE Aerospace Conf, pp 1-12.Google Scholar
  60. 60.
    Rex D, Ma J, Toga A (2003) The LONI pipeline processing environment. Neuroimage, 19(3):1033-1048.CrossRefGoogle Scholar
  61. 61.
    Rish I (2001) An empirical study of the naive Bayes classifier. Workshop on Empirical Methods in Artificial Intelligence; Proc Intl Joint Conf Artificial Intelligence, vol 335.Google Scholar
  62. 62.
    Romero T, Larrañaga P (2009) Triangulation of Bayesian networks with recursive estimation of distribution algorithms. Intl J Approximate Reasoning, 50(3):472-484.CrossRefGoogle Scholar
  63. 63.
    Russell SJ, Norvig P (2003) Artificial Intelligence: A modern approach. 2nd edition. Prentice Hall/Pearson Education, Upper Saddle River, NJ.Google Scholar
  64. 64.
    Sarkar S, Boyer KL (1993) Integration, inference, and management of spatial information using Bayesian networks: Perceptual organization. IEEE Trans Pattern Analysis and Machine Intelligence, 15(3):256-274.CrossRefGoogle Scholar
  65. 65.
    Schievink WI (1997) Intracranial aneurysms. N Engl J Med, 336(1):28-40.CrossRefGoogle Scholar
  66. 66.
    Siau K, Chan H, Wei K (2004) Effects of query complexity and learning on novice user query performance with conceptual and logical database interfaces. IEEE Trans Syst Man Cybern, 34(2):276-281.CrossRefGoogle Scholar
  67. 67.
    Suermondt HJ, Cooper GF (1993) An evaluation of explanations of probabilistic inference. Comput Biomed Res, 26(3):242-254.CrossRefGoogle Scholar
  68. 68.
    Van Allen T, Singh A, Greiner R, Hooper P (2008) Quantifying the uncertainty of a belief net response: Bayesian error-bars for belief net inference. Artificial Intelligence, 172(4-5):483-513.MATHCrossRefMathSciNetGoogle Scholar
  69. 69.
    Verduijn M, Peek N, Rosseel PMJ, de Jonge E, de Mol BAJM (2007) Prognostic Bayesian networks: I: Rationale, learning procedure, and clinical use. J Biomedical Informatics, 40(6):609-618.CrossRefGoogle Scholar
  70. 70.
    Wang H, Druzdel MJ (2000) User interface tools for navigation in conditional probability tables and elicitation of probabilities in Bayesian networks. Proc 16th Conf Uncertainty in Artificial Intelligence (UAI), pp 617-625.Google Scholar
  71. 71.
    Wemmenhove B, Mooij J, Wiegerinck W, Leisink M, Kappen H, Neijt J (2007) Inference in the Promedas medical expert system. Artificial Intelligence In Medicine, pp 456-460.Google Scholar
  72. 72.
    Westling M, Davis L (1997) Interpretation of complex scenes using Bayesian networks. Computer Vision - ACCV'98. Springer, pp 201-208.CrossRefGoogle Scholar
  73. 73.
    Wiebers DO, Whisnant JP, Huston J, 3rd, Meissner I, Brown RD, Jr., Piepgras DG, Forbes GS, Thielen K, Nichols D, O'Fallon WM, Peacock J, Jaeger L, Kassell NF, Kongable-Beckman GL, Torner JC (2003) Unruptured intracranial aneurysms: Natural history, clinical outcome, and risks of surgical and endovascular treatment. Lancet, 362(9378):103-110.CrossRefGoogle Scholar
  74. 74.
    Yanovsky I Thompson PM Osher S Leow AD (2006) Large deformation unbiased differomorphic nonlinear image registration: Theory and implementation. UCLA Center for Applied Mathematics (Report #06-71).Google Scholar
  75. 75.
    Yap GE, Tan AH, Pang HH (2008) Explaining inferences in Bayesian networks. Applied Intelligence, 29(3):263-278.CrossRefGoogle Scholar
  76. 76.
    Yedidia JS, Freeman WT, Weiss Y (2003) Understanding belief propagation and its generalizations. In: Lakemeyer G, Bernhard N (eds) Exploring Artificial Intelligence in the New Millennium. Elsevier Science, pp 239–236.Google Scholar
  77. 77.
    Yuan C, Lu T-C, Druzdzel MJ (2004) Annealed MAP. Proc 20th Conf Uncertainty in Artificial Intelligence (UAI), Banff, Canada, pp 628-635.Google Scholar
  78. 78.
    Zapata-Rivera JD, Neufeld E, Greer JE (1999) Visualization of Bayesian belief networks. Proc IEEE Visualization '99 (Late Breaking Topics), pp 85-88.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Medical Imaging InformaticsUCLA David Geffen School of MedicineLos AngelesUSA
  2. 2.Medical Imaging Informatics Group Department of Radiological SciencesDavid Geffen School of Medicine University of California, Los AngelesLos AngelesUSA

Personalised recommendations