Encyclopedia of GIS

2017 Edition
| Editors: Shashi Shekhar, Hui Xiong, Xun Zhou

Bayesian Network Integration with GIS

  • Daniel P. Ames
  • Allen Anselmo
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-17885-1_95

Synonyms

Definition

A Bayesian network (BN) is a graphical-mathematical construct used to probabilistically model processes which include interdependent variables, decisions affecting those variables, and costs associated with the decisions and states of the variables. BNs are inherently system representations and, as such, are often used to model environmental processes. Because of this, there is a natural connection between certain BNs and GIS. BNs are represented as a directed acyclic graph structure with nodes (representing variables, costs, and decisions) and arcs (directed lines representing conditionally probabilistic dependencies between the nodes). A BN can be used for prediction or analysis of real-world problems and complex natural systems where statistical correlations can be found between variables or approximated using expert opinion. BNs have a vast array of applications for aiding decision-making in areas such as medicine, engineering, natural resources, and decision management. BNs can be used to model geospatially interdependent variables as well as conditional dependencies between geospatial layers. Additionally, BNs have been found to be useful and highly efficient in performing image classification on remotely sensed data.

Historical Background

Originally described by Pearl (1988), BNs have been used extensively in medicine and computer science (Heckerman 1997). In recent years, BNs have been applied in spatially explicit environmental management studies. Examples include the Neuse Estuary Bayesian ecological response network (Borsuk and Reckhow 2000), Baltic salmon management (Varis and Kuikka 1996), climate change impacts on Finnish watersheds (Kuikka and Varis 1997), the Interior Columbia Basin Ecosystem Management Project (Lee and Bradshaw 1998), and waterbody eutrophication (Haas 1998). As illustrated in these studies, a BN graph structures a problem such that it is visually interpretable by stakeholders and decision-makers while serving as an efficient means for evaluating the probable outcomes of management decisions on selected variables.

Both BNs and GIS can be used to represent spatially explicit, probabilistically connected environmental and other systems; however, the integration of the two techniques has only been explored relatively recently. BN integration with GIS typically takes one of the four distinct forms: (1) BN-based layer combination (i.e., probabilistic map algebra) as demonstrated in Taylor (2003); (2) BN-based classification as demonstrated in Stassopoulou et al. (1998) and Stassopoulou et al. (1998); (3) using BNs for intelligent, spatially oriented data retrieval, as demonstrated in Walker et al. (2004) and Walker et al. (2005); and (4) GIS-based BN decision support system (DSS) frameworks where BN nodes are spatially represented in a GIS framework as presented by Ames et al. (2005).

Scientific Fundamentals

As noted above, BNs are used to model reality by representing conditional probabilistic dependencies between interdependent variables, decisions, and outcomes. This section provides an in-depth explanation of BN analysis using an example BN model called the “Umbrella” BN (Fig. 1), an augmented version of the well-known “Weather” influence diagram presented by Shachter and Peot (1992). This simple BN attempts to model the variables and outcomes associated with the decision to take or not take an umbrella on a given outing. This problem is represented in the BN by four nodes. “Weather” and “Forecast” are nature or chance nodes where “Forecast” is conditioned on the state of “Weather” and “Weather” is treated as a random variable with a prior probability distribution based on historical conditions. “Take Umbrella” is a decision variable that, together with the “Weather” variable, defines the status of “Satisfaction.” The “Satisfaction” node is known as a “Utility” or “Value” node. This node associates a resultant outcome value (monetary or otherwise) to represent the satisfaction of the individual based on the decision to take the umbrella and whether or not there is rain. Each of these BN nodes contains discrete states where each variable state represents abstract events, conditions, or numeric ranges of each variable.
Bayesian Network Integration with GIS, Fig. 1

Umbrella Bayesian decision network structure. A and B nature nodes, C a decision node, and D a utility node

The Umbrella model can be interpreted as follows: if it is raining, there is a higher probability that the forecast will predict it will rain. In reverse, through the Bayesian network “backward propagation of evidence,” if the forecast predicts rain, it can be inferred that there is a higher chance that rain will actually occur. The link between “Forecast” and “Take Umbrella” indicates that the “Take Umbrella” decision is based largely on the observed forecast. Finally, the link to the “Satisfaction” utility node from both “Take Umbrella” and “Weather” captures the relative gains in satisfaction derived from every combination of states of the BN variables.

Bayesian networks are governed by two mathematical techniques: conditional probability and Bayes’ theorem.

Conditional probability is defined as the probability of one event given the occurrence of another event and can be calculated as the joint probability of the two events occurring divided by the probability of the second event:
$$\displaystyle\begin{array}{rcl} P(A\vert B) = \frac{P(A,B)} {P(B)} \:.& &{}\end{array}$$
(1)
From Eq. 1, the fundamental rule for probability calculus and the downward propagation of evidence in a BN can be derived. Specifically, it is seen that the joint probability of A and B equals the conditional probability of event A given B, multiplied by the probability of event B (Eq. 2):
$$\displaystyle\begin{array}{rcl} P(A,B) = P(A\vert B) \cdot P(B)\:.& &{}\end{array}$$
(2)
Equation 2 is used to compute the probability of any state in the Bayesian network given the states of the parent node events. In Eq. 3, the probability of state A x occurring given parent B is the sum of the probabilities of the state of A x given state B i , with i being an index to the states of B, multiplied by the probability of that state of B:
$$\displaystyle\begin{array}{rcl} P(A_{x},B) =\sum \limits _{i}P(A_{x}\vert B_{i}) \cdot P(B_{i})\:.& &{}\end{array}$$
(3)
Similarly, for calculating states with multiple parent nodes, the equation is modified to make the summation of the conditional probability of the state A x given states B i and C j multiplied by the individual probabilities of B i and C j :
$$\displaystyle\begin{array}{rcl} & & P(A_{x},B,C) \\ & & =\sum \limits _{i,j}P(A_{x}\vert B_{i},C_{j}) \cdot P(B_{i}) \cdot P(C_{j})\:.{}\end{array}$$
(4)
Finally, though similar in form, utility nodes do not calculate probability, but instead calculate the utility value as a metric or index given the states of its parent or parents as shown in Eqs. 5 and 6:
$$\displaystyle\begin{array}{rcl} U(A,B) =\sum \limits _{i}U(A\vert B_{i}) \cdot P(B_{i})& &{}\end{array}$$
(5)
$$\displaystyle\begin{array}{rcl} & & U(A,B,C) \\ & & =\sum \limits _{i,j}U(A\vert B_{i},C_{j}) \cdot P(B_{i}) \cdot P(C_{j})\:.{}\end{array}$$
(6)
The second equation that is critical to BN modeling is Bayes’ theorem:
$$\displaystyle\begin{array}{rcl} P(A\vert B) = \frac{P(B\vert A) \cdot P(A)} {P(B)} \:.& &{}\end{array}$$
(7)

The conditional probability inversion represented here allows for the powerful technique of Bayesian inference, for which BNs are particularly well suited. In the Umbrella model, inferring a higher probability of a rain given a rainy forecast is an example application of Bayes’ theorem.

Connecting each node in the BN is a conditional probability table (CPT). Each nature node (state variable) includes a CPT that stores the probability distribution for the possible states of the variable given every combination of the states of its parent nodes (if any). These probability distributions can be assigned by frequency analysis of the variables and expert opinion based on observation or experience, or they can be set to some “prior” distribution based on observations of equivalent systems.

Tables 1 and 2 show CPTs for the Umbrella BN. In Table 1, the probability distribution of rain is represented as 70% chance of no rain and 30% chance of rain. This CPT can be assumed to be derived from historical observations of the frequency of rain in the given locale. Table 2 represents the probability distribution of the possible weather forecasts (“Sunny,” “Cloudy,” or “Rainy”) conditioned on the actual weather event. For example, when it actually rained, the prior forecast called for “Rainy” 60% of the time, “Cloudy” 25% of the time, and “Sunny” 15% of the time. Again, these probabilities can be derived from historical observations of prediction accuracies or from expert judgment.
Bayesian Network Integration with GIS, Table 1

Probability of rain

Weather

 

No rain

Rain

 

70%

30%

 
Bayesian Network Integration with GIS, Table 2

Forecast probability conditioned on rain

Forcast

 

Weather

Sunny

Cloudy

Rainy

 

No rain

70%

20%

10%

 

Rain

15%

25%

60%

 
Table 3 is a utility table defining the relative gains in utility (in terms of generic “units” of satisfaction) under all of the possible states of the BN. Here, satisfaction is highest when there is no rain and the umbrella is not taken and lowest when the umbrella is not taken but it does rain. Satisfaction “units” are in this case assigned as arbitrary ratings from 0 to 100, but in more complex systems, utility can be used to represent monetary or other measures.
Bayesian Network Integration with GIS, Table 3

Satisfaction utility conditioned on rain and the “Take Umbrella” decision

Satisfaction

 

Weather

Take Umbrella

Satisfaction

 

No Rain

Take

20 units

 

No Rain

Do not Take

100 units

 

Rain

Take

70 units

 

Rain

Do not Take

0 units

 
Following is a brief explanation of the implementation and use of the Umbrella BN. First it is useful to compute P(Forecast = Sunny) given unknown Weather conditions as follows:
$$\displaystyle\begin{array}{rcl} & & P(\mathrm{Forecast = Sunny}) {}\\ & & = \sum \limits _{i=\mathrm{NoRain,\ Rain}}P(\mathrm{Forecast} {}\\ & & \quad =\mathrm{ Sunny}\vert \mathrm{Weather}_{i}) \cdot P(\mathrm{Weather}_{i}) {}\\ & & = 0.7 \cdot 0.7 + 0.15 \cdot 0.3 = 0.535 = 54\%. {}\\ \end{array}$$
Next P(Forecast = Cloudy) and P(Forecast = Rainy) can be computed as
$$\displaystyle\begin{array}{rcl} & & P(\mathrm{Forecast\ =\ Cloudy,\ Weather}) {}\\ & & = 0.2 \cdot 0.7 + 0.25 \cdot 0.3 = 0.215 = 22\% {}\\ & & P(\mathrm{Forecast\ =\ Cloudy,\ Weather}) {}\\ & & = 0.1 \cdot 0.7 + 0.6 \cdot 0.3 = 0.25 = 25\%\:. {}\\ \end{array}$$
Finally, evaluate the “Satisfaction” utility under both possible decision scenarios (take or leave the umbrella):
$$\displaystyle\begin{array}{rcl} & & U(\mathrm{Satisfaction}\vert \mathrm{TakeUmbrella\ =\ Take}) {}\\ & & =\sum \limits _{i,j}U(\mathrm{Satisfaction}\vert \mathrm{TakeUmbrella,\ Weather}_{j}) {}\\ & & \qquad \qquad \cdot P(\mathrm{TakeUmbrealla}_{i}) \cdot P(\mathrm{Weather}_{j}) {}\\ & & = 20\,{\ast}\,1.0\,{\ast}\,0.7 + 100\,{\ast}\,0.0\,{\ast}\,0.7 + 70\,{\ast} \,1.0\,{\ast}\,0.3 {}\\ & & \quad + 0\,{\ast}\,0.0\,{\ast}\,0.3 = 35\:. {}\\ \end{array}$$
Similarly, the utility of not taking the umbrella is computed as
$$\displaystyle\begin{array}{rcl} & & U(\mathrm{Satisfaction,\ TakeUmbrella} {}\\ & & \quad =\mathrm{ NoTake,\ Weather}) {}\\ & & \quad = 20\,{\ast}\,\,0.0\,\,{\ast}\,\,0.7 + 100\,{\ast}\,1.0\,{\ast}\,0.7 + 70\,{\ast}\,\,0.0 {}\\ & & \qquad {\ast}\,0.3 + 0\,{\ast}\,1.0\,{\ast}\,0.3 = 70 {}\\ \end{array}$$

Clearly, the higher satisfaction is predicted for leaving the umbrella at home, thereby providing an example of how a simple BN analysis can aid the decision-making process. While the Umbrella BN presented here is quite simple and not particularly spatially explicit, it serves as a generic BN example. Specific application of BNs in GIS is presented in the following section.

Key Applications

As discussed before, integration of GIS and BNs is useful in any BN which has spatial components, whether displaying a spatially oriented BN, using GIS functionality as input to a BN, or forming a BN from GIS analysis. Given this, the applications of such integration are only limited by that spatial association really. One example mentioned above of such a spatial orientation has showed usefulness of a watershed management BN, but there are other types of BNs which may benefit from this form of integration. For instance, many ecological, sociological, and geological studies which might benefit from a BN also could have strong spatial associations. Another example might be that traffic analysis BNs have very clear spatial associations often. Finally, even BNs trying to characterize the spread of diseases in epidemiology would likely have clear spatial association.

As outlined above, GIS-based BN analysis typically takes one of the four distinct forms including:
  • Probabilistic map algebra

  • Image classification

  • Automated data query and retrieval

  • Spatial representation of BN nodes

A brief explanation of the scientific fundamentals of each of these uses is presented here.

Probabilistic Map Algebra

Probabilistic map algebra involves the use of a BN as the combinatorial function used on a cell-by-cell basis when combining raster layers. For example, consider the ecological habitat models described by Taylor (2003). Here, several geospatial raster data sets are derived representing proximity zones for human-caused landscape disturbances associated with the development of roads, wells, and pipelines. Additional data layers representing known habitat for each of several threatened and endangered species are also developed and overlaid on the disturbance layers. Next, a BN was constructed representing the probability of habitat risk conditioned on both human disturbance and habitat locations. CPTs in this BN were derived from interviews with acknowledged ecological experts in the region. Finally, this BN was applied on a cell-by-cell basis throughout the study area, resulting in a risk probability map for the region for each species of interest.

The use of BNs in this kind of probabilistic map algebra is currently hindered only by the lack of specialized tools to support the analysis. However, the concept holds significant promise as an alternative to the more traditional GIS-based “indicator analysis” where each layer is reclassified to represent an arbitrary index and then summed to give a final metric (often on a 1 to 100 scale of either suitability or unsuitability). Indeed, the BN approach results in a more interpretable probability map. For example, such an analysis could be used to generate a map of the probability of landslide conditioned on slope, wetness, vegetation, etc. Certainly a map that indicates percent chance of landslide could be more informative for decision-makers than an indicator model that simply displays the sum of some number of reclassified indicators.

Image Classification

In the previous examples, BN CPTs are derived from historical data or information from experts. However, many BN applications make use of the concept of Bayesian learning as a means of automatically estimating probabilities from existing data. BN learning involves a formal automated process of “creating” and “pruning” the BN node-arc structure based on rules intended to maximize the amount of unique information represented by the BN CPTs. In a GIS context, BN learning algorithms have been extensively applied to image classification problems. Image classification using a BN requires the identification of a set of input layers (typically multispectral or hyperspectral bands) from which a known set of objects or classifications are to be identified.

Learning data sets include both input and output layers where output layers clearly indicate features of the required classes (e.g., polygons indicating known land cover types). A BN learning algorithm applied to such a data set will produce an optimal (in BN terms) model for predicting land cover or other classification schemes at a given raster cell based on the input layers. The application of the final BN model to predict land cover or other classifications at an unknown point is similar to the probabilistic map algebra described previously.

Automated Data Query and Retrieval

In the case of application of BNs to automated query and retrieval of geospatial data sets, the goal is typically to use expert knowledge to define the CPTs that govern which data layers are loaded for visualization and analysis. Using this approach in a dynamic web-based mapping system, one could develop a BN for the display of layers using a CPT that indicates the probability that the layer is important, given the presence or absence of other layers or features within layers at the current view extents. Such a tool would supplant the typical approach which is to activate or deactivate layers based strictly on “zoom level.” For example, consider a military GIS mapping system used to identify proposed targets. A BN-based data retrieval system could significantly optimize data transfer and bandwidth usage by only showing specific high-resolution imagery when the probability of needing that data is raised due to the presence of other features which indicate a higher likelihood of the presence of the specific target.

BN-based data query and retrieval systems can also benefit from Bayesian learning capabilities by updating CPTs with new information or evidence observed during the use of the BN. For example, if a user continually views several data sets simultaneously at a particular zoom level or in a specific zone, this increases the probability that those data sets are interrelated and should result in modified CPTs representing those conditional relationships.

Spatial Representation of BN Nodes

Many BN problems and analyses though not completely based on geospatial data have a clear geospatial component and as such can be mapped on the landscape. This combined BN-GIS methodology is relatively new but has significant potential for helping improve the use and understanding of a BN. For example, consider the East Canyon Creek BN (Ames et al. 2005) represented in Fig. 2. This BN is a model of streamflow (FL_TP and FL_HW) at both a wastewater treatment plant and in the stream headwaters, conditional on the current season (SEASON). Also the model includes estimates of phosphorus concentrations at the treatment plant and in the headwaters (PH_TP and PH_HW) conditional on the season and also on operations at both the treatment plant (OP_TP) and in the headwaters (OP_HW). Each of these variables affects phosphorus concentrations in the stream (PH_ST) and ultimately reservoir visitation (VIS_RS). Costs of operations (CO_TP and CO_HW) as well as revenue at the reservoir (REV_RS) are represented as utility nodes in the BN.
Bayesian Network Integration with GIS, Fig. 2

The East Canyon Creek BDN from Ames et al. (2005), as seen in the GeNIe (Decision Systems Laboratory 2006) graphical node editor application

Most of the nodes in this BN (except for SEASON) have an explicit spatial location (i.e., they represent conditions at a specific place). Because of this intrinsic spatiality, the East Canyon BN can be represented in a GIS with points indicating nodes and arrows indicating the BN arcs (i.e., Fig. 3). Such a representation of a BN within a GIS can give the end users a greater understanding of the context and meaning of the BN nodes. Additionally, in many cases, it may be that the BN nodes correspond to specific geospatial features (e.g., a particular weather station) in which case spatial representation of the BN nodes in a GIS can be particularly meaningful.
Bayesian Network Integration with GIS, Fig. 3

(a ) East Canyon displayed with the East Canyon BN overlain on it. (b ) Same, but with the DEM layer turned off and the BN network lines displayed

Future Directions

It is expected that research and development of tools for the combined integration of GIS and BNs will continue in both academia and commercial entities. New advancements in each of the application areas described are occurring on a regular basis and represent an active and interesting study area for many GIS analysts and users.

References

  1. Ames DP, Neilson BT, Stevens DK, Lall U (2005) Using Bayesian networks to model watershed management decisions: an East Canyon Creek case study. J Hydroinform 7:267–282. IWA PublishingGoogle Scholar
  2. Borsuk ME, Reckhow KH (2000) Summary description of the Neuse estuary Bayesian ecological response network (Neu-BERN). http://www2.ncsu.edu/ncsu/CIL/WRRI/neuseltm.html. 26 Dec 2001
  3. Haas TC (1998) Modeling waterbody eutrophication with a Bayesian belief network. Working paper, School of Business Administration, University of Wisconsin, MilwaukeeGoogle Scholar
  4. Heckerman D (1997) Bayesian networks for data mining. Data Mining Knowl Discov 1:79–119. MapWindow Open Source Team (2007). MapWindow GIS 4.3 Open Source Software. Accessed 06 Feb 2007 at the MapWindow Website: http://www.mapwindow.org/
  5. Kuikka S, Varis O (1997) Uncertainties of climate change impacts in Finnish watersheds: a Bayesian network analysis of expert knowledge. Boreal Environ Res 2:109–128Google Scholar
  6. Lee DC, Bradshaw GA (1998) Making monitoring work for managers: thoughts on a conceptual framework for improved monitoring within broad-scale ecosystem management. http://icebmp.gov/spatial/lee_monitor/preface.html (26 Dec 2001)
  7. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San FranciscoMATHGoogle Scholar
  8. Shachter R, Peot M (1992) Decision making using probabilistic inference methods. In: Proceedings of the eighth conference on uncertainty in artificial intelligence, Stanford, pp 275–283Google Scholar
  9. Stassopoulou A, Petrou M, Kittler J (1998) Application of a Bayesian network in a GIS based decision making system. Int J Geograph Inf Sci 12(1):23–45CrossRefGoogle Scholar
  10. Taylor KJ (2003) Bayesian belief networks: a conceptual approach to assessing risk to habitat. Utah State University, LoganGoogle Scholar
  11. Varis O, Kuikka S (1996) An influence diagram approach to Baltic salmon management. In: Proceedings of the conference on decision analysis for public policy in Europe, INFORMS decision analysis society, AtlantaGoogle Scholar
  12. Walker A, Pham B, Maeder A (2004) A Bayesian framework for automated dataset retrieval. In: Geographic information systems. 10th International Multimedia Modelling Conference (MMM), Brisbane, p 138Google Scholar
  13. Walker A, Pham B, Moody M (2005) Spatial Bayesian learning algorithms for geographic information retrieval. In: Proceedings 13th annual ACM international workshop on geographic information systems, Bremen, pp 105–114Google Scholar

Recommended Reading

  1. Ames DP (2002) Bayesian decision networks for watershed management. Utah State University, LoganGoogle Scholar
  2. Norsys Software Corp (2006) Netica Bayesian belief network software. Acquired from http://www.norsys.com/
  3. Stassopoulou A, Caelli T (2000) Building detection using Bayesian networks. Int J Pattern Recognit Artif Intell 14(6):715–733CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Daniel P. Ames
    • 1
  • Allen Anselmo
    • 1
  1. 1.Department of Geosciences, Geospatial Software LabIdaho State UniversityPocatelloUSA