Skip to main content
Log in

Combining categorical and continuous spatial information within the Bayesian maximum entropy paradigm

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Due to the fast pace increasing availability and diversity of information sources in environmental sciences, there is a real need of sound statistical mapping techniques for using them jointly inside a unique theoretical framework. As these information sources may vary both with respect to their nature (continuous vs. categorical or qualitative), their spatial density as well as their intrinsic quality (soft vs. hard data), the design of such techniques is a challenging issue. In this paper, an efficient method for combining spatially non-exhaustive categorical and continuous data in a mapping context is proposed, based on the Bayesian maximum entropy paradigm. This approach relies first on the definition of a mixed random field, that can account for a stochastic link between categorical and continuous random fields through the use of a cross-covariance function. When incorporating general knowledge about the first- and second-order moments of these fields, it is shown that, under mild hypotheses, their joint distribution can be expressed as a mixture of conditional Gaussian prior distributions, with parameters estimation that can be obtained from entropy maximization. A posterior distribution that incorporates the various (soft or hard) continuous and categorical data at hand can then be obtained by a straightforward conditionalization step. The use and potential of the method is illustrated by the way of a simulated case study. A comparison with few common geostatistical methods in some limit cases also emphasizes their similarities and differences, both from the theoretical and practical viewpoints. As expected, adding categorical information may significantly improve the spatial prediction of a continuous variable, making this approach powerful and very promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bogaert P (2002) Spatial prediction of categorical variables: the Bayesian maximum entropy approach. Stoch Environ Res Risk Assess 16:425–448

    Article  Google Scholar 

  • Brus DJ, de Gruijter JJ, Marsman BA, Visschers R, Bregt AK, Breeuwsma A, Bouma J (1996) The performance of spatial interpolation methods and choropleth maps to estimate properties at points: a soil survey case study. Environmetrics 7:1–16

    Article  Google Scholar 

  • Christakos G (1990) A Bayesian maximum-entropy view to the spatial estimation problem. Math Geol 22:763–777

    Article  Google Scholar 

  • Christakos G (2000) Modern spatiotemporal geostatistics. Oxford University Press, New York

    Google Scholar 

  • Christakos G, Li X (1998) Bayesian maximum entropy analysis and mapping: a farewell to kriging estimators? Math Geol 30:435–462

    Article  Google Scholar 

  • Christakos G, Bogaert P, Serre ML (2002) Temporal GIS. Advanced functions for field-based applications. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Cressie N (1990) The origins of kriging. Math Geol 22:239–252

    Article  Google Scholar 

  • Diggle P, Tawn J, Moyeed R (1998) Model-based geostatistics. J R Stat Soc: Appl Stat 47:299–350

    Article  Google Scholar 

  • Dobermann A, Ping JL (2004) Geostatistical integration of yield monitor data and remote sensing improves yield maps. Agron J 96:285–297

    Article  Google Scholar 

  • D’Or D, Bogaert P, Christakos G (2001) Application of the BME approach to soil texture mapping. Stoch Environ Res Risk Assess 15:87–100

    Article  Google Scholar 

  • Douaik A, Van Meirvenne M, Tóth T, Serre ML (2004) Space-time mapping of soil salinity using probabilistic bayesian maximum entropy. Stoch Environ Res Risk Assess 18:219–227

    Article  Google Scholar 

  • Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York

    Google Scholar 

  • Jaynes ET (1982) On the rational of maximum-entropy methods. Proc IEEE 70:939–952

    Article  Google Scholar 

  • Journel AG (1983) Nonparametric estimation of spatial distributions. Math Geol 45:445–468

    Google Scholar 

  • Lloyd CD, Atkinson PM (2004) Increased accuracy of geostatistical prediction of nitrogen dioxide in the United Kingdom with secondary data. Int J Appl Earth Obs Geoinform 5:293–305

    Article  Google Scholar 

  • Meul M, Van Meirvenne M (2003) Kriging soil texture under different types of nonstationarity. Geoderma 112:217–233

    Article  Google Scholar 

  • Savelieva E, Demyanova V, Kanevskia M, Serre ML, Christakos G (2005) BME-based uncertainty assessment of the Chernobyl fallout. Geoderma 128:312–324

    Article  CAS  Google Scholar 

  • Venables WN, Ripley BD (1994) Modern applied statistics with S-Plus. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Wang F (1990) Fuzzy supervised classification of remote sensing images. IEEE Trans Geosci Remote Sens 28:94–201

    Google Scholar 

Download references

Acknowledgments

This work has been supported for the first author by a Belgian grant of the Fonds pour la formation à la Recherche dans l’Industrie et dans l’Agriculture (FRIA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Bogaert.

Appendices

Appendix 1

As we assumed that each Y(x) is Bernouilli distributed, its discrete pdf is

$$p_{Y_{k} ({\mathbf{x}})} (y) = \left\{ \begin{array}{*{20}l} P(C({\mathbf{x}}) = c_{k}) & \hbox{if}\; y=1\\ P(C({\mathbf{x}})\neq c_{k})& \hbox{if}\; y=0\\ 0 & \hbox{otherwise}\\ \end{array} \right. $$
(58)

and the equivalent continuous pdf is thus given by

$$ f_{Y_{k} ({\mathbf{x}})} (y) = P(C({\mathbf{x}}) = c_{k}) \delta(y-1) + P(C({\mathbf{x}}) \neq c_{k}) \delta(y).$$
(59)

In a similar way, as the joint discrete pdf for \(\left(Y_{k}({\bf x}_{i}), Y_{k \prime}({\bf x}_{j})\right)\) is given by

$$p_{Y_{k} ({\mathbf{x}}_{i}), Y_{k \prime} ({\mathbf{x}}_{j})} (y,y^{\prime}) = \left\{ \begin{array}{*{20}l} P(C({\mathbf{x}}_i)\neq c_{k}\cap C({\mathbf{x}}_j)\neq c_{k{\prime}}) & \hbox{if}\;y=0, \, y^{\prime}=0 \\ P(C({\mathbf{x}}_i)\neq c_{k}\cap C({\mathbf{x}}_j)=c_{k{\prime}}) & \hbox{if} \; y=0, \, y^{\prime}=1 \\ P(C({\mathbf{x}}_i)=c_{k} \cap C({\mathbf{x}}_j)\neq c_{k{\prime}}) & \hbox{if} \; y=1, \, y^{\prime}=0 \\ P(C({\mathbf{x}}_i)=c_{k} \cap C({\mathbf{x}}_j)=c_{k{\prime}}) & \hbox{if} \; y=1, \, y^{\prime}=1 \\ 0 & \hbox{otherwise},\\ \end{array}\right.$$
(60)

the equivalent continuous pdf is

$$\begin{aligned} f_{Y_{k} ({\mathbf{x}}_{i}), Y_{k \prime}({\mathbf{x}}_j)} (y,y^{\prime}) &= P(C({\mathbf{x}}_i)\neq c_{k}\cap C({\mathbf{x}}_j)\neq c_{k{\prime}})\delta(y)\delta(y^{\prime})\\ &\quad + P(C({\mathbf{x}}_i)\neq c_{k} \cap C({\mathbf{x}}_j)=c_{k{\prime}})\delta(y)\delta(y^{\prime} - 1)\\ &\quad + P(C({\mathbf{x}}_i)=c_{k} \cap C({\mathbf{x}}_j)\neq c_{k{\prime}})\delta(y-1)\delta(y^{\prime})\\ &\quad + P(C({\mathbf{x}}_i)=c_{k} \cap C({\mathbf{x}}_j)=c_{k{\prime}})\delta(y-1)\delta(y^{\prime}-1),\\ \end{aligned}$$
(61)

whereas for the joint pdf of \((Y_{k} ({\bf x}_{i}),Z({\bf x}_{j}))\), we will express it as

$$f_{Y_{k} ({\mathbf{x}}_i), Z({\mathbf{x}}_j)} (y,z)= f_{Z({\mathbf{x}}_j) | y_{k}({\mathbf{x}}_i)} (z) f_{Y_{k} ({\mathbf{x}}_{i})}(y).$$
(62)

Similar notations can be, of course, obtained for the joint pdf of any combinations of several continuous and/or binary variables.

Appendix 2

The general expression for a multivariate Gaussian distribution \(N(\varvec{\mu}_{\ell},\Sigma)\) over n+1 variables is given by

$$f({\mathbf{z}} | {\mathbf{y}}_{\ell}) = \frac{\exp \left((1/2) ({\mathbf{z}} - \varvec{\mu}_{\ell})^{\prime} \Sigma^{-1}({\mathbf{z}} - \varvec{\mu}_{\ell}) \right) }{\sqrt{(2 \pi)^{n+1} \det(\Sigma^{-1})}}$$
(63)

Expanding the quadratic term inside this exponential, we can write that

$$f({\mathbf{z}} | {\mathbf{y}}_{\ell}) \propto \exp \left(\frac{1}{2} {\mathbf{z}}^{\prime} \Sigma^{-1}{\mathbf{z}} + \varvec{\mu}_{\ell}^{\prime} \Sigma^{-1}{\mathbf{z}}\right),$$
(64)

where the other factors are constant with respect to z. By identification with Eq. 33, we thus have the results

$$\frac{1}{2} {\mathbf{z}}^{\prime} \Sigma^{-1} {\mathbf{z}} = {\mathbf{z}}^{\prime} {\mathbf{B}} {\mathbf{z}}, \quad \varvec{\mu}_{\ell}^{\prime} \Sigma^{-1} {\mathbf{z}} = - \varvec{\eta}_{\ell}^{\prime} {\mathbf{z}},$$
(65)

so that, as stated,

$$\Sigma = \frac{1}{2} {\mathbf{B}}^{-1}, \quad \varvec{\mu}_{\ell} = -\Sigma \varvec{\eta}_{\ell}.$$
(66)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wibrin, M., Bogaert, P. & Fasbender, D. Combining categorical and continuous spatial information within the Bayesian maximum entropy paradigm. Stoch Environ Res Ris Assess 20, 423–433 (2006). https://doi.org/10.1007/s00477-006-0035-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-006-0035-8

Keywords

Navigation