Abstract
Categorical data play an important role in a wide variety of spatial applications, while modeling and predicting this type of statistical variable has proved to be complex in many cases. Among other possible approaches, the Bayesian maximum entropy methodology has been developed and advocated for this goal and has been successfully applied in various spatial prediction problems. This approach aims at building a multivariate probability table from bivariate probability functions used as constraints that need to be fulfilled, in order to compute a posterior conditional distribution that accounts for hard or soft information sources. In this paper, our goal is to generalize further the theoretical results in order to account for a much wider type of information source, such as probability inequalities. We first show how the maximum entropy principle can be implemented efficiently using a linear iterative approximation based on a minimum norm criterion, where the minimum norm solution is obtained at each step from simple matrix operations that converges to the requested maximum entropy solution. Based on this result, we show then how the maximum entropy problem can be related to the more general minimum divergence problem, which might involve equality and inequality constraints and which can be solved based on iterated minimum norm solutions. This allows us to account for a much larger panel of information types, where more qualitative information, such as probability inequalities can be used. When combined with a Bayesian data fusion approach, this approach deals with the case of potentially conflicting information that is available. Although the theoretical results presented in this paper can be applied to any study (spatial or non-spatial) involving categorical data in general, the results are illustrated in a spatial context where the goal is to predict at best the occurrence of cultivated land in Ethiopia based on crowdsourced information. The results emphasize the benefit of the methodology, which integrates conflicting information and provides a spatially exhaustive map of these occurrence classes over the whole country.
Similar content being viewed by others
References
Abramov R (2007) A practical computational framework for the multidimensional moment-constrained maximum entropy principle. J Comput Phys 211:198–209
Abramov R (2010) The multidimensional maximum entropy moment problem : a review on numerical methods. Commun Math Sci 8(2):377–392
Agresti A (2013) Categorical data analysis, 3rd edn. Wiley, Hoboken
Ali AL, Schmid F, Al-Salman R, Kauppinen T (2014) Ambiguity and plausibility: managing classification quality in volunteered geographic information. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp 143–152
Allard D, D’Or D, Froidevaux R (2011) An efficient maximum entropy approach for categorical variable prediction. Eur J Soil Sci 62(3):381–393
Andersen EB (1980) Discrete statistical models with social science applications. North Holland, Amsterdam
Bandyopadhyay K, Bhattacharya A, Biswas P, Drabold D (2005) Maximum entropy and the problem of moments: a stable algorithm. Phys Rev E 71(5):057701
Bayat B, Nasseri M, Zahraie B (2015) Identification of long-term annual pattern of meteorological drought based on spatiotemporal methods: evaluation of different geostatistical approaches. Nat Hazards 76:515–541
Bierkens MFP, Burrough PA (1993) The indicator approach to categorical soil data, I. Theory. Eur J Soil Sci 44(2):361–368
Bishop YMM, Fienberg SE, Holland PW (2007) Discrete multivariate analysis: theory and practice. Springer, Berlin
BMELib : a MATLAB numerical toolbox of modern spatiotemporal geostatistics implementing the Bayesian maximum entropy theory. http://www.unc.edu/depts/case/BMElab/
Bogaert P (2002) Spatial prediction of categorical variables: the Bayesian maximum entropy approach. Stoch Environ Res Risk Assess 16(6):425–448
Bogaert P, Gengler S (2014) MinNorm approximation of MaxEnt/MinDiv problems for probability tables. In MaxEnt 2014—Bayesian inference and maximum entropy methods in science and engineering, Amboise, France, 21–26 September 2014, pp 287–296
Brus DJ, Bogaert P, Heuvelink GBM (2008) Bayesian maximum entropy prediction of soil categories using a traditional soil map as soft information. Eur J Soil Sci 59(2):166–177
Canosa N, Miller HG, Plastino A, Rossignoli R (1995) Maximum entropy-minimum norm method for the determination of level densities. Physica A 220:611–617
Cao C, Kyriakidis PC, Goodchild MF (2011) A multinomial logistic mixed model for the prediction of categorical spatial data. Int J Geogr Inf Sci 25(12):2017–2086
Cao G, Yoo EH, Wang S (2014) A statistical framework of data fusion for spatial prediction of categorical variables. Stoch Environ Res Risk Assess 28:1785–1799
Cardille JA, Clayton MK (2007) A regression tree-based method for integrating land-cover and land-use data collected at multiple scales. Environ Ecol Stat 14:161–179
Christakos G (2000) Modern spatiotemporal geostatistics. Oxford University Press, Oxford
Christakos G, Bogaert P, Serre M (2002) Temporal geographical information systems: advanced functions for field-based applications. Springer, Berlin
Christensen R (1997) Log-linear models and logistic regression, 2nd edn. Springer, Berlin
Comber A, See L, Fritz S, Van der Velde M, Perger C, Foody G (2013) Using control data to determine the reliability of volunteered geographic information about land cover. Int J Appl Earth Obs Geoinf 23:37–48
Comber A, Mooney P, Purves R, Rocchini D, Walz A (2015) Comparing national differences in what the people perceive to be there: mapping variations in crowd sourced land cover. Int Arch Photogramm Remote Sens Spat Inf Sci: ISPRS 1:71–75
Comber A, Fonte C, Foody G, Fritz S, Harris P, Olteanu-Raimond AM, See L (2016) Geographically weighted evidence combination approaches for combining discordant and inconsistent volunteered geographical information. Geoinformatica 20:503–527
Cressie N (2015) Statistics for spatial data, 2nd edn. Wiley-Interscience, Hoboken
Cressie N, Wikle CK (2011) Statistics for spatial-temporal Data. Wiley, Hoboken
D’Or D, Bogaert P (2004) Spatial prediction of categorical variables with the Bayesian maximum entropy approach: the Ooypolder case study. Eur J Soil Sci 55(4):763–775
Fienberg SE (1970) An iterative procedure for estimation in contingency tables. Ann Math Stat 41(3):907–917
Fienberg SE, Rinaldo A (2012) Maximum likelihood estimation in log-linear models. Ann Stat 40(2):996–1023
Foody GM, See L, Fritz S, Van der Velde M, Perger C, Schill C, Boyd DS, Comber A (2015) Accurate attribute mapping from volunteered geographic information: issues of volunteer quantity and quality. Cartogr J 52:336–344
Fritz S, MacCallum I, Schill C, Perger C, Grillmayer R, Achard F, Kraxner F, Obersteiner M (2009) Geo-Wiki.Org: the use of crowdsourcing to improve global land cover. Remote Sens 1:345–354
Fritz S, See LM, Rembold F (2010) Comparison of global and regional land cover maps with statistical information for the agricultural domain in Africa. Int J Remote Sens 25:1527–1532
Fritz S, You L, Bun A, See L, McCallum I, Schill C, Perger C, Liu J, Hansen M, Obersteiner M (2011) Cropland for sub-Saharan Africa: a synergistic approach using five land cover data sets. Geophys Res Lett 38. doi:10.1029/2010GL046213
Gengler S, Bogaert P (2015) Bayesian data fusion applied to soil drainage classes spatial mapping. Math Geosci 48:79–88
Gengler S, Bogaert P (2016) Integrating crowdsourced data with a land cover product: a Bayesian data fusion approach. Remote Sens 8:545
Goodchild MF, Li L (2012) Assuring the quality of volunteered geographic information. Spat Stat 1:110–120
Goovaerts P (1997) Geostatistics for natural resources evaluation (applied geostatistics). Oxford University Press, Oxford
Huang X, Li J, Liang Y, Wang Z, Guo J, Jiao P (2017) Spatial hidden Markov chain models for estimation of petroleum reservoir categorical variable. J Pet Explor Prod Technol 7(1):11–22
Hunter J, Alabri A, Ingen CV (2013) Assessing the quality and trustworthiness of citizen science data. Concurr Comput Pract Exp 25:454–466
Hurtt GC, Rosentrater L, Frolking S, Moore B (2001) Linking remote-sensing estimates of land cover and census statistics on land use to produce maps of land use of the conterminous United States. Glob Biogeochem Cycles 15:673–685
Jafari A, Khademi H, Finke PA, Van de Wauw J, Ayoubi S (2014) Spatial prediction of soil great groups by boosted regression trees using a limited point dataset in an arid region, southeastern Iran. Geoderma 232–234:148–163
Jaynes ET (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge
Jin C, Zhu J, Steen-Adams MM, Sain SR, Gangnon RE (2013) Spatial multinomial regression models for nominal categorical data: a study of land cover in Northern Wisconsin, USA. Environmetrics 24(2):98–108
Johnson BA, Iizuka K (2016) Integrating OpenStreetMap crowdsourced data and landsat time-series imagery for rapid land use/land cover (LULC) mapping: case study of the laguna Bay area of the Philippines. Appl Geogr 67:140–149
Kapur JN (2009) Maximum entropy models in science and engineering. New Age, New Delhi
Kou X, Jiang L, Bo Y, Yan S, Chai L (2016) Estimation of land surface temperature through blending MODIS and AMSR-E data with Bayesian maximum entropy. Remote Sens 8:105
Messier KP, Campbell T, Bradley PJ, Serre M (2015) Estimation of groundwater Radon in North Carolina using land use regression and Bayesian maximum entropy. Environ Sci Technol 49:9817–9825
Muller C, Chapman L, Johnston S, Kidd C, Illingworth S, Foody G, Overeem A, Leigh R (2015) Crowdsourcing for climate and atmospheric sciences: current status and future potential. Int J Climatol 35:3185–3203
Pérez-Hoyos A, García-Haro F, San-Miguel-Ayanz J (2012) A methodology to generate a synergetic land-cover map by fusion of different land-cover products. Int J Appl Earth Obs Geoinf 19:72–87
Poser K, Dransch D (2010) Volunteered geographic information for disaster management with application to rapid flood damage estimation. Geomatica 64:89–98
See L, McCallum I, Fritz S, Perger C, Kraxner F, Obersteiner M, Baruah UD, Mili N, Kalita NR (2013) Mapping cropland in Ethiopia using crowdsourcing. Int J Geosci 4:6–13
See L, Fritz S, You L, Ramankutty N, Herrero M, Justice C, Becker-Reshef I, Thornton P, Erb K, Gong P, Tang H, van der Velde M, Ericksen P, McCallum I, Kraxner F, Obersteiner M (2015) Improved global cropland data as an essential ingredient for food security. Glob Food Secur 4:37–45
See L, Mooney P, Foody G, Bastin L, Comber A, Estima J, Fritz S, Kerle N, Jiang B, Laakso M, Liu HY, Milčinski G, Nikšic M, Painho M, Pödör A, Olteanu-Raimond AM, Rutzinger M (2016) Crowdsourcing, citizen science or volunteered geographic information? The current state of crowdsourced geographic information. Int J Geo Inf 5:55
Thenkabail PS (ed) (2015) Remotely sensed data characterization, classification, and accuracies (remote sensing handbook). CRC Press, Boca Raton
Wahyudi A, Bartzke M, Kuster E, Bogaert P (2013) Maximum entropy estimation of a Benzene contaminated plume using ecotoxicological assays. Environ Pollut 172:170–179
Waller LA (2005) Spatial models for categorical data. In: John Wiley and sons (ed) Encyclopedia of biostatistics. Wiley, Hoboken
Werner H, Hanke M, Neubauer A (2000) Regularization of inverse problems. Kluwer, Berlin
Whittaker J, McLennan B, Handmer J (2015) A review of informal volunteerism in emergencies and disasters: definition, opportunities and challenges. Int J Disaster Risk Reduct 13:358–368
Wrigley N (2002) Categorical data analysis for geographers and environmental scientists. Blackburn Press, Caldwell
Wu X (2003) Calculation of maximum entropy densities with application to income distribution. J Econom 115(2):347–354
Xu Y, Serre M, Reyes J, Vizuete W (2016) Bayesian maximum entropy integration of ozone observation and model prediction: a national application. Environ Sci Technol 50:4393–4400
Zook M, Graham M, Shelton T, Gorman S (2010) Volunteered geographic information and crowdsourcing disaster relief: a case study of the Haitian Earthquake. World Med Health Policy 2:6–32
Acknowledgements
We are indebted to two anonymous reviewers for their detailed and numerous comments that greatly helped improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A1: Proof for the concavity of the entropy
Let us consider the entropy \(H({\mathbf {p}})\) defined as in Eq. (2). It is easy to prove that \(H({\mathbf {p}})\) is convex everywhere with respect to \({\mathbf {p}}\), and general proofs can be found in the literature (see e.g. Jaynes 2003; Kapur 2009). We will reproduce the proof here for the sake of completeness in order to discuss the results for the corresponding Hessian matrix in our specific context.
Let us rewrite \(H({\mathbf {p}})\) as a function of the vector of \(n-1\) first probabilities \({\mathbf {p}}_0=(p_1,\ldots ,p_{n-1})'\), where the last one is recovered from the condition \(\mathbf {1'p}_0+p_n=1\), so that
Taking the derivatives with respect to \({\mathbf {p}}_0\) yields
with derivatives equal to \(\mathbf {0}\) when \({\mathbf {p}}_0={\mathbf {1}}(1-\mathbf {1'p}_0)\); i.e., when
Using the fact that \((\mathbf {I+11'})^{-1}={\mathbf {I}}-(1/n)\mathbf {11'}\), it thus comes that
where \({\mathbf {1}}\) is the unit vector of lengths \((n-1)\) so that \(\mathbf {1'1}=n-1\). Taking additionally derivatives with respect to \({\mathbf {p}}'_0\) yields
where \(1-\mathbf {1'p}_0\ge 0\) and where \(\mathbf {11'}\) and \(diag({\mathbf {p}}_0)\) are positive semidefinite and positive definite matrices, respectively, thus leading to \(\det (H({\mathbf {p}}))<0\) and so the entropy is concave with respect to the probabilities as defined over the unit simplex. In particular, at the absolute minimum where \({\mathbf {p}}_0=(1/n){\mathbf {1}}\), the Hessian matrix is equal to
Appendix A2: Proof for the convexity of the squared norm
It can be proven that the squared norm \(||{\mathbf {p}}||^2={\mathbf {p'p}}\) subject to the constraint \(\mathbf {1'p}=1\) is convex everywhere, and so \(||{\mathbf {p}}||^2\) is also convex over the space restricted by the additional set of linear equations \(\mathbf {Ap=b}\). Thus, there is always a solution, and the solution is unique. To prove this, let us rewrite \(||{\mathbf {p}}||^2\) as a function of the vector of \(n-1\) first probabilities \({\mathbf {p}}_0=(p_1,\ldots ,p_{n-1})'\), where the last one is recovered from the condition \(\mathbf {1'p}_0+p_n=1\), so that
Taking the derivatives with respect to \({\mathbf {p}}_0\) yields
with derivatives equal to \(\mathbf {0}\) when
Taking additionally derivatives with respect to \({\mathbf {p}}'_0\) yields
showing that the Hessian matrix does not depend on \({\mathbf {p}}_0\) and is equal up to a multiplicative constant to the MaxEnt Hessian matrix when \({\mathbf {p}}_0=(1/n){\mathbf {1}}\), as seen from Eq. (11).
Appendix A3: Polytopes and simplices
Let us define a convex polytope \(\overline{P}({\mathbf {V}})\) in \({\mathbb {R}}^n\) as
(with \(m<\infty \)) where \({\mathbf {V}}\) is the matrix specifying the m vertices of the polytope, and the coordinates in \({\mathbb {R}}^n\) of the ith vertex are given by the ith column of \({\mathbf {V}}\). \({\mathbf {V}}\varvec{\lambda }\) is a convex linear combination of these vertices, and thus, \(\overline{P}({\mathbf {V}})\) is the convex hull (i.e., a convex faceted solid – or convex bounded polyhedron – defined over \({\mathbb {R}}^n\)) generated by these linear combinations over \({\mathbb {R}}^n\). Alternatively, let us define
as the (possibly unbounded) convex polyhedron generated by the intersection of the set of half-spaces \({\mathbf {A}}{\mathbf {x}}\le {\mathbf {b}}\). If this polyhedron is bounded, then \(P({\mathbf {A}},{\mathbf {b}})\) is called the half-spaces (or H-) representation of the corresponding polytope \(\overline{P}({\mathbf {V}})\). Identifying the vertices \({\mathbf {V}}\) generated by this intersection of half-spaces is part of the so-called enumeration problem, for which an efficient algorithm is available. From above, it is clear that any polytope can be univoquely defined either from its half-spaces definition or from its vertices. From topological properties, the intersection of two polytopes \(\overline{P}({\mathbf {V}}_1)\) and \(\overline{P}({\mathbf {V}}_2)\) is a new polytope, where each polytope can possibly be specified by its H-representation if needed.
A specific polytope of interest here is the simplex \(S({\mathbf {W}})\equiv \overline{P}({\mathbf {W}})\) when \(m=n\), so that it corresponds to a polytope that has n vertices in \({\mathbb {R}}^n\); i.e., these vertices lie on the same hyperplane in \({\mathbb {R}}^n\) so the polytope is, in fact, \((n-1)\) dimensional. In particular, the unit simplex is defined as \(S({\mathbf {I}})\), where \({\mathbf {I}}\) is the orthonormal basis in \({\mathbb {R}}^n\). By the light of the general results given above, it thus comes that the intersection between the unit simplex \(S({\mathbf {I}})\) (i.e., a polytope) with another polytope \(P({\mathbf {A}},{\mathbf {b}})\) yields another simplex \(S({\mathbf {W}})\) that is a subset of \(S({\mathbf {I}})\). From the topological properties again, it comes that
-
1.
any simplex \(S({\mathbf {W}})\) can be represented as a simplicial complex, i.e. as the union of a finite set of simplices lying on the same hyperplane in \({\mathbb {R}}^n\), with
$$ S({\mathbf {W}})=\bigcup _i S({\mathbf {W}}_i) \quad {\text{where }}\dim S({\mathbf {W}}_i)\cap S({\mathbf {W}}_j)<n \quad \forall i\ne j $$the condition on the intersection meaning that two distinct simplices \(S({\mathbf {W}}_i)\) and \(S({\mathbf {W}}_j)\) can, at most, share a common face (noting that the empty set is a face of every simplex, so that for simplices built on a disjoint set of vertices, their intersection, which is the empty set, obeys the previous definition);
-
2.
for any arbitrary simplex \(S({\mathbf {W}}_i)\), there is always an affine transformation that allows us to map the vertices of the unit simplex \(S({\mathbf {I}})\) of the same dimension n to the vertices of \(S({\mathbf {W}}_i)\). That is, obviously there exists an infinite set of possible linear transformations such that
$$ {\mathbf {W}}_i=\mathbf {CI}+{\mathbf {D}}. $$In particular, using, e.g., arbitrarily the first column \(\mathbf {w}_{i1}\) of \({\mathbf {W}}_i\),
$$ \mathbf {C=W}_i-{\mathbf {D}}\quad \mathbf {D=w}_{i1}{\mathbf {1'}}, $$where \({\mathbf {W}}_i-{\mathbf {D}}\) is a translation of the simplex so that the first vertex is now at the origin of the orthonormal basis.
Considering the particular case of \(P({\mathbf {A}},{\mathbf {b}})\) and \(S({\mathbf {I}})\), their intersection \(\varOmega \) is thus a simplicial complex, with
where (i) all vertices of all simplices \(S({\mathbf {W}}_i)\)s lie on the same hyperplane, which is the hyperplane where the vertices of \(S({\mathbf {I}})\) lie, and (ii) each \(S({\mathbf {W}}_i)\) is an affine transform from \(S({\mathbf {I}})\) itself. Because all vertices lie on the same hyperplane, their projection \({\mathbf {W}}_{i,p}\) on the same \(n-1\) dimensional subspace as obtained by dropping one line in \({\mathbf {W}}_i\) (the same line for all \({\mathbf {W}}_i\)’s, of course) yields a set of n points in as \((n-1)\) dimensional space, with the corresponding enclosed volumes \(v_i\) given by
These volumes are in the same ratios as the corresponding surfaces of the simplices over the hyperplane. In other words,
is the percentage of the surface of the simplicial complex over the hyperplane which is covered by the ith simplex.
Rights and permissions
About this article
Cite this article
Bogaert, P., Gengler, S. Bayesian maximum entropy and data fusion for processing qualitative data: theory and application for crowdsourced cropland occurrences in Ethiopia. Stoch Environ Res Risk Assess 32, 815–831 (2018). https://doi.org/10.1007/s00477-017-1426-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-017-1426-8