Skip to main content
Log in

Sampling Strategies for Uncertainty Reduction in Categorical Random Fields: Formulation, Mathematical Analysis and Application to Multiple-Point Simulations

  • Published:
Mathematical Geosciences Aims and scope Submit manuscript

Abstract

The task of optimal sampling for the statistical simulation of a discrete random field is addressed from the perspective of minimizing the posterior uncertainty of non-sensed positions given the information of the sensed positions. In particular, information theoretic measures are adopted to formalize the problem of optimal sampling design for field characterization, where concepts such as information of the measurements, average posterior uncertainty, and the resolvability of the field are introduced. The use of the entropy and related information measures are justified by connecting the task of simulation with a source coding problem, where it is well known that entropy offers a fundamental performance limit. On the application, a one-dimensional Markov chain model is explored where the statistics of the random object are known, and then the more relevant case of multiple-point simulations of channelized facies fields is studied, adopting in this case a training image to infer the statistics of a non-parametric model. In both contexts, the superiority of information-driven sampling strategies is proved in different settings and conditions, with respect to random or regular sampling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  • Abellan A, Noetinger B (2010) Optimizing subsurface field data acquisition using information theory. Math Geosci 42(6):603–630. https://doi.org/10.1007/s11004-010-9285-6

    Article  Google Scholar 

  • Afshari S, Pishvaie M, Aminshahidy B (2014) Well placement optimization using a particle swarm optimization algorithm, a novel approach. Pet Sci Technol 32(2):170–179

    Article  Google Scholar 

  • Arpat B, Caers J (2007) Conditional simulations with patterns. Math Geol 39(2):177–203

    Article  Google Scholar 

  • Aspie D, Barnes RJ (1990) Infill-sampling design and the cost of classification errors. Math Geol 22(8):915–932

    Article  Google Scholar 

  • Bangerth W, Klie H, Matossian V, Parashar M, Wheeler M (2005) An autonomic reservoir framework for the stochastic optimization of well placement. Clust Comput 8:255–269

    Article  Google Scholar 

  • Bangerth W, Klie H, Wheeler MF, Stoffa P, Sen M (2006) On optimization algorithms for the reservoir oil well placement problem. Comput Geosci 10:303–319

    Article  Google Scholar 

  • Baraniuk RG, Davenport M, DeVore R, Wakin M (2008) A simple proof of the restricted isometry property for random matrices. Constr Approx 28(3):253–263

    Article  Google Scholar 

  • Bittencourt AC, Horne RN (1997) Reservoir development and design optimization. In: SPE annual technical conference and exhibition, society of petroleum engineers, San Antonio, Texas, SPE, vol 38895, pp 1–14

  • Boyko N, Karamemis G, Kuzmenko V, Uryasev S (2014) Sparse signal reconstruction: LASSO and cardinality approaches. Springer, Cham, pp 77–90

    Google Scholar 

  • Brus DJ, Heuvelink GBM (2007) Optimization of sample patterns for universal kriging of environmental variables. Geoderma 138:86–95

    Article  Google Scholar 

  • Bui H, La C, Do M (2015) A fast tree-based algorithm for compressed sensing with sparse-tree prior. Signal Process 108(Complete):628–641. https://doi.org/10.1016/j.sigpro.2014.10.026

    Article  Google Scholar 

  • Candes EJ (2008) The restricted isometry property and its applications for compressed sensing. C R Acad Sci Paris I 346:589–592

    Article  Google Scholar 

  • Candes EJ, Romberg J, Tao T (2006a) Robust uncertanty principle: exact signal reconstruction from highly imcomplete frequency information. IEEE Trans Inf Theory 52(2):489–509

    Article  Google Scholar 

  • Candes EJ, Romberg J, Tao T (2006b) Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math 59:1207–1223

    Article  Google Scholar 

  • Christakos G, Killam BR (1993) Sampling design for classifying contaminant level using annealing search algorithms. Water Resour Res 29(12):4063–4076

    Article  Google Scholar 

  • Christodoulou S, Gagatsis A, Xanthos S, Kranioti S, Agathokleous A, Fragiadakis M (2013) Entropy-based sensor placement optimization for waterloss detection in water distribution networks. Water Resour Manag Int J Pub Eur Water Resour Assoc (EWRA) 27(13):4443–4468. https://EconPapers.repec.org/RePEc:spr:waterr:v:27:y:2013:i:13:p:4443-4468

  • Cohen A, Dahmen W, DeVore R (2009) Compressed sensing and best \(k\)-term approximation. J Am Math Soc 22(1):211–231

    Article  Google Scholar 

  • Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley Interscience, New York

    Google Scholar 

  • Cressie N, Gotway C, Grondona M (1990) Spatial prediction for networks. Tech Rep 7:251–271, Chermometr Intell Lab. Syst

  • Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52:1289–1306

    Article  Google Scholar 

  • Eldar YC (2015) Sampling theory: beyond bandlimited systems, 1st edn. Cambridge University Press, New York

    Google Scholar 

  • Elfeki A, Dekking M (2001) A markov chain model for subsurface characterization: theory and applications. Math Geol 33(5):569–589. https://doi.org/10.1023/A:1011044812133

    Article  Google Scholar 

  • Founcart S, Lai M (2009) Sparsest solutions of underdetermined linear systems via \(\ell _p\)-minimization. Appl Comput Harmon Anal 26:395–407

    Article  Google Scholar 

  • Gao H, Wang J, Zhao P (1996) The updated kriging variance and optimal sample design. Math Geol 28(3):295–313

    Article  Google Scholar 

  • Goodchild M, Buttenfield B, Wood J (1994) Introduction to visualizing data validity. In: Hearnshaw HM, Unwin DJ (eds) Visualization in geographic information systems. Wiley, Chichester, pp 141–149

    Google Scholar 

  • Goovaerts P (2001) Geostatistical modelling of uncertainty in soil science. Geoderma 103:3–26

    Article  Google Scholar 

  • Gray R, Davisson LD (2004) Introduction to statistical signal processing. Cambridge University Press, Cambridge

  • Guardiano F, Srivastava M (1993) Multivariate geostatistics: beyond bivariate methods. Geostatistics-Troia. Kluwer Academic, Amsterdam, pp 133–144

    Chapter  Google Scholar 

  • Guestrin C, Krause A, Singh A (2005) Near-optimal sensor placements in Gaussian processes. In: International conference on machine learning (ICML)

  • Gutjahr A (1991) Geostatistics for sampling designs and analysis. In: Nash R (ed) Groundwater residue sampling design. American Chemical Society, ACS symposium series, Washington, DC, pp 48–90

  • Huang T, Lu DT, Li X, Wang L (2013) Gpu-based snesim implementation for multiple-point statistical simulation. Comput Geosci 54:75–87. https://doi.org/10.1016/j.cageo.2012.11.022

    Article  Google Scholar 

  • Kennedy BA (1990) Surface mining, 2nd edn. Society of mining. Metallurgy and Exploration Inc, Englewood

    Google Scholar 

  • Krause A, Guestrin C, Gupta A, Kleinberg J (2006) Near-optimal sensor placements: maximizing information while minimizing communication cost. In: Proc. of information processing in sensor networks (IPSN)

  • Krause A, Leskovec J, Guestrin C, VanBriesen J, Faloutsos C (2008a) Efficient sensor placement optimization for securing large water distribution networks. J Water Resour Plan Manag 134(6):516–526

    Article  Google Scholar 

  • Krause A, Singh A, Guestrin C (2008b) Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res 9:235–284

    Google Scholar 

  • Krause A, Guestrin C, Gupta A, Kleinberg J (2011) Robust sensor placements at informative and communication-efficient locations. ACM Trans Sens Netw. https://doi.org/10.1145/1921621.1921625

  • MacKay DJC (2002) Information theory, inference & learning algorithms. Cambridge University Press, New York

    Google Scholar 

  • Magnant Z (2011) Numerical methods for optimal experimental design of ill-posed problems. PhD thesis, Emory University, https://search.proquest.com/docview/881634811?accountid=14621

  • Marchant B, Lark R (2007) Optimized sample scheme for geostatistics surveys. Math Geol 39:113–134

    Article  Google Scholar 

  • Mariethoz G, Caers J (2015) Multiple-points geostatistics. Wiley Blackwell, Hoboken

    Google Scholar 

  • McBratney A, Webster R, Burgess T (1981a) The design of optimal sampling schemes for local estimation and mapping of regionalized variables—I: theory and method. Comput Geosci 7(4):331–334

    Article  Google Scholar 

  • McBratney A, Webster R, Burgess T (1981b) The design of optimal sampling schemes for local estimation and mapping of regionalized variables—II: program and examples. Comput Geosci 7(4):335–365

    Article  Google Scholar 

  • Norrena KP, Deutsch CV (2002) Automatic determination of well placement subject to geostatistical and economic constraints. In: SPE international thermal operations and heavy oil symposium and international horizontal well technology conference, society of petroleum engineers, Calgary, AB, Canada, SPE , vol 78996, pp 1–12

  • Norris J (1997) Markov chains. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Olea RA (1984) Sampling design optimization for spatial functions. Math Geol 16(4):369–392

    Article  Google Scholar 

  • Ortiz JM, Deutsch CV (2004) Indicator simulation accounting for multiple-point statistics. Math Geol 36(5):545–565

    Article  Google Scholar 

  • Ostroumov V, Rachold V, Vasiliev A, Sorokovikov V (2005) An application of a markov-chain model of shore erosion for describing the dynamics of sediment flux. Geo-Mar Lett 25(2):196–203. https://doi.org/10.1007/s00367-004-0201-2

    Article  Google Scholar 

  • Peschel GJ, Mokosch M (1991) Interrelations between geostatistics and information theory and their practical use. Math Geol 23(1):3–7. https://doi.org/10.1007/BF02065960

    Article  Google Scholar 

  • Remy N, Boucher A, Wu J (2009) Applied geostatistics with SGeMS : a user’s guide. Cambridge University Press, formerly CIP, Cambridge

    Book  Google Scholar 

  • Rossi ME, Deutsch CV (2014) Mineral resource estimation. Springer, Berlin

    Book  Google Scholar 

  • Scheidt C, Caers J (2009) Representing spatial uncertainty using distances and kernels. Math Geosci 41(4):397–419. https://doi.org/10.1007/s11004-008-9186-0

    Article  Google Scholar 

  • Schweizer D, Blum P, Butscher C (2017) Uncertainty assessment in 3-d geological models of increasing complexity. Solid Earth 8(2):515–530. https://doi.org/10.5194/se-8-515-2017, https://www.solid-earth.net/8/515/2017/

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(379–423):623–656

    Article  Google Scholar 

  • Strebelle S (2002) Conditional simulation of complex geological structures using multiple points statistics. Math Geol 34(1):1–22

    Article  Google Scholar 

  • Strebelle S, Zhang T (2004) Non-stationary multiple-point geostatistical models. In: Leuangthong O, Deutsch CV (eds) Geostatistics Banff. Springer, Berlin, pp 235–244

    Google Scholar 

  • van Groenigen J, Siderius W, Stein A (1999) Constrained optimisation of soil sampling for minimisation of the kriging variance. Geoderma 87:239–259

    Article  Google Scholar 

  • Vašat R, Heuvelink G, Borůvka L (2010) Sampling design optimization for multivariate soil mapping. Geoderma 155(3—-4):147–153

  • Vershynin R (2012) Introduction to the non-asymtotic analysis of random matrices (chap 5). In: Eldar Y, Kutyniok G (eds) Compressed sensing, theory and applications, 1st edn. Cambridge University Press, Cambridge, pp 210–268

    Chapter  Google Scholar 

  • Wellmann JF (2013) Information theory for correlation analysis and estimation of uncertanties reduction in maps and model. Entropy 15:1464–1485

    Article  Google Scholar 

  • Wellmann JF, Regenauer-Lieb K (2012) Uncertainties have a meaning: Information entropy as a quality measure for 3-d geological models. Tectonophysics 526(Supplement C):207–216. https://doi.org/10.1016/j.tecto.2011.05.001. http://www.sciencedirect.com/science/article/pii/S0040195111001788, modelling in Geosciences

  • Wellmann JF, Horowitz FG, Schill E, Regenauer-Lieb K (2010) Towards incorporating uncertainty of structural data in 3d geological inversion. Tectonophysics 490(3):141–151. https://doi.org/10.1016/j.tecto.2010.04.022. http://www.sciencedirect.com/science/article/pii/S0040195110001691

  • Wellmer FW (1998) Statistical evaluations in exploration for mineral deposits. Springer, Berlin

    Book  Google Scholar 

  • Wu J, Boucher A, Zhang T (2008) A SGeMS code for pattern simulation of continuous and categorical variables: FILTERSIM. Comput Geosci 34(12):1863–1876

    Article  Google Scholar 

  • Xu C, Hu C, Liu X, Wang S (2017) Information entropy in predicting location of observation points for long tunnel. Entropy 19(7). https://doi.org/10.3390/e19070332. http://www.mdpi.com/1099-4300/19/7/332

  • Yeung RW (2002) A first course in information theory. Springer, Berlin

    Book  Google Scholar 

  • Zhang C, Li W (2008) A comparative study of nonlinear markov chain models for conditional simulation of multinomial classes from regular samples. Stoch Environ Res Risk Assess 22(2):217–230. https://doi.org/10.1007/s00477-007-0109-2

    Article  Google Scholar 

  • Zidek J, Sun W, Le D (2000) Designing and integrating composite networks for monitoring multivarite gaussian pollution fields. Appl Stat 49:63–79

    Google Scholar 

Download references

Acknowledgements

This material is based on work supported by Grants of Conicyt-Chile (PhD Schollarship 2013), Fondecyt Grants 1170854, 1151029 and 1181823, the Biomedical Neuroscience Institute (ICM, P09-015-F), and the Advanced Center for Electrical and Electronic Engineering (AC3E), Basal Project FB0008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Santibañez.

Appendices

Appendix A: Entropy as an Indicator of Simulation Complexity

Here, a formal interpretation of the Shannon entropy as an indicator of complexity is elaborated when the task is to simulate or create a collection of independent and identically distributed (i.i.d.) realizations of a finite alphabet random variable. For that, a connection between simulation and the task of almost lossless source coding (Cover and Thomas 2006) will be presented, from which the entropy is a fundamental performance indicator (Shannon 1948; Cover and Thomas 2006).

1.1 A.1 The Operational Complexity Indicator for Simulation

Considering a finite alphabet random variable X taking values in \({\mathcal {A}}\) with probability \(\mu _X\in {\mathcal {P}}({\mathcal {A}})\), the simulation problem of length \(n\ge 1\) corresponds to the generation of n-samples in \({\mathcal {A}}^n\) from the product (or n-fold) distribution \(\mu _X^n \equiv \mu _X\times \mu _X\times \cdots \mu _X\in {\mathcal {P}}({\mathcal {A}}^n)\).

A covering argument is considered to stipulate the complexity of simulating i.i.d. realizations of \(\mu _X^n\) in the product space \({\mathcal {A}}^n\). More precisely, the cardinality of the smallest subset of sequences in \({\mathcal {A}}^n\) that captures almost all the probability with respect to \(\mu _X^n\), is proposed as a natural indicator of the complexity of simulating \(\mu _X^n\) in \({\mathcal {A}}^n\). This notion can be captured with the following set of definitions:

Definition 1

For \(\epsilon >0\), \(B\subset {\mathcal {A}}^n\) is \(\epsilon \)-typical for \(\mu ^n_X\), if \(\mu ^n_X(B) \ge 1 - \epsilon \).

Definition 2

For \(\epsilon >0\), the size k is said to be \(\epsilon \)-feasible for \(\mu ^n_X\), if there is \(B\subset {\mathcal {A}}^n\) that is \(\epsilon \)-typical for \(\mu ^n_X\) and \(\left| B\right| \le k\).

Definition 3

Finally, considering n i.i.d. samples, an indicator of the complexity of \(\mu _x\) is given by

$$\begin{aligned} k(\epsilon , \mu ^n_x) \equiv \min \left\{ k: k\,\hbox {is}\,\epsilon \hbox {-}\hbox {feasible for}\,\mu ^n_X \right\} . \end{aligned}$$
(51)

Adopting these concepts, in particular \(k(\epsilon , \mu ^n_x)\), a scenario where a higher number of sequences is needed to capture almost all the probability is more complex (from the point of view of creating i.i.d. simulations), than a case where fewer sequences are needed for the same covering objective. This idea matches the notion of typical set proposed by Shannon to prove source coding theorems (Shannon 1948). In this context, an interesting aspect to pay attention is the exponential growth of \(k(\epsilon , \mu ^n_x)\) in the process of making n arbitrarily large. It is simple to note that \(k(\epsilon , \mu _x)\) grows with n and it is upper bounded by \(\left| {\mathcal {A}}\right| ^n\), therefore \(\log _2 k(\epsilon , \mu ^n_x) \le n \log _2 \left| {\mathcal {A}}\right| \) and, consequently, taking the limits on the number of simulations the basic upper bound is

$$\begin{aligned} \lim \sup _{n \longrightarrow \infty } \frac{1}{n} \log _2 k(\epsilon , \mu ^n_x) \le \log _2 \left| {\mathcal {A}}\right| . \end{aligned}$$
(52)

This means that \(k(\epsilon , \mu ^n_x)\) grows (with n) at most exponentially with an exponential rate given by \(log_2 \left| {\mathcal {A}}\right| \). The next result in (53) stipulates that the precise exponential rate of \(k(\epsilon , \mu ^n_x)\) (as n goes to infinity) approaches \({\mathcal {H}}(\mu _x)\), when making \(\epsilon \) arbitrary small, in the sense that

$$\begin{aligned} \lim _{\epsilon \longrightarrow 0}\lim _{n \longrightarrow \infty } \frac{1}{n} \log _2 k(\epsilon , \mu ^n_x) = {\mathcal {H}}(\mu _X). \end{aligned}$$
(53)

1.2 A.2 Derivation of the Representation Result in (53)

To show the identity in (53), it is useful to introduce a stronger notion of achievable rate for the entire i.i.d. process \(\left\{ \mu _X^n: n\ge 1\right\} \) that is commonly used in the context of almost lossless fixed-rate source coding (Cover and Thomas 2006) and (Yeung 2002).

Definition 4

For the i.i.d. simulation of \(\mu _X\), the rate \(r>0\) is said to be achievable for \(\left\{ \mu _X^n: n\ge 1\right\} \), if there is a sequence of sets \(\left\{ B_n\subset {\mathcal {A}}^n: n\ge 1 \right\} \) that captures all the probability in the stronger sense that

$$\begin{aligned} \lim _{n \rightarrow \infty } \mu _X^n(B_n)=1, \end{aligned}$$

and

$$\begin{aligned} \lim \sup _{n \rightarrow \infty } \frac{1}{n}\log _2 \left| B_n \right| \le r. \end{aligned}$$

Definition 5

In this context, the minimum achievable rate for \(\left\{ \mu _X^n: n\ge 1\right\} \) is

$$\begin{aligned} R^*(\mu _X) \equiv \min \left\{ r: \hbox {} r>0\,\hbox {is achievable for} \left\{ \mu _X^n: n\ge 1\right\} \right\} . \end{aligned}$$

From the Definitions, it is direct to show that \( \lim \sup _{n \longrightarrow \infty } \frac{1}{n} \log _2 k(\epsilon , \mu ^n_x) \le R^*(\mu _X)\) for all \(\epsilon >0\) and consequently,

$$\begin{aligned} \lim \sup _{\epsilon \longrightarrow 0}\lim \sup _{n \longrightarrow \infty } \frac{1}{n} \log _2 k(\epsilon , \mu ^n_x) \le R^*(\mu _X). \end{aligned}$$
(54)

Remarkably for our i.i.d. context, it is well-known that the limit in the left-hand side of (54) is well-defined and matches \(R^*(\mu _X)\). Furthermore, this expression reduces to the entropy of \(\mu _x\). This result was proved in the original paper of Shannon (Shannon 1948) and it is highlighted in the following result:

Theorem 1

(Shannon 1948) \(\lim _{\epsilon \longrightarrow 0}\lim _{n \longrightarrow \infty } \frac{1}{n} \log _2 k(\epsilon , \mu ^n_x) =R^*(\mu )= {\mathcal {H}}(\mu _X).\)

Therefore, the main conclusion of this analysis is that the Shannon entropy determines the complexity of simulating a finite alphabet probability in the precise operational sense defined in Definitions 3 and 5.

The proof of Theorem 1 is a direct consequence of the celebrated (weak) asymptotic equipartition property (AEP), first stated by Shannon and proved for the i.i.d. case in Shannon (1948). A systematic and clear exposition of this property and the proof of Theorem 1 can be found in Cover and Thomas (2006) and Yeung (2002) in the context of what is known in information theory as the Shannon source coding theorem. To conclude this part, it is important to elaborate some observations and implications about Theorem 1:

  1. 1.

    This result shows that there is a collection of sequences \(\left\{ B_n\subset {\mathcal {A}}^n: n\ge 1\right\} \) that captures asymptotically all the probability, that is, the collection is typical in the sense that \(\lim _{n \rightarrow \infty } \mu _X^n(B_n)=1\), and its cardinality grows exponentially at a rate that is precisely the entropy of \(\mu _X\), that is, as n goes to infinity \(\left| B_n \right| \approx 2^{n\cdot {\mathcal {H}}(\mu _x)}\).

  2. 2.

    If \({\mathcal {H}}(\mu _X) < \log _2 \left| {\mathcal {A}}\right| \), then the collection of typical sequences, mentioned in the previous point, is an arbitrary small fraction of all the possible sequences, in the sense that

    $$\begin{aligned} \frac{\left| B_n \right| }{\left| {\mathcal {A}}^n \right| } \approx \frac{2^{n\cdot {\mathcal {H}}(\mu _x)}}{2^{n\cdot log_2 \left| {\mathcal {A}}\right| }}= 2^{-n\cdot (\log _2 \left| {\mathcal {A}}\right| -{\mathcal {H}}(\mu _x) )} \longrightarrow 0. \end{aligned}$$
    (55)

    Note that the cardinality ratio in (55) goes to zero exponentially with n at a rate given by \(\log _2 \left| {\mathcal {A}}\right| -{\mathcal {H}}(\mu _x)>0\). Then for a i.i.d process \(\left\{ \mu _X^n: n\ge 1\right\} \) with entropy strictly smaller than \(\log _2 \left| {\mathcal {A}}\right| \) there is a tiny fraction of sequences that characterize the i.i.d. process induced by \(\mu _X\). Here, \(\log _2 \left| {\mathcal {A}}\right| \) is the maximum entropy only achieved by the uniform distribution on \({\mathcal {A}}\) (Cover and Thomas 2006).

  3. 3.

    For the achievability part of this result, Shannon proposes a specific collection of typical sequences given by

    $$\begin{aligned} B_n(\epsilon )=\left\{ (x_1,..,x_n)\in {\mathcal {A}}^n: \left| -\frac{1}{n} \log _2 \mu ^n_X(x_1,..,x_n) - {\mathcal {H}}(\mu _X) \right| \le \epsilon \right\} , \end{aligned}$$
    (56)

    that as n goes to infinity has the following properties (Cover and Thomas 2006):

    1. (a)

      it is a typical set: \(\lim _{n \rightarrow \infty } \mu _X^n(B_n(\epsilon ))=1\).

    2. (b)

      if \((x_1,..,x_n) \in B_n(\epsilon )\) then

      $$\begin{aligned} 2^{-n\cdot ({\mathcal {H}}(\mu _X)+\epsilon )} \le \mu ^n_X(x_1,..,x_n) \le 2^{-n\cdot ({\mathcal {H}}(\mu _X)-\epsilon )}. \end{aligned}$$
    3. (c)

      \((1-\epsilon ) 2^{n\cdot ({\mathcal {H}}(\mu _X) -\epsilon )} \le \left| B_n(\epsilon ) \right| \le 2^{n\cdot ({\mathcal {H}}(\mu _X) +\epsilon )}\).

    Here, achievability refers to the construction of a sequence that is typical and achieves a rate smaller or equal to the entropy (Cover and Thomas 2006). Thus, the relevant aspect of this construction is the fact that as n progresses to infinity the elements of this typical set become uniformly distributed. Considering \(\epsilon \) sufficiently small, for all \((x_1,..,x_n)\in B_n(\epsilon )\), then

    $$\begin{aligned} \mu ^n_X(x_1,..,x_n)\approx 2^{-n\cdot {\mathcal {H}}(\mu _X)} \approx \frac{1}{ \left| B_n(\epsilon ) \right| }. \end{aligned}$$

    Then, within this set \(B_n(\epsilon )\), which is typical, all its elements have the same probability. This means that when making i.i.d. samples of the model \(\mu ^n_X\) and n is sufficiently large, a single sample of this typical set (that happens with very high probability), has the same probability than any other element of the set. Consequently, from this fact, it is clear that the size of this typical set plays a major role from the point of view of specifying the complexity of the simulation task, which is formalized on the concept of minimum achievable rate in Definition 5, and that is precisely given by \({\mathcal {H}}(\mu _X)\).

Appendix B: Proof of Proposition 1

Proof

After obtaining the sequential rule \({\tilde{f}}^*_k \in {\mathbf {F}}_k\) to solve the adaptive sensing rule of size k, the remaining posterior uncertainty is given by

$$\begin{aligned} H({\hat{X}}_{f^*_k}| {X}_{f^*_k}), \end{aligned}$$
(57)

and the reduction of the uncertainty concerning the information of \({\tilde{f}}^*_k\) to resolve \({\bar{X}}\) is given by

$$\begin{aligned} I({\tilde{f}}^*_k) = H({\overline{X}}) - H({\hat{X}}_{{\tilde{f}}^*_k}| {X}_{{\tilde{f}}^*_k}) = H({X}_{{\tilde{f}}^*_k}) \ge 0. \end{aligned}$$
(58)

As the sets \({\hat{X}}_{{\tilde{f}}^*_k} \) and \( {X}_{{\tilde{f}}^*_k} \) conform a partition of \({\overline{X}}\),

$$\begin{aligned} H({\overline{X}}) = H({\hat{X}}_{{\tilde{f}}^*_k} , {X}_{{\tilde{f}}^*_k}). \end{aligned}$$
(59)

In addition, by the chain rule of entropy

$$\begin{aligned} H({\overline{X}}) = H( {X}_{{\tilde{f}}^*_k}) + H({\hat{X}}_{{\tilde{f}}^*_k} , {X}_{{\tilde{f}}^*_k}). \end{aligned}$$
(60)

Applying (59) and (60), the information gain of the rule \({\tilde{f}}^*_k\) can be rewritten by

$$\begin{aligned} I({\tilde{f}}^*_k)&= H( {X}_{{\tilde{f}}^*_k}) + H({\hat{X}}_{{\tilde{f}}^*_k} , {X}_{{\tilde{f}}^*_k}) - H({\hat{X}}_{{\tilde{f}}^*_k}| {X}_{{\tilde{f}}^*_k}) \nonumber \\&= H( {X}_{{\tilde{f}}^*_k}) \nonumber \\&= H( {X}_{{\tilde{f}}^*_k(1)}, {X}_{{\tilde{f}}^*_k(2)}, .. , {X}_{{\tilde{f}}^*_k(k)}) \nonumber \\&= H( {X}_{(i_{1}^{*},j_{1}^{*})} , {X}_{(i_{2}^{*},j_{2}^{*})} , .. , {X}_{(i_{k}^{*},j_{k}^{*})}). \end{aligned}$$
(61)

The expression (61) can be rearranged by using the chain rule as

$$\begin{aligned} I({\tilde{f}}^*_k)&= H( {X}_{(i_{1}^{*},j_{1}^{*})} ) + \sum _{h = 2}^{k} H( {X}_{(i_{h}^{*},j_{h}^{*})} | {X}_{(i_{1}^{*},j_{1}^{*})} , .. , {X}_{(i_{h -1}^{*},j_{h -1}^{*})} ). \end{aligned}$$
(62)

Finally, using (62), the information gain difference between the sampling steps k and \(k-1\) is given by

$$\begin{aligned} I({\tilde{f}}^*_k) - I({\tilde{f}}^*_{k-1})&= H( {X}_{(i_{1}^{*},j_{1}^{*})} ) + \sum _{h = 2}^{k} H( {X}_{(i_{h}^{*},j_{h}^{*})} | {X}_{(i_{1}^{*},j_{1}^{*})} , .. , {X}_{(i_{h -1}^{*},j_{h -1}^{*})} ) \nonumber \\&\qquad - H( {X}_{(i_{1}^{*},j_{1}^{*})} ) - \sum _{h = 2}^{k-1} H( {X}_{(i_{h}^{*},j_{h}^{*})} | {X}_{(i_{1}^{*},j_{1}^{*})} , .. , {X}_{(i_{h -1}^{*},j_{h -1}^{*})} ) \nonumber \\&= H( {X}_{(i_{k}^{*},j_{k}^{*})} | {X}_{(i_{1}^{*},j_{1}^{*})}, .., {X}_{(i_{k-1}^{*},j_{k-1}^{*})} ). \end{aligned}$$
(63)

Appendix C: Optimality of the Sequential Rule for Fields with no Inter-Pixel Dependency

Proof

For a field \({\bar{X}}\) with no inter-pixel dependency, every random variable \(X_{i,j}\) is statistically independent of others variables in the sense that \(H(X_{i_{1},j_{1}} | X_{i_{2},j_{2}}) = H(X_{i_{1},j_{1}}) \), for all \((i_{2},j_{2})\in [M]\times [M] \setminus \left\{ (i_{1},j_{1})\right\} \). Then, by the independence bound on entropy (Cover and Thomas 2006), the posterior field entropy associated with the application of the optimal rule \(f^{*}_{k}\) posted (18) is given by,

$$\begin{aligned} H({X}_f)&= \sum _{(i,j) \in {f^{*}_{k}} } H(X_{i,j}). \end{aligned}$$
(64)

Thus, for a field with no inter-pixel dependency (nipd), the optimal sampling rule of size k can be state as

$$\begin{aligned} f^{*,nipd}_k = \arg \max _{f\in {\mathbf {F}}_k} \sum _{(i,j) \in {f} } H(X_{i,j}), \end{aligned}$$
(65)

which, by the non negativity of the entropy, is the problem of choosing the k positions with the highest a priori entropy.

In the case of the iterative sequential rule, the k-measurement is now given by,

$$\begin{aligned} (i^{*,nipd}_k,j^{*,nipd}_k)&= \arg \max _{(i,j)\in [M]\times [M] \setminus \left\{ (i^{*,nipd}_l,j^{*,nipd}_l): l=1,..,k-1\right\} } H(X_{i,j}), \end{aligned}$$
(66)

which correspond to choose the location with the k-th highest a priori entropy. Therefore, the sequential rule \({\tilde{f}}^{*,nipd}_k \in {\mathbf {F}}_k\) can be obtained by

$$\begin{aligned} {\tilde{f}}^{*,nipd}_k(1)&=(i^{*,nipd}_1,j^{*,nipd}_1), \nonumber \\ {\tilde{f}}^{*,nipd}_k(2)&=(i^{*,nipd}_2,j^{*,nipd}_2), \nonumber \\ ..., \nonumber \\ {\tilde{f}}^{*,nipd}_k(k)&=(i^{*,nipd}_k,j^{*,nipd}_k). \end{aligned}$$
(67)

Finally, by construction, the optimal combinatorial rule is equal to the sequential approach under the assumption of statistical independence,

$$\begin{aligned} {\tilde{f}}^{*,nipd}_k = f^{*,nipd}_{k}, \end{aligned}$$
(68)

meaning that in this scenario the optimal sampling can be achieve by the sequential approach. This simple case summarizes the nature of the optimal sampling approach that tries to resolve the locations with highest uncertainty in order to increase the average knowledge of the global field. As the inter-pixel dependency increases and the multiple-point statistics of the field becomes more complex, to take into account the high order conditional entropies becomes essential to avoid locations with redundant information.

Appendix D: Analysis of the Adaptive Sampling Scheme for the Markov Chain Case

Here, the AMIS \(\left\{ \right. {\tilde{f}}^a_k(\cdot | \cdot ): k \in [N] \left. \right\} \) in (36) is compared with the SMIS approach \(\left\{ \right. {\tilde{f}}^*_k(\cdot ):k \in [N] \left. \right\} \) in (28). This comparison is performed in the context of a finite length one-dimensional Markov source presented in Sect. 6 in terms of field resolvability (resolution of uncertainty) and the estimation of non-sensed positions. To evaluate the quality on resolving non-sensed position from the information of sensed position, the conditional entropy conditioning on the data is used, given a sensed rule \(f_k\) and its sensed data \(x_f=(x_1,..,x_{k-1})\in {\mathcal {A}}^{k-1}\),

$$\begin{aligned} H( {\hat{X}}_{f_k} | X_{f} = x_{f}). \end{aligned}$$
(69)
Fig. 19
figure 19

Remaining conditional entropy by considering the previous sampled locations and its measurements. Symmetric transition matrix (\(\beta = 0.2\))

By extending the result in Proposition 4, there is a simple algorithm to compute (69) by the Markov assumption not reported here for the space constraint. In addition, the problem of estimating non-sensed position from the sensed data is considered for which the minimum mean square error estimator (MMSE) is applied given by the conditional mean (Gray and Davisson 2004). More precisely, given \(f_k\) and its sensed data \(x_f=(x_1,..,x_{k-1})\in {\mathcal {A}}^{k-1}\), the MMSE estimator of \({\hat{X}}_{f_k}\) from \(X_f=x_f\) is given in closed form by \(\mathbb {E} ({{\hat{X}}_f| X_f=x_f})\in {\mathcal {A}}^{N-k}\), which is a function of \(x_f\). This problem reduces to compute point-wise the expectation for every non-sensed position given \(X_f=x_f\), which is a simple task under the Markov assumption.

Fig. 20
figure 20

Estimation error considering the previous sampled locations and its measurements. Symmetric transition matrix with \(\beta = 0.1\). Top: random sampling versus AMIS method, bottom: SMIS versus AMIS

Appendix E: Analysis of the Partial Update of Conditional Entropies in (45)

It can been argued that the use of the partial update of the conditional probabilities proposed in Sect. 7.2 can be justified in the context of SMIS in (27). Considering the stage \(k-1\) and k of this algorithm, for an arbitrary unmeasured location, the focus is to evaluate the conditional entropy \(H(X_{i,j}|X_{i^*_1,j^*_1},..,X_{i^*_{k-2},j^*_{k-2}})\) for the stage \(k-1\), and \(H(X_{i,j}|X_{i^*_1,j^*_1},..,\)\(X_{i^*_{k-2},j^*_{k-2}},\)\(X_{i^*_{k-1}, j^*_{k-1}})\) for the stage k. Thus, the point-wise reduction of conditional entropy for the location (ij) from the stage \(k-1\) to the state k is given by

$$\begin{aligned} \Delta H^{(k-1) \rightharpoonup k}(X_{i,j})&= H(X_{i,j}|X_{i^*_1,j^*_1},..,X_{i^*_{k-2},j^*_{k-2}}) \nonumber \\&\quad - H(X_{i,j}|X_{i^*_1,j^*_1},..,X_{i^*_{k-2},j^*_{k-2}},X_{i^*_{k-1},j^*_{k-1}}) \nonumber \\&= I(X_{i,j} | X_{i^*_1,j^*_1},..,X_{i^*_{k-2},j^*_{k-2}};X_{i^*_{k-1},j^*_{k-1}}), \end{aligned}$$
(70)

where, for \(k = 2\) it is clear that the reduction in entropy corresponds exactly to mutual information

$$\begin{aligned} \Delta H^{(1) \rightharpoonup 2}(X_{i,j})&= H(X_{i,j}) - H(X_{i,j}|X_{i^*_1,j^*_1}) \nonumber \\&= I(X_{i,j} ; X_{i^*_1,j^*_1}). \end{aligned}$$
(71)

As the algorithm works in the search for new measurements, the mutual information is conditioned on previous measurements, and consequently, the approach used in the entropy map update deteriorates. However, recalculating the entropy maps every 20 samples has provided satisfactory results (see Figs. 19, 20).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santibañez, F., Silva, J.F. & Ortiz, J.M. Sampling Strategies for Uncertainty Reduction in Categorical Random Fields: Formulation, Mathematical Analysis and Application to Multiple-Point Simulations. Math Geosci 51, 579–624 (2019). https://doi.org/10.1007/s11004-018-09777-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-018-09777-2

Keywords

Navigation