Properties of Sums of Geological Random Variables

In the absence of empirical data that allows resolution of the vexing problem of how to address probabilistic dependencies among and between elements of large sets of geologic random variables data we need methods that refocus and streamline expert geological judgment inputs along with analytical methods for modeling dependencies that go beyond pairwise correlation and its cousins. Some possibilities are reviewed.


Introduction
Suppose that you are given the marginal distribution of each of a set of n random variables but no other information. What can be said about the behavior of their sum? This is an old problem, extensively studied by probability theorists and statisticians (Hoeffding 1940;Frèchet 1951). There is a rich probabilistic finance and actuarial risk analysis literature devoted to calculation of bounds on sums of random variables. This question motivates our review of state of the art methods designed to reduce geologists' cognitive load when asked to assign judgmental probabilities to uncertain geologic variables.
In a wide range of settings geologists are asked to provide personal probability judgments about a collection of uncertain quantities and, in particular, about sums of them. Probabilistic assessments of oil and gas in unexplored petroleum plays and basins are recurring examples. In the absence of hard data they deal rather well with the cognitive task of providing personal judgments about marginal distributions of geologic attributes; i.e. their assessments are, in the large, reasonably well calibrated. Geologists' personal judgments about dependencies among uncertain geologic quantities are more problematic.
It is worthwhile to distinguish micro-assessments-assessment of dependencies among individual reservoir attributes for example-from macro-assessmentsassessment of dependencies among assessment units, each of which may be a collection of anomalies, reservoirs and fields. Measurable data bearing directly on probabilistic dependencies at the micro-assessment level is often available but precise measurable data bearing on dependencies among elements in a macro-assessment is seldom available. Chen et al. (2012) point out that Although efforts have been made to address variable dependence in both methodology and tool development, the greatest emphasis and attention have been given to resource aggregation. Until now, the impact of interdependencies among variables in volumetric resource calculations has been mostly ignored, and the implementation of variable dependency remains a challenge to petroleum resource appraisal. In practice, inadequate data commonly exist to either specify a standard multivariate distribution with an appropriate correlation structure or to quantify the resource aggregation correlation matrices. However, variable correlations are so common among geologic variables that ignoring their interdependence may lead to serious bias, affecting both the resulting resource potential estimation.
Most geologists with some training and experience in probability assessment can provide reasonable responses to questions about marginal distributions of individual attributes of a target entity. Few if any are well equipped to provide sharp coherent judgments about possible dependencies among them. Some progress has been made in understanding how to elicit sensible, coherent judgements about second order co-variability of petroleum assessment units-the recent USGS study of CO 2 sequestration in depleted oil and gas reservoirs is an example. However, specification of marginal distributions along with second order moments is not sufficient for identification of a joint distribution of a set of uncertain quantities. This matters when interest centers on the right tail of a sum of magnitudes of petroleum in assessment units. Excepting special cases-joint lognormality for example-the right tail of a sum of jointly dependent uncertain quantities can, both in principle and in practice differ meaningfully from the right tail of an approximation based on marginal distributions and second moment properties alone. Lillestøl and Sinding-Larsen's (2017) study of giant field probabilities based on 182 North Sea discoveries highlights the importance of accurate modeling of tail probabilities. For economists, bureaucrats and politicians right tail probabilities are often the most interesting feature of a probabilistic oil and gas assessment. What, for example, is the probability of finding at least one more giant field in a given mature petroleum province? Objectives here are first, to outline how methods currently used by geologists to impute probabilistic dependencies among uncertain geologic quantities fit (or don't fit) into a conceptual framework developed by probabilists to answer the question posed at the outset and second, to review how the probability distribution of a sum of such quantities can be bounded given knowledge of marginal distributions alone assuming they are governed by a type of functional dependency called co-monotonicity. Co-monotonicity and cupolas are conceptual twins. Section 5.2 lays out necessary theory and definitions and calls attention to co-monotonic upper bounds on sums of random variables and lower bounds expressed in terms of conditional expectations. Section 5.3 addresses geologic case studies in two of which geologists compute a probability distribution of a sum of random geologic magnitudes in three steps: first, specify marginal distributions of each magnitude, second, elicit judgmental appraisals of pairwise correlations among magnitudes and third, combine the two using Monte Carlo simulation to arrive at a distribution of the sum. This approach might be labelled "incomplete specification" (not to be confused with the econometric definitions of just-, overand under-specification.). Iman and Conover's (1982) ingenious method for imputing dependencies among a set of random variables requiring only pairwise correlations among elements of that set and marginal distributions is deployed in the CO 2 sequestration study cited above (Sect. 5.3.2). Chen et al. (2012) use of cupolas to capture probabilistic dependencies in geologic micro-assessments is reviewed in Sect. 5.3.3. Brief concluding remarks appear in Sect. 5.4. Blondes et al. (2013a, b) offer a sensible rationale for careful attention to dependencies: In the Circum-Arctic aggregation of the 48 AUs, the 90-percent uncertainty interval for recoverable gas is 1,471, 2,009, or 3,515 tcf for assumptions of independence, assessor specified dependency (correlation), or total dependence respectively. Clearly, decision makers who rely on assessment results need accurate interval projections. Too broad an interval provides little information; too narrow an interval gives a false sense of precision.
Spatial modeling provides important insights into the structure of probabilistic dependencies among petroleum play attributes and deserves careful attention in parallel with methods and models discussed here. It is a topic for another day.

Preliminaries
Define F X to be the distribution function of a random vector X = ðX 1 , . . . , X n Þ t with domain R n and marginal distributions F i , i = 1, . . . , n. Set F X ðxÞ = ProbfX 1 ≤ x 1 , . . . , X n ≤ x n g. Assume that each F i is continuous and possesses a one to one inverse. Define the pth fractile of X i as the value in the domain of X i such that ProbfX i ≤ x p g = p and its inverse as F − 1 i ðpÞ = x i ðpÞ. In turn the pth fractile of the sum S n = X 1 + ⋯ + X n is s p such that ProbfS n ≤ s p g = p or F − 1 S n ðpÞ = s p . What conditions guarantee that fractiles are strictly additive? That is that for all p ∈ ð0, 1Þs p = x 1 ðpÞ + ⋯ + x n ðpÞ? Imposition of functional dependencies among X 1 , . . . , X n is one route to sufficient conditions for this to be true. To divide difficulties suppose that X 1 , . . . , X n share a common domain D X and consider n continuous invertible functions h i , each with domain D X . Suppose that x i = h i ðx 1 Þ for all x i ∈ D X , i = 2, .., n. Then ProbfS n < sg = ProbfX 1 + h 2 ðX 1 Þ + ⋯ + h n ðX 1 Þ < sg. The omnibus function gðx 1 Þ = x 1 + h 2 ðx 1 Þ + ⋯ + h n ðx 1 Þ , x 1 ∈ D X is continuous and invertible so ProbfgðX 1 Þ < sg = ProbfX 1 < g − 1 ðsÞg. The pth fractile of S n is s p such that ProbfgðX 1 Þ < s p g = p or ProbfX 1 < g − 1 ðs p Þg = p leading to x 1 ðpÞ = g − 1 ðs p Þ.
Equivalently gðx 1 ðpÞÞ = s p . Functional dependencies of this type are too strong to survive the rigors of modeling most real world data. In the absence of complete knowledge of a joint distribution co-monotonicity is a more flexible approach to modeling joint behavior of dependent random variables.
Here = d means agreement in distribution. Intuitively each element of a co-monotonic random vector is a functional of a single random variable U so all elements of X exhibit strong positive dependency. McNeil et al. (2005) provide a more general definition: X is co-monotonic if and only if it agrees in distribution with a random vector, each of whose components is a non-decreasing function of a single random variable. If elements of X are co-monotonic increasing one element of X increases all others. Goovaerts et al. (2000) provide a clear readable account of properties of sums of co-monotonic random variables in an actuarial context. Deelstra et al. (2009) offer a literature review of co-monotonicity in financial economics.
Foreshadowing a possible critique by geologists that in their setting, some elements of X may be independent or possibly negatively dependent (rather rare), co-monotonicity and its consequences provide upper and lower bounds on a sum of random variables with specified marginal distributions that embrace a wide range of dependence structures. When these bounds are judged to be tight enough, reasonable projections of probability distributions of aggregates can be made using marginal distributions along with specification of certain conditional expectations. (See 5.1, 5.5). They provide useful information about projections made based on information elicited from geologists about dependencies and police reasonableness of geologic probabilistic projections of uncertain geologic resources made using other methods.

Bounds
A random variable X precedes a random variable Y in convex order, denoted by X ≥ cx Y if and only if EðgðXÞÞ ≥ EðgðYÞÞ for all real convex functions g for which expectations are finite. Kaas et al. (2009) use convex order to show that fractiles of co-monotonic random variables can be added in the following sense: for any random vector X = ðX 1 , . . . , X n Þ possessing marginal cumulative distribution functions F 1 , . . . , F n and U a uniform (0, 1) random variable , for all p ∈ ð0, 1Þ. They point out that (5.1) is a supremum in terms of convex order and is a best bound for marginal distributions in a Fréchet space. It is well known that if a random vector X with marginal distributions F 1 , . . . , F n belong to a Fréchet space n the joint cumulative distribution function ProbfX 1 ≤ x 1 , . . . , X n ≤ x n g of X is bounded from above by M n ≡ minfF 1 ðx 1 Þ, . . . , F n ðx n Þg. Goovarts et al. note that M n is reachable in n .
For sums of elements of X introduction of a random variable Z such that distribution functions of each X i given Z are known with certainty leads to refined upper and lower bounds. In a geologic context Z is interpretable as a latent (background) variable describing gross geologic characteristics of, for example, a petroleum assessment unit. The conditioning variable Z might be regression dependent on geologic attributes of an assessment unit and need not be scalar. These authors define F − 1 X i Z j ðUÞ to be a random variable f i ðU, ZÞ that for ðU, ZÞ = ðu, zÞ assumes value F − 1 X i z j ðuÞ and prove that for U uniform ð0, 1Þ and Z independent of U  (2017) show that upper and lower Fréchet-Hoeffding bounds such as those described above can be tightened. They demonstrate that other types of information, knowledge of functionals of lower dimensional marginals of an n-dimensional cupola for example, also lead to improvements. The tradeoff is that the improved bounds are quasi-cupolas but not cupolas. Comparison of predictive distributions of undiscovered mineral resources derived by conventional methods currently in use with co-monotonic bounds on them is a promising avenue of research.

Thumbnail Case Studies
Thumbnail sketches of three case studies serve as a template for discussion of probabilistic dependence issues discussed above: examples of the USGS approach to probabilistic dependencies among oil and gas assessment units, the USGS probabilistic assessment of CO 2 sequestration in mature oil and gas reservoirs in the United States and a Canadian Geological Survey study of use of cupolas to capture probabilistic dependencies among accumulations in individual oil and gas plays.

USGS Oil and Gas Resource Projections
The USGS developed an assessment system in the 1980s with the acronym FASP (fast appraisal system for petroleum resources). FASP incorporated perfect positive correlation between micro-level reservoir attributes but allowed specification of any positive correlation in the course of aggregating play resources. However, the USGS 2000 World Petroleum Assessment aggregates undiscovered resource volumes from assessment unit level to regional level using perfect correlation as the argument for adding assessment unit fractiles to arrive at regional level aggregates. Recognizing that at the global level dependencies among large regional aggregates of resources are unlikely to be perfectly correlated they adopt pairwise correlation of 0.5 between pairs of eight regions . No sensitivity analysis of how aggregate projections vary with these particular choices is provided.
Many USGS assessment studies present tables of fractiles of individual assessment units and then add them to arrive at a fractile assessment of total resources. Addition is qualified by the statement that "Fractiles are additive under assumption of perfect positive correlation" allowing avoidance of direct assessment of dependencies among units. Table 2 (Klett et al. , 2005Klett 2004). It is easy to show that "perfect correlation" is not robust to variations in specification of the functional form of marginal distributions elicited from geologists. Worse, addition of fractiles without careful attention to properties of the joint distribution of a set of uncertain quantities can lead to incoherence. On the other hand mutual independence allows specification of arbitrary marginal probability distributions without doing violence to coherence but often leads to an unacceptably narrow probability projection of sums of oil and gas magnitudes.
A salient feature of Pearson's correlation coefficient is that random variables X and Y possess correlation 1.0 or − 1.0 only if X and Y are linearly dependent. As Denuit and Dehaene (2003) point out, a limiting case is a bivariate normal pair of random variables for which the variance of one member of the pair is zero. If X and Y are jointly lognormal and log X is a linear function of log Y the Pearson correlation of log X and log Y is either 1.0 or −1.0. However, the Pearson correlation of X and Y is then less than 1.0. Denuit and Dehaene provide a more nuanced treatment. Suppose F 1 and F 2 are marginal cumulative distribution functions of X and Y respectively, each concentrated on ð0, ∞Þ and U is a uniform random variable independent of X and Y. Using super-modularity these authors prove that if F 1 and F 2 lie in a Fréchet space the Pearson correlation coefficient rðX, YÞ of X and Y is bounded by In this setting perfect correlation is not achievable. They also prove that it is possible for a pair of co-monotonic lognormal random variables to have pairwise correlation close to zero, contradicting the intuitive notion that small correlation implies weak dependence. Denuit and Dehane call attention to Shih and Huang (1992) and Schechtman and Yitzhaki's (1999) observation that, for any two random variables, the achievable range of Pearson's correlation coefficient is (−1, 1) only if the functional form of the two marginal distributions differ solely in values of location and/or scale parameters. If not, the range of Pearson's r is narrower than (−1, 1) and depends on the shape of the two marginal distributions.
These authors document several important features of Kendall's τ and Spearman's ρ. (Spearman's ρ is at the center of the Iman and Conover method deployed in the USGS (2013) study of CO 2 sequestration to compute predictive probability distributions of aggregates). First, both are invariant with respect to strictly monotone transformations. Second, when one variable is a non-decreasing (non-increasing) transformation of the other they equal 1 (or −1) at the Fréchet upper (resp. lower) bound. They note that at a value of 1.0 or −1.0 Kendall's τ and Spearman's ρ achieve Fréchet bounds. According to them Kendall's τ and Spearman's ρ are more desirable measures of association for non-normal multivariate distributions than Pearson's r because the latter does not share Kendall and Spearman's correlation invariance properties. These invariance properties come into play in Iman and Conover's method discussed below. Denuit and Dehane prove the non-obvious fact that if positively or negatively quadrant dependent random couples are jointly uncorrelated they are mutually independent.
All of this emphasizes that "perfect correlation" as an omnibus argument for adding fractiles has many pitfalls. Co-monotonic bounds on random sums are a conceptually satisfactory alternative that deserves much future study.

USGS Probabilistic Assessment of CO 2 Storage Capacity
A recent USGS probabilistic assessment of CO 2 sequestration in mature petroleum reservoirs (Blondes et al. 2013a, b) is based on both micro-and macro-assessments by geologists. Their macro-assessment aggregates storage assessment units (SAUs) at basin, regional and national levels. An objective was to provide probabilistic assessments that take into account dependencies among assessment units arising from "overlap of geologic analogs, assessment methods and assessors" using individual SAU marginal probability distributions and "…a correlation matrix obtained by expert elicitation describing interdependencies between pairs of SAUs". The correlation matrix dimension is 192 × 192. Because a menagerie of marginal distributions-Beta-PERT, lognormal, truncated lognormal-were deployed at the micro-level use of standard multivariate distribution theory is not appropriate. Dependencies among storage capacity magnitudes are induced using an innovative distribution free method developed by Iman and Conover (1982) that allows marginal distribution shapes to be estimated from data sets distinct from data sets used to estimate dependency structure. Their method is designed to provide rank correlations that match assessed correlations and to translate the match into a predictive probability distributions for individual assessment units and larger aggregates. (See Blondes et al. 2013a for informative examples). How to aggregate from basin, to region and then to a national scale is an issue. Should this be done in a single stage using the correlation matrix for all SAUs in the study or successively aggregate subsets of SAUs in multiple stages? Blondes et al. (2013b) conclude that Although the single-stage approach requires determination of significantly more correlation coefficients, it captures geologic dependencies among similar units in different basins and it is less sensitive to fluctuations in low correlation coefficients than the multiple stage approach. Thus, subsets of one single-stage correlation matrix are used to aggregate to basin, regional, and national scales.
Successive aggregation in multiple stages drastically reduces the number of pairwise correlations that must be elicited from geologists at the expense of requiring each assessor to appraise pairwise correlations of sums of assessment unit magnitudes. Although there are no studies comparing how well geologists' assessments calibrate when asked to appraise dependencies among sums of SAU magnitudes relative to appraisal of dependencies among individual SAUs it is reasonable to conjecture that individual SAU appraisals are much more likely to be well calibrated. Properties of single and multi-stage appraisal methods are studied in Kaufman et al. (2018).
Four plays, Ivik, Taglu, Kugmallit (East) and Kugmallit (West) are used to illustrate how to incorporate dependencies among individual play resources. Although no systematic method for eliciting geologists' judgments about between play dependencies are discussed the authors motivate their choice of a rather large correlations between plays (0.6) and perfect correlation (1.0) by noting that all four plays share the same source rock and petroleum system: "The resource richness of each play is basically a function of both the oil charge and the preservation of accumulations that are mostly controlled by common petroleum system elements… we infer that the resources in the four plays are highly correlated, although the pool size distributions among the four plays vary considerably." Pairwise correlations between area, net pay, porosity and oil saturation vary from a low of 0.20 to a high of 0.86. The authors call attention to the substantial difference between total ultimate oil resource medians under the assumption of independence and under the assumption of within and between play correlations: the latter is 1.6 times the former.
Principal messages are that to be realistic, probabilistic appraisal of oil and gas resources in unexplored and partially explored regions must account for multiple sources of dependencies and that cupolas are useful for doing so.

Concluding Remarks
In the absence of empirical data that allows resolution of the vexing problem of how to address probabilistic dependencies among and between elements of large sets of geologic random variables we need methods that refocus and streamline expert geological judgment inputs as well as analytical methods for modeling dependencies that go beyond pairwise correlation and its cousins. One promising avenue is the theory of vines proposed by Bradford and (2002). Their theory broadens the range of allowable dependency structures beyond Bayesian belief networks and exploits properties of rank correlations in a fashion that leads to efficient computation.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.