Probabilistic Aggregation of Uncertain Geological Resources
 174 Downloads
Abstract
Commodities such as oil and gas occur in isolated reservoirs or accumulations, more generically called basic units here. To understand a study area’s economic potential and to craft plans for exploration and development, resource analysts often aggregate (sum, accumulate) basic unit magnitudes in distinct spatial subsets of the study area and then appraise the total area’s potential by summing these intermediate sums. In a probabilistic approach, magnitudes are modeled as random variables. Some have asked, “Do different methods of partitioning basic units into subsets lead to different probability distributions for the sum of all basic unit magnitudes?” Any method of aggregation of basic unit magnitudes which obeys the rules of probability leads to the same probability distribution of the sum of all unit magnitudes as that computed by direct summation of all basic unit magnitudes. A Monte Carlo simulation of a synthetic example in which the magnitude of resource in each unit is marginally lognormal and pairwise correlations among basic unit magnitudes are specified illustrates key features of probabilistic aggregation. The joint distribution of certain pairs of aggregates are closely approximated by a bivariate lognormal distribution.
Keywords
Aggregation matrix Singlestage aggregation Multiplestage aggregation Lognormality1 Introduction
Basic mineral commodities such as oil, gas, copper, silver and gold occur as isolated reservoirs (deposits, accumulations). A principal goal of studies that treat mineral magnitudes in individual accumulations as uncertain quantities—random variables (rvs)— is to provide probabilistic projections of magnitude totals of selected subsets of the collection of all accumulations in a (spatial) assessment frame as well as of the total of all accumulation magnitudes in the frame.
Mineral resources of the type studied here are geographically discontinuous and disseminated over a study area, which can be as large as a country or a continent. These accumulations are threedimensional objects whose spatial distribution often appears as a geographic map. Geologists customarily define a “basic unit” to be, say, a copper deposit or a petroleum reservoir, field or play, and assign a probability distribution to the magnitude of resources in each unit leading to as many marginal probability distributions as basic units.
Direct probabilistic aggregation of a collection of basic unit magnitudes into a sum of all unit magnitudes requires specification of a joint probability distribution incorporating probabilistic dependencies among basic unit magnitudes, a daunting exercise when the number of units is large. Many resource assessment studies do specify a joint probability distribution of basic unit magnitudes (Carter and Morales 1998; Schuenemeyer 2005; Delfiner and Barrier 2008; Pike 2008; Schuenemeyer and Gautier 2010; Van Elk and Gupta 2010; Blondes et al. 2013a, c; U.S. Geological Survey Carbon Dioxide Storage Resources Assessment Team 2013). Crovelli and Balay (1991) characterize dependencies among basic units and between aggregates in terms of covariances and correlations.
Methodological issues that arise in the course of probabilistic aggregation are the subject of this contribution. In Sect. 2, two proofs show that if the rules of probability are obeyed, distinct partial intermediate aggregations of basic unit magnitudes lead to the same distribution of the sum of all basic unit magnitudes. Said differently, multiple levels of aggregation (multiplestage aggregation) lead to the same probability distribution for the sum of all accumulation magnitudes as direct summation of all accumulation magnitudes (single stage aggregation). Section 3 outlines properties of the datagenerating process used in Sect. 4 numerical examples. As a set of basic units is aggregated into a smaller number of larger sets do (positive) pairwise correlations between sums of basic unit magnitudes increase, decrease or stay the same? Is there an ordinal ordering of pairwise correlations among these sums as the number of elements in them increases? Section 5 addresses these questions, presents easy to compute bounds on allowable background correlations and establishes useful inequalities governing differences between pairwise correlations between basic unit magnitudes and pairwise correlations among aggregates of them. Remarks about practical aspects of aggregation and elicitation of geological judgments appear in Sect. 6.
2 Aggregation
For many geological resources, such as oil and gas, there is a natural hierarchy of aggregation levels: individual accumulation magnitudes in an oil and gas field, the sum of individual magnitudes in a play, the sum of magnitudes in the collection of plays in a petroleum basin, and in turn, a regional basin aggregate.

If the laws of probability laws are obeyed, the probability distribution of the sum of all individual accumulation magnitudes in a sample frame is

the same as the probability distribution for this sum computed by use of an aggregation scheme—no matter how one chooses to aggregate.
Define aggregation as follows: assume that choice of labelling of magnitudes is noninformative. Partition a set of N uncertain magnitudes \( \{ X_{1} , \ldots ,X_{N} \} \) into \( K \) mutually exclusive and collectively exhaustive subsets \( A_{\,1} ,..,A_{K} \). Define S_{k} to be the sum of elements in \( A_{K} ,k = 1, \ldots ,K \). Then \( \{ S_{1} , \ldots ,S_{K} \} \) is an aggregate of {X_{1},…, X_{N}}.
Assertion 1: The cumulative distribution function of the sum \( S = X_{1} + \cdots X_{N} \) of N uncertain accumulation magnitudes (rvs) is identical to that of the sum \( S_{1} + \cdots + S_{K} \) of aggregates \( S_{1} , \ldots ,S_{K} \).
The following simple proofs extend to successive levels of aggregation of \( S_{1} , \ldots ,S_{K} \).
First Proof: Each possible realization \( x_{1} , \ldots ,x_{N} \) of \( X_{1} , \ldots ,X_{N} \) is a set of N real numbers, each in \( (  \infty ,\infty ) \). Use parentheses to partition \( x_{1} + \cdots + x_{N} \) as \( (x_{1} + \cdots + x_{{i_{1} }} ) + \) \( (x_{{i_{1} + 1}} + \cdots + x_{{i_{2} }} ) + \cdots + (x_{{i_{K  1} + 1}} + \cdots + x_{N} ) \). Sum numbers within each pair of parentheses and set \( s_{1} = x_{1} + \cdots + x_{{i_{1} }} ,s_{2} = x_{{i_{1} + 1}} + \cdots + x_{{i_{2} }} , \ldots ,s_{K} = x_{{i_{K  1} + 1}} + \cdots + x_{N} \). Numbers \( x_{1} , \ldots ,x_{N} \) obey the associative law of arithmetic, so \( s_{1} + s_{2} + \cdots + s_{K} = x_{1} + x_{2} + \cdots + x_{N} \) for any such partition of \( \{ x_{1} , \ldots ,x_{N} \} \). This obtains for each possible realization \( x_{1} , \ldots ,x_{N} \) of \( X_{1} , \ldots ,X_{N} \) and all possible partitions of \( \{ x_{1} , \ldots ,x_{N} \} \), so the associative law of arithmetic applies to \( X_{1} , \ldots ,X_{N} \) as well.
Second Proof: Suppose that the range of each \( X_{1} , \ldots ,X_{N} \) is \( (  \infty ,\infty ) \) and that the \( (N \times 1) \) array of uncertain quantities \( {\mathbf{X}}_{N} = (X_{1} , \ldots ,X_{N} ) \) possesses a continuous probability distribution (density)\( Prob\{ {\mathbf{X}}_{N} \in d{\mathbf{X}}\} = f({\mathbf{X}})d{\mathbf{X}} \) with respect to the Lebesgue measure on \( (  \infty ,\infty )^{N} \). Compare the cumulative distribution function \( F(s) \equiv Prob\{ X_{1} + \cdots + X_{N} \le s\} \) of \( X_{1} + \cdots + X_{N} \) with the cumulative distribution function \( G_{K} (s) \equiv Prob\{ S_{1} + \cdots + S_{K} \le s\} \) of \( S_{1} + \cdots + S_{K} \). By construction \( S_{1} = (X_{1} + \cdots + X_{{n_{1} }} ),S_{2} = (X_{{n_{1} + 1}} + \cdots + X_{{n_{2} }} ), \ldots ,S_{K} = (X_{{n_{K  1} + 1}} + \cdots + X_{N} ) \) so on applying the associative law of arithmetic to rvs \( S_{1} , \ldots ,S_{K} \), \( G_{K} (s) = F(s) \) for all \( s \in (  \infty ,\infty ). \)
Assertion 1 obtains irrespective of the structure of dependencies assigned to \( X_{1} , \ldots ,X_{N} \). Distinct aggregation schemes, each of which obey the rules of probability, lead to identical moments of all order for the sum of all basic unit magnitudes (Assertion 2 below).
Blondes et al. (2013a, b, c) make three assertions: first, that the probability distribution of an aggregated sum using multiple stages of correlation matrices is strongly dependent on the number of aggregation stages, the size of the individual groups and the size of the total aggregation. Second, multiplestage aggregation will, if correlation coefficients are positive, narrow aggregate distributions. Third, the choice of partition of units into groups can have a larger impact on the distribution of the sum of all unit magnitudes than choices of correlation coefficient selected by experts. All or any of these assertions may obtain if geologists’ probability judgments do not adhere to laws of probability. In contrast, Assertion 1 says that any aggregation scheme obeying the rules of probability leads to the same probability distribution as that for the sum of all basic unit magnitudes.
2.1 Moments and Aggregates
One tactic for simplifying the task of assessing properties of a large collection of basic units is to assume that the joint distribution of magnitudes \( X_{1} , \ldots ,X_{N} \) is a parametric distribution indexed by a mean vector, a variance matrix and possibly a small number of additional parameters. Even so, the task of specifying parameters can be daunting. If N is large, the covariance matrix of \( X_{1} , \ldots ,X_{N} \) possesses an intimidatingly large number of parameters. Aggregation of \( X_{1} , \ldots ,X_{N} \) into K ≪ N subsets helps in principle: an analyst must then assess or estimate from available data K(K + 1)/2 ≪ N(N + 1)/2 variances and covariances of sums.
Assertion 2: The variance of the sum of elements of \( {\mathbf{X}}_{N} \) equals the variance of the sum of aggregates of elements of \( {\mathbf{X}}_{N} \) for all possible partitions of elements of \( {\mathbf{X}}_{N} \) into nonnull subsets.
Assertion 2 is, of course, a direct consequence of Assertion 1.
2.2 MultipleStage Aggregation
Matrices of the type displayed in Eq. (2) yield a compact representation of means and variances of basic unit sums induced by multiple levels of aggregation. For any fixed ordering of elements of \( {\mathbf{X}}_{N} = (X_{1} , \ldots ,X_{N} )^{t} \), the matrix in Eq. (2) maps \( {\mathbf{X}}_{N} \) into a vector of aggregates \( {\mathbf{S}}_{K} = (S_{1} , \ldots ,S_{K} )^{t} \). Call \( {\mathbf{A}} \) as in Eq. (2) \( {\mathbf{A}}^{(1)} \) so that \( {\mathbf{A}}^{(1)} {\mathbf{X}}{}_{N} = {\mathbf{S}}_{K} \).
This leads to:
3 Dependencies
How best to appraise probabilistic dependencies among basic mineral resource units is a recurring issue—from the first largescale exercise in subjective geological assessment of basic mineral resources (Miller et al. 1975) to recent attempts. Authors of the CircumArctic study (Schuenemeyer and Gautier 2010) make it clear that probabilistic projections of oil and gas in this very large region are sensitive to variations in covariabilities of basic unit magnitudes. They point out that when 48 CircumArctic assessment units are aggregated, 90% uncertainty intervals for recoverable gas range from 1471 TCF, to 2009 TCF, to 3515 TCF for assumptions of independence, assessorspecified dependencies (correlations), and functional dependence of all units (Pearson correlation coefficient 1.0), respectively. Decision makers who rely on assessment results need accurate interval estimates. Too broad an interval provides little information; too narrow an interval gives a false sense of precision.
To keep the assessment task within bounds, geologists often limit appraisal of dependencies to pairwise correlations among \( X_{1} , \ldots ,X_{N} \) or among aggregates of them. In most realistic geological assessment exercises, pairwise correlations range from close to zero to close to 1.0 with large subsets of correlations in between. Several U.S. Geological Survey (USGS) studies (Collett 2008; Klett and Gautier 2009) state that zero pairwise correlation implies probabilistic independence of a pair of uncertain quantities and, at the opposite extreme, claim that assignment of a correlation of 1.0—often mislabeled as “perfect correlation”— allows computation of fractiles of a sum of all basic unit magnitudes by addition of basic unit fractiles. Neither statement is true in general.
3.1 DataGenerating Process
3.2 Basic Unit Magnitude Correlation Structure
The correlation structure of \( {\mathbf{X}}_{N} = (X_{1} , \ldots ,X_{N} )^{t} \) and that of \( \ln {\mathbf{X}}_{N} = (\ln X_{1} , \ldots ,\ln X_{N} )^{t} = {\mathbf{Y}}_{N} \) demand attention. First, the dispersion of a sum of lognormal rvs is a function of sums of pairwise covariances among them and small variations in covariances can lead to large differences in dispersion of this sum. Second, while in theory personal probability judgments about basic unit magnitudes do not depend on whether elicitation is done in units of magnitude or units of logarithms of magnitude, in practice distinct choices of scale and function often lead to distinct probability judgments about unit magnitudes even when they should not.
3.3 Covariance Structure of Aggregates
Basic unit magnitude correlation matrix
Basic unit variance matrix
Standard deviations of \( S_{1} = X_{1} + \cdots + X_{7} \) and \( S_{2} = X_{8} + \cdots + X_{12} \) are 184 and 150, respectively, \( Cov(S_{1} ,S_{2} ) = \) 179 and \( Corr(S_{1} ,S_{2} ) \) = 0.562. The variance of the sum of all 12 basic unit magnitudes is 87,381 and its standard deviation is 296.
Variance and correlation matrices for clusters {1,2,3,4}, {5,6,7}, {8,9,10}, {11,12}
{1,2,3,4}  11948  7188  4809  3389  1  0.864  0.507  0.474 
{5,6,7}  7188  5787  3252  2352  0.864  1  0.493  0.473 
{8,9,10}  4809  3252  7523  4376  0.507  0.493  1  0.771 
{11,12}  3389  2352  4376  4278  0.474  0.473  0.771  1 
Variance and correlation matrices for clusters {1,2,3,4,5,6,7} and {8,9,10,11,12}
{1,2,3,4,5,6,7}  32112  13802  1  0.537 
{8,9,10,11,12}  13802  20554  0.537  1 
4 Simulation
Properties of marginal lognormal distributions
Unit number  

1  2  3  4  5  6  7  8  9  10  11  12  
Mean  27.6  42.3  61.8  73.8  32.5  35.2  49.0  55.9  67.4  40.2  45.5  59.8 
SD  25.9  33.4  32.9  34.9  28.0  25.4  33.4  41.8  33.4  31.8  33.4  40.8 
Median  20.16  33.18  54.61  66.70  24.60  28.54  40.49  44.77  60.35  31.56  36.65  49.45 
Mode  10.73  20.43  42.59  54.51  14.11  18.77  27.65  28.70  48.44  19.43  23.80  33.77 
0.9 Fractile  56  81  103  119  64  65  89  105  110  77  85  109 
0.1 Fractile  9.0  17.8  39.7  51.5  12.1  16.7  24.8  25.3  45.5  17.0  21.1  30.3 
μ  3.004  3.502  4.000  4.200  3.203  3.351  3.701  3.801  4.100  3.452  3.601  3.901 
σ  0.794  0.696  0.499  0.449  0.745  0.647  0.618  0.667  0.469  0.696  0.657  0.618 
σ ^{2}  0.631  0.485  0.249  0.202  0.556  0.419  0.381  0.445  0.220  0.485  0.432  0.381 
4.1 Numerical Aggregation
4.2 Approximate Lognormality
5 Aggregation and Correlation
Do pairwise correlations between sums of basic unit magnitudes increase, decrease or stay the same as basic units are aggregated into smaller numbers of larger and larger sets? Is there an ordinal ordering of pairwise correlations among these sums as the number of elements in them increases? Answers to both questions are “No” in general. However, variance matrices structured as in the USGS CircumArctic study and in Table 1 lead to useful inequalities between pairwise correlations among individual unit magnitudes and correlations between sums of magnitudes. Partition the set of all basic units into two distinct subsets (clusters) \( A_{\,1} \) and \( A_{\,2} \) chosen so that the magnitude of any unit in \( A_{\,1} \) and that of any unit in \( A_{\,2} \) possess identical pairwise correlation. Section 5.2 provides a proof that, for positive background correlations, the pairwise correlation between the sum of unit magnitudes in \( A_{\,1} \) and the sum of unit magnitudes in \( A_{\,2} \) is uniformly larger than the common (background) correlation assigned to two individual units in distinct clusters. Section 5.1 sets the stage with presentation of properties of Shür complements used to show that, as the number of elements in \( A_{\,1} \) and the number of elements in \( A_{2} \) increase in accord with a uniform asymptotic regime described in Sect. 5.3, the pairwise correlation between \( A_{\,1} \) and \( A_{2} \) sums approaches a limit proscribed by a function of weighted averages of withincluster correlations.
5.1 Shür Complements
Here, elements of \( {\mathbf{X}}_{1} \) are interpretable as magnitudes of a cluster of geologically similar units for which geologists provide sufficient information to pin down numerical values for components of \( {\mathbf{V}}_{11} \). Interpret \( {\mathbf{X}}_{2} \) and \( {\mathbf{V}}_{22} \) similarly. According to Schuenemeyer and Gautier (2010), correlations between two basic unit magnitudes lying in distinct clusters are not easy to pin down and the number of them in their study is large. To limit complexity, they assume that almost all pairwise correlations between two units in distinct clusters share a common value and call each such correlation “background correlation”. Table 1 is a simple example in which pairwise correlations between basic unit magnitudes within each of two distinct clusters share a common value. Assertion 4 below documents how allowable values of background correlation depend on variance matrices assigned to clusters.
A version of the following assertion appears in Kaufman (2016) along with tighter but more recondite inequalities for patterned variance matrices.
 (1)
The sum of elements of the inverse of any PDS correlation matrix is strictly greater than one.
 (2)
Consider the correlation matrix \( \left[ {\begin{array}{*{20}c} {{\mathbf{A}}_{11} } & {{\mathbf{A}}_{12} } \\ {{\mathbf{A}}_{21} } & {{\mathbf{A}}_{22} } \\ \end{array} } \right]\,\, \) associated with V as in (17) when \( (n \times N  n){\mathbf{A}}_{12} = {\mathbf{1}}_{n} {\mathbf{1}}_{m}^{t} \times \rho \) and \( {\mathbf{A}}_{11} ,\,{\mathbf{A}}_{22} > {\mathbf{0}} \). Define \( g_{i} \) to be the sum of elements of \( {\mathbf{A}}_{ii}^{  1} \). Then \( {\mathbf{V}} \) is positive definite symmetric if and only if
 (a)
An \( (N \times N) \) intraclass correlation matrix with correlation coefficient \( \theta \) is PDS \( iff\,\,  \frac{1}{N  1} < \theta < 1 \).
 (b)
The sum of elements of the inverse of an (N × N) intraclass correlation matrix with correlation coefficient \( \theta \) is \( N/(1 + (N  1)\theta ). \)
The sum of elements of the inverse of the green matrix in Table 1 is \( g_{1} = 1.346\, \) and sum of elements of the inverse of the blue matrix is \( \,g_{2} = 1.471 \). Pairwise background correlation \( \rho \) between clusters is restricted to lie in \( (  0.711,0.711). \) If the green matrix is replaced with a (7 × 7) identity matrix and the blue matrix is replaced with a \( (5 \times 5) \) identity matrix, then \( \rho \) is restricted to lie in \( (  0.169,0.169). \)
5.2 Background Correlation
Interpret \( {\mathbf{V}}_{ii} \) as the variance matrix assigned to magnitudes of geologically similar basic units assigned to the \( i^{th} \) cluster. In (21) \( {\mathbf{V}}_{ij} \) is the covariance of pairs of elements, one in cluster \( i \) and the other in cluster \( j,\,i \ne j \).
The CircumArctic study employs a version of (25) with a small number of off block diagonal correlations assigned values different from \( \rho \). In order for \( {\mathbf{C}} \) as in (25) to be PDS the correlation coefficient, \( \rho \) must lie in an interval \( (\rho^{  } ,\rho^{ + } ) \) with \( \rho^{  } \) the largest negative root and \( \rho^{ + } \) the smallest positive root of a polynomial \( P(\rho ) \) of degree \( K \) whose coefficients are composed of elementary symmetric functions of \( g_{1} , \ldots ,g_{K} \) with \( g_{k} \) the sum of elements of \( {\mathbf{C}}_{kk}^{  1} \). Alternatively, on applying a similarity transform to \( {\mathbf{C}} \) that maps it into block diagonal form, \( \rho^{  } \) is the largest negative eigenvalue and \( \rho^{ + } \) is the smallest positive eigenvalue of a \( (K \times K) \) matrix appearing on the diagonal of transformed \( {\mathbf{C}} \) (Kaufman (2016)).
5.3 Aggregation of Clusters
Assume that pairwise correlations between elements of \( {\mathbf{X}}_{1} \) and elements of \( {\mathbf{X}}_{2} \) are \( {\mathbf{C}}_{12} = {\mathbf{1}}_{1} {\mathbf{1}}_{2}^{t} \rho \) as in Eq. (25).
 (a)
The pairwise correlation of \( S_{1} \,{\text{and}}\,S_{2} \) is
 (b)
If variances \( v_{ii} \) are bounded away from zero and are finite, background correlation ρ is less than the geometric mean of \( \bar{c}_{n} \) and \( \overline{\overline{c}}_{N  n} \)
If pairwise correlations within clusters are bounded away from zero, then \( \bar{c}_{n} \,{\text{and}}\,\,\overline{\overline{c}}_{N  n} \,{\text{are}}\,{\text{both}}\,O ( 1 ) \). If Eq. (20) implies that withincluster correlations are \( O(\frac{1}{N}) \), then \( \bar{c}_{n} \,{\text{and}}\,\,\overline{\overline{c}}_{N  n} \) are \( O (\frac{1}{N} ) \).
Proof
The allowable range of \( \theta_{1} \) is \( (  \frac{1}{n  1},1\,) \) and that of \( \theta_{2} \) is \( (  \frac{1}{N  n  1},1\,) \). For \( \theta_{1} \) and \( \theta_{2} \) in their allowable ranges, \( \theta_{1} + \frac{{1  \theta_{1} }}{n} \) and \( \theta_{2} + \frac{{1  \theta_{2} }}{N  n} \) are positive. Because the sum of elements of the inverse of the green matrix in Table 1 is \( g_{1} = n/(1 + (n  1)\theta_{1} ) \) and the sum of elements of the blue matrix is \( g_{2} = (N  n)/(1 + (N  n  1)\theta_{2} \), Assertion 4 says that the correlation coefficient \( \rho \,( > 0) \) must be less than the denominator in (43) in order for \( Var({\mathbf{X}})\,{\text{with}}\,{\mathbf{X}} = ({\mathbf{X}}_{1}^{t} ,{\mathbf{X}}_{2}^{t} ) \) to be positive definite. (The CauchySchwartz inequality says the same). In this example, \( n = 7,N  n = 5, \)\( \bar{c}_{n} = \theta_{1} = 0.7 \),\( \overline{\overline{c}}_{N  n} = \theta_{2} = 0.6 \),\( \overline{{f_{n} }} = 1/n = 1/7 \) and \( \overline{\overline{f}}_{N  n} = 1/(N  n) = 1/5 \). For \( \rho = 0.4 \), \( Corr(S_{1} ,S_{2} ) = 0.563 \). The allowable range of \( \rho \) is \( (  0.711,0.711) \).
Consider an alternative partition of the 12 basic units in Table 1 into two subsets with labels \( \{ 1,2,3,4,5,6,7,8,9\} \) and \( \{ 10,11,12\} \). This partition “splits” clusters in such a way that common background correlations of 0.4 in Table 1 appear in the correlation matrix for units labelled \( \{ 1,2,3,4,5,6,7,8,9\} \) along with correlations of 0.7. The set of all pairwise correlations between elements of {10,11,12} and elements of {1,2,3,4,5,6,7,8,9} are no longer identical, so \( S_{1}^{*} = X_{1} + X_{2} + \cdots + X_{9} \) and \( S_{2}^{*} = X_{10} + X_{11} + X_{12} \) are not block diagonal aggregates. In this particular case, the pairwise correlation of \( S_{1}^{*} \) and \( S_{2}^{*} \) is 0.635, substantially greater than the background correlation of 0.4. In general, when a partition of basic unit magnitudes splits clusters possessing common background correlations as in the above example, the pairwise correlation between resulting sums can be greater than, equal to or less than background correlation.
6 Assessment Tradeoffs
6.1 Parsimony
Direct assessment of second moments of N unit magnitudes requires appraisal of \( N \) variances and \( \frac{1}{2}N(N  1) \) pairwise correlations. Partitioning aggregate units into a small number of clusters and directly assessing correlations between sums of basic unit magnitudes in each cluster (multiplestage aggregation) is an attractive alternative because it modulates the tyranny of large numbers. An example is direct assessment of correlations between pairs of sums of oil equivalents in each of several petroleum plays instead of between individual prospects and accumulations. To a geologist assigned the task of assessing covariability among basic unit magnitudes, this sounds like a magically simple recipe! The number of pairwise correlation coefficients decreases at the expense of requiring subjective appraisal of covariability of sums of oil equivalents. Aggregation of \( X_{1} , \ldots ,X_{N} \) to \( S_{1} , \ldots ,S_{K} \),\( K < N \) requires specification of \( K(K  1)/2 \) pairwise correlation coefficients and \( K \) variances. The reduction in number of parameters to assess is beguiling! For example, a \( (12 \times 12) \) variance matrix of basic unit magnitudes \( X_{1} , \ldots ,X_{12} \) requires specification of 12 variances and 66 pairwise covariances. If \( X_{1} , \ldots ,X_{12} \) are partitioned into two subsets \( \{ X_{1} , \ldots ,X_{7} \} \) and \( \{ X_{8} , \ldots ,X_{12} \} \), the variance matrix for sums \( S_{1} = X_{1} + \cdots ,X_{7} \) and \( S_{2} = X_{8} + \cdots ,X_{12} \) possesses only three parameters. Asking a geologist to assess \( Var(S_{1} ),Var(S_{2} )\,\,{\text{and}}\,Cov(S_{1} ,S_{2} ) \) in place of parameters of the variance matrix of \( X_{1} , \ldots ,X_{12} \) greatly reduces the assessment burden, but shifts focus away from properties of basic unit magnitudes. A modeling tactic that trims the number of parameters to assess is to assign common pairwise correlation of oil equivalent magnitudes to members of each petroleum play in a sample frame and to specify a common background correlation between oil equivalent magnitudes in distinct plays—as in Table 1 for example.
Because variances and covariances of aggregates of \( X_{1} , \ldots ,X_{N} \) are functionally dependent on all variances and covariances of \( X_{1} , \ldots ,X_{N} \), the introduction of multiple levels of aggregation reduces the number of parameters to assess but increases the number of constraints on second moments of aggregates. Assessment schemes must take these features of aggregates into account.
6.2 Elicitation of Dependencies and Correlation
When measurable data available to estimate oil and gas depositional model parameters are not available, the only way to proceed is to elicit geologists’ judgments about parameters and dependencies (Meyer and Booker 2001; O’Hagan et al. 2006; Delfiner and Barrier 2008; Daneshkhah and Oakley 2010). Geological analogy (a qualitative measure of similarity) plays an important role here. Choice of which analogy is highly subjective, adding a layer of complexity to the assessment process. Team effects occur often: correlations among basic units in distinct areas assessed by a particular team are often larger than correlations between units assessed by that team and units assessed by a different team. In addition to these assessment issues, subjective appraisal of pairwise covariability by elicitation of judgments about pairwise correlations deserves particular attention. Appraisal of the impact of subjective assessment biases is important. If basic units are probabilistically dependent, assessment error at one unit can propagate to assessment errors at other units.
The Pearson correlation coefficient \(  1 < \rho < 1 \) is a measure of the strength of linear association between two uncertain quantities. Although it can be computed for any pair of uncertain quantities whose joint distribution is known, an estimate of it computed from observed data is neither robust nor resistant to outliers (Wilcox 2016) and, if not carefully interpreted, can be misleading (Anscombe 1973). Here, pairwise correlations are not estimated from data, so estimation robustness is not an issue. However, use of pairwise correlation as a measure of dependence of one random variable \( Y \) on another random variable \( X \) can be misleading in other ways. How to interpret the meaning of \( \rho \) depends on the particular joint probability law governing \( X\,{\text{and}}\,Y \). If \( X\,{\text{and}}\,Y \) are bivariate normal, the expectation \( E(Y\left {X)} \right. \) of \( Y\,{\text{given}}\,X \) is a linear function of \( X \), so \( \rho \) is a sensible measure of the elasticity (variation of) \( Y \) with respect to \( X \) as well as of the dispersion of \( Y \) around the regression line \( E(Y\left {X)} \right. = a + bX \). If \( X\,{\text{and}}\,Y \) are bivariate lognormal, \( E(Y\left {X)} \right. \) is no longer a linear function of the pairwise correlation of \( X\,{\text{and}}\,Y \). More generally, the pairwise correlation between functions of two bivariate normal rvs is not a robust measure of dependency.
It is more natural for geologists to think about how the magnitude \( X_{i} \) of basic unit i varies with variations in \( X_{j} \) rather than how \( \ln X_{i} \) varies as \( \ln X_{j} \) varies. For \( i \ne j \), suppose that the \( (i,j)^{th} \) element \( c_{ij} \) of \( {\mathbf{C}} \) is the pairwise correlation between basic unit magnitudes. In general, \( Corr(X_{i} ,X_{j} ) \) is not equal to \( Corr(\ln X_{i} ,\ln X_{j} ) \). However, when \( {\mathbf{X}}_{N} \) is multivariate lognormal with \( Var(\ln X_{i} ) = \sigma_{ii}^{2} ,\,i = 1, \ldots ,N \) and \( Corr(\ln X_{i} ,\ln X_{j} ) = r \) fixed, for small \( \sigma_{ii}^{2} ,\,i = 1, \ldots ,N \),\( Corr(X_{i} ,X_{j} ) \approx r \). A protocol designed to elicit geologists’ judgments about degrees of dependencies among basic unit magnitudes assumed to be lognormal must take into account these facts. A review of some analytical methods for modeling dependencies that go beyond pairwise correlation and its cousins appears in Kaufman (2018).
7 Conclusions
Any method of aggregation of basic unit magnitudes obeying the rules of probability leads to the same distribution of the sum of all unit magnitudes.
If not carefully policed, personal (subjective) judgments by geologists elicited at distinct levels of aggregation may or may not be coherent and may or may not lead to a distribution for the sum of all basic unit magnitudes identical to that computed by direct summation of all of them. This is an implementation, not a mathematical, problem. Multiplestage aggregation requires fewer judgmental assessments about fewer parameters. The tradeoff is that multiplestage aggregation directs geologists’ subjective probability assessments away from primitive geological attributes underpinning properties of basic unit magnitudes.
Probabilistic aggregation of resources that incorporates expert judgment is coherent if and only if judgments adhere to the rules of probability. Resolution of many issues that plague assessment practice remain to be studied and resolved.
Notes
Acknowledgments
The U. S. Geological Survey requires a preliminary internal review before any paper can be published in a scientific journal (http://pubs.usgs.gov/circ/1367/). We wish to thank Emil Attanasi, David Root and Peter Warwick for their insightful suggestions.
References
 Anscombe FJ (1973) Graphics in statistical analysis. Am Stat 27(1):17–21Google Scholar
 Blondes MS, Brennan ST, Merrill MD, Buursink ML, Warwick PD, Cahan SM, Cook TA, Corum MD, Craddock WH, DeVera CA, Drake RM, Drew LJ, Freeman PA, Lohr CD, Olea RA, RobertsAshby TL, Slucher R, Varela BA (2013a) National assessment of geologic carbon dioxide storage resources—methodology implementation: U.S. Geological Survey OpenFile Report 20131055, p 26. http://pubs.usgs.gov/of/2013/1055/OF131055.pdf
 Blondes MS, Schuenemeyer JH, Drew LJ, Warwick PD (2013b) Probabilistic aggregation of individual assessment units in the U.S. Geological Survey national CO_{2} sequestration assessment. Energy Proc 37:5110–5117CrossRefGoogle Scholar
 Blondes MS, Schuenemeyer JH, Olea RA, Drew LJ (2013c) Aggregation of carbon dioxide sequestration storage assessment units. Stoch Environ Res Risk Assess 27:1839–1859CrossRefGoogle Scholar
 Carter PJ, Morales E (1998) Probabilistic addition of gas reserves within a major gas project. In: Paper presented at the Society of Petroleum Engineers Asia Pacific Oil and Gas Conference and Exhibition, p 8. SPE paper 50113Google Scholar
 Collett T (2008) Assessment of gas hydrates on the North Slope, Alaska, 2008. US Geological Survey Fact 20083073, p 4. https://pubs.usgs.gov/fs/2008/3073/pdf/FS083073_508.pdf
 Crovelli RA, Balay RH (1991) A microcomputer program for energy assessment and aggregation using the triangular probability distribution. Comput Geosci 17(2):197–225CrossRefGoogle Scholar
 Daneshkhah A, Oakley JE (2010) Eliciting multivariate probability distributions. In: Böcker K (ed) Rethinking Risk Measurement and Reporting, vol I. Risk Books, LondonGoogle Scholar
 Delfiner P, Barrier R (2008) Partial probabilistic addition: a practical approach for aggregating resources. SPE Reserv Eval Eng 11(2):379–386Google Scholar
 Kaufman GM (2016) Generalizations of intraclass correlation matrices (unpublished working paper)Google Scholar
 Kaufman GM (2018) Properties of sums of geologic random variables. In: Daya Sagar BS, Cheng Q, Agterberg F (eds) Handbook of Mathematical Geosciences: fifty years of IAMG, Chapter 5. 50th Anniversary volume (forthcoming)Google Scholar
 Klett TR, Gautier DL (2009) Assessment of undiscovered petroleum resources of the Barents Sea. U.S. Geological Survey Fact Sheet 20093037, p 4. http://pubs.usgs.gov/fs/2009/3037/pdf/FS093037.pdf
 Meyer MA, Booker JM (2001) Eliciting and analyzing expert judgement: a practical guide. ASASIAM series on Statistics and Applies Probabilities, Alexandria, VA, p 459Google Scholar
 Miller BM, Thomsen HL, Dolton GL, Coury AB, Hendricks TA, Lennartz FE, Powers R, Sable EG, Varnes KI (1975) Geological estimates of undiscovered oil and gas resources in the United States. United States Geological Survey Circular 725, p 78, 3 mapsGoogle Scholar
 O’Hagan A, Buck CE, Daneshkhah A, Eiser JR, Garthwaite PH, Jenkinson DJ, Oakley JE, Rakow T (2006) Uncertain judgements: eliciting experts’ probabilities. Wiley, Chichester, p 321CrossRefGoogle Scholar
 Pike R (2008) How much oil is really there? Making correct statistics bring reality to global planning. Significance 5:149–152CrossRefGoogle Scholar
 Schuenemeyer JH (2005) Methodology for the 2005 USGS assessment of undiscovered oil and gas resources, Central North Slope, Alaska. U.S. Geological Survey OpenFile Report 20051410, p 82. https://pubs.usgs.gov/of/2005/1410/of20051410.pdf
 Schuenemeyer JH, Gautier DL (2010) Aggregation methodology for the CircumArtic resource appraisal. Math Geosci 42(5):583–594CrossRefGoogle Scholar
 U.S. Geological Survey Geologic Carbon Dioxide Storage Resources Assessment Team (2013) National Assessment of Geologic Carbon Dioxide Storage ResourcesResults: U.S. Geological Survey Circular 1386, p 41. http://pubs.usgs.gov/circ/1386/pdf/circular1386.pdf
 Van Elk JF, Gupta R (2010) Probabilistic aggregation of oil and gas field resource estimates and project portfolio analysis. SPE Reserv Eval Eng 13(1):72–81Google Scholar
 Wilcox RR (2016) Introduction to robust estimation and hypothesis testing, 4th edn. Academic Press, Cambridge, p 786Google Scholar