# Fast and Accurate Approximation to Kriging Using Common Data Neighborhoods

- 793 Downloads

## Abstract

Unknown values of a random field can be predicted from observed data using kriging. As data sets grow in size, the computation times become large. To facilitate kriging with large data sets, an approximation where the kriging is performed in sub-segments with common data neighborhoods has been developed. It is shown how the accuracy of the approximation can be controlled by increasing the common data neighborhood. For four different variograms, it is shown how large the data neighborhoods must be to get an accuracy below a chosen threshold, and how much faster these calculations are compared to the kriging where all data are used. Provided that variogram ranges are small compared to the domain of interest, kriging with common data neighborhoods provides excellent speed-ups (2–40) while maintaining high numerical accuracy. Results are presented both for data neighborhoods where the neighborhoods are the same for all sub-segments, and data neighborhoods where the neighborhoods are adapted to fit the data densities around the sub-segments. Kriging in sub-segments with common data neighborhoods is well suited for parallelization and the speed-up is almost linear in the number of threads. A comparison is made to the widely used moving neighborhood approach. It is demonstrated that the accuracy of the moving neighborhood approach can be poor and that computational speed can be slow compared to kriging with common data neighborhoods.

### Keywords

Kriging Fast approximation Sub-segments Data neighborhood Parallelization## 1 Introduction

Kriging is a method of predicting values of a random field in unobserved locations based on observed data scattered in space. Kriging requires the solution of a linear equation system the size of the number of observed data that are covered by a region or domain of interest. For large data sets, the computational cost becomes large and numerical instabilities may occur. This motivates the use of data subsets, that is, local neighborhoods of data relevant for the targeted kriging locations. One local method is the so-called moving neighborhood with a geometrically defined neighborhood that moves with the target location. Several approaches investigate suitable balance between near and far observed data to be included in neighborhoods (Cressie 1993; Chilès and Delfiner 1999; Emery 2009).

This paper presents an approximation to kriging using common data neighborhoods, which is an extension of the methodology introduced in Vigsnes et al. (2015). In Sect. 2, kriging and its computational steps are presented and in Sect. 3 common data neighborhoods and sub-segments are introduced. Section 4 considers the accuracy of the approximation and discusses the relationship between computation time and accuracy. The methodology is furthermore extended to adaptive data neighborhoods in Sect. 5. The kriging approximation is well suited for parallelization and results demonstrating this potential are included in Sect. 6. Finally, a generalization to other forms of kriging and the prediction error is discussed in Sect. 7.

## 2 Kriging

Consider a regular grid in a hyperrectangle (orthotope) \(\mathcal {D}\) in \(\mathbb {R}^d\). Assume that the grid covers \(\mathcal {D}\) and contains *N* grid nodes. The objective is to predict a random field \(z(\mathbf {x})\) at each of the *N* grid nodes given *n* observations.

*n*-dimensional vector

*n*-dimensional vector containing \(m(\mathbf {x})\) at the observation locations.

### 2.1 Computation Time

The four main steps in solving Eq. (2) are given by the following algorithm.

- 1.
Assemble \(\mathbf {K}\).

- 2.
Cholesky factorize \(\mathbf {K}\).

- 3.
Solve for weights: \(\mathbf {w}= \mathbf {K}^{-1}\, (\mathbf {z}-\mathbf {m})\).

- 4.
Calculate \(z^{*}(\mathbf {x}) = m(\mathbf {x}) +\mathbf {k}'(\mathbf {x}) \cdot \mathbf {w}\) for every grid node.

*T*’s are time constants dependent on hardware and implementation.

The two bottlenecks are steps 2 and 4. Step 2 becomes a bottleneck for large data sets and step 4 becomes a bottleneck when the number of grid nodes, *N*, is huge. To limit the computation time *n* or *N* must be kept small. Keeping *N* small is trivial but not interesting since *N* is usually chosen as small as possible while still remaining an acceptable spatial resolution for \(z^{*}(\mathbf {x})\). Reducing *n* means leaving out observations from the computations. This usually gives loss of information that can affect the quality of the result.

## 3 Kriging in Data Neighborhoods

A common approach to limit the number of data, *n*, is to use a moving neighborhood. This means that a subset of the data that is close to a grid node \(\mathbf {x}\), is chosen when predicting \(z(\mathbf {x})\). The number of data in the neighborhoods is usually chosen quite small (\({<}100\)) (Emery 2009). The downside of this approach is that all four steps in solving Eq. (2) must be calculated for every grid node, allowing for little reuse of computed vectors and matrices. This potentially makes the computation time very long for large grids, although the approach is highly parallelizable.

### 3.1 Common Data Neighborhoods

Instead of looking at grid node specific data neighborhoods, \(\mathcal {D}\) is divided into sub-segments where each sub-segment shares a common data neighborhood. This way the location independent weights, \(\mathbf {w}\) in Eq. (3), can be reused for all the grid nodes inside the sub-segment. For simplicity, equally sized sub-segments \(\mathcal {D}_i\) that are hyper-rectangles in \(\mathbb {R}^d\), are used. The sub-segments are made so that they divide \(\mathcal {D}\) into *M* disjoint sets.

*S*and

*P*are dimensionless constants. The value of

*P*gives the extension of the data neighborhood beyond the sub-segment, and determines the overlap between common data neighborhoods.

*M*, and the average number of data in a common data neighborhood, \(\bar{n}\), can be expressed as

*S*, for a given overlap

*P*, independently of the number of grid cells

*N*and number of data

*n*. The sub-segments along the edges of \(\mathcal {D}\) will, on average, have less than \(n_R\) data points in the data neighbourhood, so Eq. (6) over-estimates the true computation time. This over-estimation becomes more pronounced as variogram ranges increase.

## 4 Finding Optimal Sub-segment Size and Common Data Neighborhoods

### 4.1 Controlling the Accuracy

The choice of the overlap *P* is a compromise between acceptable accuracy and acceptable computation time. This choice is independent of, and valid for, any sub-segment size *S*. For a given overlap *P*, the computation time is optimized by selecting the optimal sub-segment size *S*.

*P*, all data belong to all common data neighborhoods and the maximum relative approximation error is zero. For smaller overlaps, the maximum relative approximation error will depend on variogram type and the data density, \(n_R\). The empirical maximum relative approximation error for a given overlap

*P*is found by comparing kriging using common data neighborhoods with kriging using all data. Kriging is repeated 100 times using simulated sets of observation values. An estimate of the maximum relative approximation error is obtained by choosing the largest error from the 100 samples. Details are given in Appendix A.

Figure 2 shows the empirical relationship between the overlap *P* and the maximum relative approximation error for four variograms and three data densities of 5, 45 and 80. The three data densities correspond to correlation ranges 50, 150 and 200, respectively. For all variograms, an approximate log-linear relationship between the overlap *P* and the maximum relative approximation error can be observed. It is also observed that the maximum relative approximation error seems independent of the data density \(n_R\) for the spherical variogram but for the exponential variograms the error increase with smaller data densities. Figure 2 shows that the common data neighborhoods must extend significantly beyond one range to get a maximum relative approximation error as low as 1%. For the spherical variogram, for instance, the neighborhood must extend more than 3 ranges, which may seem counter-intuitive for a variogram that has a finite range. The reason is the relay effect (Chilès and Delfiner 1999), which is particularly strong for the spherical variogram.

Figure 3 shows examples of error maps \((z^{*}(\mathbf {x}) - z^{*}_{\text {CDN}}(\mathbf {x}) )/\sigma \) for two different variograms and three different values of overlap *P*. As the overlap increases, the error in regions with sparse data successively decreases. When increasing the overlap by 0.5 for the general exponential variogram, the maximum relative approximation error decreases by approximately one order of magnitude. To obtain the same decrease in maximum relative approximation error for the spherical variogram, the overlap has to be increased by 1. This is consistent with the steeper slope observed for the exponential variogram in Fig. 2.

### 4.2 Minimizing the Computation Time

*P*, the sub-segment size

*S*that minimizes the computation time, can be found by doing a one-dimensional minimization of Eq. (10). A simple half-interval search (binary search) has proved very effective.

Figure 4 shows the computation time versus sub-segment size *S* for two different variograms and two values of overlap *P* corresponding to maximum relative approximation error of 1 and 0.1% for data density \(n_R= 45\) and \(N=10^6\) grid cells. The terms in Eq. (10) and the time constants from Table 1 are used to calculate the times in Fig. 4. The time constants have been found by direct time measurements of the separate steps in the Kriging Algorithm, and depend on implementation and hardware. This implementation is run on an Intel Xeon X5690 3.47 GHz, using the Intel Math Kernel Library for linear algebra.

The optimal sub-segment size *S* is found at the minimum of the total computation time (Fig. 4). When reducing the maximum relative approximation error from 1 to 0.1%, the optimal sub-segment size *S* increases from 0.35 to 0.45 for the general exponential variogram and from 1.5 to 2.3 for the spherical variogram. For the spherical variogram the total computation time is close to constant for a wide interval of sub-segment sizes, thus any choice of *S* within this interval is efficient. If, on the other hand, the sub-segment is chosen too small, the computation time will escalate. The optimal sub-segment size is higher for the spherical variogram than for the general exponential. A default value for the sub-segment size *S*, valid for all variograms and all overlaps, is, therefore, not favorable. However, values around 1 seem acceptable in most situations.

Computing the optimal overlap *P* and sub-segment size *S* is potentially a time-consuming task. For a given variogram, however, the values of *P* and the corresponding accuracy can be pre-tabulated as in Fig. 2. With a value of *P* selected according to a required accuracy, the optimal overlap *S* can efficiently be found by minimizing Eq. (10). Accurate time constants in Eq. (10) are not crucial for obtaining a value of *S* that gives acceptable computation time, hence constants from Table 1 can be applied.

The time constants in Eq. (10) for different variograms

Variogram type | \(T_{\mathbf {K}}\) | \(T_{\text {Chol}}\) | \(T_{\text {Weight}}\) | \(T_{z^{*}}\) |
---|---|---|---|---|

Spherical | 6 | 0.028 | 0.56 | 10 |

General exponential, power 1.0 | 24 | 0.030 | 0.60 | 48 |

General exponential, power 1.5 | 54 | 0.028 | 0.56 | 106 |

General exponential, power 1.99 | 57 | 0.083 | 0.56 | 113 |

### 4.3 Comparison with Moving Neighborhood

A moving neighborhood is equivalent to using a sub-segment size of one single cell, that is, by setting \(S= N_R^{-1/d}\). This choice of *S* is very inefficient, as can be illustrated by the two-dimensional case in Fig. 4a. With the optimal *S* of 0.35, the computation time is approximately 1 min, while the moving neighborhood approach (\(S = 0.0067\)) takes 2.7 h.

In practice, most moving neighborhood algorithms must sacrifice accuracy by using few data in the neighborhoods, to get acceptable computation times. By using the 100 closest data points only, the computation time reduces to 14.1 min, and by reducing the number of data points to 20, the computation time drops to 2.5 min. In these estimates the times spent on the search for the closest neighbors to all cells are 2.3 and 1.8 min, respectively. The maximum relative approximation errors for these calculations are 11 and 49%, respectively, as can be compared to the error of 1% for the 1 min common data neighborhoods approach. To illustrate this consider the prediction of depth to a geological surface where the standard error (sill) of the variogram is 20 m. A maximum relative approximation error of 1% corresponds to a maximum numerical error of 0.2 m that is acceptable in most situations. The moving neighborhood approaches would give maximum numerical depth errors of 2.2 and 9.8 m, respectively. These errors are probably too large to be acceptable in many situations.

For a comparison, the KB2D executable in GSLIB (Deutsch and Journel 1998) has been run on the same example using the 100 and 20 closest data and using the exponential variogram. This resulted in computation times of 26.8 and 1.1 min. Our implementation yielded correspondingly 8.1 and 2.0 min, of which 2.2 and 1.8 min were used for the search for closest neighbors. The KB2D algorithm is twice as fast for the example with the 20 closest data. This is probably because it uses a more efficient approach to finding neighboring data. However, for the 100 closest data the computation time is more than 3 times longer using GSLIB. This is probably due to different efficiencies in the linear algebra libraries.

## 5 Adaptive Data Neighborhoods

Minimizing the computation time according to Eq. (10) is performed under the assumption of uniformly distributed data, which is an unrealistic assumption for real cases. Sub-segments with number of data much larger than the average, \(\bar{n}\), may increase the computation time significantly since the Cholesky factorization (step 2 in the Kriging Algorithm) is proportional to \(\bar{n}^3\). On the other hand, sub-segments with the number of data much smaller than \(\bar{n}\) may give lower accuracy.

To handle local variations in data densities, adaptive data neighborhoods can be used. The size of a specific data neighborhood can be chosen such that the number of data in the neighborhood is close to \(\bar{n}\). This is best done by choosing an overlap *P* for each sub-segment using a half-interval binary search. It is reasonable to limit the binary search, for instance within [*P* / 2, 2*P*], to avoid too small and too large neighborhoods.

*P*and number of data in the neighborhoods, in addition to error measures and computation times. Note that average number of data, average

*n*, is the actual average, while the targeted values are \(\bar{n} = 216\) and \(\bar{n}=904\) for the general exponential and spherical variogram, respectively. For the data used in this paper, there is a small decrease in maximum relative approximation error from 5.1% using common data neighborhood to 3.4% using adaptive neighborhood for the general exponential variogram with power 1.5 and data density \(n_R = 45\). The same level of maximum relative approximation error is achieved using the directional adaptive data neighborhood. No decrease in maximum relative approximation error is observed for the spherical variogram. However, for both variograms, the error in regions with sparse data is smaller using an adaptive neighborhood than using common data neighborhood. This is also reflected by the decrease in mean of the error maps, by a factor of 3.3 and 6.2 for the general exponential variogram, and a factor of 3 and 3.5 for the spherical variogram. Hence, a general improvement in accuracy is achieved with the adaptive data neighborhoods, although maximum relative approximation error is hardly decreased. The computation time increases mainly due to the increased average number of data in the adaptive neighborhoods.

Specifications of values of overlap *P*, number of data, error measures and computation times for the different data neighborhoods

Common | Adaptive | Directional adaptive | |
---|---|---|---|

Gen exp 1.5 | |||

\(\bar{P}\) | 1 | 1.14 | 1.57 |

\([P_{\text {min}},P_{\text {max}}]\) | [1, 1] | [0.54, 2.00] | [0.64, 2.00] |

Average | 185 | 221 | 250 |

\([n_{\text {min}}, n_{\text {max}}]\) | [17, 392] | [175, 230] | [175, 310] |

MRAE | 5.1% | 3.4% | 3.0% |

Mean error | 0.18% | 0.054% | 0.029% |

Comp. time (s) | 23 | 28 | 32 |

Spherical | |||

\(\bar{P}\) | 2 | 2.78 | 3.38 |

\([P_{\text {min}},P_{\text {max}}]\) | [2, 2] | [1.80, 4.00] | [2.44, 4.00] |

Average | 625 | 918 | 972 |

\([n_{\text {min}}, n_{\text {max}}]\) | [226, 1019] | [710, 929] | [710, 1099] |

MRAE | 3.1% | 3.4% | 2.6% |

Mean error | 0.24% | 0.080% | 0.068% |

Comp. time (s) | 11 | 16 | 18 |

## 6 Parallelization

Kriging is well suited for parallelization, as the predicted values are calculated independent of each other. This also holds for sub-segments as the prediction in one sub-segment is independent of the other sub-segments. There are typically more than 100 sub-segments, and the amount of calculation for each sub-segment is large while the amount of overhead is small. The granularity is, therefore, good, and an efficient parallelization can be obtained. Hu and Shu (2015) present an MPI-based kriging algorithm where the grid cells to be estimated are split into a few blocks and assigned to each processor at once. This parallelization demonstrates significant speed-ups, but assumes a small number of observations. Here the OpenMP API (OpenMP 2008) is used for the parallelization. Using dynamic scheduling, a new sub-segment is assigned to a thread when the thread has finished predicting the previous sub-segment. Table 3 summarizes speed-up factors for different numbers of threads and different values of overlap *P*, predicting \(N = 10^6\) grid cells conditioned to \(n=2000\) data. The calculations have been performed on an Intel Xeon X5690 3.47 GHz based system with 76 Gb memory and 2 CPUs, each with 6 physical cores with hyper-threading enabled, resulting in 24 threads. The table shows results for the general exponential variogram with power 1.5 and the spherical variogram. The speed-up factors are relative to using 1 thread and can be seen to scale nearly linearly with the number of threads, up to the number of physical cores. A small additional gain is obtained with hyper-threading.

Speed-up factors relative to using 1 thread, when predicting \(10^6\) cells conditioned to 2000 data using general exponential variogram with power 1.5 and the spherical variogram, and data density of 45

# threads | Gen exp 1.5 | Spherical | ||||||
---|---|---|---|---|---|---|---|---|

| | |||||||

0.5 | 1.0 | 1.5 | 2.0 | 1.0 | 2.0 | 3.0 | 4.0 | |

2 | 2.0 | 2.0 | 2.0 | 2.0 | 1.9 | 1.9 | 2.0 | 1.9 |

4 | 3.8 | 4.0 | 4.0 | 3.9 | 3.9 | 3.8 | 3.8 | 3.7 |

8 | 7.2 | 7.6 | 7.7 | 7.6 | 7.1 | 7.1 | 6.9 | 6.2 |

12 | 10.1 | 11.0 | 11.1 | 11.2 | 10.1 | 9.7 | 9.2 | 8.3 |

24 | 13.4 | 15.3 | 15.4 | 15.0 | 11.9 | 10.5 | 9.6 | 8.5 |

Comp. time (s) | 6.6 | 23.1 | 46.1 | 72.8 | 4.0 | 10.7 | 17.8 | 23.7 |

## 7 Extension to Universal Kriging, Bayesian Kriging, Prediction Error and Conditional Simulations

*i*, but solving the dual kriging weights in step 3 in the Kriging Algorithm must be replaced by solving the kriging weights

The classic approach to conditional simulations (Journel and Huijbregts 1978) is to generate an unconditional Gaussian random field and condition it to data using (simple) kriging. If an efficient approach to the unconditional simulations is used, like the Fast Fourier Transform based algorithms (Dietrich and Newsam 1993; Wood and Chan 1994), the time consuming part is still the conditioning, and the suggested approach will give significant speed-ups.

## 8 Conclusions

An approximation to kriging has been proposed and tested. The idea is to divide the region of interest into rectangular sub-sets with overlapping data neighborhoods. The approximation gives significant speed-ups (\(\sim \)2–40) depending on data density and the chosen acceptable accuracy.

The accuracy is controlled by selecting the overlap *P* of the common data neighborhoods. It requires time consuming simulation experiments to find a relationship between variogram type, overlap *P* and accuracy. Figure 2 summarizes results from such simulation experiments for some widely used variograms. These results can be used to select a reasonable value for the overlap *P* for many situations. For instance, accepting maximum relative approximation error of 5% can be obtained using \(P=1.5\) for the exponential variogram and \(P=2\) for the spherical variogram. Smoother variograms will require larger overlap *P* to obtain the same accuracy, especially when there are little data. Choosing the overlap *P* is a trade-off between accuracy and speed. For a given overlap *P* the sub-segment size *S* that minimize the computation time can be found by minimizing Eq. (10). This is a simple one-dimensional search that takes hardly any time. Figure 4 shows that a default value of \(S=1\) gives good efficiency in most cases.

The approximation is further refined by introducing adaptive data neighborhoods, where the data neighborhoods are allowed to shrink and expand depending on the data density near the sub-segments. Adaptive neighborhoods improve the accuracy, in particular for areas of sparse data. The improved accuracy is moderate and must be weighted against the added algorithm complexity. The approximation is well suited for parallelization since the granularity of the tasks is so that the overhead is small. This is demonstrated by the almost linear scaling shown in Table 3. The speed-up from parallelization comes on top of the speed-up obtained from using sub-segments with overlapping data neighborhoods. Obtaining combined speed-ups of a factor 400 is therefore possible in situations when 5% maximum relative approximation error is acceptable.

A comparison to the widespread moving neighborhood algorithm has been made. It is shown that using sub-segments with overlapping data neighborhoods are superior, both on computation time and accuracy, when there are many data and the domain of interest is significantly larger than the volume defined by the variogram ranges. The analysis shows that moving neighborhood algorithms are poor approximations to kriging using all available data since common practice is to limit the number of data in the neighborhood to a small number.

## Notes

### Acknowledgements

The paper was funded by the Research Council of Norway.

### References

- Chilès JP, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley, New YorkCrossRefGoogle Scholar
- Cressie N (1993) Statistics for spatial data. Wiley, New YorkGoogle Scholar
- Deutsch CV, Journel AG (1998) GSLIB: geostatistical software library and user’s guide. Oxford University Press, New YorkGoogle Scholar
- Dietrich CR, Newsam GN (1993) A fast and exact method for multidimensional gaussian stochastic simulations. Water Resour Res 29(8):2861–2869. doi: 10.1029/93WR01070 CrossRefGoogle Scholar
- Emery X (2009) The kriging update equations and their application to the selection of neighboring data. Comput Geosci 13(3):269–280. doi: 10.1007/s10596-008-9116-8 CrossRefGoogle Scholar
- Hu H, Shu H (2015) An improved coarse-grained parallel algorithm for computational acceleration of ordinary Kriging interpolation. Comput Geosci 78:44–52. doi: 10.1016/j.cageo.2015.02.011 CrossRefGoogle Scholar
- Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic Press Inc., LondonGoogle Scholar
- Omre H, Halvorsen K (1989) The Bayesian bridge between simple and universal kriging. Math Geol 21(7):767–786. doi: 10.1007/BF00893321 CrossRefGoogle Scholar
- OpenMP (2008) OpenMP application program interface version 3.0. http://www.openmp.org
- Vigsnes M, Abrahamsen P, Hauge VL, Kolbjørnsen O (2015) Efficient neighborhoods for kriging with numerous data. In: Conference proceedings. Third conference on petroleum geostatistics, EAGE. doi: 10.3997/2214-4609.201413660
- Wood ATA, Chan G (1994) Simulation of stationary Gaussian processes in [0,1]\(^d\). J Comput Graph Stat 3(4):409–432. doi: 10.1080/10618600.1994.10474655 Google Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.