# Nested Kriging predictions for datasets with a large number of observations

## Abstract

This work falls within the context of predicting the value of a real function at some input locations given a limited number of observations of this function. The Kriging interpolation technique (or Gaussian process regression) is often considered to tackle such a problem, but the method suffers from its computational burden when the number of observation points is large. We introduce in this article nested Kriging predictors which are constructed by aggregating sub-models based on subsets of observation points. This approach is proven to have better theoretical properties than other aggregation methods that can be found in the literature. Contrarily to some other methods it can be shown that the proposed aggregation method is consistent. Finally, the practical interest of the proposed method is illustrated on simulated datasets and on an industrial test case with \(10^4\) observations in a 6-dimensional space.

### Keywords

Gaussian process regression Big data Aggregation methods Best linear unbiased predictor Spatial processes## Notes

### Acknowledgements

Part of this research was conducted within the frame of the Chair in Applied Mathematics OQUAIDO, gathering partners in technological research (BRGM, CEA, IFPEN, IRSN, Safran, Storengy) and academia (Ecole Centrale de Lyon, Mines Saint-Etienne, University of Grenoble, University of Nice, University of Toulouse and CNRS) around advanced methods for Computer Experiments. The authors would like to warmly thank Dr. Géraud Blatman and EDF R&D for providing us the industrial test case. They also thank both editor and reviewers for very precise and constructive comments on this paper. This paper has been finished during a stay of D. Rullière at Vietnam Institute for Advanced Study in Mathematics, the latter author thanks the VIASM institute and DAMI research chair (Data Analytics & Models for Insurance) for their support.

### References

- Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model mispecification. Comput. Stat. Data Anal.
**66**, 55–69 (2013)MathSciNetCrossRefGoogle Scholar - Bachoc, F., Durrande, N., Rullière, D., Chevalier, C.: Some properties of nested Kriging predictors. Technical report hal-01561747 (2017)Google Scholar
- Bhatnagar, S., Prasad, H., Prashanth, L.: Stochastic Recursive Algorithms for Optimization, vol. 434. Springer, New York (2013)MATHGoogle Scholar
- Cao, Y., Fleet, D.J.: Generalized product of experts for automatic and principled fusion of Gaussian process predictions. arXiv preprint arXiv:1410.7827v2, CoRR, abs/1410.7827:1–5. Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal (2014)
- Deisenroth, M.P., Ng, J.W.: Distributed Gaussian processes. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France. JMLR: W & CP vol. 37 (2015)Google Scholar
- Genest, C., Zidek, J.V.: Combining probability distributions: a critique and an annotated bibliography. Stat. Sci.
**1**(1), 114–135 (1986)MathSciNetCrossRefMATHGoogle Scholar - Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)MATHGoogle Scholar
- Guhaniyogi, R., Finley, A.O., Banerjee, S., Gelfand, A.E.: Adaptive Gaussian predictive process models for large spatial datasets. Environmetrics
**22**(8), 997–1007 (2011)MathSciNetCrossRefGoogle Scholar - Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian Processes for Big Data. Uncertainty in Artificial Intelligence Conference. Paper Id 244 (2013)Google Scholar
- Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput.
**14**(8), 1771–1800 (2002)CrossRefMATHGoogle Scholar - Katzfuss, M.: Bayesian nonstationary spatial modeling for very large datasets. Environmetrics
**24**(3), 189–200 (2013)MathSciNetCrossRefGoogle Scholar - Lemaitre, J., Chaboche, J.-L.: Mechanics of Solid Materials. Cambridge University Press, Cambridge (1994)MATHGoogle Scholar
- Maurya, A.: A well-conditioned and sparse estimation of covariance and inverse covariance matrices using a joint penalty. J. Mach. Learn. Res.
**17**(1), 4457–4484 (2016)MathSciNetMATHGoogle Scholar - Nickson, T., Gunter, T., Lloyd, C., Osborne, M.A., Roberts, S.: Blitzkriging: Kronecker-structured stochastic Gaussian processes. arXiv preprint arXiv:1510.07965v2, pp 1–13 (2015)
- Ranjan, R., Gneiting, T.: Combining probability forecasts. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**72**(1), 71–91 (2010)MathSciNetCrossRefGoogle Scholar - Roustant, O., Ginsbourger, D., Deville, Y.: DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J. Stat. Softw.
**51**(1), 1–55 (2012)Google Scholar - Rue, H., Held, L.: Gaussian Markov Random Fields. Theory and Applications. Chapman & Hall, London (2005)CrossRefMATHGoogle Scholar
- Samo, Y.-L.K., Roberts, S.J.: String and membrane Gaussian processes. J. Mach. Learn. Res.
**17**(131), 1–87 (2016)MathSciNetMATHGoogle Scholar - Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer, Berlin (2013)MATHGoogle Scholar
- Satopää, V.A., Pemantle, R., Ungar, L.H.: Modeling probability forecasts via information diversity. J. Am. Stat. Assoc.
**111**(516), 1623–1633 (2016)MathSciNetCrossRefGoogle Scholar - Scott, S.L., Blocker, A.W., Bonassi, F.V., Chipman, H.A., George, E.I., McCulloch, R.E.: Bayes and big data: the consensus monte carlo algorithm. Int. J. Manag. Sci. Eng. Manag.
**11**(2), 78–88 (2016)Google Scholar - Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer, Berlin (2012)Google Scholar
- Stein, M.L.: Limitations on low rank approximations for covariance matrices of spatial data. Spat. Stat.
**8**, 1–19 (2014)MathSciNetCrossRefGoogle Scholar - Tresp, V.: A bayesian committee machine. Neural Comput.
**12**(11), 2719–2741 (2000)CrossRefGoogle Scholar - Tzeng, S., Huang, H.-C., Cressie, N.: A fast, optimal spatial-prediction method for massive datasets. J. Am. Stat. Assoc.
**100**(472), 1343–1357 (2005)MathSciNetCrossRefMATHGoogle Scholar - van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., Emmerich, M.: Optimally weighted cluster Kriging for big data regression. In: International Symposium on Intelligent Data Analysis, pp. 310–321. Springer (2015)Google Scholar
- Wahba, G.: Spline Models for Observational Data, vol. 59. SIAM, Philadelphia (1990)CrossRefMATHGoogle Scholar
- Wei, H., Du, Y., Liang, F., Zhou, C., Liu, Z., Yi, J., Xu, K., Wu, D.: A k-d tree-based algorithm to parallelize Kriging interpolation of big spatial data. GISci. Remote Sens.
**52**(1), 40–57 (2015)CrossRefGoogle Scholar - Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)MATHGoogle Scholar
- Winkler, R.L.: The consensus of subjective probability distributions. Manag. Sci.
**15**(2), B–61 (1968)CrossRefGoogle Scholar - Winkler, R.L.: Combining probability distributions from dependent information sources. Manag. Sci.
**27**(4), 479–488 (1981)MathSciNetCrossRefMATHGoogle Scholar - Zhang, B., Sang, H., Huang, J.Z.: Full-scale approximations of spatio-temporal covariance models for large datasets. Stat. Sinica
**25**(1), 99–114 (2015)Google Scholar