Statistics and Computing

, Volume 28, Issue 4, pp 849–867 | Cite as

Nested Kriging predictions for datasets with a large number of observations

  • Didier Rullière
  • Nicolas DurrandeEmail author
  • François Bachoc
  • Clément Chevalier


This work falls within the context of predicting the value of a real function at some input locations given a limited number of observations of this function. The Kriging interpolation technique (or Gaussian process regression) is often considered to tackle such a problem, but the method suffers from its computational burden when the number of observation points is large. We introduce in this article nested Kriging predictors which are constructed by aggregating sub-models based on subsets of observation points. This approach is proven to have better theoretical properties than other aggregation methods that can be found in the literature. Contrarily to some other methods it can be shown that the proposed aggregation method is consistent. Finally, the practical interest of the proposed method is illustrated on simulated datasets and on an industrial test case with \(10^4\) observations in a 6-dimensional space.


Gaussian process regression Big data Aggregation methods Best linear unbiased predictor Spatial processes 



Part of this research was conducted within the frame of the Chair in Applied Mathematics OQUAIDO, gathering partners in technological research (BRGM, CEA, IFPEN, IRSN, Safran, Storengy) and academia (Ecole Centrale de Lyon, Mines Saint-Etienne, University of Grenoble, University of Nice, University of Toulouse and CNRS) around advanced methods for Computer Experiments. The authors would like to warmly thank Dr. Géraud Blatman and EDF R&D for providing us the industrial test case. They also thank both editor and reviewers for very precise and constructive comments on this paper. This paper has been finished during a stay of D. Rullière at Vietnam Institute for Advanced Study in Mathematics, the latter author thanks the VIASM institute and DAMI research chair (Data Analytics & Models for Insurance) for their support.


  1. Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model mispecification. Comput. Stat. Data Anal. 66, 55–69 (2013)CrossRefGoogle Scholar
  2. Bachoc, F., Durrande, N., Rullière, D., Chevalier, C.: Some properties of nested Kriging predictors. Technical report hal-01561747 (2017)Google Scholar
  3. Bhatnagar, S., Prasad, H., Prashanth, L.: Stochastic Recursive Algorithms for Optimization, vol. 434. Springer, New York (2013)zbMATHGoogle Scholar
  4. Cao, Y., Fleet, D.J.: Generalized product of experts for automatic and principled fusion of Gaussian process predictions. arXiv preprint arXiv:1410.7827v2, CoRR, abs/1410.7827:1–5. Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal (2014)
  5. Deisenroth, M.P., Ng, J.W.: Distributed Gaussian processes. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France. JMLR: W & CP vol. 37 (2015)Google Scholar
  6. Genest, C., Zidek, J.V.: Combining probability distributions: a critique and an annotated bibliography. Stat. Sci. 1(1), 114–135 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  7. Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)zbMATHGoogle Scholar
  8. Guhaniyogi, R., Finley, A.O., Banerjee, S., Gelfand, A.E.: Adaptive Gaussian predictive process models for large spatial datasets. Environmetrics 22(8), 997–1007 (2011)MathSciNetCrossRefGoogle Scholar
  9. Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian Processes for Big Data. Uncertainty in Artificial Intelligence Conference. Paper Id 244 (2013)Google Scholar
  10. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)CrossRefzbMATHGoogle Scholar
  11. Katzfuss, M.: Bayesian nonstationary spatial modeling for very large datasets. Environmetrics 24(3), 189–200 (2013)MathSciNetCrossRefGoogle Scholar
  12. Lemaitre, J., Chaboche, J.-L.: Mechanics of Solid Materials. Cambridge University Press, Cambridge (1994)zbMATHGoogle Scholar
  13. Maurya, A.: A well-conditioned and sparse estimation of covariance and inverse covariance matrices using a joint penalty. J. Mach. Learn. Res. 17(1), 4457–4484 (2016)MathSciNetzbMATHGoogle Scholar
  14. Nickson, T., Gunter, T., Lloyd, C., Osborne, M.A., Roberts, S.: Blitzkriging: Kronecker-structured stochastic Gaussian processes. arXiv preprint arXiv:1510.07965v2, pp 1–13 (2015)
  15. Ranjan, R., Gneiting, T.: Combining probability forecasts. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72(1), 71–91 (2010)MathSciNetCrossRefGoogle Scholar
  16. Roustant, O., Ginsbourger, D., Deville, Y.: DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J. Stat. Softw. 51(1), 1–55 (2012)Google Scholar
  17. Rue, H., Held, L.: Gaussian Markov Random Fields. Theory and Applications. Chapman & Hall, London (2005)CrossRefzbMATHGoogle Scholar
  18. Samo, Y.-L.K., Roberts, S.J.: String and membrane Gaussian processes. J. Mach. Learn. Res. 17(131), 1–87 (2016)MathSciNetzbMATHGoogle Scholar
  19. Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer, Berlin (2013)zbMATHGoogle Scholar
  20. Satopää, V.A., Pemantle, R., Ungar, L.H.: Modeling probability forecasts via information diversity. J. Am. Stat. Assoc. 111(516), 1623–1633 (2016)MathSciNetCrossRefGoogle Scholar
  21. Scott, S.L., Blocker, A.W., Bonassi, F.V., Chipman, H.A., George, E.I., McCulloch, R.E.: Bayes and big data: the consensus monte carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11(2), 78–88 (2016)Google Scholar
  22. Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer, Berlin (2012)Google Scholar
  23. Stein, M.L.: Limitations on low rank approximations for covariance matrices of spatial data. Spat. Stat. 8, 1–19 (2014)MathSciNetCrossRefGoogle Scholar
  24. Tresp, V.: A bayesian committee machine. Neural Comput. 12(11), 2719–2741 (2000)CrossRefGoogle Scholar
  25. Tzeng, S., Huang, H.-C., Cressie, N.: A fast, optimal spatial-prediction method for massive datasets. J. Am. Stat. Assoc. 100(472), 1343–1357 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  26. van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., Emmerich, M.: Optimally weighted cluster Kriging for big data regression. In: International Symposium on Intelligent Data Analysis, pp. 310–321. Springer (2015)Google Scholar
  27. Wahba, G.: Spline Models for Observational Data, vol. 59. SIAM, Philadelphia (1990)CrossRefzbMATHGoogle Scholar
  28. Wei, H., Du, Y., Liang, F., Zhou, C., Liu, Z., Yi, J., Xu, K., Wu, D.: A k-d tree-based algorithm to parallelize Kriging interpolation of big spatial data. GISci. Remote Sens. 52(1), 40–57 (2015)CrossRefGoogle Scholar
  29. Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)zbMATHGoogle Scholar
  30. Winkler, R.L.: The consensus of subjective probability distributions. Manag. Sci. 15(2), B–61 (1968)CrossRefGoogle Scholar
  31. Winkler, R.L.: Combining probability distributions from dependent information sources. Manag. Sci. 27(4), 479–488 (1981)CrossRefzbMATHGoogle Scholar
  32. Zhang, B., Sang, H., Huang, J.Z.: Full-scale approximations of spatio-temporal covariance models for large datasets. Stat. Sinica 25(1), 99–114 (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Didier Rullière
    • 1
  • Nicolas Durrande
    • 2
    Email author
  • François Bachoc
    • 3
  • Clément Chevalier
    • 4
  1. 1.Laboratoire SAF, EA2429, ISFA, Université Claude Bernard Lyon 1Université de LyonLyonFrance
  2. 2.Institut Fayol—LIMOSMines Saint-ÉtienneSaint-ÉtienneFrance
  3. 3.Institut de Mathématiques de ToulouseUniversité Paul SabatierToulouseFrance
  4. 4.Institute of StatisticsUniversity of NeuchâtelNeuchâtelSwitzerland

Personalised recommendations