Nested Kriging predictions for datasets with a large number of observations

  • Didier Rullière
  • Nicolas Durrande
  • François Bachoc
  • Clément Chevalier
Article

Abstract

This work falls within the context of predicting the value of a real function at some input locations given a limited number of observations of this function. The Kriging interpolation technique (or Gaussian process regression) is often considered to tackle such a problem, but the method suffers from its computational burden when the number of observation points is large. We introduce in this article nested Kriging predictors which are constructed by aggregating sub-models based on subsets of observation points. This approach is proven to have better theoretical properties than other aggregation methods that can be found in the literature. Contrarily to some other methods it can be shown that the proposed aggregation method is consistent. Finally, the practical interest of the proposed method is illustrated on simulated datasets and on an industrial test case with \(10^4\) observations in a 6-dimensional space.

Keywords

Gaussian process regression Big data Aggregation methods Best linear unbiased predictor Spatial processes 

References

  1. Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model mispecification. Comput. Stat. Data Anal. 66, 55–69 (2013)MathSciNetCrossRefGoogle Scholar
  2. Bachoc, F., Durrande, N., Rullière, D., Chevalier, C.: Some properties of nested Kriging predictors. Technical report hal-01561747 (2017)Google Scholar
  3. Bhatnagar, S., Prasad, H., Prashanth, L.: Stochastic Recursive Algorithms for Optimization, vol. 434. Springer, New York (2013)MATHGoogle Scholar
  4. Cao, Y., Fleet, D.J.: Generalized product of experts for automatic and principled fusion of Gaussian process predictions. arXiv preprint arXiv:1410.7827v2, CoRR, abs/1410.7827:1–5. Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal (2014)
  5. Deisenroth, M.P., Ng, J.W.: Distributed Gaussian processes. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France. JMLR: W & CP vol. 37 (2015)Google Scholar
  6. Genest, C., Zidek, J.V.: Combining probability distributions: a critique and an annotated bibliography. Stat. Sci. 1(1), 114–135 (1986)MathSciNetCrossRefMATHGoogle Scholar
  7. Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)MATHGoogle Scholar
  8. Guhaniyogi, R., Finley, A.O., Banerjee, S., Gelfand, A.E.: Adaptive Gaussian predictive process models for large spatial datasets. Environmetrics 22(8), 997–1007 (2011)MathSciNetCrossRefGoogle Scholar
  9. Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian Processes for Big Data. Uncertainty in Artificial Intelligence Conference. Paper Id 244 (2013)Google Scholar
  10. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)CrossRefMATHGoogle Scholar
  11. Katzfuss, M.: Bayesian nonstationary spatial modeling for very large datasets. Environmetrics 24(3), 189–200 (2013)MathSciNetCrossRefGoogle Scholar
  12. Lemaitre, J., Chaboche, J.-L.: Mechanics of Solid Materials. Cambridge University Press, Cambridge (1994)MATHGoogle Scholar
  13. Maurya, A.: A well-conditioned and sparse estimation of covariance and inverse covariance matrices using a joint penalty. J. Mach. Learn. Res. 17(1), 4457–4484 (2016)MathSciNetMATHGoogle Scholar
  14. Nickson, T., Gunter, T., Lloyd, C., Osborne, M.A., Roberts, S.: Blitzkriging: Kronecker-structured stochastic Gaussian processes. arXiv preprint arXiv:1510.07965v2, pp 1–13 (2015)
  15. Ranjan, R., Gneiting, T.: Combining probability forecasts. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72(1), 71–91 (2010)MathSciNetCrossRefGoogle Scholar
  16. Roustant, O., Ginsbourger, D., Deville, Y.: DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J. Stat. Softw. 51(1), 1–55 (2012)Google Scholar
  17. Rue, H., Held, L.: Gaussian Markov Random Fields. Theory and Applications. Chapman & Hall, London (2005)CrossRefMATHGoogle Scholar
  18. Samo, Y.-L.K., Roberts, S.J.: String and membrane Gaussian processes. J. Mach. Learn. Res. 17(131), 1–87 (2016)MathSciNetMATHGoogle Scholar
  19. Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer, Berlin (2013)MATHGoogle Scholar
  20. Satopää, V.A., Pemantle, R., Ungar, L.H.: Modeling probability forecasts via information diversity. J. Am. Stat. Assoc. 111(516), 1623–1633 (2016)MathSciNetCrossRefGoogle Scholar
  21. Scott, S.L., Blocker, A.W., Bonassi, F.V., Chipman, H.A., George, E.I., McCulloch, R.E.: Bayes and big data: the consensus monte carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11(2), 78–88 (2016)Google Scholar
  22. Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer, Berlin (2012)Google Scholar
  23. Stein, M.L.: Limitations on low rank approximations for covariance matrices of spatial data. Spat. Stat. 8, 1–19 (2014)MathSciNetCrossRefGoogle Scholar
  24. Tresp, V.: A bayesian committee machine. Neural Comput. 12(11), 2719–2741 (2000)CrossRefGoogle Scholar
  25. Tzeng, S., Huang, H.-C., Cressie, N.: A fast, optimal spatial-prediction method for massive datasets. J. Am. Stat. Assoc. 100(472), 1343–1357 (2005)MathSciNetCrossRefMATHGoogle Scholar
  26. van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., Emmerich, M.: Optimally weighted cluster Kriging for big data regression. In: International Symposium on Intelligent Data Analysis, pp. 310–321. Springer (2015)Google Scholar
  27. Wahba, G.: Spline Models for Observational Data, vol. 59. SIAM, Philadelphia (1990)CrossRefMATHGoogle Scholar
  28. Wei, H., Du, Y., Liang, F., Zhou, C., Liu, Z., Yi, J., Xu, K., Wu, D.: A k-d tree-based algorithm to parallelize Kriging interpolation of big spatial data. GISci. Remote Sens. 52(1), 40–57 (2015)CrossRefGoogle Scholar
  29. Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)MATHGoogle Scholar
  30. Winkler, R.L.: The consensus of subjective probability distributions. Manag. Sci. 15(2), B–61 (1968)CrossRefGoogle Scholar
  31. Winkler, R.L.: Combining probability distributions from dependent information sources. Manag. Sci. 27(4), 479–488 (1981)MathSciNetCrossRefMATHGoogle Scholar
  32. Zhang, B., Sang, H., Huang, J.Z.: Full-scale approximations of spatio-temporal covariance models for large datasets. Stat. Sinica 25(1), 99–114 (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Didier Rullière
    • 1
  • Nicolas Durrande
    • 2
  • François Bachoc
    • 3
  • Clément Chevalier
    • 4
  1. 1.Laboratoire SAF, EA2429, ISFA, Université Claude Bernard Lyon 1Université de LyonLyonFrance
  2. 2.Institut Fayol—LIMOSMines Saint-ÉtienneSaint-ÉtienneFrance
  3. 3.Institut de Mathématiques de ToulouseUniversité Paul SabatierToulouseFrance
  4. 4.Institute of StatisticsUniversity of NeuchâtelNeuchâtelSwitzerland

Personalised recommendations