Skip to main content

Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models

Abstract

Complex, mechanistic hydrological models can be computationally expensive, have large numbers of input parameters, and generate multivariate output. Model emulators can be constructed to approximate these complex models with substantial computational savings, making activities such as sensitivity analysis, calibration and uncertainty analysis feasible. Success in the use of an emulator relies on it making accurate and precise predictions of the model output. However, it is often unclear what type of emulation approach will be suitable. We present a comparison of reduced-rank, multivariate emulators built upon different ‘emulation engines’ and apply them to the Australian Water Resource Assessment System model. We examine first-order and second-order approaches which focus on specifying the mean and covariance, respectively. We also introduce a nonparametric approach for quantifying the uncertainty associated with the emulated prediction where this has bounded support. Our results demonstrate that emulation engines based on second-order approaches, such as Gaussian processes, can be computationally burdensome and may be comparable in performance to computationally efficient, first-order methods such as random forests.Supplementary materials accompanying this paper appear online.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  • Asher, M. J., Croke, B. F. W., Jakeman, A. J., and Peeters, L. J. M. (2015). A review of surrogate models and their application to groundwater modeling. Water Resources Research, 51(8):5957–5973.

    Article  Google Scholar 

  • Bastos, L. and O’Hagan, A. (2009). Diagnostics for Gaussian process emulators. Technometrics, 51(4):425–438. cited By 80.

    MathSciNet  Article  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.

    Article  MATH  Google Scholar 

  • Conti, S. and O’Hagan, A. (2010). Bayesian emulation of complex multi-output and dynamic computer models. Journal of Statistical Planning and Inference, 140(3):640–651. cited By 62.

    MathSciNet  Article  MATH  Google Scholar 

  • Cressie, N. and Wikle, C. K. (2011). Statistics for spatio-temporal data. John Wiley & Sons.

  • Frolov, S., Baptista, A., Leen, T., Lu, Z., and van der Merwe, R. (2009). Fast data assimilation using a nonlinear kalman filter and a model surrogate: An application to the columbia river estuary. Dynamics of Atmospheres and Oceans, 48(1–3):16–45. cited By 15.

    Article  Google Scholar 

  • Gramacy, R. and Apley, D. (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2):561–578. cited By 1.

    MathSciNet  Article  Google Scholar 

  • Gramacy, R. and Lee, H. (2007). tgp: an R package for Bayesian nonstationary, semiparametric nonlinear regression ad design by treed Gaussian process models. Journal of Statistical Software, 19(9):1–46.

    Article  Google Scholar 

  • Gramacy, R. and Lee, H. (2008a). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103(483):1119–1130. cited By 133.

    MathSciNet  Article  MATH  Google Scholar 

  • —— (2008b). Gaussian processes and limiting linear models. Computational Statistics and Data Analysis, 53:123–136.

  • Gramacy, R. B. (2016). laGP: Large-scale spatial modeling via local approximate gaussian processes in R. Journal of Statistical Software, 72(1):1–46.

    MathSciNet  Article  Google Scholar 

  • Hastie, T. and Tibshirani, R. (1986). Generalized additive models. Statistical Science, 1:297–310.

    MathSciNet  Article  MATH  Google Scholar 

  • Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computer model calibration using high-dimensional output. Journal of the American Statistical Association, 103(482):570–583. cited By 168.

    MathSciNet  Article  MATH  Google Scholar 

  • Hooten, M., Leeds, W., Fiechter, J., and Wikle, C. (2011). Assessing first-order emulator inference for physical parameters in nonlinear mechanistic models. Journal of Agricultural, Biological, and Environmental Statistics, 16(4):475–494. cited By 13.

    MathSciNet  Article  MATH  Google Scholar 

  • Kennedy, M. and O’Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 63(3):425–450. cited By 711.

    MathSciNet  Article  MATH  Google Scholar 

  • Leeds, W., Wikle, C., and Fiechter, J. (2014). Emulator-assisted reduced-rank ecological data assimilation for nonlinear multivariate dynamical spatio-temporal processes. Statistical Methodology, 17(0):126–138. Modern Statistical Methods in Ecology.

  • Leeds, W., Wikle, C., Fiechter, J., Brown, J., and Milliff, R. (2013). Modeling 3-d spatio-temporal biogeochemical processes with a forest of 1-d statistical emulators. Environmetrics, 24(1):1–12. cited By 6.

    MathSciNet  Article  Google Scholar 

  • Liu, F. and West, M. (2009). A dynamic modelling strategy for Bayesian computer model emulation. Bayesian Analysis, 4(2):393–412. cited By 23.

    MathSciNet  Article  MATH  Google Scholar 

  • Lorenz, E. (1956). Empirical orthogonal functions and statistical weather prediction, statistical forecasting project. Statistical Forecasting Project - Scientific Report No. 1, 49pp.

  • Luo, J. and Lu, W. (2014). Comparison of surrogate models with different methods in groundwater remediation process. Journal of Earth System Science, 123(7):1579–1589.

    Article  Google Scholar 

  • Machac, D., Reichert, P., Rieckermann, J., and Albert, C. (2016). Fast mechanism-based emulator of a slow urban hydrodynamic drainage simulator. Environmental Modelling & Software, 78:54–67.

    Article  Google Scholar 

  • Mara, T. and Joseph, O. (2008). Comparison of some efficient methods to evaluate the main effect of computer model factors. Journal of Statistical Computation and Simulation, 78(2):167–178. cited By 8.

    MathSciNet  Article  MATH  Google Scholar 

  • Oakley, J. and O’Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A Bayesian approach. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 66(3):751–769. cited By 378.

    MathSciNet  Article  MATH  Google Scholar 

  • O’Hagan, A. (2006). Bayesian analysis of computer code outputs: A tutorial. Reliability Engineering and System Safety, 91(10-11):1290–1300. cited By 173.

    Article  Google Scholar 

  • Paciorek, C., Lipshitz, B., Zhu, W., Prabhat, P., Kaufman, C., and Thomas, R. (2015). Parallelizing Gaussian process calculations in R. Journal of Statistical Software, 63(10):1–23. cited By 1.

    Article  Google Scholar 

  • Preisendorfer, R. (1988). Principal component analysis in meteorology and oceanography. Elsevier. cited By 919.

  • R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

  • Rasmussen, C. E. and Williams, C. K. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA.

    MATH  Google Scholar 

  • Razavi, S., Tolson, B. A., and Burn, D. H. (2012). Review of surrogate modeling in water resources. Water Resources Research, 48(7):n/a–n/a. W07401.

  • Reichert, P., White, G., Bayarri, M., and Pitman, E. (2011). Mechanism-based emulation of dynamic simulation models: Concept and application in hydrology. Computational Statistics & Data Analysis, 55(4):1638–1655.

    MathSciNet  Article  MATH  Google Scholar 

  • Rougier, J. (2008). Efficient emulators for multivariate deterministic functions. Journal of Computational and Graphical Statistics, 17(4):827–843. cited By 50.

    MathSciNet  Article  Google Scholar 

  • Sacks, J., William, J., Mitchell, T., and Wynn, H. (1989). Design and analysis of computer experiments. Statist. Sci., 4(4):409–423.

    MathSciNet  Article  MATH  Google Scholar 

  • Schnorbus, M. A. and Cannon, A. J. (2014). Statistical emulation of streamflow projections from a distributed hydrological model: Application to cmip3 and cmip5 climate projections for british columbia, canada. Water Resources Research, 50(11):8907–8926.

    Article  Google Scholar 

  • Sobol’, I. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. USSR Computational Mathematics and Mathematical Physics, 7(4):86–112. cited By 594.

    MathSciNet  Article  MATH  Google Scholar 

  • Sparnocchia, S., Pinardi, N., and Demirov, E. (2003). Multivariate empirical orthogonal function analysis of the upper thermocline structure of the mediterranean sea from observations and model simulations. Annales Geophysicae, 21(1 PART I):167–187. cited By 0.

  • Stanfill, B., Mielenz, H., Clifford, D., and Thorburn, P. (2015). Simple approach to emulating complex computer models for global sensitivity analysis. Environmental Modelling & Software, 74:140–155.

    Article  Google Scholar 

  • Storlie, C., Swiler, L., Helton, J., and Sallaberry, C. (2009). Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliability Engineering and System Safety, 94(11):1735–1763. cited By 126.

    Article  Google Scholar 

  • Strong, M., Oakley, J., and Brennan, A. (2014). Estimating multiparameter partial expected value of perfect information from a probabilistic sensitivity analysis sample: A nonparametric regression approach. Medical Decision Making, 34(3):311–326. cited By 6.

    Article  Google Scholar 

  • Sudret, B. (2008). Global sensitivity analysis using polynomial chaos expansions. Reliability Engineering & System Safety, 93(7):964–979. Bayesian Networks in Dependability.

  • van der Merwe, R., Leen, T., Lu, Z., Frolov, S., and Baptista, A. (2007). Fast neural network surrogates for very high dimensional physics-based models in computational oceanography. Neural Networks, 20(4):462–478. cited By 24.

    Article  Google Scholar 

  • Vaze, J., Viney, N., Stenson, M., Renzullo, L., Van Dijk, A., Dutta, D., Crosbie, R., Lerat, J., Penton, D., Vleeshouwer, J., Peeters, L., Teng, J., Kim, S., Hughes, J., Dawes, W., Zhang, Y., Leighton, B., Perraud, J.-M., Joehnk, K., Yang, A., Wang, B., Frost, A., Elmahdi, A., Smith, A., and Daamen, C. (2013). The australian water resource assessment modelling system (awra). In Piantadosi, J., Anderssen, R., and Boland, J., editors, MODSIM2013, 20th International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand.

  • Viney, N., Vaze, J., Crosbie, R., Wang, B., Dawes, W., and Frost, A. (2014). AWRA-L v4.5: technical description of model algorithms and inputs. CSIRO.

  • Wikle, C. (2015). Modern perspectives on statistics for spatio-temporal data. Wiley Interdisciplinary Reviews: Computational Statistics, 7(1):86–98. cited By 0.

    MathSciNet  Article  Google Scholar 

  • Wood, S. (2006). Generalized Additive Models: an Introduction with R. CRC press.

  • Zhan, C.-s., Song, X.-m., Xia, J., and Tong, C. (2013). An efficient integrated approach for global sensitivity analysis of hydrological model parameters. Environmental Modelling & Software, 41:39–52.

    Article  Google Scholar 

  • Zhang, Y., Viney, N., Chen, Y., and Li, H. Y. (2011). Collation of streamflow data for 719 unregulated australian catchments. Technical report, CSIRO: Water for a Healthy Country National Research Flagship.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel W. Gladish.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (R 8 KB)

Supplementary material 2 (pdf 48 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gladish, D.W., Pagendam, D.E., Peeters, L.J.M. et al. Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models. JABES 23, 39–62 (2018). https://doi.org/10.1007/s13253-017-0308-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-017-0308-3

Keywords

  • Surrogate model
  • Meta-model
  • Random forests
  • Gaussian processes
  • AWRA
  • Reduced-rank multivariate statistical emulator