Abstract
Complex, mechanistic hydrological models can be computationally expensive, have large numbers of input parameters, and generate multivariate output. Model emulators can be constructed to approximate these complex models with substantial computational savings, making activities such as sensitivity analysis, calibration and uncertainty analysis feasible. Success in the use of an emulator relies on it making accurate and precise predictions of the model output. However, it is often unclear what type of emulation approach will be suitable. We present a comparison of reduced-rank, multivariate emulators built upon different ‘emulation engines’ and apply them to the Australian Water Resource Assessment System model. We examine first-order and second-order approaches which focus on specifying the mean and covariance, respectively. We also introduce a nonparametric approach for quantifying the uncertainty associated with the emulated prediction where this has bounded support. Our results demonstrate that emulation engines based on second-order approaches, such as Gaussian processes, can be computationally burdensome and may be comparable in performance to computationally efficient, first-order methods such as random forests.Supplementary materials accompanying this paper appear online.






Similar content being viewed by others
References
Asher, M. J., Croke, B. F. W., Jakeman, A. J., and Peeters, L. J. M. (2015). A review of surrogate models and their application to groundwater modeling. Water Resources Research, 51(8):5957–5973.
Bastos, L. and O’Hagan, A. (2009). Diagnostics for Gaussian process emulators. Technometrics, 51(4):425–438. cited By 80.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
Conti, S. and O’Hagan, A. (2010). Bayesian emulation of complex multi-output and dynamic computer models. Journal of Statistical Planning and Inference, 140(3):640–651. cited By 62.
Cressie, N. and Wikle, C. K. (2011). Statistics for spatio-temporal data. John Wiley & Sons.
Frolov, S., Baptista, A., Leen, T., Lu, Z., and van der Merwe, R. (2009). Fast data assimilation using a nonlinear kalman filter and a model surrogate: An application to the columbia river estuary. Dynamics of Atmospheres and Oceans, 48(1–3):16–45. cited By 15.
Gramacy, R. and Apley, D. (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2):561–578. cited By 1.
Gramacy, R. and Lee, H. (2007). tgp: an R package for Bayesian nonstationary, semiparametric nonlinear regression ad design by treed Gaussian process models. Journal of Statistical Software, 19(9):1–46.
Gramacy, R. and Lee, H. (2008a). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103(483):1119–1130. cited By 133.
—— (2008b). Gaussian processes and limiting linear models. Computational Statistics and Data Analysis, 53:123–136.
Gramacy, R. B. (2016). laGP: Large-scale spatial modeling via local approximate gaussian processes in R. Journal of Statistical Software, 72(1):1–46.
Hastie, T. and Tibshirani, R. (1986). Generalized additive models. Statistical Science, 1:297–310.
Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computer model calibration using high-dimensional output. Journal of the American Statistical Association, 103(482):570–583. cited By 168.
Hooten, M., Leeds, W., Fiechter, J., and Wikle, C. (2011). Assessing first-order emulator inference for physical parameters in nonlinear mechanistic models. Journal of Agricultural, Biological, and Environmental Statistics, 16(4):475–494. cited By 13.
Kennedy, M. and O’Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 63(3):425–450. cited By 711.
Leeds, W., Wikle, C., and Fiechter, J. (2014). Emulator-assisted reduced-rank ecological data assimilation for nonlinear multivariate dynamical spatio-temporal processes. Statistical Methodology, 17(0):126–138. Modern Statistical Methods in Ecology.
Leeds, W., Wikle, C., Fiechter, J., Brown, J., and Milliff, R. (2013). Modeling 3-d spatio-temporal biogeochemical processes with a forest of 1-d statistical emulators. Environmetrics, 24(1):1–12. cited By 6.
Liu, F. and West, M. (2009). A dynamic modelling strategy for Bayesian computer model emulation. Bayesian Analysis, 4(2):393–412. cited By 23.
Lorenz, E. (1956). Empirical orthogonal functions and statistical weather prediction, statistical forecasting project. Statistical Forecasting Project - Scientific Report No. 1, 49pp.
Luo, J. and Lu, W. (2014). Comparison of surrogate models with different methods in groundwater remediation process. Journal of Earth System Science, 123(7):1579–1589.
Machac, D., Reichert, P., Rieckermann, J., and Albert, C. (2016). Fast mechanism-based emulator of a slow urban hydrodynamic drainage simulator. Environmental Modelling & Software, 78:54–67.
Mara, T. and Joseph, O. (2008). Comparison of some efficient methods to evaluate the main effect of computer model factors. Journal of Statistical Computation and Simulation, 78(2):167–178. cited By 8.
Oakley, J. and O’Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A Bayesian approach. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 66(3):751–769. cited By 378.
O’Hagan, A. (2006). Bayesian analysis of computer code outputs: A tutorial. Reliability Engineering and System Safety, 91(10-11):1290–1300. cited By 173.
Paciorek, C., Lipshitz, B., Zhu, W., Prabhat, P., Kaufman, C., and Thomas, R. (2015). Parallelizing Gaussian process calculations in R. Journal of Statistical Software, 63(10):1–23. cited By 1.
Preisendorfer, R. (1988). Principal component analysis in meteorology and oceanography. Elsevier. cited By 919.
R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Rasmussen, C. E. and Williams, C. K. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA.
Razavi, S., Tolson, B. A., and Burn, D. H. (2012). Review of surrogate modeling in water resources. Water Resources Research, 48(7):n/a–n/a. W07401.
Reichert, P., White, G., Bayarri, M., and Pitman, E. (2011). Mechanism-based emulation of dynamic simulation models: Concept and application in hydrology. Computational Statistics & Data Analysis, 55(4):1638–1655.
Rougier, J. (2008). Efficient emulators for multivariate deterministic functions. Journal of Computational and Graphical Statistics, 17(4):827–843. cited By 50.
Sacks, J., William, J., Mitchell, T., and Wynn, H. (1989). Design and analysis of computer experiments. Statist. Sci., 4(4):409–423.
Schnorbus, M. A. and Cannon, A. J. (2014). Statistical emulation of streamflow projections from a distributed hydrological model: Application to cmip3 and cmip5 climate projections for british columbia, canada. Water Resources Research, 50(11):8907–8926.
Sobol’, I. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. USSR Computational Mathematics and Mathematical Physics, 7(4):86–112. cited By 594.
Sparnocchia, S., Pinardi, N., and Demirov, E. (2003). Multivariate empirical orthogonal function analysis of the upper thermocline structure of the mediterranean sea from observations and model simulations. Annales Geophysicae, 21(1 PART I):167–187. cited By 0.
Stanfill, B., Mielenz, H., Clifford, D., and Thorburn, P. (2015). Simple approach to emulating complex computer models for global sensitivity analysis. Environmental Modelling & Software, 74:140–155.
Storlie, C., Swiler, L., Helton, J., and Sallaberry, C. (2009). Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliability Engineering and System Safety, 94(11):1735–1763. cited By 126.
Strong, M., Oakley, J., and Brennan, A. (2014). Estimating multiparameter partial expected value of perfect information from a probabilistic sensitivity analysis sample: A nonparametric regression approach. Medical Decision Making, 34(3):311–326. cited By 6.
Sudret, B. (2008). Global sensitivity analysis using polynomial chaos expansions. Reliability Engineering & System Safety, 93(7):964–979. Bayesian Networks in Dependability.
van der Merwe, R., Leen, T., Lu, Z., Frolov, S., and Baptista, A. (2007). Fast neural network surrogates for very high dimensional physics-based models in computational oceanography. Neural Networks, 20(4):462–478. cited By 24.
Vaze, J., Viney, N., Stenson, M., Renzullo, L., Van Dijk, A., Dutta, D., Crosbie, R., Lerat, J., Penton, D., Vleeshouwer, J., Peeters, L., Teng, J., Kim, S., Hughes, J., Dawes, W., Zhang, Y., Leighton, B., Perraud, J.-M., Joehnk, K., Yang, A., Wang, B., Frost, A., Elmahdi, A., Smith, A., and Daamen, C. (2013). The australian water resource assessment modelling system (awra). In Piantadosi, J., Anderssen, R., and Boland, J., editors, MODSIM2013, 20th International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand.
Viney, N., Vaze, J., Crosbie, R., Wang, B., Dawes, W., and Frost, A. (2014). AWRA-L v4.5: technical description of model algorithms and inputs. CSIRO.
Wikle, C. (2015). Modern perspectives on statistics for spatio-temporal data. Wiley Interdisciplinary Reviews: Computational Statistics, 7(1):86–98. cited By 0.
Wood, S. (2006). Generalized Additive Models: an Introduction with R. CRC press.
Zhan, C.-s., Song, X.-m., Xia, J., and Tong, C. (2013). An efficient integrated approach for global sensitivity analysis of hydrological model parameters. Environmental Modelling & Software, 41:39–52.
Zhang, Y., Viney, N., Chen, Y., and Li, H. Y. (2011). Collation of streamflow data for 719 unregulated australian catchments. Technical report, CSIRO: Water for a Healthy Country National Research Flagship.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Gladish, D.W., Pagendam, D.E., Peeters, L.J.M. et al. Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models. JABES 23, 39–62 (2018). https://doi.org/10.1007/s13253-017-0308-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-017-0308-3

