Skip to main content

Bayes Linear Emulation, History Matching, and Forecasting for Complex Computer Simulators

  • Reference work entry
  • First Online:
Handbook of Uncertainty Quantification

Abstract

Computer simulators are a useful tool for understanding complicated systems. However, any inferences made from them should recognize the inherent limitations and approximations in the simulator’s predictions for reality, the data used to run and calibrate the simulator, and the lack of knowledge about the best inputs to use for the simulator. This article describes the methods of emulation and history matching, where fast statistical approximations to the computer simulator (emulators) are constructed and used to reject implausible choices of input (history matching). Also described is a simple and tractable approach to estimating the discrepancy between simulator and reality induced by certain intrinsic limitations and uncertainties in the simulator and input data. Finally, a method for forecasting based on this approach is presented. The analysis is based on the Bayes linear approach to uncertainty quantification, which is similar in spirit to the standard Bayesian approach but takes expectation, rather than probability, as the primitive for the theory, with consequent simplifications in the prior uncertainty specification and analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 1,099.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,399.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bastos, L.S., O’Hagan, A.: Diagnostics for Gaussian process emulators. Technometrics 51, 425–438 (2008)

    Article  MathSciNet  Google Scholar 

  2. Clark, M.P., Slater, A.G., Rupp, D.E., Woods, R.A., Vrugt, J.A., Gupta, H.V., Wagener, T., Hay, L.E.: Framework for Understanding Structural Errors (FUSE): a modular framework to diagnose differences between hydrological models. Water Resour. Res. 44, W00B02 (2008)

    Google Scholar 

  3. Craig, P.S., Goldstein, M., Seheult, A.H., Smith, J.A.: Pressure matching for hydrocarbon reservoirs: a case study in the use of Bayes linear strategies for large computer experiments (with discussion). In: Gastonis, C., et al. (eds.) Case Studies in Bayesian Statistics, vol. III, pp. 37–93. Springer, New York (1997)

    Chapter  Google Scholar 

  4. Craig, P.S., Goldstein, M., Rougier, J.C., Seheult, A.H.: Bayesian forecasting using large computer models. JASA 96, 717–729 (2001)

    Article  MATH  Google Scholar 

  5. Cumming, J., Goldstein, M.: Small sample Bayesian designs for complex high-dimensional models based on information gained using fast approximations. Technometrics 51, 377–388 (2009)

    Article  MathSciNet  Google Scholar 

  6. de Finetti, B.: Theory of Probability, vols. 1 & 2. Wiley, New York (1974, 1975)

    Google Scholar 

  7. Goldstein, M.: Subjective Bayesian analysis: principles and practice. Bayesian Anal. 1, 403–420 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. Goldstein, M., Rougier, J.C.: Bayes linear calibrated prediction for complex systems. JASA 101, 1132–1143 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Goldstein, M., Rougier, J.C.: Reified Bayesian modelling and inference for physical systems (with discussion). JSPI 139, 1221–1239 (2008)

    MATH  Google Scholar 

  10. Goldstein, M., Seheult, A., Vernon, I.: Assessing model adequacy. In: Wainwright, J., Mulligan, M. (eds) Environmental Modelling: Finding Simplicity in Complexity, 2nd edn., pp. 435–449, Wiley, Chichester (2010)

    Google Scholar 

  11. Goldstein, M., Wooff, D.A.: Bayes Linear Statistics: Theory and Methods. Wiley, Chichester/Hoboken (2007)

    Book  MATH  Google Scholar 

  12. Monteith, J.L.: Evaporation and environment. Symp. Soc. Exp. Biol. 19, 205–224 (1965)

    Google Scholar 

  13. O’Hagan, A.: Bayesian analysis of computer code outputs: a tutorial. Reliab. Eng. Syst. Saf. 91, 1290–1300 (2006)

    Article  Google Scholar 

  14. Pukelsheim, F.: The three sigma rule. Am. Stat. 48, 88–91 (1994)

    MathSciNet  Google Scholar 

  15. Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments. Springer, New York (2003)

    Book  MATH  Google Scholar 

  16. Vernon I., Goldstein M., and Bower, R.: Galaxy Formation: a Bayesian Uncertainty Analysis (with discussion). Bayesian Anal. 5, 619–670 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  17. Williamson, D., Goldstein, M., Allison, L., Blaker, A., Challenor, P., Jackson, L., Yamazaki, K.: History matching for exploring and reducing climate model parameter space using observations and a large perturbed physics ensemble. Clim. Dyn. 41(7–8), 1703–1729 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathan Huntley .

Editor information

Editors and Affiliations

Appendix: Internal Discrepancy Perturbations

Appendix: Internal Discrepancy Perturbations

In this appendix, a description of the internal discrepancy experiments is provided. The first step is to identify potential quantities to perturb. For FUSE, the obvious quantities to choose are the two input time series and the initial condition. Also, the parameters could be perturbed at every time step. Similarly, the state vectors could be perturbed at every step, but this was not feasible in FUSE. Other possibilities not considered here include the time scale of the simulator and the accuracy of the numerical solver.

The next step is to informally assess the potential influence of each quantity. For example, increasing all rainfall by 10 % makes a large difference to the output, whereas increasing all evapotranspiration by 10 % makes a smaller but noticeable difference. Meanwhile, making large changes to the initial condition leads to extremely small changes away from the start of the simulation (recall that the quantities of interest are near the end of the simulation). From these initial explorations, each quantity can be categorized: if it has very little influence, it may not be worth perturbing; if it has a small influence, it may be worth including but not expending much effort on; if it has a large influence, it is worth carefully modeling. The outcome of this exploration for FUSE suggested that the initial condition was hardly relevant, the evapotranspiration was worth including, and the rainfall and parameter perturbations deserved more attention.

The final step is to consider how to generate perturbations of each quantity. The initial condition is ignored. For evapotranspiration, good estimates of observation uncertainty are lacking, but given the low influence of this quantity this is not too worrying: any plausible perturbation should be sufficient. Each evapotranspiration observation was multiplied by a perturbation drawn from a log-normal distribution, such that most observations were perturbed by no more than 10 %. Correlation between observations within 24 h was also included, so if a particular observation has a high perturbation, nearby ones will also. This is motivated by the daily period of the evapotranspiration time series.

Parameter perturbations are performed by multiplying each initial parameter by some random perturbation, with nearby multipliers being correlated. The size of the perturbations were chosen such that the parameters rarely changed by much more than 10 % over the course of a simulation. This creates collections of perturbations that cause the parameters to evolve slowly without sudden large changes and without a large change overall. The parameter perturbations have a significant effect on the output, but expert opinion about how these are likely to change over time and by how much is lacking. In principle, in such a situation one should make the correlation and the magnitude of the perturbations configurable parameters, so as to understand their influence. For this example, however, this complication is avoided. An example of the evolution of a particular choice of x (1) for a particular perturbation can be seen in Fig. 2.6.

Fig. 2.6
figure 6

The evolution of parameter x (1) for a particular parameter perturbation

Perturbing the rainfall also has a significant effect on the output. In this case, however, there is some more guidance on the perturbations required. Sources of uncertainty in the rainfall was attributed to three significant causes: the “local” gauge measurement error, the process of aggregating readings to the nearest hour, and the process of averaging over the catchment by kriging. Suitable perturbations from these errors were generated and combined.

The overall rainfall perturbations generated for this process typically display occasional noticeable differences but mostly small differences. This suggests that the rainfall error could contribute significantly to discrepancy for maximum stream flow, but not so much for discrepancy for average stream flow.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this entry

Cite this entry

Goldstein, M., Huntley, N. (2017). Bayes Linear Emulation, History Matching, and Forecasting for Complex Computer Simulators. In: Ghanem, R., Higdon, D., Owhadi, H. (eds) Handbook of Uncertainty Quantification. Springer, Cham. https://doi.org/10.1007/978-3-319-12385-1_14

Download citation

Publish with us

Policies and ethics