## Abstract

We apply an established statistical methodology called history matching to constrain the parameter space of a coupled non-flux-adjusted climate model (the third Hadley Centre Climate Model; HadCM3) by using a 10,000-member perturbed physics ensemble and observational metrics. History matching uses emulators (fast statistical representations of climate models that include a measure of uncertainty in the prediction of climate model output) to rule out regions of the parameter space of the climate model that are inconsistent with physical observations given the relevant uncertainties. Our methods rule out about half of the parameter space of the climate model even though we only use a small number of historical observations. We explore 2 dimensional projections of the remaining space and observe a region whose shape mainly depends on parameters controlling cloud processes and one ocean mixing parameter. We find that global mean surface air temperature (SAT) is the dominant constraint of those used, and that the others provide little further constraint after matching to SAT. The Atlantic meridional overturning circulation (AMOC) has a non linear relationship with SAT and is not a good proxy for the meridional heat transport in the unconstrained parameter space, but these relationships are linear in our reduced space. We find that the transient response of the AMOC to idealised CO_{2} forcing at 1 and 2 % per year shows a greater average reduction in strength in the constrained parameter space than in the unconstrained space. We test extended ranges of a number of parameters of HadCM3 and discover that no part of the extended ranges can by ruled out using any of our constraints. Constraining parameter space using easy to emulate observational metrics prior to analysis of more complex processes is an important and powerful tool. It can remove complex and irrelevant behaviour in unrealistic parts of parameter space, allowing the processes in question to be more easily studied or emulated, perhaps as a precursor to the application of further relevant constraints.

This is a preview of subscription content, access via your institution.

## References

Acreman DM, Jeffery CD (2007) The use of Argo for validation and tuning of mixed layer models. Ocean Model 19:53–69

Berliner LM, Kim Y (2008) Bayesian design and analysis for superensemble-based climate forecasting. J Clim 21(9):1981–1910

Brierley CM, Collins M, Thorpe AJ (2010) The impact of perturbations to ocean-model parameters on climate and climate change in a coupled model. Clim Dyn 34:325–343

Broecker WS (1987) The biggest chill. Nat Hist Mag 97:74–82

Brohan P, Kennedy JJ, Harris I, Tett SFB, Jones PD (2006) Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850. J Geophys Res 111:D12106

Challenor P, McNeall D, Gattiker J (2009) Assessing the probability of rare climate events. In: O’Hagan A, West M (eds) The handbook of applied Bayesian analysis, chap. 10. Oxford University Press, Oxford

Collins M, Brierley CM, MacVean M, Booth BBB, Harris GR (2007) The sensitivity of the rate of transient climate change to ocean physics perturbations. J Clim 20:23315–2320

Collins M, Booth BBB, Bhaskaran B, Harris GR, Murphy JM, Sexton DMH, Webb MJ (2011) Climate model errors, feedbacks and forcings: a comparison of perturbed physics and multi-model experiments. Clim Dyn 36:1737–1766

Craig PS, Goldstein M, Seheult AH, Smith JA (1996) Bayes linear strategies for matching hydrocarbon reservoir history. In: Bernado JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics 5. Oxford University Press, Oxford, pp 69–95

Craig PS, Goldstein M, Seheult AH, Smith JA (1997) Pressure matching for hydrocarbon reservoirs: a case study in the use of Bayes linear strategies for large computer experiments. In: Gatsonis C, Hodges JS, Kass RE, McCulloch R, Rossi P, Singpurwalla ND (eds) Case studies in Bayesian statistics vol III. Springer, New York, pp 36–93

Craig PS, Goldstein M, Rougier JC, Seheult AH (2001) Bayesian forecasting for complex systems using computer simulators. J Am Stat Assoc 96:717–729

Cumming JA, Goldstein M (2010) Bayes linear uncertainty analysis for oil reservoirs based on multiscale computer experiments. In: O’Hagan A, West M (eds) The Oxford handbook of applied Bayesian analysis. Oxford University Press, Oxford, pp 241–270

de Finetti B (1974) Theory of probability, volume 1. Wiley, New York

de Finetti B (1975) Theory of probability, volume 2. Wiley, New York

Dickson RR, Brown J (1994) The production of North Atlantic deep water: sources, rates, and pathways. J Geophys Res Oceans 99:12,319–12,341

Draper NR, Smith H (1998) Applied regression analysis, 3rd edn. Wiley, New York

Edwards NR, Cameron D, Rougier JC (2011) Precalibrating an intermediate complexity climate model. Clim Dyn 37:1469–1482

Frieler K, Meinshausen M, Schneider von Deimling T, Andrews T, Forster P (2011) Changes in global-mean precipitation in response to warming, greenhouse gas forcing and black carbon. Geophys Res Lett 38:L04702. doi:10.1029/2010GL045953

Furrer R, Sain SR, Nychka D, Meehl GA (2007) Multivariate Bayesian analysis of atmosphere–ocean general circulation models. Environ Ecol Stat 14:249–266

Goldstein M (1986) Exchangeable belief structures. J Am Stat Assoc 81:971–976

Goldstein M (1986) Prevision. In: Kotz S, Johnson NL (eds) Encyclopaedia of statistical sciences, vol 7. pp 175–176

Goldstein M, Rougier JC (2004) Probabilistic formulations for transferring inferences from mathematical models to physical systems. SIAM J Sci Comput 26(2):467–487

Goldstein M, Rougier JC (2009) Reified Bayesian modelling and inference for physical systems. J Stat Plan Inference 139:1221–1239

Goldstein M, Wooff D (2007) Bayes linear statistics theory and methods. Wiley, New York

Gordon C, Cooper C, Senior CA, Banks H, Gregory JM, Johns TC, Mitchell JFB, Wood RA (2000) The simulation of SST, sea ice extents and ocean heat transports in a version of the Hadley Centre coupled model without flux adjustments. Clim Dyn 16:147–168

Gregory JM, Dixon KW, Stouffer RJ, Weaver AJ, Driesschaert E, Eby M, Fichefet T, Hasumi H, Hu A, Jungclaus JH,Kamenkovich IV, Levermann A, Montoya M, Murakami S, Nawrath S, Oka A, Sokolov AP, Thorpe, RB (2005) Subannual, seasonal and interannual variability of the North Atlantic meridional overturning circulation. Geophys Res Lett 32. doi:10.1029/2005GL023209

Huffman GJ, Adler RF, Bolvin DT, Gu G (2009) Improving the global precipitation record: GPCP Version 2.1. Geophys Res Lett 36:L17808. doi:10.1029/2009GL040000

Jackson L, Vellinga M, Harris G (2012) The sensitivity of the meridional overturning circulation to modelling uncertainty in a perturbed physics ensemble without flux adjustment. Clim Dyn. doi:10.1007/s00382-011-1110-5

Johns WE, Baringer MO, Beal LM, Cunningham SA, Kanzow T, Bryden HL, Hirschi JJM, Marotzke J, Meinen C, Shaw B, Curry R (2011) Continuous, array-based estimates of Atlantic Ocean heat transport at 26.5N. J Clim 24:2429–2449

Jones PD, New M, Parker DE, Martin S, Rigor IG (1999) Surface air temperature and its changes over the past 150 years. Rev Geophys 37(2):173–199

Joshi MM, Webb MJ, Maycock AC, Collins M (2010) Stratospheric water vapour and high climate sensitivity in a version of the HadSM3 climate model. Atmos Chem Phys 10:7161–7167

Kraus EB, Turner J (1967) A one dimensional model of the seasonal thermocline II. The general theory and its consequences. Tellus 19:98–106

Legates DR, Willmott CJ (1990) Mean seasonal and spatial variability in global surface air temperature. Theor Appl Climatol 41:11–21

Kennedy MC, O’Hagan A (2001) Bayesian calibration of computer models. J R Stat Soc Ser B 63:425–464

Kuhlbrodt T, Griesel A, Montoya M, Levermann A, Hofmann M, Rahmstorf S (2007) On the driving processes of the Atlantic meridional overturning circulation. Rev Geophys 45. doi:10.1029/2004RG000166

Liu C, Allan RP (2012) Multisatellite observed responses of precipitation and its extremes to interannual climate variability. J Geophys Res 117:D03101. doi:10.1029/2011JD016568

Manabe S, Stouffer RJ (1980) Sensitivity of a global climate model to an increase of CO2 concentration in the atmosphere. J Geophys Res 85:5529–5554

McManus JF, Francois R, Gherardi JM, Keigwin LD, Brown-Leger S (2004) Collapse and rapid resumption of Atlantic meridional circulation linked to deglacial climate changes. Nature 428:834–837. doi:10.1038/nature02494

Meehl GA, Covey C, Delworth T, Latif M, McAvaney B, Mitchell JFB, Stouffer RJ, Taylor KE (2007) The WCRP CMIP3 multi-model dataset: a new era in climate change research. Bull Am Meteorol Soc 88:1383–1394

Morris MD, Mitchell TJ (1995) Exploratory designs for computational experiments. J Stat Plan Inference 43:381–402

Murphy JM, Sexton DMH, Barnett DN, Jones GS, Webb MJ, Collins M, Stainforth DA (2004) Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature 430:768–772

Murphy JM, Sexton DMH, Jenkins GJ, Booth BBB, Brown CC, Clark RT, Collins M, Harris GR, Kendon EJ, Betts RA, Brown SJ, Humphrey KA, McCarthy MP, McDonald RE, Stephens A, Wallace C, Warren R, Wilby R, Wood R (2009) UK Climate Projections Science Report: Climate change projections. Met Office Hadley Centre, Exeter, UK. http://ukclimateprojections.defra.gov.uk/images/stories/projections_pdfs/UKCP09_Projections_V2.pdf

Pope VD, Gallani ML, Rowntree PR, Stratton RA (2000) The impact of new physical parameterizations in the Hadley Centre climate model: HadAM3. Clim Dyn 16:123–146

Pukelsheim F (1994) The three sigma rule. Am Stat 48:88–91

Rhines PB, Häkkinen S (2003) Is the oceanic heat transport in the North Atlantic irrelevant to the climate in Europe? ASOF Newsl 13–17

Rice JA (1995) Mathematical statistics and data analysis, 2nd edn. Duxbury Press, Wadsworth Publishing Company, Belmont, California

Rougier JC (2007) Probabilistic inference for future climate using an ensemble of climate model evaluations. Clim Change 81:247–264

Rougier JC, Sexton DMH, Murphy JM, Stainforth D (2009) Emulating the sensitivity of the HadSM3 climate model using ensembles from different but related experiments. J Clim 22:3540–3557

Rougier JC, Goldstein M, House L (2012) Second-order exchangeability analysis for multi-model ensembles. J Am Stat Assoc (to appear)

Rowlands DJ, Frame DJ, Ackerley D, Aina T, Booth BBB, Christensen C, Collins M, Faull N, Forest CE, Grandey BS, Gryspeerdt E, Highwood EJ, Ingram WJ, Knight S, Lopez A, Massey N, McNamara F, Meinshausen N, Piani C, Rosier SM, Sanderson BJ, Smith LA, Stone DA, Thurston M, Yamazaki K, Yamazaki YH, Allen MR (2012) Broad range of 2050 warming from an observationally constrained large climate model ensemble. Nat Geosci, published online. doi:10.1038/NGEO1430

Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4:409–435

Sanderson BM, Shell KM, Ingram W (2010) Climate feedbacks determined using radiative kernels in a multi-thousand member ensemble of AOGCMs. Clim Dyn 35:1219–1236

Santner TJ, Williams BJ, Notz WI (2003) The design and analysis of computer experiments. Springer, New York

Sexton DMH, Murphy JM, Collins M (2011) Multivariate probabilistic projections using imperfect climate models part 1: outline of methodology. Clim Dyn. doi:10.1007/s00382-011-1208-9

Solomon S, Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M, Miller HL (eds) (2007) Contribution of working group I to the fourth assessment report of the intergovernmental panel on climate change, 2007. Cambridge University Press, Cambridge

Stephens GL, Wild M, Stackhouse Jr PW, L’Ecuyer T, Kato S, Henderson DS (2012) The global character of the flux of downward longwave radiation. J Clim 25:2329–2340

Trenberth KE, Fasullo JT, Kiehl J (2009) Earth’s global energy budget. Bull Am Meteorol Soc 90:311–323. doi:10.1175/2008BAMS2634.1

Vernon I, Goldstein M, Bower RG (2010) Galaxy formation: a Bayesian uncertainty analysis. Bayesian Anal 5(4):619–846, with Discussion

Whittle P (1992) Probability via expectation, 3rd edn. Springer texts in statistics. Springer, New York

Williamson D, Goldstein M, Blaker A (2012) Fast linked analyses for scenario based hierarchies. J R Stat Soc Ser C 61(5):665–692

Williamson D, Blaker AT (2013) Evolving Bayesian emulators for structured chaotic time series, with application to large climate models. SIAM J Uncertain Quantification (resubmitted)

Zaglauer S (2012) The evolutionary algorithm SAMOA with use of design of experiments. In: Proceeding GECCO companion ’12. ACM, New York, pp 637–638

## Acknowledgments

This research was funded by the NERC RAPID-RAPIT project (NE/G015368/1), and was supported by the Joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101). We would like to thank the CPDN team, in particular Andy Bowery, for their work in submitting our ensemble to the CPDN users. We’d also like to thank Richard Allan for helpful discussions regarding precipitation estimates, and all of the CPDN users around the world who contributed their spare computing resource as part of the generation of our ensemble. Finally, we’d like to thank both referees for their thoughtful and detailed comments.

## Author information

### Affiliations

### Corresponding author

## Appendices

### Appendix A: The statistical model derivation and justification for model discrepancy

We begin this technical Appendix A by deriving Eq. (7) using the statistical model in Sect. 4.1 given by Eqs. (5) and (6). Under our statistical model, supposing *f*
^{h}(*x*) is our climate model and is part of the MME and given the relation between the observations and the true climate to be as in (3), if *x*
_{0} were \( x^* \) we would have

The observation error, *e*, is uncorrelated with the other terms by definition, and *U*, the discrepancy between *y* and the expectation of the MME, and *R*
_{
h
}, the discrepancy between *f*
^{h} and the expectation of the MME, are uncorrelated as a consequence of the representation theorem. We judge ξ(*x*
_{0}) to be uncorrelated with the other terms as this represents the error in our emulator based on a large ensemble at *x*
_{0}, which should have no relationship to these other errors. Hence, taking variances, Eq. (7) follows.

Our statistical model makes the explicit assumption that \( f^{h}(x^{*}_{[h]}) \), our climate model run at its best input is second order exchangeable with the MME. If the MME members are second order exchangeable, this would be fine so long as they represented different climate models run at their best inputs. However, this is not the case. Each MME member is a ‘tuned run’ of a different climate model run at some setting of its parameters *x*
^{t}_{[j]}
where this setting is highly unlikely to be the best input, \( x^{*}_{[j]} \), because, if it were, we would not be interested in the parametric uncertainty of the climate model. The usually adopted statistical model assumes *y* = *f*
^{j}
\( (x^{*}_{[h]}) \) + *η*
^{j}, where we cannot learn about* η*
^{j} using our climate model (Rougier 2007). If *x*
^{t}_{[j]}
were equivalent to \( x^{*}_{[j]} \), under this model, a PPE tells you nothing more about climate than you already knew from \( f^{j}(x^{*}_{[h]}) \). Hence, usually, we would expect \( f^{h}(x^{*}_{[h]}) \) to be closer to *y* than \( f^{h}(x^{*}_{[h]}) \). The implication of this assumption is to have derived a larger discrepancy than is, perhaps, required.

To illustrate, suppose that we did view the climate models that contribute to the MME at their best inputs as second order exchangeable so that \({f^{j}(x^{*}_{[j]})={\mathcal{M}}\oplus R_{j}(f^{j}(x^{*}_{[j]}))={\mathcal{M}}\oplus R^{*}_{j}}\). Further, suppose that we have no a priori reason to believe that any of the MME members have been tuned closer to their best inputs than any other, so that

where *R*
^{t}_{
j
}
is a mean-zero residual and Cov[*R*
^{t}_{
j
}
,*R*
^{t}_{
k
}
] = 0 for *j* ≠ *k*.

Then

which, by ignoring tuning as we have in (5), implies that *R*
_{
j
} = *R*
^{*}_{
j
}
+ *R*
^{t}_{
j
}
. This means that our discrepancy term has additional error due to tuning built in.

In practice it would be very difficult to assess the individual contribution of tuning error *R*
^{t}_{
j
}
to the discrepancy without expert judgement from the climate modellers, even if the illustrative framework above were adopted.

However, by having a larger discrepancy we rule out less parameter space. This cautious approach is consistent with a first-wave history match, with the aim of removing those regions of parameter space that correspond to extremely unphysical climates. Though some of the remaining error can be attributed to tuning, this statistical model does not specifically account for the fact that the MME values we get from CMIP3 are not ten year means from climate models with a short spin up phase as our ensemble members are. This factor leads us to choose a statistical model that leads to larger discrepancy variance and less parameter space ruled out than might be obtained using expert judgement.

### Appendix B: Constraint emulators

In order to fit each emulator mean function, we used a stepwise selection method to add or subtract functions of the model parameters to our vector *g*(*x*). The allowed functions were linear, quadratic and cubic terms with up to third order interactions between all parameters. Switch parameters were treated as factors (variables with a small number of distinct possible “levels”) and factor interactions with all continuous parameters were allowed.

First we perform a forward selection procedure where we permit each variable to be added to *g*(*x*) in its lowest available form. So if ct is not yet in *g*(*x*), ct can be added, but ct^{2} cannot be added. If ct is already in *g*(*x*) then ct^{2} can be added, but we simultaneously add all first order interactions with the other variables in *g*(*x*) so that the resulting statistical model will be robust to changes of scale (see Draper and Smith 1998, for discussion). We do similar for third order interactions. The term(s) added at each stage is the one that reduces the residual sum of squares from the regression the most.

When it becomes clear that adding more terms is not improving the predictive power of the emulator (a judgement made by the statistician based on looking at the proportion of variability explained by the regression and at plots of the residuals from the fit) we begin a backwards elimination algorithm. This removes terms, one at a time, with the least contribution to the sum of squares explained, without compromising the model. Lower order terms are not permitted to be removed from *g*(*x*) whilst higher order terms remain. We stop when the removing the next term chosen by the algorithm leads to a poorer statistical model. For more details on forwards selection methods and backwards elimination, see Draper and Smith (1998).

The emulators contained between 76 terms in *g*(*x*) (in the case of SAT) and 44 terms (in the case of PRECIP). For the SAT emulator, it was found that the response was easier to model as a function of log(entcoef), though this transformation produced inferior mean functions for each of the other 3 outputs. The terms for the SAT emulator can be seen in Table 3. Each header in the table refers to one of the parameters from Tables 5 and 6, shortened in an obvious way to save space. Numbers on the diagonal refer to power terms in *g*(*x*) in each of the relevant parameters. The number 1 on the diagonal implies only a linear term was included in *g*(*x*). The number 2 implies that both quadratic and linear terms were in *g*(*x*). The number 3 implies cubic, quadratic and linear terms.

Numbers on the upper triangle refer to the inclusion or not of interactions between the two relevant variables. For example, reading from the table, the term (vf1*dyndiff) is included in *g*(*x*), but the term (vf1*g0) is not. Variables indicated in bold on the lower triangle refer to three way interactions that are present in *g*(*x*). For example, the terms (ent*vf1^{2}) and (vf1*ent^{2}) are both included in *g*(*x*). Note that dynd and r_lay are both handled by the model as factors with pre-specified levels. Our emulator does contain a constant term, so the vector *g*(*x*) includes the element 1.

Having fitted a mean function, the next step in emulation is to provide a statistical model for the residual \(\epsilon(x)\). Upon investigation, the residuals from the fitted mean functions appeared to behave like white noise. Hence we decided that it was not worth modelling \(\epsilon(x)\) in Eq. (1) as a weakly stationary Gaussian process (where \(\epsilon(x)\) and \(\epsilon(x')\) are correlated via a covariance function *R*(|*x* − *x*′|) that depends on the distance between points in parameter space (See, for example, Williamson et al. 2012). We therefore opted to model \(\epsilon(x)\) as mean zero uncorrelated error as is done by Sexton et al. (2011) and Rougier et al. (2009). We let the variance of \(\epsilon(x)\) be equal to the variance of the residuals from the four regressions.

To validate the emulators, we kept 100 randomly chosen ensemble members back as a validation set. The emulators were not trained using these members. We then predict the value of the constraints in each ensemble member using the emulator and examine the residuals (the difference between our prediction and the truth). The residuals are plotted against the predicted values in Fig. 9.

The emulator’s performance is acceptable as Fig. 9 shows that our emulator predictions are consistent with our uncertainty specification. If anything, the validation suggests that we have over specified our variance (at least in the case of SCYC and PRECIP), because no points fall outside 3 standard deviations. However, this will only lead to less parameter space being ruled out, and we should be cautious in terms of what parts of parameter space are ruled out and what is retained. We make the four emulators available to download along with our implausibility code so that our results may be replicated, and their sensitivity to any of our judgements explored.

### Appendix C: Tables

Tables 4, 5 and 6 give descriptions and ranges for the parameters and the settings of switches used in our ensemble. Some parameters have relationships with other model parameters that were given to us by the Met Office so that a change in one leads to a derivable value for the other. CWland also determines CWsea, the cloud droplet to rain threshold over sea (kg/m^{3}), MinSIA also determines dtice (the ocean ice diffusion coefficient) and k_gwd also determines kay_lee_gwave (the trapped lee wave constant for surface gravity waves m^{3/2}).

## Rights and permissions

## About this article

### Cite this article

Williamson, D., Goldstein, M., Allison, L. *et al.* History matching for exploring and reducing climate model parameter space using observations and a large perturbed physics ensemble.
*Clim Dyn* **41, **1703–1729 (2013). https://doi.org/10.1007/s00382-013-1896-4

Received:

Accepted:

Published:

Issue Date:

### Keywords

- Bayesian uncertainty quantification
- History matching
- Implausibility
- Observations
- NROY space