# Bayesian multi-model projection of climate: bias assumptions and interannual variability

## Authors

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s00382-009-0588-6

- Cite this article as:
- Buser, C.M., Künsch, H.R., Lüthi, D. et al. Clim Dyn (2009) 33: 849. doi:10.1007/s00382-009-0588-6

## Abstract

Current climate change projections are based on comprehensive multi-model ensembles of global and regional climate simulations. Application of this information to impact studies requires a combined probabilistic estimate taking into account the different models and their performance under current climatic conditions. Here we present a Bayesian statistical model for the distribution of seasonal mean surface temperatures for control and scenario periods. The model combines observational data for the control period with the output of regional climate models (RCMs) driven by different global climate models (GCMs). The proposed Bayesian methodology addresses seasonal mean temperatures and considers both changes in mean temperature and interannual variability. In addition, unlike previous studies, our methodology explicitly considers model biases that are allowed to be time-dependent (i.e. change between control and scenario period). More specifically, the model considers additive and multiplicative model biases for each RCM and introduces two plausible assumptions (“constant bias” and “constant relationship”) about extrapolating the biases from the control to the scenario period. The resulting identifiability problem is resolved by using informative priors for the bias changes. A sensitivity analysis illustrates the role of the informative prior. As an example, we present results for Alpine winter and summer temperatures for control (1961–1990) and scenario periods (2071–2100) under the SRES A2 greenhouse gas scenario. For winter, both bias assumptions yield a comparable mean warming of 3.5–3.6°C. For summer, the two different assumptions have a strong influence on the probabilistic prediction of mean warming, which amounts to 5.4°C and 3.4°C for the “constant bias” and “constant relation” assumptions, respectively. Analysis shows that the underlying reason for this large uncertainty is due to the overestimation of summer interannual variability in all models considered. Our results show the necessity to consider potential bias changes when projecting climate under an emission scenario. Further work is needed to determine how bias information can be exploited for this task.

### Keywords

Multi-model predictionBayesianModel biasBias changeRCMAlpine region## 1 Introduction

Climate projections and associated applications in impact studies have become an important topic of scientific and public interest during the last decades. Several research teams around the world are developing models to simulate the current climate and its future evolution under several greenhouse gas and aerosol scenarios.

On the large scale, general circulation models (GCMs) are used with coarse horizontal resolution. While they are capable of effectively reproducing large-scale effects and circulation patterns, they cannot predict small-scale effects for a selected region. Information about regional climate can be obtained by dynamic down-scaling (Giorgi 1990). To this end, regional climate models (RCMs) use the GCM output as their driving boundary conditions. It is advantageous to combine different results of several climate models—both on the global and regional scale—to obtain a reliable data base. It is generally believed that multi-model ensembles are superior to single models, and that the ensemble may even outperform the best single participating model. Recent analysis indicates that much of this gain is due to the fact that single models are overconfident (Weigel et al. 2008). In relation to climate projections, combining different models exploits the strengths of diverse approaches and yields a more appropriate estimate of the uncertainties (Meehl et al. 2007). The combined GCM/RCM multi-model approach has been advanced by large international projects such as PRUDENCE (e.g. Christensen and Christensen 2007; Christensen et al. 2007b).

Once a large multi-model ensemble is available, one is left with the task of optimally combining this information into one probabilistic prediction of the anticipated changes in climate. In the case of medium-range weather forecasts and seasonal climate prediction, several methods exist (for an overview see Wilks 2006). Many of these methods address the task by assigning (equal) weights to all ensemble members and by subtracting the biases of each model, as known from past model performance. However, in a multi-model climate change ensemble, there are additional issues that should be considered. One would like to predict the whole climate distribution, in particular higher moments and quantiles, and there is the additional complication that the climate model biases can depend on the underlying climate, i.e. the biases are time- and state-dependent.

The last item appears particularly difficult. Indeed, the standard procedure in studies about climate change entails the implicit assumption that bias changes are negligible compared to changes in climate, i.e. the consideration of “climate change” defined as difference between scenario and control climate. This important assumption is rarely discussed in depth (but see e.g. Shackley et al. 1998), and a thorough test appears elusive, as the changes in climate considered are of a magnitude that have not occurred in the instrumental past. Yet the assumption of a time-independent (or climate-state-independent) bias is crucial. Even with a model that perfectly reproduces the current climate, there is no guarantee that the model will exhibit the true climate sensitivity (Stainforth et al. 2005). Also from a physical viewpoint, it appears unlikely that the biases of a climate model should be state-independent, as the climate system entails many non-linearities and threshold processes (e.g. related to atmospheric humidity, freezing/melting, sea ice, soil moisture, clouds, convection, etc). One method to address the role of these nonlinearities on the simulation of climate is to separately validate summer and winter seasons (e.g. Meehl et al. 2007) and to use the representation of the seasonal cycle as a measure of the model’s fidelity (Shukla et al. 2006).

The Bayesian framework is particularly attractive for combining several models. It decomposes the complicated relationship between the observations and the outputs of different models into simpler, hierarchical relationships that can be described in a reasonable and transparent way. (Gelman et al. 2003). Although the necessary integrations cannot be done analytically, Markov Chain Monte Carlo methods make it possible to deal with complicated distributions (Gilks et al. 1996).

Tebaldi et al. (2005) were among the first to use the Bayesian framework to analyze multi-model climate predictions. They obtain a probability density function (PDF) for the mean temperature changes in 22 global regions and four seasons by combining observations and output from several GCMs of 30 year regional climate averages. Their approach can be viewed as a weighted average of the individual GCM results, with weights similar to those used by the reliability ensemble average (REA) of Giorgi and Mearns (2002). The framework of Tebaldi et al. (2005) has been generalized in many directions. Smith et al. (2008) study several regions simultaneously. Tebaldi and Sanso (2008) introduce a multivariate generalization for analyzing decadal averages of temperature and precipitation for 1955–2100. Furrer et al. (2007) analyze the spatial variability of the climate change signal. They use a multivariate hierarchical Bayes model to separate it into a large scale signal of climate change and an isotropic process representing small-scale variability among models. Jun et al. (2008) analyze the spatial variability of the additive bias in detail for the control climate. Min and Hense (2007) calculate Bayes factors for a weighted multi-model average. These Bayes factors are obtained by comparing the simulations to a reference model in terms of likelihood. Sain et al. (2008) provide a multivariate approach that takes into account the spatial structure of the data. Bayesian methods are also used to aggregate station data on a regular grid for an RCM validation (Snyder et al. 2007). A review of multi-model climate projections and the different types of uncertainty is given by Tebaldi and Knutti (2007). They also discuss the problems of model dependence, tuning and evaluation.

Our approach is a different extension of Tebaldi et al. (2005). We study RCMs instead of GCMs, but the main methodological difference is that we consider not only the long-term climate mean, but also the interannual variations, by focusing on the distribution of seasonal values of the variable of interest. A possible nonstationarity of the data is taken into account by including linear trends in the control and scenario periods. For simplicity, we assume that all models have the same underlying trend.

The reason for analyzing the distribution of seasonal values is two-fold. First for impact studies, both changes in mean and variability of the climate variables are relevant (Katz and Brown 1992; Schär et al. 2004), and our approach provides this. Second, the broader approach allows us to study additive and multiplicative biases of the different RCMs in the Bayesian framework. We discuss two different assumptions for extrapolating the biases into the scenario period, which both are plausible, but lead to quite different conclusions about the likely climate changes. We can even allow these biases to be different in the control and scenario period, but we have to assume in the prior distributions that the bias changes are small.

In this paper, our variable of interest is the seasonally and regionally averaged 2 m-temperature, but other variables could in principle be considered, e.g. the regional average of the maximum temperature within a season. However, complications will arise if the assumption of normal distributions of the variables is no longer valid. We will restrict attention to the target variable (i.e. temperature), and biases in other variables, e.g. precipitation, do not enter the analysis. This procedure has also been followed by other studies (Giorgi and Mearns 2002; Tebaldi et al. 2005), although it would be desirable to account for the overall performance of a model as in the multivariate extension of Tebaldi and Sanso (2008).

The paper is structured as follows. In Sect. 2 the data and the aggregation procedure are described. In Sect. 3 the methods and the Bayesian model setup are explained. In Sect. 4 results for the Alpine region are shown. In the final Sect. 5 we draw conclusions and discuss further extensions of our approach.

## 2 Data

In this paper both observational data and output from the RCMs are summarized by the term “data”. One has to distinguish between current climate data that comes from observations and model projections, and the future climate data that comes from models only. Our variable of interest is the 2 m-temperature, but the same methods apply to other variables in principle. Some of the problems that can arise for other variables are discussed in Sect. 5.

### 2.1 Regional climate model data

For the statistical analysis there is the output of 4 RCMs (CHRM, CLM, HIRHAM, RCAO) and 1 high-resolution GCM with a stretched spectral discretization (Arpege). All simulations are part of the PRUDENCE project (http://prudence.dmi.dk) or use the PRUDENCE methodology in their set-up. Here we restrict the attention to the most salient aspects and refer to the literature for a full documentation of the numerical experiments (Christensen et al. 2007a; Christensen and Christensen 2007).

PRUDENCE data overview: we use a subset of models that are driven by different atmospheric GCM runs

Institute | Model | |||
---|---|---|---|---|

RCM | Reference | GCM | Reference | |

CNRM (Toulouse, France) | Arpege | (Gibelin and Déqué 2003) | ||

ETH (Zurich, Switzerland) | CHRM | (Vidale et al. 2003) | ECHAM5 | (Roeckner et al. 2003) |

GKSS (Geesthacht, Germany) | CLM | (Steppeler et al. 2003) | HadAM3H/1 | (Jones et al. 2001) |

DMI (Copenhagen, Denmark) | HIRHAM | (Christensen et al. 1996) | HadAM3H/2 | (Jones et al. 2001) |

SMHI (Norrköping, Sweden) | RCAO | (Jones et al. 2004) | ECHAM4/OPYC | (Roeckner et al. 1996) |

Note that 3 of the 5 simulations include the same sea-surface temperature and sea-ice distributions (i.e. Arpege, CHRM and CLM) stemming from a coupled HadCM3 simulation (for details see Rowell 2005). The HIRHAM simulation considered employs an independent HadCM3/HadAM3 ensemble member, and the RCAO another ocean model (see Räisänen et al. 2004). In addition RCAO is interactively coupled with a regional ocean model of the Baltic Sea.

The integration area of the models varies, but in all cases covers the larger part of Europe. The focus is on the Alpine region (AL: 44–48N, 5–15E) which is one of the standard regions of the PRUDENCE project (Christensen and Christensen 2007). This region lies in the center part of the integration area for all models. The spatial resolution of the data is around 0.5° (∼56 km). Model output has been interpolated on the regular CRU grid (see Sect. 2.2) so that it can easily be compared with observations from the control period.

### 2.2 Observational data

The observed temperature data are obtained from the Climatic Research Unit (CRU). The data is located on a regular 0.5 lon × 0.5 lat grid. It is based on station data, interpolated as a function of latitude, longitude and elevation above sea level. In New et al. (1999) there is a detailed description of the data set and the thin-plate spline that was used for interpolation. Data can be accessed via http://www.cru.uea.ac.uk. It is a widely established surface temperature data set covering the period 1901–2002. In the analysis we assume that the CRU observations represent the true climate.

### 2.3 Aggregation

For both seasons (winter: DJF and summer: JJA) the statistical analysis is done independently of the other season. We average the variable of interest both temporally over the 3 months of each season and spatially over all land grid points in the Alpine region. For the spatial average, a grid point has been considered as a land point if at least 50% of the corresponding area is landmass. Water grid points have been excluded from all models and the CRU data set to avoid a mixing of the sea and land temperatures.

The spatial domain considered has a size of about 20 × 8 grid points and it is one of the standard domains used for the evaluation of RCMs (see Christensen and Christensen 2007). At the spatial scale considered, both elements of the GCM/RCM model chain are important. Déqué et al. (2007) have used the PRUDENCE archive to quantify whether the regional-scale uncertainties in climate projections stem from the GCM, the RCM or from internal variability. An important conclusion reached from their analysis is that the uncertainty due to the use of different RCMs can be as large as the uncertainty due to different GCMs. More specifically, the analysis showed that uncertainties in winter conditions were primarily affected by the GCMs (i.e. by large-scale circulations), while summer uncertainties were considerably affected by the RCMs (i.e. by parameterizations).

With this aggregation, one can ignore correlations and trends within a season and within the region. The limitation of spatial averaging is that small-scale features cannot be observed anymore since information is lost. In contrast to Tebaldi et al. (2005), we do not average over the years and retain the interannual variations of the climate which is our main interest. A potential difficulty of our approach is that trends during the periods 1961–1990 and 2071–2100 become confounded with the interannual variability. In order to avoid this, we will include linear trends in our model and integrate them out in the Bayesian framework.

## 3 Methods

### 3.1 Notation

As explained in the previous section, the data consists of *T* = 30 observations for the variable of interest in the control period (1961–1990) and of *T* values of the same variable generated by *M* = 5 models both for the control and scenario periods (2071–2100) under an A2 emission scenario. Having the same number of values in the control and scenario periods is not essential. We denote by *X*_{0,t} the observations in year 1960 + *t*, by *X*_{i,t} the control output of model *i* in year 1960 + *t* and by *Y*_{i,t} the scenario output of model *i* in year 2070 + *t* with *t* = 1,…, *T* years. Although the observations *Y*_{0,t} for the years 2070 + *t* are not available, they are included as unobserved data in the model. This will make the interpretation of model parameters more transparent. Since separate analyses are conducted for each season, it is not necessary to add an index for the season.

### 3.2 Bayesian formalism

As mentioned in the introduction, we are going to use a Bayesian approach to construct a probability distribution for the scenario climate given all data. In this approach one has to specify the likelihood *p*(Data | Θ), that is the conditional probability density of the data given the parameters Θ (for details see Gelman et al. 2003, Sect. 1.3), and—because all parameters in the model are considered as random variables—a joint prior distribution *p*(Θ) of all parameters. In our context “parameters” denote quantities of interest like long-term climate means and variances, climate changes, biases, bias changes or trends that determine the distribution of the data. Other types of parameters that are used within the RCMs are not discussed in the paper. In Sect. 3.3 the likelihood is specified for this framework and in Sect. 3.5 the distribution of the priors will be discussed.

*p*(Θ | Data) of the parameters given the data by Bayes formula:

*p*(

*Y*

_{0,t}| Data) is of particular interest. This is the best estimate of the distribution of the scenario climate given all data. It is obtained by averaging the density of

*Y*

_{0,t}given the parameters Θ with respect to the posterior distribution

### 3.3 Distribution of data

In our framework we make three main assumptions about the conditional distribution of the data given the parameters:

**Assumption 1**

Conditionally on the parameters, all data are independent.

Assumption 1 implies that the likelihood has a product form. Independence means that serial correlations in the time series and possible correlations between models are ignored. The autocorrelation plots of the series do not show significant correlations, and thus the first part does not seem problematic, though in general this depends upon the region considered. In order to fulfill the second part, different RCMs driven by different GCM simulations are used. Even then, the independence assumption may nevertheless be questioned as, the GCMs and RCMs are based on the same scientific knowledge, and thus they are not completely independent (Tebaldi and Knutti 2007). It means that the PDFs do not represent all sources of uncertainty (e.g. Knutti et al. 2002).

**Assumption 2**

*T*

_{0}yields that the intercept μ can be interpreted as the mean value of the climate distribution. γ is a common linear trend that is estimated from all control simulations and the CRU data set together. This trend is not of main interest, but it should be removed to obtain stationary distributions. By introducing detrended data

*X*

_{i,t}

^{det}=

*X*

_{i,t}−γ(

*t*−

*T*

_{0}), independent and identically distributed (i.i.d.) data are obtained for the control climate and the outputs of each model:

We denote distributions that describe the control climate by a superscript *c*. On the other hand we use a superscript *s* for the scenario period. The parameters μ and σ are the expectation value and standard deviation of the control climate, β_{i} is an additive bias of the climate mean in model *i*, and *b*_{i} is a multiplicative bias. In other words, we assume that model projections only imply a change in the location and spread, but not of the shape of the distribution.

Independence and identical distributions imply in particular that the detrended data are exchangeable over time, that is, their distribution is independent of permutations of the year index. In other words, a model output *X*_{i,t}^{det} is not supposed to be close to the observation *X*_{0,t}^{det} for the same year *t*, and two model outputs *X*_{i,t}^{det} and *X*_{j,t}^{det} for *i* ≠ *j* need not be close for the same *t*. This reflects the fact that the different data series stem from independent realizations of the (same) climate state. However, if model *i* is good, then the distribution *F*_{i}^{c} of *X*_{i,t}^{det} should be close to the distribution *F*_{0}^{c}.

**Assumption 3a**

This means that a mean shift Δμ and a multiplicative change *q* in the variability of the scenario climate are allowed. Δγ represents a change in the trend for the scenario data. Moreover, with the parameters Δβ_{i} and \(q_{b_i}\) the additive and multiplicative biases can change between the control and scenario periods. A model may reproduce the climate well today, but an increased bias in the scenario is possible due to incorrectly parameterized or simplified physical processes. Note that the components “true change”, “bias” and “bias change” are combined additively for the mean, and multiplicatively for the standard deviation.

The assumption of normal distributions is reasonable due to the aggregation over a season and within the Alpine region. In addition, quantile plots of observations and model data against the theoretical normal distribution do not show strong discrepancies (see Sect. 4.2). In principle the normal assumption can be relaxed using either more general distribution families or a non-parametric approach. But even with the restriction to the normal distribution the problem is still somewhat ill-posed as we will see in the next section.

### 3.4 Identifiability

For the control climate, there are model values from the RCM control runs and observations from the CRU data set. Therefore it is possible to estimate both the mean value μ of the climate and the individual biases β_{i} for each model.

Since there are no observations *Y*_{0,t}, Δμ and Δβ_{i} cannot be estimated separately from the data alone, they are confounded. The model is not identifiable, that is two different parameter sets with identical sums Δμ + Δβ_{i} lead to the same distribution for all data. A large value of Δμ could in principle be compensated by opposite model bias changes Δβ_{i} for each model. This is a general problem in statistical and dynamic down-scaling. One needs observations to calibrate and validate a model and to verify model assumptions. These observations are only available for the control climate. Therefore one has to accept the assumption that a (statistical) relationship also holds for the scenario climate, or that parameters calibrated in the control period remain valid in the scenario period. In our context we are facing the same problem by trying to separate the climate change Δμ and the change of the model bias Δβ_{i} of the *i*-th model.

- (i)
One assumes that the model bias does not change, that is Δβ

_{i}= 0. - (ii)
One puts restrictions on the bias change, e.g. ∑

_{i}Δβ_{i}= 0, that is the average of the model biases does not change in the scenario period. - (iii)
One introduces a soft restriction that ∑

_{i}Δβ_{i}^{2}is small, that is the changes of model biases cannot be too large, where “not too large” will be defined more thoroughly later. - (iv)
One reparameterizes the model by defining new parameters ν

_{i}≔ Δμ + Δβ_{i}which then are identifiable.

With the second alternative, a large bias change of one model forces either a large bias change of another model in the opposite direction, or many smaller compensations by the other models. In addition it does not allow the total bias to become larger (or smaller) due to a climate shift.

Although the re-parameterization in the fourth alternative solves the identifiability problem, it does not allow one to distinguish between model biases and climate change. Since the aim is a climate projection that corrects for individual model biases, this is not a real alternative to the problem.

The third solution is a regularisation of the over-parameterized problem. In a Bayesian context it can be implemented with specific choices of the priors for the affected parameters Δβ_{i}. Equation 6a together with alternative (iii) will subsequently be referred to as the “constant bias” assumption and later be contrasted with an alternative “constant relation” assumption. The term “constant bias” is somehow misleading since actually bias changes are allowed, but alternative (iii) will overall tend to minimize the bias changes depending upon the prior distribution. In the next sections we will describe these assumptions and their interpretation in more detail.

The same problem as for Δμ and Δβ_{i} appears for *q* and \(q_{b_i}\). Because these parameters represent multiplicative biases, only the products \(q \cdot q_{b_i}\) are identifiable. Again this problem is solved by forcing the sum of the \(\log(q_{b_i})^2\)-terms to be small. This regularisation is achieved by the choice of the prior distribution of \(q_{b_i}.\)

### 3.5 Choice of priors

For all parameters one has to choose prior distributions. We assume that all parameters are a priori independent so that only the marginal prior distributions are needed. There are two classes of parameters: μ, Δμ, β_{i}, Δβ_{i}, γ and Δγ are related to the mean values of the assumed normal distributions of the data. It is common to take normal priors for these parameters since this simplifies the computations. The other class of parameters consists of σ^{2}, *q*^{2}, *b*_{i}^{2} and \(q_{b_i}^2\) which are variances or multiplicative changes of the variances. It is a common procedure (Gelman et al. 2003) to work with the precision, which is defined as the inverse of the variance, and to choose a Gamma distribution for the prior of the precision. The same procedure is used for the multiplicative change factors. Note that this reparametrization in terms of precision does not affect the results, it is used only for computational reasons. Hence we will later show the posterior for the standard deviation σ and the scale factors *q*, *b*_{i} and \(q_{b_i}\) which have a more direct physical interpretation.

_{i}, γ, Δγ, σ

^{−2}and

*q*

^{−2}and

*b*

_{i}

^{−2}, we choose them so that the priors are flat and thus carry little information. In particular, the prior variances are chosen such that only values which are far away from physical plausibility are excluded. This means the posterior distribution will be mainly determined by the likelihood, that is the data. The reason for this is that in this case little is to be gained by using expert knowledge and that we want to avoid controversies.

Hyper-parameters for the prior distributions: for normal distributions hyper-parameters for the expectation (μ_{0}) and the variance (σ_{0}^{2}) are given

Parameter | Distribution | Hyper-parameter 1 (μ | Hyper-parameter 2 (σ | 95% Confidence interval |
---|---|---|---|---|

μ (°C) | Normal | 0 (Winter) | 25 | [−9.8, 9.8] |

15 (Summer) | [5.2, 24.8] | |||

Δμ (°C) | Normal | 0 | 16 | [−7.8, 7.8] |

β | Normal | 0 | 16 | [−7.8, 7.8] |

Δβ | Normal | 0 | 0.5 | [−1.4, 1.4] |

γ (°C year | Normal | 0 | 16 | [−7.8, 7.8] |

Δγ (°C year | Normal | 0 | 16 | [−7.8, 7.8] |

σ | Gamma | 0.1 | 0.1 | [0, 9.8] |

| Gamma | 0.1 | 0.1 | [0, 9.8] |

| Gamma | 0.1 | 0.1 | [0, 9.8] |

\(q_{b_i}^{-2}\) | Gamma | 3 | 3 | [0.2, 2.4] |

The situation for the parameters Δβ_{i} and \(q_{b_i}^{-2}\) is different. For the reasons discussed in Sect. 3.4, we take informative priors with small variances that are concentrated around zero and one, respectively. This choice of hyper-parameters means for instance that the bias change Δβ_{i} lies between −1.4°C and 1.4°C with a probability of 95%. Although this assumption seems somewhat restrictive, one has to keep in mind that there are no future observation to strictly separate climate shift and bias change. Therefore one is forced to accept an assumption about a possible bias change. Our approach is reasonable. It assumes a priori that the bias change Δβ_{i} is comparable or smaller than typical biases β_{i} in the control period, because otherwise the scenario runs would be of little use. Since one can estimate the biases β_{i} from the data *X*_{0,t} and *X*_{i,t}, there is a rational basis to choose the variance of the prior for Δβ_{i}.

Only the parameters ν_{i} = Δμ + Δβ_{i} (climate shift plus additional scenario bias of model *i*) are identifiable. The prior assumptions above imply that (ν_{1}, …, ν_{M}) are a priori jointly normally distributed, where all ν_{i} have mean zero and variance σ_{Δμ}^{2} + σ_{Δβ}^{2} and all pairs ν_{i},ν_{j} (*i* ≠ *j*) have a correlation \(\sigma_{\Updelta \mu}^2(\sigma_{\Updelta\mu}^2 + \sigma_{\Updelta b}^2)^{-1}.\) In other words, the correlation matrix has constant off-diagonal entries. Hence a small σ_{Δβ}^{2} corresponds to the a priori belief that all ν_{i} are similar (highly correlated).

It is important to check the sensitivity of the results to the choice of the prior distributions and the hyper-parameters. This is especially important here, since the hyper-parameters are specified in order to solve the identifiability problem, and are not based on prior expert knowledge. This sensitivity analysis will be done in Sect. 4.4, and we will describe separately how the hyper-parameters have been varied.

### 3.6 Computation of the posterior

By Bayes formula the joint posterior density of all parameters given the data is proportional to the prior density multiplied by the likelihood of the data.

Hence in principle, the posterior is known, but this is of little practical use. In order to deduce information about the marginal posteriors of the two main parameters of interest, Δμ and *q*, and in order to compute posterior predictive densities, high dimensional integration would be needed which is difficult. Common practice in modern statistics is to rely on Markov Chain Monte Carlo methods instead. Monte Carlo methods replace analytical calculations by empirical estimates computed with an artificially generated sample from the posterior distribution. For complicated high-dimensional distributions it is not feasible to generate an independent sample, but it is possible to generate a dependent sample with a suitable Markov Chain. This means that each member of the sample is constructed recursively from its predecessor, (see e.g. Gilks et al. 1996). For our analysis, we use the standard Gibbs sampler which updates a single component at a time, because the so-called full conditionals have a standard form. Results are based on a single Markov Chain with length 550,000 where the first 50,000 are disregarded as a burn-in period. The remaining 500,000 samples were thinned to a sample of 5,000 by taking only every hundredth point. The thinning removes the dependency within the Markov Chain so that 5,000 remaining points are an independent sample of the distribution of interest. To check the convergence of the chain, diagnostics such as autocorrelation and effective sample size were calculated. None of these diagnostic tools showed any indication that the chain has not converged. Moreover, additional simulations not shown here confirmed the results.

### 3.7 An alternative assumption for scenario period values

Even under the assumption that climate change and model error affect only location and scale, but not the shape of the distribution, there is at least one additional way to specify the distribution of scenario period values that can also be regarded as plausible. The “constant bias” assumption in Eq. 6a means that the difference between the expected values of the control and the scenario periods in model *i* is equal to Δμ + Δβ_{i}. Hence up to small bias changes (alternative (iii) in Sect. 3.4), all models are assumed to predict the climate scenario shift correctly.

The alternative “constant relation” assumption says that a model over- or underestimates the climate scenario shift by approximately the same factor by which it over- or underestimates the interannual variability within a season in the control period. The latter factor is equal to *b*_{i}. Allowing such an additional bias change means thus replacing Eq. 6a by

**Assumption 3b**

*i*≤

*M*

The specification of the priors of the parameters with this alternative “constant relation” assumption is done as before. In particular, an informative prior is used, forcing Δβ_{i} to be near zero and \(q_{b_i}\) near one (alternative (iii) in Sect. 3.4). In this way, we will avoid the analogue basic non-identifiability problem as discussed in Sect. 3.4.

_{i}and a multiplicative change of the variability \(q_{b_i}\) allow for some changes in the bias. The result is shown with the black solid line. The bias changes are restricted to be small by the informative priors on these parameters. The slightly adapted relationship between the quantiles of today’s observation and the control model output is used to estimate the new climate mean μ + Δμ. Since there are no future observation, no points can be drawn for the quantile in the scenario period, on the x-axis.

With the “constant relation” assumption (Assumption 3b) in the right figure, one can extrapolate the observed bias relationship today (red dashed line) into the scenario period. This results in two different parts of the model bias change. The first part is a systematic part. If the slope of the line is larger than one a systematic bias increase of (*b*_{i} −1)Δμ is expected. The second part of the bias change Δβ_{i} is restricted to be small by the informative priors as with the “constant bias” assumption. One has to remark that with the “constant relation” assumption the bias change can be quite large due to the systematic part since the restriction with informative prior only influences the second part of the bias change. This will result in a different estimation of the climate shift Δμ because a part of the signal is attributed to the bias change. This can be seen in Fig. 1 by remarking that with the same observations and model projections, Δμ in the right figure is smaller than in the left figure. However, because *Y*_{0,t} is not available, it is difficult to distinguish between the two models only from the data.

*F*

^{c}

_{i}and

*F*

^{s}

_{i}. Remember that the superscript

*c*is used for distributions and quantiles that are related to the control climate while

*s*stands for the scenario climate. The α-quantile

*z*(α) of a distribution is the value that divides the mass of the distribution into the ratio α:(1−α). In other words, the probability that a random draw from this distribution is below

*z*(α) is equal to α. The

*k*-th smallest among

*T*data points is an estimate of the α =

*k*(

*T*+ 1)

^{−1}quantile. Then by Eqs. 3 and 5

*i*= 1, …,

*M*

_{i}≈ 0 and \(q_{b_i}\) ≈ 1. In other words, the relation between the true quantiles and the quantiles of the model output has a similar structure in the control and the scenario periods (if Δβ

_{i}= 0 and \(q_{b_i}=1\), the structure is identical).

*z*(0.5) is the median (typical year) and in case of the normal distribution, it is equal to the mean. By using

*z*

^{c}

_{0}(0.5) = μ in Eqs. 8, 9 and 10, one obtains

Hence Assumption 3a says in particular that the difference between model and observation for a typical year are similar both in the control and the scenario period, regardless of how warm a typical year is. This can be justified by saying that the physical relationships are still valid for a changed forcing and thus have about the same error for a typical year.

Note that Eqs. 12 and 9 are similar. Assumption 3b postulates therefore that one can use the same linear relation between *z*^{c}_{i}(α) and *z*^{c}_{0}(α) in the control period and between *z*^{s}_{i}(α) and *z*^{s}_{0}(α) in the scenario period. Hence if the temperature of a warm year in the control period is similar to that of a cold year in the scenario, then the difference between model and observations is about the same in both cases. This explains the name “constant relation” that describes Assumption 3b.

It is important to note that both assumptions have been made in distinct areas of climate research. Christensen et al. (2008) suggest that temperature and precipitation biases grow in a global warming scenario. As mentioned in the introduction, the “constant bias” assumption is implicit to the consideration of the “scenario minus control” signal in climate projections, and is made throughout the IPCC report (Meehl et al. 2007). Likewise, the “constant relation” assumption is made in many statistical evaluations of seasonal forecasting, where a forecasted anomaly is considered relative to the models representation of the observed variability (e.g. Kharin and Zwiers 2003). It can thus be argued that the “constant relation” assumption is the more natural assumption for near-term climate change (e.g. the next 20 years), as we would expect the error structure of the models to be approximately conserved over shorter time periods, when the climate shifts can be considered comparatively small. Likewise, it can be argued that the “constant bias” assumption is the more natural assumption in longer-term climate change studies (e.g. 100 years), as the anticipated changes are considerably larger than the currently observed interannual variability. Further work is needed to determine how biases in the control period can be used for the estimation of biases in the scenario period.

## 4 Results

### 4.1 Climate prediction: “Constant bias” versus “constant relation” assumption

#### 4.1.1 Summer temperature

*q*, γ and γ + Δγ are given under the two assumptions “constant bias” (black solid line) and “constant relation” (red dashed line), respectively. Our method predicts an expected increase of the average temperature of 5.4°C for the “constant bias” assumption and of 3.4°C for the “constant relation” assumption. This difference is quite large and will be discussed in more detail in the next Sect. 4.2.

*q*, hence the RCMs considered are not able to decide whether the variability of the mean summer temperatures will increase or decrease in the future, albeit there is a small tendency towards an increase in variability, (see also Fig. 3). Previous research revealed that there might be considerable increases in interannual summer variability over Central Europe (Schär et al. 2004). The aforementioned study assessed one single model chain (the CHRM driven by HadAM3H), but recent model intercomparisons indicate that this result qualitatively agrees with most RCMs (Giorgi et al. 2004; Giorgi and Bi 2005; Vidale et al. 2007; Lenderink et al. 2007) and GCMs (Seneviratne et al. 2006). The absence of a pronounced variability increase in our analysis appears mostly related to the consideration of the Alpine region, which is situated to the south of the region of maximum variability increase.

The trend is with posterior probability higher than 99% between −0.01 and 0.04°C per year for the period 1961–1990 and between 0.06 and 0.12°C per year for the period 2071–2100. In comparison, the global mean surface temperature trend of the A2 scenario for the 2071–2100 period amounts to 0.05°C per year (Meehl et al. 2007, see their Fig. 10.4). The larger trend over the Alpine region revealed above can be explained by two reasons: First, the regional warming over continental land surfaces considerably exceeds the global mean warming which is moderated by the presence of large ocean surfaces. Second, it is possible that the RCMs overestimate the trend during the scenario period, as the respective simulations use a spin-up period of merely one year and are initialized from a soil-moisture distribution that is not in complete balance with the scenario climate. However as the RCM trend exceeds the global trend by merely a factor 2, we believe that the former reason dominates.

In Fig. 3, the posterior predictive density given all data is shown with a dashed red line for the mean temperatures *X*_{0,t}^{det} in the control period and with a red solid line for the predicted mean temperatures *Y*_{0,t}^{det} in the scenario period. The posterior predictive densities for the output *Y*_{i,t}^{det} of individual RCMs are given with black dotted lines. In addition to the trend, the individual biases β_{i} and *b*_{i} are also removed, but not the bias changes of the scenario period. The additive bias change for the scenario is Δβ_{i} under the “constant bias” assumption and (*b*_{i}−1)Δμ + Δβ_{i} respectively under the “constant relation” assumption. As we will see in Sect. 4.3, the *b*_{i}’s are quite large for all models in the summer season. This explains why under the “constant relation” assumption the expected value of the multi-model ensemble projection is smaller than all individual model projections of the scenario period.

Recall that in the posterior predictive density uncertainty about the parameters has been taken into account by integrating with respect to the posterior distribution of the parameters. Hence the individual model projections depend also on other models through integration over the posterior distribution of the parameters given the data. Therefore they influence each other to some extent.

For both assumptions, the range of the different models is quite large. The combined Bayesian prediction density is much narrower than an equally weighted average of the prediction densities of the 5 models. This is due to the inclusion of additive bias changes for the individual RCMs in the model. Note that the biases are not estimated by aligning the black curves as well as possible. For the control climate, they are essentially estimated by comparing the control simulations and the observed climate, and for the scenario climate the estimate depends upon the assumption. For the “constant bias” assumption, they are assumed to be similar because the prior for Δβ_{i} is concentrated around zero. For the “constant relation” assumption we assume that the biases show a linear relationship where the intercept and slope are determined by comparing the control simulations and the observed climate. The size and uncertainty of estimated biases for both assumptions will be discussed in Sect. 4.3.

#### 4.1.2 Winter temperature

In the lower row of Fig. 2, the posterior distribution of Δμ, *q*, γ and γ + Δγ are given under the two assumptions “constant bias” and “constant relation”, respectively. In contrast to the summer, the results are quite consistent under both assumptions and an expected increase of the mean temperature of around 3.5–3.6°C is observed. The uncertainty about the climate shift Δμ is larger under the “constant relation” assumption.

As in summer, the posterior for the other three parameters is similar under both assumptions. Values above and below 1 are plausible for *q*, hence the RCMs considered are not able to decide whether the variability of the mean winter temperatures will increase or decrease in the future.

### 4.2 Diagnostic check of assumptions

Although our results are reasonable and consistent with the literature, we have to verify several assumptions.

#### 4.2.1 Normal distribution, independence

*p*-value was 0.058. Using quantile plots and goodness-of-fit tests there are no obvious violation of the normal assumption.

Furthermore, in order to examine the temporal independence between the different years, we computed the autocorrelation for each model and for the observations, assuming a stationary time series model. Even at lag one, no significant autocorrelation could be observed. There is no strong correlation between the different RCMs either. Such correlations are avoided by not including additional RCMs that are driven by the same GCM run. Since on a large scale the RCM reproduces the year-to-year process of the GCM, such correlations would be quite high. In the PRUDENCE project some RCMs are driven by the same GCM run and have a correlation between 0.8 and 0.95. We currently study a possible extension of our model that incorporates a GCM effect and thereby relaxes the restriction that all model chains consider a different GCM simulation.

#### 4.2.2 Relation between model output and observations

These results have different implications under the “constant bias” and the “constant relation” assumption, as we have seen in Sect. 4.1. In the next Sect. 4.3 we will examine the biases and bias changes in more detail to explain the behaviour of the two assumptions.

### 4.3 Model biases

#### 4.3.1 Summer temperature

_{i}and the dashed red line the scenario bias. Under the “constant bias” assumption, the scenario bias is β

_{i}+ Δβ

_{i}whereas for the “constant relation” assumption it is β

_{i}+ (

*b*

_{i}−1)Δμ + Δβ

_{i}.

With the “constant bias” assumption the biases for control and scenario periods are generally similar, but the uncertainty about the biases increases in the scenario period. This was to be expected. There is no systematic increase or decrease of the biases in all models. The RCAO model has the biggest bias change. The situation changes for the “constant relation” assumption. The biases for the control period are similar to those under the “constant bias” assumption, but there is a systematic increase in the scenario biases of all models. In all models, the scenario bias is now clearly positive and the uncertainty is larger than under the “constant bias” assumption. Again it is largest for the RCAO model. Since under the “constant relation” assumption all RCMs have large positive biases, the climate shift remaining after bias correction is smaller than under the “constant bias” assumption. This results in a posterior predictive density *Y*_{0,t}^{det} that is smaller than each RCM as observed in Fig. 3.

*b*

_{i}−1)Δμ. A simple point estimate of

*b*

_{i}is given by the slope of straight lines in Fig. 6 which are clearly greater than one. This is confirmed by Fig. 8 which shows the posterior distributions of multiplicative variability biases in summer. The solid line represents the control bias

*b*

_{i}and the dashed line describes the scenario bias \(b_{i}q_{b_i}\). Under both assumptions, the control bias

*b*

_{i}is larger than one for all models, and this explains why the scenario biases are substantially larger under the “constant relation” assumption. In other words, the reason for the difference between the results under the two assumptions is the overestimation of the year-to-year variability in the summer by most models (see Vidale et al. 2007; Lenderink et al. 2007). Figure 8 also shows that the scenario multiplicative bias is—under both assumptions—not much different from the control multiplicative bias. The difference is largest for the CLM model.

The ability to estimate biases of individual models both for the control and scenario period is a clear advantage of our approach. Assuming that there are no biases or that the biases remain constant over time would lead to incorrect quantifications of uncertainty.

#### 4.3.2 Winter temperature

In all models the uncertainty about the additive bias increases from control to scenario period. For some models the posterior mean of the bias remains unchanged and only the spread is larger. For other models the posterior mean also changes. Compared to the biases of the summer, the biases are slightly smaller in winter. Again, the RCAO model yields the largest bias and the largest bias change, but they are smaller than in the summer for the same model.

### 4.4 Sensitivity analysis

In Sect. 3.4 we described an identifiability problem of our model setup. Our solution in the Bayesian framework has been to choose informative priors for the two parameters Δβ_{i} and \(q_{b_i}\). We used a normal distribution with expectation 0°C and variance 0.5°C^{2} for Δβ_{i} and an inverse Gamma distribution with expectation 1 and variance 0.33 for \(q_{b_i}\) (see Table 2). Although these choices are based on some qualitative knowledge about the behaviour of the model biases, there is an additional uncertainty in this prior distribution that is difficult to quantify. We therefore vary the variances of these prior distributions over a large spectrum of possible values to examine the sensitivity of our model to the prior distributions. It would be desirable to vary several of the hyper-parameters simultaneously since there are also possible interactions between the parameters, but this is computationally expensive to do. As a compromise between varying only the hyper-parameters of one single parameter and varying all hyper-parameters together we simultaneously varied the hyper-parameters of two parameters and kept the others fixed. This has been done for all possible pairs of parameters.

_{Δβ}

^{2}and σ

_{Δμ}

^{2}. In Fig. 11 we show results of different prior distributions for one single parameter Δβ

_{i}, the additive bias change. For Δβ

_{i}we varied the hyper-parameter σ

_{Δβ}

^{2}of the prior distribution. Plots of the effect on the posterior for the additive bias change Δβ

_{i}of the CHRM model and the corresponding climate shift Δμ are shown for the “constant bias” assumption, but the plots for the “constant relation” assumption look similar.

In the upper row of Fig. 11 the dashed red lines show the prior distributions and the solid black lines the a posteriori distributions of Δβ_{i}. Different values of the prior variance σ_{Δβ}^{2} are used. For large values one can see the identifiability problem. There is a lot of uncertainty and the gain of knowledge by the observations is small. For small values the prior and the posterior distributions are nearly identical. In such cases we assume that there is essentially no bias change and therefore the identifiability problem disappears. These different prior distributions for the bias change affect not only the posterior of the bias change, but also the posterior of the parameter Δμ that describes the climate shift. In the lower row of Fig. 11, it is shown how the posterior of Δμ (solid black line) changes by varying the prior variance σ_{Δβ}^{2}. Note that the prior distribution of Δμ is fixed (dashed red line) and only the prior distribution of Δβ_{i} is changed. Furthermore if there is an uninformative prior for Δβ_{i}, the correlation between Δμ and Δβ_{i} gets higher as expected. Nevertheless if one only considers the sum Δμ + Δβ_{i} as proposed in Sect. 3.4 in alternative (iv), the identifiability problem disappears. For all σ_{Δβ}^{2} in Fig. 11 one obtains the same distribution for this sum (plots not shown). As indicated in Sect. 3.4 this is not a true solution to our problem since the estimation and separation of the climate shift and model bias is our main purpose.

Having a very concentrated prior distribution around 0 for the Δβ_{i}’s means that there is no bias change. In that situation the a posteriori distribution for the climate shift is also very concentrated around 5°C (mean summer temperature increase). With a totally uninformative prior for Δβ_{i}, the uncertainty about the climate shift increases, Δμ laying somewhere between 2 and 7°C. This behaviour of the climate shift has also been observed by Lopez et al. (2006, see their Fig. 3) when they are using different priors for the change of the variability for the scenario runs. Including the year-to-year variability, the uncertainty about the predicted mean summer temperatures would be even larger. But such uninformative prior for Δβ_{i} with σ_{Δβ}^{2} = 4 or 16 is not a reasonable choice in our view. The bias |β_{i}| are with high probability less than 3° (see Fig. 7) for all models and therefore one would expect the bias changes |Δβ_{i}| to be smaller than 3° as well.

Having more uncertainty in the scenario runs implies increased uncertainty in the climate shift. The sensitivity shown here is not a disadvantage of the Bayesian approach, but it highlights a more general problem. Making the assumption of a constant bias over time leads to a too confident conclusion about the precision of a prediction. The other extreme is the assumption that there is no knowledge about the change of the bias at all. Then practically no conclusion can be drawn from the model outputs. Hence one should make a reasonable choice of the size of possible bias changes Δβ_{i}. Note that only for additive and multiplicative bias changes, informative priors have been used. To validate this statement we have run a simulation in which all other priors have been taken completely uninformative (improper priors). The results have not changed.

## 5 Conclusions and outlook

The new methodology is successfully applied to temperature changes as simulated by five GCM/RCM model chains, and it yields a single probabilistic estimate of climate change under an SRES A2 scenario. We can consider the predictive density of the resulting temperature changes as a kind of weighted average of shifted and scaled versions of the individual RCM predictions. The Bayesian approach incorporates a statistical way for deriving the weights, shifts and scale factors. We start with equal prior weights and the same priors for shifts and scale factors for all models. In principle, with the Bayesian approach it would also be possible to include qualitative a priori knowledge about different model behaviour in an easy way.

The methodology does not make any a priori assumptions regarding climate change. In particular, the priors for the parameters describing the climate change signal are non-informative. A more comprehensive sensitivity analysis (not included in the paper) confirms that the choice of these priors does not influence the results.

Our analysis does however show that there is an intrinsic identifiability problem, as the data does not allow a clear separation between bias changes and climate changes. Some additional assumptions are thus inevitable. We resolved this identifiability problem by using informative priors for the bias changes. The choice of these priors influences the results, but we believe that our choice is reasonable, and we show that the sensitivity is small as long as we avoid extreme choices. Also, the use of informative priors is well justified, as there is both established trust in climate models and justified doubts about the stationarity of model biases. In effect, our approach constrains the bias changes to be smaller in magnitude than the climate changes by about a factor 3.

The study demonstrates that assumptions about the extrapolation of the model biases from the control into the scenario period are crucial, at least for the situation considered (Alpine summer surface temperatures). To arrive at this conclusion, we have made two different assumptions about the behaviour of the model bias, referred to “constant bias” and “constant relation” assumption. Both assumptions appear plausible and both have (implicitly or explicitly) been used in climate studies, yet the two assumptions yield different estimates of future summer mean temperatures. Indeed, with one of the two assumptions, the strong summer mean warming exhibited by most models is reduced from an ensemble mean of 5.4°C to 3.4°C, thus becoming smaller than the ensemble mean warming for the winter season. By contrast, winter temperature estimates are not affected by the bias assumptions, and this difference is explained by the difficulties (success) of the models in reproducing the observed interannual variability of the summer (winter) season. Although the current paper restricts its attention to Alpine temperatures, we note in passing that similar conclusions can be drawn if the model is applied to larger areas, e.g. Central Europe.

The aforementioned result is of general interest, as it questions an important implicit assumption of current scenario models, namely that the model bias will not significantly depend upon the climate state. This assumption is implicitly buried in the consideration of “changes in climate”, which are defined as the difference between scenario and control climate.

Distinguishing in an objective way between the two aforesaid bias assumptions seems difficult. The decision cannot be made by statistical methods alone, but needs expert knowledge. Additional information about the behaviour of model biases may be gained by considering one model in different climatic regions or under different emission scenarios. Longer time series for the control runs and observations may also help to determine the behaviour of the biases and would also enable the consideration and exploitation of different variability measures (e.g. interannual versus decadal variability).

There are several extensions of our methodology beyond the current study. Since spatial and temporal aggregation is a limitation of this study, one could consider spatial averages over smaller regions (e.g. station rather than domain-averaged data), temporal averages over shorter periods (e.g. monthly rather than seasonal means), other variables (e.g. precipitation), or replace the temporal averages by a measure that considers extremes (e.g. number of days above a 90th percentile). Applying the current methodology to other models and data sets (e.g. global mean surface temperature) would also be of considerable interest. Some of these extensions would presumably require us to consider non-normal distributions. Extensions to other location-scale families of distributions (univariate distributions that are parameterized by a location parameter μ and a scale parameter σ) are straightforward, but things become more complicated when different shapes of the distribution are also involved. Other potential extensions deal with the separation of GCM and RCM uncertainties and with an individual treatment of the different RCMs trends. For the former, one would include RCMs that are based on the same GCM simulation and model the correlations with hierarchical random effects. For the latter, one would replace the common slope γ in Assumption 2 by a model-specific slope γ + δ_{i} for model *i*. Another question is the treatment of spatial correlations if no aggregation is done. Some of these extensions will be considered in the PhD thesis of the first author.

## Acknowledgements

The ETH climate simulations were conducted on the computing facilities of ETH and the Swiss Center for Scientific Computing (CSCS), and the associated research was supported by the Fifth and Sixth Framework Programme of the European Union (projects PRUDENCE and ENSEMBLES), by the Swiss Ministry for Education and Research, and by the Swiss National Science Foundation (NCCR Climate). Data have been provided through the PRUDENCE data archive, funded by the EU through contract EVK2-CT2001-00132. We thank three referees for constructive comments.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.