The UKCP18 projections exemplify key characteristics of state-of-the-art information about future regional climate. Here, we assess to what extent different strands of the UKCP18 land projections (Murphy et al. 2018) satisfy the quality dimensions of the framework. The probabilistic projections combine multi-model-ensembles (MME) and perturbed-physics-ensembles (PPE) to provide a probabilistic estimate of the uncertainties tied to future changes in regional climate. The global projections provide model-derived trajectories for future climate which aim to sample a broad range of possible future responses to anthropogenic forcing (Murphy et al. 2018, p. 38). The regional projections include dynamical downscaling using a PPE of regional climate models (RCM).
We apply the quality assessment framework to these three strands of UKCP18 and assess how they satisfy the dimensions of the quality framework. When appropriate, we show whether quality varies depending on the variable of interest within a particular strand or across strands. For example, the theory dimension highlights that quality is better satisfied for estimates about variables that depend on thermodynamic principles (such as global average temperature) than fluid dynamical theory (such as regional precipitation) (see, e.g., Risbey and O’Kane 2011) independently of the strand under assessment. Table 2 provides a summary of the products of the UKCP18 land projections.
The probabilistic projections provide probabilistic estimates for potential future climate over the UK, based on an assessment of model uncertainties (Murphy et al. 2018).
The probabilities can be interpreted as an outcome of the methodology used. The authors of UKCP18 say that “the available models are sufficiently skillful that the conditional probabilistic projections…provide useful advice about known uncertainties in future changes” (Murphy et al. 2018, p. 10) but recognize that “systematic errors represent an important but unavoidable caveat” (Murphy et al. 2018, p. 10). Furthermore, they warn the user that the probabilities do not reflect the confidence the scientists have in the strength of the evidence (see, e.g., Murphy et al. 2018, p. 9). This implies that the probabilities do not provide a measure of what can be concluded from the evidence.
These statements do not clarify how to interpret the usefulness of the information provided. If the uncertainty ranges do not represent the possible ranges of future climate but rather are conditional on the particular methodology and the evidence used, what are the consequences for the statements about future climate? A non-expert user would probably not be able to use this information to assess the consequences for the epistemic reliability of the probabilistic estimates and therefore for the suitability of the information for their particular purpose.
The decision-relevance of the information and the expertise required by a user to assess the epistemic reliability of the uncertainty estimates are not clarified by the additional available documents. For example, consider the following:
“We have designed the probabilistic projections to provide the primary tool for assessments of the ranges of uncertainties in UKCP18. However, they may not capture all possible future outcomes.” (Fung et al. 2018, p. 3)
“The future probabilistic projections in UKCP18 are an update of those produced for UKCP09. You should interpret the probabilities as being an indication of how much the evidence from models and observations taken together in our methodology support a particular future climate outcome. […] The relative probabilities indicate how strongly the evidence from models and observations, taken together in our methodology, support alternative future climate outcomes.” (Ibid.)
These statements show that the evaluation of the merits of a complex methodology is left to the user to decipher. It is unclear how a user who is not an expert in uncertainty assessments could assess the extent to which these estimates are suitable for their purposes. So, while the availability of multiple reports and guidance notes would suggest that the probabilistic projections satisfy the transparency dimension, the opacity of the method to derive the projections and the lack of explanation of how this affects the statements about future climate indicates that the probabilistic projections only minimally satisfy this dimension (score: 1). In order to score higher along this dimension, it should be clearly stated what it means for the uncertainty ranges to be conditional on the evidence and methodology, and what the consequences of this conditionality are. For example, it could be specified how much wider the uncertainty range could be, and what kind of information the probabilistic estimates can provide—do they represent the degree of belief UKCP18 scientists have regarding future regional climate?
Theoretical understanding is an important component of climate information for adaptation, and models do not directly encapsulate all theoretical knowledge (Baldissera Pacchetti et al. 2021). In order to show how epistemically reliable the results are, model output should be assessed based on the scientists’ theoretical understanding of climatic processes and the theoretical justification for how the model output is processed. The theory dimension of the framework does not only address the process understanding of the underlying mechanisms responsible for observed and future climate, but also the use of methodology. Here we focus on methodology.
Murphy et al. (2018) use the Bayesian framework of Goldstein and Rougier (2004) to develop probabilities. The probabilistic projections are mainly constructed by developing three PPEs. Two of these are updated with observational constraints and combined with an MME obtained from CMIP5 “to achieve a combined sampling of parametric and structural uncertainties in physical and carbon cycle responses” (Murphy et al. 2018, p. 13). The model output is then further downscaled with an RCM PPE to produce the projections at the 25-km resolution. There are several issues with this methodology.
While Murphy et al. (2018) state that the probabilities do not reflect their confidence in the evidence, the probabilities are presented as some kind of knowledge claim about future climate. The main issue here is that probabilities cannot be interpreted as a measure of likely futures—not even subjective probabilities as intended by the original methodology introduced by Goldstein and Rougier (2004)—unless the subjective nature of this approach is made explicit and discussed in more detail. These probabilities are a quantified measure resulting from the methodology and the modeling choices, but it is unclear whether they are a measure of uncertainty about future climate. We further substantiate this claim below.
Murphy et al. (2018) do not usefully discuss how UKCP18 addresses the issues raised in Frigg et al. (2015), who argue that the use of the discrepancy term to generate decision-relevant probabilities is problematic. The use of the discrepancy term rests on the informativeness assumption, i.e., the assumption that the distance between the model and the truth is small (Frigg et al. 2015, p. 3993).
Murphy et al. (2018) assume that the MME from CMIP5 can be an adequate proxy to estimate this distance, but CMIP5 output cannot be considered a representative sample of the real world and thus a good basis for assessing structural model uncertainty. This assumption is flawed because of shared assumptions and shared biases of models (see Masson and Knutti 2011; Knutti et al. 2013; and the discussion in Baldissera Pacchetti et al. 2021, p. 481).
While these criticisms are acknowledged in UKCP18, it is not explained how UKCP18 overcomes the consequences for generating decision-relevant knowledge so the concerns over the informativeness of the discrepancy term identified by Frigg et al. in UKCP09 persist in UKCP18. Probabilistic estimates would be better justified if supplemented with physical interpretation of the model output. As we and others have argued elsewhere (Stainforth et al. 2007a; Frigg et al. 2015; Thompson et al. 2016; Baldissera Pacchetti et al. 2021), extrapolatory inferences can be unreliable for complex, nonlinear systems like the climate system, and certain methodological assumptions used to produce probabilistic estimates about future regional climate do not warrant the claims of decision-relevance for the information obtained from these projections. Furthermore, these estimates cannot be considered to represent subjective credences of a group of experts, since the authors of the technical report themselves state that “the probabilistic format should not be misinterpreted as an indication of high confidence in the weight of evidence behind specific outcomes” (Murphy et al. 2018, p. 9). The probabilistic projections therefore do not satisfy (score 0) the theory dimension. To improve theory with respect to the methodology, the subjective nature of these probabilities should be fully embraced, the justification for the informativeness assumption and its limitations should be described, and alternative methodologies to aggregate model output should be taken into consideration (e.g., Stainforth et al. 2007b).
Diversity and completeness
Diversity and completeness assess some key characteristics of the evidence and how the evidence is analyzed. These dimensions are subdivided into independence, number, and comprehensiveness, which respectively assess the shared assumptions and origin, the number of different types of evidence, and the extent to which individual types of evidence are explored.
The main lines of evidence used are an MME, three PPEs (the output of which is augmented with a statistical emulator), and observational data. To assess the diversity of this evidence, we discuss the extent to which these sources of evidence are different from one another, and, relatedly, whether they share substantive assumptions. In addition, expert knowledge is used to estimate the ranges of the parameters of the PPEs (Murphy et al. 2018, p. 13). However, the process for extracting the knowledge and the uncertainty implications for the probabilistic projections are unclear. The UKCP18 science reports (Murphy et al. 2018; Lowe et al. 2018) do not reveal any other sources of evidence for the probabilistic projections. The lack of a thorough description of the use of expert judgment to select the parameter ranges is problematic because the methodology used to process the PPEs was designed as an approach for quantifying expert knowledge (Goldstein and Rougier 2004). It is unclear however whether Murphy et al. (2018) intend their methodology to represent expert judgement (or expert knowledge). Besides, it has been argued that probabilistic expert elicitation can be ambiguous and can underestimate the uncertainty associated with the knowledge claims of groups of scientists (Millner et al. 2013). The consequences of such issues are impossible to assess because the expert judgement aspect of the approach is not described and indeed is undermined by various caveats (see above and Murphy et al. 2018, p. 9). We cannot therefore assess the role expert knowledge plays as a source of evidence, so the discussion below focuses on model-based and observational evidence.
Independence is somewhat satisfied (score 2) with respect to model-based and observational evidence. We consider the MME and PPEs to be one type of evidence. In principle, these ensembles explore different sources of uncertainty: the MME explores structural uncertainty, whereas the PPE explores parameter uncertainty. Nevertheless, there is considerable overlap in the model structure and, consequently, shared biases in model output (Masson and Knutti 2011; Knutti et al. 2013). However, we can consider observations to be a different type of evidence. Take the HadCRUT3 dataset (Brohan et al. 2006) used for temperature as an example. This dataset is evaluated with re-analysis data but the overlap in model-based assumptions is not considerable (Parker 2016). Number is minimally satisfied (score 1) as few types of evidence are taken into account. Comprehensiveness is somewhat satisfied (score 2) with respect to model-based and observational evidence: structural model uncertainties are explored with a large MME by today’s standards and the uncertainties regarding the choice of parameters within one of the models is also on the large side by today’s standards although climateprediction.net demonstrated that a wider range of behavior can be found with much bigger ensembles (Stainforth et al. 2005).
Since the probabilistic projections aim to provide an estimate of uncertainty, there is one more way in which comprehensiveness should be assessed. Singh and AchutaRao (2020) show that observational uncertainty can affect estimates of future change, as the assessment of model performance varies depending on the observational dataset used. This uncertainty may be minimal for datasets of variables that have an extensive record in space and time and bias may be easily removed for variables that are well understood–such as temperature. However, this uncertainty may become severe for other variables of interest and can change depending on the metric used (Kennedy-Asser et al. 2021), and this difficulty should be explicitly acknowledged to provide epistemically reliable information.
In order to improve quality along these dimensions, expert elicitation should be thoroughly documented, a wider range of models coming from different modeling centers should be taken into account, and parametric uncertainty should be systematically explored across different models. Reanalysis data could also be taken from different centers as European and global reanalysis datasets are produced by several international research centers. This diversity could help control for some of the idiosyncrasies in modeling assumptions and data processing methodologies that are tied to each research centre.
Historical empirical adequacy
Historical empirical adequacy assesses whether statements about future regional climate intended for climate change adaptation decisions have been subjected to adequate empirical tests. Empirical adequacy for the variables for which probabilistic estimates are provided is not itself an indicator that the probabilistic estimates will be epistemically reliable, but if they are not empirically adequate it is a strong indicator that they will not be epistemically reliable. In this sense, empirical adequacy for the purpose of evaluating model behavior for variables of interest is a minimal requirement for quality. The importance of empirical adequacy for evaluating models has been stressed recently by Nissan et al. (2020). The following analysis is based only on the information that can be accessed.
The output of the probabilistic projections is assessed and updated mostly by studying anomalies in key variables. For example, Murphy et al. (Murphy et al. 2018, Fig. 2.4a, p. 20 and Fig. 2.5, p.25) assess temperature changes with respect to a chosen baseline period. This evaluation of the empirical adequacy of a model or a group of models does not satisfy historical empirical adequacy. While anomalies may be useful for supporting a strong inference about the need for mitigation, it does not adequately support epistemically reliable estimates about the future climate for adaptation. We provide a motivation for this claim below.
Empirical adequacy with respect to an anomaly is only a measure relative to a chosen baseline, makes strong implicit assumptions about the linearity of the climate system, and can be achieved without a good representation of some of the details of the earth system. Take the time series data of GMST for the 1900–2000 period from CMIP5 alongside a time series of observations shown in Frigg et al. (2015, p. 3994). While the warming signal appears consistent among model output, there is a considerable difference across models for the absolute value of GMST. As Frigg et al. (2015, p. 3994) note, these differences—albeit only of a few degrees Celsius—are an indication that different models represent the earth system differently: the location of sea-ice, vegetation, etc., varies across models, and so do associated feedbacks. While this may be of less significance for evaluating the historical empirical adequacy of a global signal of climate change and related uncertainties, estimating how much temperature will change locally needs to rely on an adequate representation of the relevant earth system components, and associated processes and feedbacks—which is not captured by the empirical adequacy of anomalies.
This issue is particularly relevant when information is downscaled: heterogeneities across models in the representation of physical features of the earth system and associated processes and feedbacks may not matter when model output is averaged globally, but they will be of crucial importance when evaluating model performance at regional scales (Ekström et al. 2015). Because of the importance of evaluating historical empirical adequacy for the purpose of informing decision-making in terms of absolute values of the relevant variables, historical empirical adequacy is not satisfied for the probabilistic projections (score 0). To improve along this dimension, model performance should be evaluated (and shown to be evaluated) for absolute values of the variables provided.
The focus of the global projections is on estimates and statements about future climate derived directly from individual CMIP5 and HadGEM-GC3.05 simulations rather than processed ensemble output. This also shifts the focus of the quality assessment. These projections aim to show “how the 21st century climate may evolve under the highest emission scenario RCP8.5” (Lowe et al. 2018, p. 1). The purpose of these projections is to provide “a multi-variable dataset for impacts analysis … and [to support the] development of storylines relating to future climate variability and extremes on a broad range of timescales” (Murphy et al. 2018, p. 35). Further details about the global projections can be found in Table 2.
The global projections provide information on most of the sources of evidence and describe their methodology, but there are components of the evidence and how the evidence is analyzed that are not accessible or traceable. Again, the user is left to assess certain key features of the quality of the projections with little support from the UKCP18 documents or user interface.Footnote 2
There are various instances where this occurs. For example, as we discuss below, the user is left to assess which models perform best and what this implies for the epistemic reliability of the information. Moreover, the UKCP18 user interface does not aid in the evaluation of the performance of models against observations. Take the time series data for precipitation from the global projections (Fig. 1). When producing these images through the user interface, one can highlight up to 5 members of the ensemble, but one cannot distinguish between PPE and CMIP5 members. Furthermore, one cannot compare the model output with observations through the user interface. Unless the user has the skills to download the relevant data and process it themselves, they cannot easily assess the historical empirical adequacy dimension.
Furthermore, while most of the data sources are cited, it is not always clear what kind of data sets are used at various stages of the projection development process. For example, Murphy et al. (2018) cite the paper from which they borrow the methodology for model evaluations using 5-day simulations as the source of their data, but that paper only vaguely references the data set used (Williams et al. 2013, p. 3259). Another example of a lack of transparency in the model development process is the use of expert elicitation in the construction of the PPE. Murphy et al. (2018) do not specify who the experts are and how they were chosen.
These considerations indicate that the global projections somewhat satisfy the transparency dimension (score: 2). The raw data can be downloaded from the interface, but the user would need to have high numerical literacy and programming skills to fully trace the model output. To improve transparency, the origin of the output of the global projections and the data sources used for the model verification should be fully traceable through the user interface and, ideally, thoroughly described in the supporting documents.
The description of the theoretical underpinning of how global atmospheric circulation patterns can affect UK weather is discussed at various stages in relation to the global projections (Murphy et al. 2018). For example, theoretical understanding of key processes is taken into consideration when choosing which parameters to perturb in the PPE and when choosing what synoptic system metrics to use to assess the performance of the simulations. However, the use of theoretical understanding is not explored in much depth in the science report.
The overview report of the scientific output (Lowe et al. 2018, p. 35) provides some further insight into how this theoretical understanding can be used. For instance, theory about large-scale circulation patterns and their effect on local weather is combined with model output to provide statements about possible future climate over the UK. While this use of theoretical insight contributes to satisfying the theory dimension of the quality framework, the overview report exemplifies the use of theory only for pressure; there is no discussion of how it affects temperature or other variables. These considerations suggest that the global projections do somewhat satisfy the theory quality dimension (score 2). To improve quality along this dimension, there should be better integration between the theoretical evaluation of the physical processes represented by models, and how it bears on the epistemic reliability of model output for individual variables.
Diversity and completeness
There are several different sources of evidence for the global projections: MME, PPE, expert elicitation in building the PPEs, reanalysis data, and observations (Murphy et al. 2018). As we have discussed in the evaluation of the probabilistic projections, MME and PPE count as one type of evidence.
Model output is derived from both a PPE and an MME. The MME output is similar to the one used for the probabilistic projections, but the PPE is constructed and forced differently (see Murphy et al. 2018, Section 3). The model output here is assessed as a source of evidence as it is used at various stages of the filtering stages to satisfy the principles of “plausibility and diversity” that drive the projection development process (Murphy et al. 2018, p. 37).
Expert elicitation follows Sexton et al. (2019), which is itself partly based on the Sheffield Elicitation Framework (SHELF) method of Oakley and O’Hagan (2010). Expert elicitation is used to set up the parameter space for the PPE. The parameters and the respective ranges are elicited from experts following the protocol suggested by SHELF but not using the software. The experts were advised “to base their ranges on their own sensitivity analyses, theoretical understanding, or empirical evidence excluding any knowledge they had of the effects of the parameters in climate simulations.” (Sexton et al. 2019, p. 995). The experts also provided guidance on selecting the shape of the distribution.
Observations are used at various stages of the production process. First, they are used to filter the PPE to extract the most plausible and diverse set of models. Reanalysis datasets from the ECMWF are used to assess the short term (5-day) hindcasts (see Williams et al. 2013, p. 3259) and the Met Office HadISST2 data (Titchner and Rayner 2014) for the longer term (5-year) simulations (see Murphy et al. 2018, pp. 41–45). Observations are also used to assess how PPE performs in simulating large-scale circulation, like AMOC.
So, the global projections draw from three different types of evidence and generally satisfy the “number” component of diversity and completeness (score: 3). We note that the score of this component depends on the variable in question. For example, if we assess global projections about mean temperature, the level of theoretical understanding of thermodynamic response to GHG concentrations warrants a lower number of types of evidence to achieve the same score as model-derived statements about regional precipitation patterns.
We can now evaluate the independence and comprehensiveness of the evidence. Independence cannot be assessed for expert elicitation and model-based evidence, because the origin of the experts is not disclosed (score 0), but it is generally satisfied for model-based evidence and observations (score 3). For any variable, the PPE represents a more comprehensive evaluation than the MME, because the “plausibility and diversity” principles are applied only for developing the PPE and not the MME. Nevertheless, both ensembles contribute to the overall projections, and overall comprehensiveness is therefore somewhat satisfied (score 2). To improve along both diversity and completeness, then, the source of the experts should be revealed—and the experts should be sought from international research centers. Moreover, “plausibility and diversity” principles could also be applied for the evaluation and selection of components of the MME.
Historical empirical adequacy
Different datasets are used to assess the historical performance of models at different timescales (e.g., the 5-day and 5-year evaluations described in Murphy et al. 2018, p. 41). The discussion in Murphy et al. (2018) does not provide information about the empirical adequacy of the output of individual models, but the agreement between model output and observations is discussed with examples in Lowe et al. (2018).
Figure 2 shows the output of two random models from the global projections (model A and model B) and the NCIC observations for temperature anomaly, wind speed anomaly, and precipitation rate. There are several problems with this evaluation of empirical adequacy. First, the issues tied to using anomalies to assess the empirical adequacy of models discussed above are also relevant here. Second, the comparison of observations and model output for wind speed anomaly and precipitation do not support a high score on this dimension. The models illustrated do not appear to capture enough of the variability for wind speed anomaly although whether this is an artifact of model selection or a more general issue is unclear. The precipitation rate output shows a lot of variation between different models but there is no guidance on how to interpret this variation? Understanding these issues is important because the features of atmospheric systems that influence variables such as wind speed and precipitation are not as well understood as those that influence temperature (see Risbey and O’Kane 2011) so the theory quality dimension cannot take the slack for limited empirical adequacy.
There are further issues with how observations are used to assess model output. The global projections pass two filtering stages where hindcasts are assessed for 5-day and 5-year periods. The selection of these periods is not described in much detail. For example, 5-day hindcasts are only performed for data within the 2008/09 period (Williams et al. 2013, p. 3259), and the science report of Murphy et al. (2018) does not specify the years for which the 5-year simulations have been performed. Furthermore, the adequacy of all the output of the global projections cannot be assessed for many of the variables of interest. Moreover, Fig. 2 suggests that empirical adequacy is not satisfied for variables such as wind speed anomaly and precipitation by some or all of the models. The historical empirical adequacy dimension is therefore not satisfied (score 0). To improve this score, the performance of individual models with respect to absolute values of the variables of interest should be more explicitly discussed for each model of the ensemble.
The regional projections serve the same purpose as the global ones and follow a similar methodology (Murphy et al. 2018). There is therefore considerable overlap in the assessment and recommendations for improvement of these projections with the above global projections. There are, however, two main differences between these projections. First, the regional projections only use models from the Hadley Centre (no CMIP5 data). Second, the regional projections are developed using a one-way nesting approach to dynamically downscale the projections over the UK by forcing a PPE of regional models with a PPE of global models.
The regional projections somewhat satisfy the transparency dimension (score 2) for similar reasons as the global projections. As we will discuss below, some of the dimensions are difficult to assess either because the sources of evidence are not easily accessible or because accessing them would require a user to have the skills to analyze the data themselves. For example, the analysis given by Murphy et al. (2018, pp. 95–107) only shows model performance with respect to temperature and precipitation, while many other variables (such as wind speed, cloud cover, relative humidity) are available through the user interface (Fung 2018). Higher transparency could be achieved by following the same recommendations that were given for the global projections above.
While the regional projections share many methodological assumptions with the global projections, the evaluation of the regional projections includes some additional theoretical considerations. For example, model performance in reproducing European climatology is part of the assessment process. As with the global projections, model performance in reproducing past climatology and major synoptic systems does not guarantee that they can predict future changes. Theoretical support is needed to relate past model performance to key processes and how these processes might respond to higher GHGs concentrations. There are many difficulties in making such an assessment. For instance, the extent to which large-scale systems such as “atmospheric blocks” will affect temperature extremes over Europe and nearby regions such as the UK is still a matter of debate (Voosen 2020).
These considerations are important for the global projections but are magnified in the case of downscaled information. Possible biases introduced by downscaling are assessed for temperature and precipitation (Murphy et al. 2018, pp. 95–107). However, Giorgi (2020, p. 435) notes that the dynamical components of climate models are not well understood, and downscaling adds complexity to the evaluation of the model. Hence, as for the case of the probabilistic projections, reliance on only one modelling strategy may hide significant biases the consequences of which are not explicitly addressed. The theory dimension is therefore only minimally satisfied by the regional projections (score: 1). To improve the theory dimension, more explicit justification for the choice of downscaling method (see, e.g., Rummukainen 2010, 2016; Ekström et al. 2015) and possible consequences for model output should be included in the documents.
Diversity and completeness
Observations, model output, and expert elicitation are the three main types of evidence used here. So, like the global projections, the regional projections generally satisfy number (score: 3) and somewhat satisfy comprehensiveness (score: 2).
However, the regional projections only minimally satisfy independence (score: 1). First, the models that are used for the regional projections are all from the Hadley Centre. Watterson et al. (2014, pp. 607–698) show that CMIP models have an advantage in simulating temperature, precipitation, and pressure levels over their home territory. But skill in reproducing past data does not directly imply a good representation of the underlying physical processes—and global scale phenomena and/or teleconnections may influence future changes in the UK climate. So, the exclusion of CMIP5 models may undermine the principles of “plausibility and diversity” that guide the production of the global projections. Second, as discussed above, the downscaling step adds complexity, introducing further assumptions into the modeling process. To improve independence and comprehensiveness, more models that were not developed by the Hadley Centre should be taken into consideration. The provenance of the experts involved in the elicitation process should be diverse, too.
Historical empirical adequacy
The empirical adequacy of the regional projections is assessed by evaluating the performance of the regional models in reproducing European climatology, surface temperature, precipitation, and AMOC strength using the NCIC dataset and the standard configuration of the GCM used for the global projections. Murphy et al. (2018) claim that model performance is also assessed for other variables, but it is not discussed in detail in the report and so cannot be assessed.
The empirical adequacy of the regional projections is described more thoroughly than for the global projections, and as discussed above, there is an extensive discussion of how data and model output are compared to observations to eliminate models with possible biases. The acknowledgement of biases in model performance for absolute values of temperature and precipitation at different spatial resolutions (see, e.g., Fig. 4.5a in Murphy et al. 2018) suggest that the regional projections generally satisfy empirical adequacy (score: 3) for some of the variables of interest. However, there are some important caveats. First, the empirical adequacy cannot be assessed for all variables available in the regional projections. Second, a higher historical empirical adequacy does not imply a higher overall quality of the information. Furthermore, even if the regional projections have a higher historical empirical adequacy score than the global projections, they cannot have an overall higher quality than the global projections due to the additional assumptions introduced by the downscaling step. Historical empirical adequacy could be improved by explicitly discussing model performance for each variable provided.
The overall quality of a product cannot be assessed as the sum of the individual evaluations along the different dimensions (Baldissera Pacchetti et al. 2021). Interdependencies of the assessed products, of the quality dimensions and their relation to statements about different variables make overall quality comparisons difficult. Nevertheless, the dimensions highlight the major strengths and weaknesses of the projections and how these are related to features of the projection construction process. Figure 3 provides a visualization of the scores of the quality assessment for the different projections. This figure shows that the probabilistic projections have the lowest quality and that their main shortcomings derive from lack of transparency, theoretical support, and lack of adequate empirical tests. The global projections have higher quality but also lack historical empirical adequacy.
The higher quality of the global projections derives from two key differences. First, the global projections are not concerned with probabilistic estimates of future climate but rather with individual model simulations and potential future trajectories. This means that the evidential standards for achieving epistemic reliability are different. Second, the theoretical component—both in terms of underlying physical theory and the justification of the methodology is better justified in the global projections. The importance of synoptic weather systems and their role in understanding changes in regional weather is acknowledged, and the “plausibility” principle draws explicit attention to the physically meaningful representation of the processes that drive regional changes. Nevertheless, the above analysis shows that one cannot adequately assess the extent to which these projections satisfy key dimensions such as historical empirical adequacy of the global projections.
The regional projections have slightly lower quality than the global projections. There is little independence between sources of evidence, and the additional downscaling step, while thoroughly explained, requires additional theoretical justification for the regional projections to be adequately assessed as epistemically reliable. Moreover, the focus on the use of mostly nationally produced models raises questions about the context in which these models are granted epistemic authority (see, e.g., Mahony and Hulme 2016).