Our aim in this section is to understand how performance filtering of the Wave-1 PPE affected its feedback component distributions (Sects. 3.2, 3.3), and to analyse the relationships and processes driving these constraints (Sects. 3.4, 3.5). To provide context for the filtering analysis, we start with a summary of the distributions of the feedback components from the Wave-1 PPE (before filtering), along with a comparison to CMIP5 atmosphere-only models.
Summary of feedbacks in the atmosphere-only PPE
Global climate feedback values from the Wave-1 PPE are summarised in Table 1 and shown in Fig. 1. Net feedback values vary between − 1.80 and − 0.93 Wm−2 K−1, covering a much larger range than the stochastic ensemble, which varies between − 1.30 to − 1.22 Wm−2 K−1. Similar effects are seen across all the feedback components (comparing green crosses to black box-and-whiskers in Fig. 1), indicating that the parameter perturbations are the main source of the spread in feedbacks across the PPE, rather than internal variability.
Table 1 Descriptions of the perturbed parameters in the coupled and atmosphere-only PPEs which are most prominent in this analysis The range of net feedbacks explored in the PPE is comparable to that from a set of 11 CMIP5 atmosphere-only models, which are based on amipFuture and amip experiments (Ringer et al. 2014). The CMIP5 values sample a similar range to the PPE (− 1.85 to − 1.04 Wm−2 K−1), but this similarity masks some clear differences between the ensembles. For example, Fig. 1 shows that the high end of the CMIP5 range is dominated by a single model (IPSL-CM5A-LR), with the next highest value (− 1.42 Wm−2 K−1) lying close to the PPE median. This means that over 50% of PPE members have less negative values (i.e. higher climate sensitivities) than 10 of the 11 CMIP5 models and that the PPE provides a much more thorough sampling of the upper end of the net feedbacks.
A similar, but more pronounced, situation is seen for the net CRE component where the PPE does not sample any negative feedback values, in contrast to CMIP5. This is a result of a partial cancellation of large, but oppositely signed, LW and SW CRE feedbacks, with the positive net CRE feedback values resulting from the larger magnitude in the SW CRE component. (We discuss this cancellation, which is due to the effect of high cloud, in more detail in Sect. 3.5.2.) The differences to the CMIP5 ensemble are even larger here: the stochastic ensemble lies outside of the CMIP5 ranges in both components, while there is only minimal overlap between the PPE and CMIP5 (indeed there is no overlap in the LW CRE feedbacks). This has been a feature of recent Hadley Centre models (Senior et al. 2016; Andrews et al. 2019), while Bodas‐Salcedo et al. (2019) have also shown an increase (decrease) in SW (LW) CRE feedbacks in HadGEM3-GA7.1 compared to its predecessor, HadGEM3-GA6. The latter study attributed SW CRE feedback changes to new aerosol and mixed-phase cloud schemes (Mann et al. 2010; Furtado et al. 2016), both of which are included in the GA7.05 configuration, highlighting the importance of structural model changes in the representation of cloud feedbacks.
For net clear-sky feedbacks, the stochastic ensemble and the bulk of PPE members agree well with the CMIP5 ensemble. However, there is a clear tail to the PPE distribution sampling large negative feedback values that are well outside of the CMIP5 range. In the PPE, this is mainly due to the LW component, as sea ice is the same in the ATMOS and SSTfuture experiments. However, there are variations in the CMIP5 short-wave clear sky that may be due to differences in how the sea ice is treated in the amipFuture experiment.
The impact of filtering on PPE feedbacks
The 5-year climate performance filtering described in Sect. 2.4 ruled out 365 of the 406 Wave-1 PPE members, leaving 41 plausible variants. We refer to this as the ‘filtered’ Wave-1 PPE. The impact of this filtering on the feedback components is summarised by the black and red box-and-whisker plots in Fig. 1, whilst Fig. 2 shows the normalised histograms of these distributions. Median, maximum and minimum values are given in Table 2.
Table 2 Summary of feedback component values For the net feedback, the distribution changes show a reduction in the range across the PPE after filtering (as highlighted in Sexton et al. 2020, submitted), accompanied by a shift towards less negative feedback values (i.e. towards higher climate sensitivities). Clearer changes are seen in the cloud components, where variants with weaker negative LW CRE feedbacks and weaker positive SW CRE feedbacks are ruled out. Consequently, the distributions for these components shift to even more negative and positive feedback values, respectively, enhancing the already substantial differences to the CMIP5 models. The impact of performance filtering is also apparent in other components: for example, weakly positive net CRE feedbacks are ruled out, as are the strongly negative net clear-sky values.
We tested if the perceived shifts of the filtered Wave-1 feedback distributions, relative to the full (406-member) Wave-1 distributions, could have been caused by a random sampling of the members. We tested the null hypothesis, that the filtered distributions are random samples of the full distributions, using a bootstrapping method. For each feedback component we created 104 41-member samples, drawn randomly from the full Wave-1 distribution, and then performed 2-sample KS tests for each against the full Wave-1 distribution. The resulting bootstrapped KS test statistics are shown in Fig. 3, where they are compared to values found using the true filtered Wave-1 distributions. The impact of performance filtering is clearly significant for the SW and LW CRE feedbacks. Changes in the net and net CRE feedback distributions are also distinct from the random samples, but are less clear, while the result for the net clear-sky component is more marginal.
Key performance metric clusters
Having established that performance filtering drives significant changes to the PPE’s feedback distributions, we now identify groups of related performance metrics which are key to these changes. We refer to these groups as ‘clusters’. We use clusters, rather than focusing on individual metrics, because the full set of performance metrics (described in Sexton et al. 2020, submitted) were chosen to sample a wide range of regional and seasonal climate performances across many variables. Consequently, many of them share common driving processes and their MSE values can be highly correlated. Additionally, the effectiveness of a performance metric is dependent on several factors (e.g. the spread in performance across the PPE, the standard model’s behaviour; see Sect. 3.4.1), meaning that the number of models ruled out by these correlated metrics can vary substantially.
Although the changes for the net clear-sky feedbacks were not as clearly distinct from the sampling noise as the other components, we still include it in this analysis for completeness. Here, we use emulators (described in Sect. 2.5) to analyse the role of the performance metrics. Compared to the actual simulations, using emulators will provide a more thorough sampling of emergent relationships and will avoid any issues with noise due to small sample sizes (since only 41 actual simulations are left after filtering).
The first stage in building the clusters was to determine the individual performance metric with the largest impact on the Wave-1 distribution, for each feedback component separately. We did this using the following methodology: firstly, for every performance metric, filtering (using the ‘T’ threshold, described in Sect. 2.5.2) was applied to emulated Wave-1 distributions of log(MSE) values, creating a constrained sample in each case. Filtered distributions of feedbacks were then generated, using each of the constrained samples, and compared to the full Wave-1 distribution using 2-sample KS tests. The performance metric resulting in the largest KS test statistic value was then selected for each feedback component separately. We refer to this as the ‘principal metric’ of the cluster. The cluster was then formed by selecting performance metrics for which the emulated log(MSE) values correlated strongly (r > 0.8) with log(MSE) values of the principal metric.
The first clusters and their principal metric are given in Table 3 for each feedback component. Since each cluster is representative of the collective effect of the metrics within it, we name them using shortened descriptions summarising these. We find that the performance of extratropical LW cloud forcing (and associated) metrics provide key constraints across the feedback components. The same cluster, ‘Extratropical_LWcloudforcing’, is selected for the LW and SW CRE, and net feedbacks, whilst an almost identical cluster is also chosen for the net CRE feedback (‘Extratropical_LWcloudforcing-2’; see Table 4 in Appendix 2 for details on the individual metrics in these clusters). This high level of overlap indicates that key constraints across the feedback components are driven by common processes. In contrast, Table 3 shows a distinct cluster representing tropical LW cloud forcing variables is chosen for the net clear-sky feedback component. However, we do not analyse the role of this cluster any further due to the relatively weak changes in this feedback component shown in Sect. 3.2.
Table 3 Summary of the clusters, and their principal metrics, with leading impacts on the emulated Wave-1 feedback distributions Figure 4 shows how filtering using these clusters impacts the emulated feedback distributions. For context, the full Wave-1 (N = 64,572) distributions are shown in black, while the ‘complete’ effect of filtering using all 925 performance metrics (leaving a sample of 24,280) is shown in red. Across the feedback components, the qualitative impact of the complete filtering is mirrored by that from the first cluster (shown in blue in Fig. 4, except for the top-right panel), but the degree to which these changes are captured varies. Specifically, the first cluster explains much of the impact from the complete filtering across the cloud feedback components, particularly for the LW and SW CRE distributions.
The net feedback changes, however, are only partially captured—as evidenced by the remaining differences between the blue and red histograms for this component in Fig. 4. This is likely to be because the net feedback is a combination of more distinct processes than any of the cloud components.
To analyse these net feedback changes further, we determine a second cluster of performance metrics which is quasi-independent of the first cluster. We do this by repeating the clustering methodology described above, but after the removal of the impact of filtering by the first cluster on the feedback distribution. A full set of quasi-independent clusters could be defined this way until we run out of performance metrics. However, we limit our analysis here to the top two clusters.
The second cluster for the net feedback (SWclear-sky_ocean) comprises metrics relating to SW clear-sky fluxes over the ocean, and they include all the large-scale regions (tropics, extratropics and global) and seasons (see Table 4 in Appendix 2). The impact of combining the top two clusters on the net feedbacks is shown in the top-right panel of Fig. 4. A small but notable change in the distribution can be seen, so that it more closely reflects the total filtering than using the first cluster alone. This suggests that distinct processes and relationships lie behind the selection of these two clusters.
Emergent relationships and constraints
Section 3.3 highlighted extratropical LW cloud forcing performance as an important driver of feedback changes in the Wave-1 PPE. We now explore these relationships in more detail, with the role of the parameter perturbations being considered further in Sect. 3.5.
In Fig. 5 we show the emergent relationships for each feedback component against MSEs for the principal metric of the most prominent cluster found in Sect. 3.3: the annual mean LW cloud forcing over NH extratropical oceans (‘lcf_nhext_ocean_Annual’; see Table 3). We use the principal metric because it highlights the salient features of the constraints imposed by the associated cluster, due to the high correlations between the cluster’s metrics. This is much simpler than trying to define a combined metric based on all the metrics in the cluster. Also, we are ultimately trying to understand the constraints placed on the actual simulations. Therefore, having established a robust relationship across parameter space using the emulators, we now focus on the actual simulations.
The impact of an emergent constraint is controlled by two factors: the effectiveness of the filtering (i.e. the number of variants ruled out) and the strength of the relationship between the performance metric and the feedback components. We consider each of these in turn below for the lcf_nhext_ocean_Annual performance metric.
Effectiveness of filtering by the extratropical LW cloud forcing cluster
The top-left panel of Fig. 5 shows the distribution of lcf_nhext_ocean_Annual MSE values for the Wave-1 PPE, along with the threshold value above which models are ruled out. As described in Karmalkar et al. (2019), the effectiveness of a performance metric at ruling out parts of parameter space is determined by the relative size of the spread in MSE values against the size of the normalisation term in Eq. 2. In this study, this is largely dictated by the size of the structural error, which is based on the MSE of the standard variant. Figure 5 highlights why filtering with lcf_nhext_ocean_Annual is so effective: the good performance of the standard member (see the pink lines) sets an MSE threshold value that is relatively low compared to the spread across the PPE and consequently rules out a significant number of members.
The strength of this filtering is enhanced by the fact that the standard member is in the low tail of the MSE distribution, lying at the 9th percentile. As a result, many members lie close to the MSE threshold, with a relatively high proportion of them being ruled out (34% for this metric alone). The disparity between the standard variant and the bulk of the PPE for this metric raises a question over how confident we can be that the standard model is truly indicative of the structural uncertainty in HadGEM3-GA7.05 for extratropical LW cloud forcing. We explore this question further in Sect. 4.1.
Feedback constraints from the extratropical LW cloud forcing cluster
The second key aspect required for an effective emergent constraint is a clear relationship between the performance metric and the feedback components. These emergent relationships are shown in the remaining panels of Fig. 5 (with green crosses for the Wave-1 PPE). Very clear relationships can be seen for the LW and SW CRE feedbacks, where model variants with poorer performance are associated with weaker negative LW and weaker positive SW CRE feedbacks. When combined with the effective filtering by lcf_nhext_ocean_Annual, these relationships drive a key component of the constraints on the LW and SW CRE feedbacks (as seen in Sect. 3.3).
The emergent relationship for the net CRE feedback is noticeably less clear—consistent with the smaller changes in the filtered distributions seen in Sect. 3.2. This weaker relationship is a result of an anti-correlation between the LW and SW CRE feedbacks, shown in the bottom panel of Fig. 6, which results from the competing effects of high cloud on these components. (We explore this in more detail in Sect. 3.5.) Despite the weaker relationship, however, the lcf_nhext_ocean_Annual metric does still preferentially rule out lower net CRE feedbacks. This is due to a slightly stronger emergent relationship for the SW compared to the LW CRE feedback, which can be seen by comparing the bottom two panels in Fig. 5.
No discernible relationship is seen for the net clear-sky feedback (middle-left panel of Fig. 5). However, this is not surprising because a distinct cluster was chosen for this component (‘Tropical_LWcloudforcing’; see Table 3). Consequently, the constraint on the net feedback (the sum of the net clear-sky and net CRE components) is qualitatively similar to that for the net CRE: a weak constraint that preferentially rules out more negative feedback values (i.e. smaller climate sensitivities). As was the case for the LW and SW CRE feedbacks, these behaviours are captured in the emulated distributions shown in Fig. 4, giving us confidence in their use in our analysis.
Net feedback constraint from the tropical LW cloud forcing cluster
Figure 7 shows equivalent plots to Fig. 5, but for the principal metric of the net feedback’s second cluster: the annual SW clear-sky flux over SH extratropical oceans (‘rsutcs_shext_ocean_Annual’). The distribution of MSE values (left-hand panel) shows that the effective filtering by this metric arises for similar reasons to those for lcf_nhext_ocean_Annual—namely that the relative spread in MSE performance across the PPE is large compared to that of the standard member.
However, there are also clear differences: most notably that the rsutcs_shext_ocean_Annual MSE distribution is positively skewed. As a result, the bulk of PPE members (including the standard) are at the lower end of the MSE range explored by the PPE, with only a poorer performing tail being ruled out. This not only means that filtering with rsutcs_shext_ocean_Annual metric is less effective than with lcf_nhext_ocean_Annual (11% vs 34% ruled-out, respectively), but also contrasts with the fact that the standard member’s better performance for lcf_nhext_ocean_Annual placed it in the low tail of the MSE distribution. The implications of these differences for our structural uncertainty estimates are discussed further in Sect. 4.
No obvious relationship is seen between rsutcs_shext_ocean_Annual MSEs and net feedbacks for the Wave-1 PPE (right-hand panel of Fig. 7). This is perhaps unsurprising, given the relatively small changes for the emulated distribution (top two panels of Fig. 5) and the sparser sampling of parameter space provided by the actual Wave-1 PPE. Therefore, this second emergent constraint is based on the relationships that emerge from our emulators and the more thorough sampling of parameter space that they allow.
The role of parameters
A key advantage of PPEs is the ability to assess how parameter perturbations affect model processes and outputs. In this section we utilise this to explore the role of our parameter perturbations on: the Extratropical_LWcloudforcing cluster; on the feedback components, and finally in driving the emergent relationships described in Sect. 3.4.
Extratropical LW cloud forcing cluster
To assess the connection between parameters and the performance metrics, we first tested the impact of filtering on the 47 parameter distributions, using the Extratropical_LWcloudforcing cluster. We did this by performing 2-sample KS tests for each parameter, where the full Wave-1 distributions (406 members) were compared to those filtered by the cluster (223 members). We rejected the null hypothesis of no distribution change at the 5% level for only 2 parameters: ai, a parameter controlling the mass-diameter relationship for cloud ice; and m_ci, the cloud ice fall speed (see descriptions in Table 1).
Figure 8 shows the impact of this filtering on the ai and m_ci distributions. They show that low ai values and high m_ci values are preferentially ruled out. We show later that these poorly performing models have weaker LW cloud forcing (see members with higher MSE values in the top panel of Fig. 11), which is consistent with the physical meaning of these parameters. For example, high m_ci values (i.e. faster ice fall speeds) drive decreased cloud ice water content, which leads to less high cloud. Consequently, the LW cloud forcing (i.e. the difference between the clear-sky and total outgoing LW flux) is reduced. This effect is especially sensitive to high cloud, since the cooler cloud-top temperatures provide a greater contrast to the surface than cloud at lower altitudes. A similar effect results from smaller ai values, which indirectly drive faster ice fall speeds (see Table 1).
A parameter sensitivity analysis (Saltelli et al. 1999; and described in Sect. 2.5) for the principal metric of the Extratropical_LWcloudforcing cluster (lcf_nhext_ocean_Annual) is shown in the right panel of Fig. 8. This also indicates important roles for ai and m_ci: together they explain 60% of the variance in these MSE values. The consistency in these results highlights the suitability of using the lcf_nhext_ocean_Annual metric as a proxy to understand the behaviour of the whole cluster. We do note, however, the sensitivity to other parameters e.g. two_d_fsd_factor (a scaling factor for cloud condensate variability) and dp_corr_strat (vertical cloud overlap) which explain 21% and 17% of the variance, respectively, and may also merit further investigation.
Tsushima et al. (2020, submitted) have explored the spatial sensitivity of LW cloud forcing to parameters, using the same PPE we have analysed in this paper. The parameter sensitivities of zonal mean LW cloud forcing shown in Fig. 2a of that paper are consistent with our findings that ai and m_ci play a key role in the NH extratropics, with two_d_fsd_factor and dp_corr_strat also playing a role.
Feedback components
We show sensitivity analyses for the main parameters driving variations in the feedback components in Fig. 9. We start by considering the cloud feedbacks. Parameters such as ai, m_ci, qlmin (the minimum critical cloud condensate) and dp_corr_strat show clear impacts on the LW and SW CRE feedbacks, but their roles are diminished for the net CRE feedback, suggesting a cancellation of the LW and SW CRE effects. For m_ci and qlmin there is an almost total cancellation, whilst for ai and dp_corr_strat a residual impact remains on the net CRE feedback. The parameter with the strongest influence on net CRE feedback variations is ent_fac_dp (the deep entrainment amplitude). Whilst this does not have a strong influence on either the LW or SW CRE feedback components, its relative impact on the net CRE feedback is increased due to the strong cancellation for other parameters. Similar signatures are also seen for the parameter dbsdtbs_turb_0 (the cloud erosion rate).
The cancellation of global LW and SW CRE feedbacks is shown explicitly in the left-hand panel of Fig. 10. There is a clear anti-correlation between the components across the Wave-1 PPE (green crosses) resulting in a relatively reduced spread across the diagonal lines that represent the net CRE feedbacks. This reduced spread can also be seen in Fig. 1.
We highlight the role of the ai and m_ci parameters here using simple one-at-a-time (OAT) sensitivity tests. To do this we separately sample the emulated Wave-1 distributions of these two parameters at 7 equally spaced percentiles, from the 5th and the 95th percentile (inclusive). All other parameters are held to their standard (unperturbed) values. We then use emulators to predict feedback component values at these sampled points. We acknowledge that focusing only on ai and m_ci will not capture the full complexity introduced by other parameters noted above (e.g. qlmin, dp_corr_strat). However, these two parameters have the strongest influence on cloud feedbacks, and they capture the different cancellation behaviours highlighted above (as well as being key drivers of the Extratropical_LWcloudforcing cluster, shown in Sect. 3.5.1). As such they should provide a good illustration of the role of the driving parameters.
The results show a near perfect anti-correlation along lines of constant net CRE feedback for variations in m_ci (r = − 0.999, slope = − 0.882; red points), whilst for variations in ai (blue points) there is a small cross-diagonal component driving spread in the net CRE feedback. This is consistent with the cancelling parameter sensitivities discussed above.
We find that this cancellation is particularly strong in the tropics (middle panel of Fig. 10), where the clouds of points for the Wave-1 PPE and OAT tests are aligned with lines of constant net CRE feedback. Such a strong cancellation is indicative of high cloud feedbacks, which are known to have compensating LW and SW CRE changes in the tropics (Kiehl 1994). The key roles for m_ci and ai are also consistent with high cloud changes: as described in Sect. 3.5.1, these parameters drive changes in high cloud amounts and so will likely be important for responses in high cloud amounts.
A weaker level of cancellation is seen in the extratropics (right-hand panel of Fig. 10), with larger variations in SW CRE feedbacks relative to the LW CRE component. A possible reason for this is that the altitude of the freezing level is lower in the extratropics than in the tropics, and consequently the ‘high’ ice clouds form at a lower altitude. Here, they have less of an impact on LW CRE fluxes since the cloud-top temperatures are closer to those of the surface, but maintain a substantial impact on SW CRE fluxes through cloud albedo effects.
The OAT tests highlight that ai has a stronger influence in this region than m_ci. It is this sensitivity of the extratropical SW CRE feedback to ai that appears to drive the residual sensitivity seen in the global net CRE feedback. The reasons behind these regional variations in parameter sensitivities are likely to be complex, involving the details of how these parameters influence cloud height, albedo and amount, for both low and high cloud. A detailed examination of these processes is beyond the scope of this paper, so we will not explore it further here.
We note, however, that the results above are again consistent with the parameter sensitivities of zonal mean CRE responses given in Tsushima et al. (2020, submitted). Figure 2b, d in that paper show that the LW CRE and SW CRE responses are particularly sensitive to m_ci in the tropics and ai in the extratropics, which is consistent with our OAT sensitivity tests shown in the middle and right-hand panel of Fig. 10. For the net CRE response, the cancellation of the sensitivity to m_ci in the tropics is seen clearly in Fig. 1b of Tsushima et al. (2020, submitted), as is the residual impact of ai in the extratropics.
The middle-left panel of Fig. 9 shows that variations in net clear-sky feedbacks are dominated by ent_fac_dp, with around 60% of the variance being explained by this parameter. As discussed above, this parameter is also the most important for net CRE feedback variations. However, for the net feedback (top panel of Fig. 9), this parameter explains a smaller fraction of the total variance. This is due to an anti-correlation in its influence on the net clear-sky and net CRE feedbacks, as shown in Fig. 6. Other key parameters for both net CRE and net clear-sky feedbacks are also important for the net feedback, highlighting that it is influenced by a wide range of processes.
Emergent relationships and constraints
The emergent relationships and constraints shown in Fig. 5 (and discussed in Sect. 3.4.2) result from a shared dependence on the parameters influencing the lcf_nhext_ocean_Annual performance metric and the feedback components. It is important to note, however, that this does not necessarily imply a causal link: that is, the performance filtering constrains parameters, which may subsequently impact the feedback distributions, but these may not be driven by the same process.
Here, we analyse these shared dependences by focusing on the two key parameters highlighted above: ai and m_ci. We carried out equivalent OAT analyses to those described for Fig. 10 for all feedback components, as well as for lcf_nhext_ocean_Annual log(MSE) values. The latter were subsequently transformed to MSEs, to allow comparison with the actual Wave-1 PPE. We have additionally evaluated a joint sample of ai and m_ci, due to the similarities in their effects on the emergent relationships. The results of these tests are plotted as blue, red and purple points in Fig. 5.
Key roles for ai and m_ci on the emergent relationship for the LW and SW CRE feedbacks are clearly seen, with high m_ci and low ai values being associated with poorer performance (higher MSE values), weaker negative LW and weaker positive SW CRE feedbacks. The joint sample (purple dots) also suggest a role for ai and m_ci in the net CRE and net feedback relationships, but these are clearly affected by the cancellation effects described in Sect. 3.5.2.
As noted above, these associations do not necessarily imply causal links. The emergent relationship for the LW CRE feedback component provides an example of this: it is the performance of LW cloud forcing in the extratropics which constrains ai and m_ci but, as shown in Fig. 10, the influence of these parameters on the LW CRE feedbacks comes mainly from the tropics i.e. the key processes are not co-located. There may be more potential to explore a causal link for the net CRE feedbacks, since it is the influence of these parameters (particularly ai) in the extratropics which appears to be important for this component. However, given the potential complexities of the cloud processes in this region (as noted in Sect. 3.5.2), such an analysis is beyond the scope of this paper.
Other parameters highlighted in Sect. 3.5.2 may also play a role, but we note here the case of ent_fac_dp, for which we also carried out an OAT test (orange points in Fig. 5). Whilst several feedback components are influenced by this parameter (e.g. net CRE and net clear-sky), it does not drive variations in lcf_nhext_ocean_Annual MSE values and so has little impact on the constraints.