## Abstract

Decadal prediction results are analyzed for the predictability and skill of annual mean temperature. Forecast skill is assessed in terms of correlation, mean square error (MSE) and mean square skill score. The predictability of the forecast system is assessed by calculating the corresponding “potential” skill measures based on the behaviour of the forecast ensemble. The expectation is that potential skill, where the model predicts its own evolution, will be greater than the actual skill, where the model predicts the evolution of the real system, and that the difference is an indication of the potential for forecast improvement. This will depend, however, on the agreement of the second order climate statistics of the forecasts with those of the climate system. In this study the forecast variance differs from the variance of the verifying observations over non-trivial parts of the globe. Observation-based values of variance from different sources also differ non-trivially. This is an area of difficulty independent of the forecasting system and also affects the comparison of actual and potential mean square error. It is possible to scale the forecast variance estimate to match that of the verifying data so as to avoid this consequence but a variance mismatch, whatever its source, remains a difficulty when considering forecast system improvements. Maps of actual and potential correlation indicate that over most of the globe potential correlation is greater than actual correlation, as expected, with the difference suggesting, but not demonstrating, that it might be possible to improve skill. There are exceptions, mainly over some land areas in the Northern Hemisphere and at later forecast ranges, where actual correlation can exceed potential correlation, and this behaviour is ascribed to excessive noise variance in the forecasts, at least as compared to the verifying data. Sampling error can also play a role, but significance testing suggests it is not sufficient to explain the results. Similar results are obtained for MSE but only after scaling the forecasts to match the variance of the verifying observations. It is immediately clear that the forecast system is deficient, independent of other considerations, if the actual correlation is greater than the potential correlation and/or the actual MSE is less than the potential MSE and this gives some indication of the nature of the deficiency in the forecasts in these regions. The predictable and noise components of an ensemble of forecasts can be estimated but this is not the case for the actual system. The degree to which the difference between actual and potential skill indicates the potential for improvement of the forecasting can only be judged indirectly. At a minimum the variances of the forecasts and of the verifying data should be in reasonable accord. If the potential skill is greater than the actual skill for a forecasting system based on a well behaved model it suggests, as a working hypothesis, that forecast skill can be improved so as to more closely approach potential skill.

This is a preview of subscription content, access via your institution.

## References

Baker LH, Shaffrey LC, Sutton RT, Weisheimer A, Scaife AA (2018) An intercomparison of skill and over/underconfidence of the wintertime North Atlantic Oscillation in multi-model seasonal forecasts. Res Letts Geophys. https://doi.org/10.1029/2018GL078838

Boer GJ, Merryfield WJ, Kharin VV (2018) Relationships between potential, attainable, and actual skill in a decadal prediction experiment. Clim Dyn. https://doi.org/10.1007/s00382-018-4417-7

Boer GJ, Kharin VV, Merryfield WJ (2013) Decadal predictability and forecast skill. Clim Dyn 41:1817–1833. https://doi.org/10.1007/s00382-013-1705-0

Boer GJ (2009) Climate trends in a seasonal forecasting system. Atmos-Ocean 47:123–138. https://doi.org/10.3137/AO1002.2009

Boer GJ, Lambert SJ (2008) Multi-model decadal potential predictability of precipitation and temperature. Geophys Res Lett. https://doi.org/10.1029/2008GL033234

Boer GJ (2004) Long time-scale potential predictability in an ensemble of coupled climate models. Clim Dyn 23:29–44

Boer GJ, Lambert SJ (2001) Second order space-time climate difference statistics. Clim Dyn 17:213–218

Dee DP (2011) The ERA-interim reanalysis: configuration and performance of the data assimilation system. Q J R Meteorol Soc 137:553–597

Deque M (2003) Continuous Variables. In: Jolliffe IT, Stephenson DB (eds) 2003: Forecast verification: a practitioner’s guide in atmospheric science. Wiley, Chichester, p 240

Dunstone NJ, Smith DM, Scaife AA, Hermanson L, Eade R, Robinson N, Andrews M, Knight J (2016) Skilful predictions of the winter North Atlantic Oscillation one year ahead. Nat Geosci. https://doi.org/10.1038/NGEO2824

Eade R, Smith D, Scaife A, Wallace E, Dunstone N, Hermanson L, Robinson N (2014) Do seasonal-to-decadal climate predictions under-estimate the predictability of the real world? Geophys Res Lett. https://doi.org/10.1002/2014GL061146

Hansen J, Ruedy R, Sato M, Lo K (2010) Global surface temperature change. Rev Geophys 48:RG4004. https://doi.org/10.1029/2010RG000345

Jin Y, Rong X, Liu Z (2017) Potential predictability and forecast skill in ensemble climate forecast: a skill-persistence rule. Clim Dyn. https://doi.org/10.1007/s00382-017-4040-z

Kumar A, Peng P, Chen M (2014) Is there a relationship between potential and actual skill? Mon Weather Rev 142:2220–2227. https://doi.org/10.1175/MWR-D-13-00287.1

Pitman JG (1939) A note on normal correlation. Biometrika 31:9–12

Pohlmann H, Botzet M, Latif M, Roesch A, Wild M, Tschuck P (2004) Estimating the Decadal Predictability of a Coupled AOGCM. J Clim 17:4463–4472. https://doi.org/10.1175/3209.1

Scaife AA (2014) Skillful long- range prediction of European and North American winters. Geophys Res Lett 41:2514–2519. https://doi.org/10.1002/2014GL059637

Scaife AA, Smith DM (2018) A signal-to-noise paradox in climate science. Nat Clim Atmos Sci. https://doi.org/10.1038/s41612-018-0038-4

Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res 106:7183–7192

Merryfield WJ et al (2013) The Canadian seasonal to interannual prediction system. Part I: Models and initialization. Mon Weather Rev 141:2910–2945. https://doi.org/10.1175/MWR-D-12-00216.1

Stockdale TN, Molteni F, Ferranti L (2015) Atmospheric initial conditions and the predictability of the Arctic Oscillation. Geophys Res Lett 42:1173–1179. https://doi.org/10.1002/2014GL062681

Uppala SM (2005) The ERA-40 re-analysis. Q J R Meteorol Soc 131:2961–3012

## Acknowledgements

We acknowledge the important contributions of many members of the CCCma team in developing the model and the forecasting system Woo-Sung Lee for her contribution in producing the forecasts.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix

### Appendix

Kumar et al. (2014), hereinafter KEA, consider the relationship between potential and actual skill for DJF seasonal predictions made with two forecasting systems. It may be worth considering their results in terms of the formalism developed here. The approaches are similar in some ways although notation differs and several explicit and implicit assumptions are made in KEA which strongly conditions their conclusions.

In the Appendix of KEA two assumptions are introduced which considerably alter the relationships in Table 1. In the notation used here, the assumptions are that the predictable component of the forecast \(\psi\) is proportional to that of the observations \(\chi\), that the forecast and observation-based variances are the same and, implicitly, that *m* is large. That is

where \(\alpha\) is a constant. These assumptions imply that

where the broad arrows indicate the result of imposing (8). KEA also consider lag 1 autocorrelations which, following the same reasoning as for the statistics in Table 1, give

with \({\mathcal {A}}_{\psi }\Rightarrow {\mathcal {A}}_{\chi }\) for the lag 1 autocorrelations of the predictable components \(\psi\) and \(\chi\). We note in passing that equations (1) and (2) in KEA have misprints.

KEA compare *r* and \(\rho\) for DJF forecasts from two different forecast models (their Figs. 1 and 2), differences in \(\rho\) and \(A_{Y}\) between the two models (Figs. 3 and 4), and the RMSE \(\sqrt{e}\) and \(\sqrt{\xi }\) for the models (Fig. 5). From Table 1

where the broad arrow is again the result of imposing assumptions (8). If assumptions (8) were to hold then *r* would be proportional to \(\rho\) and scatter plots of local values of *r* against \(\rho\), as in KEA Fig. 2, would fall on the diagonal. However, in the general case all of *p*, *q* and \(R_{\chi \psi }\) vary with location over the globe as well as with forecast range so that

would not be expected to be constant and off-diagonal points in Fig. 2 would be expected. It is certainly the case, as noted by KEA and discussed in detail above, that for local values of \(r>\rho\), i.e. for points below the diagonal in Fig. 2, \(\rho\) is clearly an underestimate of potentially available skill. However, the obverse is not the case, and \(\rho>r\) does not imply that \(\rho\) is an overestimate of potentially available skill.

The lag 1 autocorrelation can be represented in several ways in the notation use here

but from Table 1 the simplest relationship between \(A_{Y}\) and \(\rho\) is

which is independent of (8) (presuming large *m*). Taking differentials to represent differences between the results from two different models, as plotted in KEA Figs. 3 and 4, gives

so that proportionally a larger \(A_{Y}\) in one model compared to another, contributes proportionally, but not linearly, to larger \(\rho\) while the reverse is true for the difference in \({\mathcal {A}}_{\psi }\). The general expectation is that larger \(\delta A_{Y}\) will be associated with larger \(\delta \rho\), as broadly seen in KEA Fig. 4. However, this may be offset locally by differences in \(\delta {\mathcal {A}}_{\psi }.\) If (8) were strictly the case, including the constancy of \(\alpha\), then (12) would become

which would be zero provided that the same verification data *X* were used in each case.

From Table 1, differences in actual and potential MSE are

and, as noted elsewhere, \(e>\xi\) in (13) does not necessarily mean that \(\sigma _{y}^{2}<\sigma _{x}^{2}\) and with assumptions (8) this would not be the case unless \(\alpha =1\) everywhere.

These more general relationships offer interpretations of the KEA’s results which differ from theirs to a greater or lesser extent. It is certainly the case that potential skill and other second order statistics must be interpreted with care as discussed in KEA and at considerable length in Boer et al. (2018). The hope is that investigations of these kinds can provide new information on forecast system deficiencies and remedies.

## Rights and permissions

## About this article

### Cite this article

Boer, G.J., Kharin, V.V. & Merryfield, W.J. Differences in potential and actual skill in a decadal prediction experiment.
*Clim Dyn* **52**, 6619–6631 (2019). https://doi.org/10.1007/s00382-018-4533-4

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s00382-018-4533-4

### Keywords

- Decadal prediction
- Predictability
- Skill
- Potential skill