1 Introduction

Stratosphere–troposphere coupling, associated with the polar vortex variability, is a prominent dynamical process in the extratropics. In the Southern Hemisphere (SH) (Baldwin et al. 2003; Byrne et al. 2018; Hio and Yoden, 2005; Lim et al. 2018; Seviour et al. 2014; Thompson et al. 2005), such downward coupling occurs in the austral spring and significantly modulates the zonal mean circulation in the troposphere as evidenced in the southern annular mode (SAM). As such, it has been considered as one of the most important sources of tropospheric predictability on the subseasonal-to-seasonal (S2S) timescale. Seviour et al. (2014) and Byrne et al. (2019), for instance, showed that a significant prediction skill of the SAM re-appears in early October after a period of no prediction skill in their model simulations. This skill re-emergence has been attributed to the polar vortex variability and its downward influence (Seviour et al. 2014). The models which have participated in the S2S prediction project (http://s2sprediction.net) also show an enhanced prediction skill in early October, although the skill re-emergence is not always statistically significant (not shown).

It is well documented that stratospheric ozone (hereafter referred to as ozone unless otherwise specified) undergoes substantial interannual variation (e.g., Salby et al. 2011; Son et al. 2013). This ozone variation can affect the polar vortex and its downward coupling. Son et al. (2013) showed that September polar ozone anomaly is significantly correlated with October SAM on interannual time scale: The chances of hot and dry weather in southern Australia are increased when polar stratospheric ozone is anomalously high in the austral spring. However, their result does not necessarily indicate that ozone anomaly is a driving factor for the tropospheric circulation change, as the ozone anomaly itself is primarily determined by the polar vortex. In general, strong upward propagating waves act to weaken and warm the polar vortex in the late winter. The associated meridional circulation transports ozone from low latitudes to the pole, increasing ozone concentration over the Antarctic stratosphere in late winter to early spring (Lim et al. 2018; Salby et al. 2011; Shaw et al. 2011; Solomon, 1999). The increased ozone concentration then can further weaken the polar vortex through an increased shortwave radiative heating and possibly enhance the downward coupling. This vortex–ozone relationship is well observed during stratospheric sudden warming events. In 2002 and 2019 spring when the polar vortex broke down, total column ozone (TCO) has sharply increased (Hendon et al. 2020; Lim et al. 2019; Son et al. 2013).

Although the ozone radiative feedback could help strengthen the downward coupling, by modulating the polar vortex, its relative importance against the dynamic coupling has only recently been examined. Hendon et al. (2020) showed that prescribing the observed ozone during the 2002 SH sudden warming event—when ozone was anomalously plentiful over the polar cap in the SH—can facilitate the vortex weakening via radiative heating and enhance the surface response (negative SAM) in October in their model simulation. Yook et al. (2020) more generally showed that interactive ozone acts to increase stratospheric variability. Although they did not discuss its impact on tropospheric predictability, they showed that interactive ozone enhances the persistence of stratospheric variability, from which it can be inferred that it also acts to increase predictability of the stratosphere and possibly the troposphere. These results suggest that a realistic ozone could improve the S2S prediction in the SH in October–November when the downward coupling is prominent. However, most operational S2S prediction models use zonally and monthly averaged climatological ozone and hence ignore its year-to-year variation (Domeisen et al. 2020).

In this study, we evaluate the impact of year-to-year varying ozone in the S2S prediction using the Global Seasonal Forecasting System version 5 (GloSea5, MacLachlan et al. 2015). To extend and generalize Hendon et al. (2020), multi-year model experiments are conducted. Specifically, the two sets of reforecast experiments are carried out by prescribing either climatological ozone or year-to-year varying ozone for the period of 2004–2020. As shown below, the reforecasts with time-varying ozone show an improved prediction skill in October, compared to those with climatological ozone.

2 Methods/experimental

The GloSea5, which is the operational ensemble seasonal prediction system of the UK Met Office (MacLachlan et al. 2015), is used in this study. This model became operational in January 2014 as the joint seasonal prediction system of the UKMO and its partner the Korea Meteorological Administration. The GloSea5 is fully coupled with atmosphere–land–ocean–sea ice components. The horizontal resolution of the atmosphere is ~ 0.83 degrees of longitude and ~ 0.56 degrees of latitude. In the vertical, a total of 85 levels with the model top at 0.01 hPa are used. The ocean and sea ice are initialized with the Forecast Ocean Assimilation Model Ocean Analysis (Blockley et al. 2014). The atmospheric and land surface initial conditions are obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) Interim reanalysis data (ERA-Interim, Dee et al. 2011) for the period of 2004–2016. After 2017, the initial conditions are taken from the analysis of the KMA/UKMO numerical weather prediction data assimilation system. See MacLachlan et al. (2015) for the full details of the GloSea5.

A monthly zonally averaged ozone climatology is prescribed in the model. As a default, the Atmospheric Chemistry and Climate (AC&C)/Stratospheric Processes and their Role in Climate (SPARC) ozone for the period of 1994–2005 is used in the operational mode. In this study, the latest ozone data from the Stratospheric Water and OzOne Satellite Homogenized (SWOOSH) ozone at a horizontal resolution of 2.5° are used. These data combine the multiple satellite observations such as SAGE-II and III, HALOE, UARS MLS, and EOS Aura MLS observations (Davis et al. 2016). Since SWOOSH ozone is available only for 12 pressure levels from 261 to 1 hPa, it is combined with AC&C/SPARC ozone below 261 hPa and above 1 hPa.

All experiments are initialized on September 1st of each year for the period of 2004–2020 and integrated for 61 days. Although SWOOSH ozone is available since 1984, its spatial and temporal coverages are coarse until 2003. Hence, only the last 17 years (2004–2020) are considered in this study. Because of the limitation of computing resources, only 18 ensemble members are used. In all reforecasts, the monthly zonal mean SWOOSH ozone is interpolated to daily timescale in order to allow a smooth transition from one month to another. In the operational version of GloSea5, ozone is prescribed on the 360-day calendar. This could lead to radiative heating or cooling errors in the polar regions in the spring when the ozone concentration changes rapidly (Hendon et al. 2020). In this study, we prescribe ozone data based on the Gregorian calendar and update it every day.

Here, it should be emphasized that only the zonal mean ozone is considered in this study. It has been reported that the zonally asymmetric ozone distribution can affect the zonal mean circulation in both the stratosphere and troposphere (e.g., Rae et al. 2019). However, three-dimensional ozone distribution is not considered in this study as the zonally averaged ozone is prescribed in the operational GloSea5. More importantly, the ozone datasets used in this study, i.e., AC&C/SPARC and SWOOSH ozone, are available only for the zonal mean value.

The two sets of experiments are conducted with different stratospheric ozone concentrations. The reference run prescribes the climatological zonal mean SWOOSH ozone (COZ), while the sensitivity run uses the year-to-year varying zonal mean SWOOSH ozone (YOZ) for 17 years. Except for the interannual variation of stratospheric ozone above 261 hPa and below 1 hPa, all other configurations are identical between the two experiments. Note that COZ ozone is derived for the period of 2004–2018. This causes a subtle difference of YOZ ozone climatology from COZ ozone as the former includes the recent two years (i.e., 2019 and 2020).

The reference meteorological fields for model evaluation, such as geopotential height and surface air temperature, are obtained from the fifth generation of the ECMWF atmospheric reanalysis (ERA5, Hersbach and Dee, 2016; Copernicus Climate Change Service, 2017; https://climate.copernicus.eu). The extratropical circulation is quantified by the polar cap index (PCI). The PCI is defined as the geopotential height anomaly integrated south of 60°S at every level in the vertical. Since the PCI well corresponds to the SAM index not only in the troposphere but also in the stratosphere (Baldwin and Thompson 2009), it is a useful metric to diagnose polar cap stratospheric variability and its downward coupling to the troposphere.

The model prediction skill is quantified by computing the temporal anomaly correlation coefficient (ACC) between GloSea5 ensemble mean prediction and ERA5. The ACC is defined as follows:

$$ACC\left( \tau \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {f_{i,\tau } - \overline{{f_{\tau } }} } \right)\left( {O_{i.\tau } - \overline{{O_{\tau } }} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {f_{i,\tau } - \overline{{f_{\tau } }} } \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {O_{i,\tau } - \overline{{O_{\tau } }} } \right)^{2} } }}$$

where i is the year, n is the total number of years (n = 17), and τ is the forecast lead time from τ = 0 to τ = 61 days. The ensemble mean prediction and ERA5 are denoted with fi and Oi, respectively. Overbar indicates the time mean over 17 years. In general, ACC decreases rapidly with the forecast lead time. A statistically significant ACC is typically shorter than 2 weeks in the troposphere but over 20 days in the stratosphere (Mariotti et al. 2018; Son et al. 2020).

The significance of ACC is tested by a nonparametric bootstrap method (Goddard et al. 2013; Smith et al. 2013; Wilks, 2006). This method is widely used in the seasonal-to-decadal prediction when the sample size is limited. We randomly select 17 years from reanalysis and ensemble mean forecasts and then calculate ACC at each lead time. By repeating this process 1000 times by allowing overlapping selection, a probability distribution of ACC is constructed. The p value is defined as the ratio of negative value from bootstrap-generated 1000 ACCs on the basis of a one-tailed test of the hypothesis that ACC is greater than 0. If p value is smaller than 0.05, the prediction skill is determined to be statistically significant at the 95% confidence level.

To evaluate the statistical significance of the YOZ-COZ skill difference, the same approach is applied to ACC difference (YOZ minus COZ). The ratio of negative value from bootstrap-generated ACC differences serves as the p value (Goddard et al. 2013).

3 Results and discussion

The Antarctic springtime ozone anomaly is significantly linked to the tropospheric circulation anomaly on interannual timescale (Son et al. 2013). Figure 1 shows their connection in the last three decades. It depicts the time pressure evaluation of lagged correlation coefficients between September mean TCO and daily PCI of ERA5 at all levels as a function of calendar day for the period of 1991–2020. The detrended data are used. To focus on S2S timescale, ERA5 PCI is smoothed by applying a 14-day moving average. Here, we use ERA5 TCO instead of SWOOSH ozone in order to extend the analysis period. The ERA5 TCO is derived from the modified version of the ozone parameterization of Cariolle and Deque (1986) as delineated by Cariolle and Teyssedre (2007). Various satellite observations over different time periods, such as MIPAS, MLS, OMI, GOME, GOME-2, and SBUV, are assimilated with variational bias corrections (Hersbach et al. 2019, 2020). Although TCO represents the ozone in a column of air extending from the surface to the top of the atmosphere, it is dominated by ozone within the stratosphere.

Fig. 1
figure 1

a Time pressure distribution of lagged correlation coefficient between detrended September total column ozone anomaly and 14-day running averaged PCI of ERA5 during the period of 1991–2020. Statistically significant values are hatched at the 95% confidence level. b Same as (a) but for the PCI at 700 hPa. Open circles denote the values which are statistically significant at the 95% confidence level. c, d Same as (a, b) but for PCI predicted by a linear regression model based on 10-hPa PCI on 1st September

Figure 1a shows that September TCO is positively correlated with September PCI in the stratosphere. This relationship can be explained by the stratospheric circulation as introduced earlier. A positive PCI (or negative SAM index) is typically associated with a weak polar vortex which results from the upward wave propagation from the troposphere. The associated meridional circulation transports ozone from low latitudes to the pole, increasing polar stratospheric ozone.

Unlike in the stratosphere, a negative correlation appears in the troposphere in early to mid-September (Fig. 1a). Although not statistically significant, this vertical dipole, i.e., a positive correlation in the stratosphere and a negative correlation in the troposphere, could be partly explained by ozone radiative forcing in the stratosphere. Jucker and Goyal (2022) argued that the enhanced static stability at the mid- to high-latitudes in the lower stratosphere due to ozone shortwave heating could drive an equatorward wave deflection at the tropopause level. The resultant wave divergence in mid- to high latitudes then could lead to a thermally indirect circulation in the troposphere through the downward control (Haynes et al. 1991) and cause adiabatic cooling in the subpolar region. The net result is negative PCI at the high latitudes in the troposphere (see Fig. 4 of Jucker and Goyal, 2022).

The vertical dipole structure, which was referred to as “fast response” in Jucker and Goyal (2022), disappears in late September. A statistically significant positive correlation then emerges in the troposphere from weeks 5 to 8 (29th of September to 19th of October; Fig. 1a, b) while a positive correlation is maintained in the stratosphere (Fig. 1a). This skill re-emergence results from the downward coupling. It is evident from Fig. 1a that a positive correlation propagates downward in time from the upper stratosphere to the tropopause and becomes connected to the troposphere. Such a time-lagged downward connection, which was referred to as “slow response” in Jucker and Goyal (2022), represents a canonical stratosphere–troposphere dynamical coupling (e.g., Seviour et al., 2014; Saggioro and Shepherd, 2019).

Seviour et al. (2014) showed that a simple regression model utilizing the polar vortex variability can explain this skill re-emergence (see Fig. 7b of Seviour et al. 2014). When the similar regression model, based on 10-hPa PCI on 1st September, is applied to ERA5 data, the downward coupling is well captured at weeks 6 to 7 (Fig. 1c, d). However, compared to Fig. 1a, b, the vertical dipole in early and mid-September is weak and the time of skill re-emergence is delayed to mid-October. This result may indicate that the ozone radiative forcing leads to a stronger fast response and an earlier slow response.

It is worth to note that the SAM index exhibits the longest timescale in November (Baldwin et al. 2003; Gerber et al. 2012). This November peak has typically been explained by the stratospheric variability and its downward coupling. However, the downward coupling shown in Fig. 1 starts to appear in October. This result is consistent with Lim et al. (2019) who showed that the stratosphere–troposphere coupling in the SH is strongest in October (see their Fig. 1b, c). Lim et al. (2019) suggested that the timing of the downward coupling is determined by the polar vortex weakening. When the polar vortex substantially weakens earlier than normal during spring, the resultant wind anomaly tends to propagate downward to the surface in October and November although the mechanism of downward propagation still remains to be determined.

The downward coupling shown in Fig. 1 is well captured by the reforecast experiments. Figure 2a shows the ACCs of PCI in the COZ experiment. High ACCs, which are statistically significant at the 95% confidence level, are maintained at all lead times in the stratosphere, indicating persistent stratospheric anomalies. However, the ACCs in the troposphere rapidly decrease after week 3. This result is consistent with Son et al. (2020) who reported that the stratospheric prediction skill is much higher than the tropospheric skill in austral spring.

Fig. 2
figure 2

Anomaly correlation coefficients (ACCs) of 14-day running averaged PCI as a function of forecast lead time in a COZ and b YOZ experiments for the period of 2004–2020. The values which are statistically significant at the 95% confidence level are dotted. c ACCs of 700-hPa PCI in COZ experiment in black and those in YOZ experiment in red. Open circles denote the values which are statistically significant at the 95% confidence level. The forecast time when the ACC difference (YOZ minus COZ) is statistically significant at the 95% confidence level is shaded in pink. Note that October 6 in x-axis refers to the ACC at the forecast week 6–7

In the troposphere, a significant prediction skill re-appears later in COZ experiment (i.e., 6 weeks after model initialization). This skill re-emergence in mid-October is not sensitive to the ensemble size and the choice of the variable. The same result is obtained when only half ensemble members are randomly selected. Although not shown, similar results are robustly found in other S2S prediction models which differ in the horizontal and vertical resolutions, ensemble size, and reforecast period (see also Seviour et al. 2014; Byrne et al. 2019). Here, we note that week 6 is not necessarily the optimal lead time of the tropospheric skill re-emergence. It could slightly change for 1 week when the initialization date is varied. For instance, the skill re-emergence appears in late September in the S2S prediction models initialized in late August or early September (not shown).

Although Fig. 2a clearly shows a downward influence in prediction skill, it does not delineate the role of ozone radiative forcing. To quantify the ozone impact on S2S prediction, we compare the COZ experiment to the YOZ experiment. Figure 2b shows the ACCs of the YOZ experiment. As described in the Method section, this experiment is identical to the COZ experiment except in how stratospheric ozone is specified; that is, the only difference is that stratospheric ozone is prescribed with year-to-year varying observation. The overall prediction skill does not change much in the first few weeks. However, a notable difference appears in late September and October when the downward influence is pronounced in the COZ experiment (Fig. 2a). Most importantly, the tropospheric prediction skill of the YOZ experiment is higher than that of the COZ experiment during the whole forecast lead time (Fig. 2c). The ACC difference between YOZ and COZ experiments which is statistically significant at the 95% confidence level appears at week 1–2 and week 6–7 (pink shading). It is noteworthy from Fig. 2a that the lowest prediction skill in the troposphere is found in mid- to late September when the vertical dipole disappears or when the fast response is switched into the slow response (Fig. 1a). Although statistically insignificant, the lowest prediction skill is also improved in the YOZ experiment presumably due to a more realistic ozone downward coupling.

This result indicates that a more realistic ozone helps to improve tropospheric prediction in October. Lim et al. (2018) indicated that the stratosphere–troposphere coupling mode in the SH is highly correlated with ozone concentration in spring and could affect the tropospheric predictability in October. By integrating a model similar to the one used in this study, Hendon et al. (2020) showed that SH circulation in October is better captured when a realistic ozone is prescribed in the model during the 2002 stratospheric sudden warming event. In this regard, our result confirms the finding of Lim et al. (2018) and generalizes the case study of Hendon et al. (2020).

The skill difference between COZ and YOZ experiments at week 6–7 is further quantified in Fig. 3a, b for 700-hPa PCI. The root-mean-squared error (RMSE) is significantly reduced from COZ to YOZ experiments, while the ACC is significantly increased. The ACC difference and its sensitivity to the ensemble size or number of validation years are further tested in Fig. 3c, d by conducting a nonparametric bootstrap resampling method. The sensitivity to the ensemble size (M) is first tested by randomly selecting M ensemble members when computing the ensemble mean prediction. This resampling is conducted 1,000 times, allowing multiple counts. Their average is then considered as a skill score for a given ensemble size M. It turns out that the YOZ experiment shows a higher ACC than the COZ experiment for all cases from M = 1 to M = 18 (Fig. 3c). Its difference from the COZ experiment becomes statistically significant if the ensemble size is equal to or greater than 10. Note from Fig. 3c that the ACC difference for M = 18 is slightly smaller than the practical ACC difference (compare lines and squares) because a bootstrapping allows multiple counts.

Fig. 3
figure 3

a Root-mean-squared error (RMSE) (m) and b ACC of 700-hPa PCI at the forecast week 6–7 in COZ and YOZ experiments. c, d ACCs of week 6–7 PCI700 as a function of ensemble size and number of validation years. Each line indicates the average value of bootstrap-generated 1000 ACCs. Square indicates the practical ACC shown in (b). Bars represent p values for the ACC difference (YOZ minus COZ). The ACC difference is statistically significant at the 95% confidence level when the p value is less than 0.05, which is denoted by red color

A similar test is also conducted for a number of validation years. The number of validation years, N, is randomly selected. This process is also repeated 1,000 times and their average is considered as a skill score for a given number of validation years. Figure 3d shows that the YOZ-COZ skill difference is robust for N = 8 to N = 17. It becomes statistically significant when the number of validation years is equal to or greater than 14. These results suggest that the ACC difference between COZ and YOZ experiments shown in Fig. 2c is robust and not caused by chance.

An improved prediction skill in YOZ experiment is also evident at the surface for the regions that exhibit a strong impact of polar vortex variability (e.g., Lim et al. 2019). Figure 4 illustrates the spatial distribution of RMSEs and ACCs at week 6–7 for maximum surface air temperature over Australia. Due to SAM-related surface climate variability, a reliable prediction skill appears in some regions in the COZ experiment (Fig. 4a), consistent with the SAM-related surface air temperature variability (e.g., Hendon et al. 2007). The prediction skill is enhanced in most regions when the year-to-year varying ozone is prescribed (Fig. 4b). A large error reduction is especially found in eastern and southern Australia, while skill improvement is negligible in western and northern Australia.

Fig. 4
figure 4

RMSE and ACC of week 6–7 maximum surface air temperature (T2Max) in a COZ and b YOZ experiments. In the right column, the values which are statistically significant at the 95% confidence level are dotted

4 Conclusions

It has been suggested that S2S prediction could be improved if the stratospheric state is represented more realistically (Seviour et al. 2014; Domeisen et al. 2020; Hendon et al. 2007). However, not many modeling studies have explored the impact of stratospheric conditions on the SH S2S prediction. In particular, the possible impact of stratospheric ozone has been rarely reported in the literature. Hendon et al. (2020) recently showed that stratospheric ozone can affect the surface prediction skill for the 2002 stratospheric sudden warming event, the first major sudden warming event in the SH. However, their results remain to be generalized with long-term reforecasts.

To assess the impact of stratospheric ozone on the S2S prediction, we performed two sets of GloSea5 reforecasts in which the stratospheric ozone concentration is prescribed with the long-term climatology (COZ) or year-to-year varying observation (YOZ). While the skill difference between the two experiments is relatively minor in the stratosphere, it is significant in the troposphere. Most importantly, the YOZ experiment outperforms the COZ experiment at all forecast lead times in October. Such an improvement is also evident at the surface over Australia.

Our result confirms that the ozone radiative forcing plays a critical role in the S2S prediction in the austral spring, generalizing Hendon et al. (2020). This finding suggests that more realistic ozone is critical not only for the long-term climate simulation (e.g., Haase et al. 2020; Ivanciu et al. 2021; Li et al. 2016) but also for the operational S2S prediction. This is particularly true in the austral spring when the ozone radiative forcing is important.

Here, it should be stated that the ozone radiative feedback is not fully included in this study. The stratospheric ozone anomalies are prescribed as an external forcing to explore their thermodynamical and dynamical effects. The interplay between ozone and circulation is not taken into account. To evaluate the role of interactive ozone in the S2S prediction, further studies are required.

What determines the timing of the downward coupling (e.g., October) is a question yet to be answered, although observational data and model simulations show about a month lag (e.g., Seviour et al. 2014; Shaw et al. 2011). The related physical process is presumably similar to the long-term tropospheric response to the ozone depletion (e.g., Son et al. 2018) which is not yet fully understood. Several mechanisms have been proposed, which include downward control, eddy–mean flow interaction related to planetary and synoptic scale waves, eddy–eddy interactions, or combination of them (e.g., Hitchcock and Haynes, 2016; Orr et al. 2012; Yang et al. 2015; Martineau and Son, 2015). This topic needs more comprehensive analyses beyond ozone sensitivity tests.