1 Introduction

Thompson (1957) and Lorenz (1963) first proposed the predictability limit of deterministic systems, and since then the predictability of the atmosphere has been extensively studied using theoretical, numerical, and statistical models (Chou 1989; Simmons and Hollingsworth 2002). In particular, the study of error growth and propagation has become an important part of predictability theory (Dalcher and Kalnay 1987; Farrell 1990; Palmer 2006; Li and Wang 2008). The ocean and atmosphere, as nonlinear dynamical systems, have complex deterministic features that are sensitive to initial values and boundary conditions. It has long been recognized that the upper limit of weather predictability for the synoptic and larger scales is about 2 weeks (Lorenz 1965).

In previous studies, Lorenz (1969) sought to calculate the predictability limit by using the doubling time of an initial infinitesimally small error in the model, but more recent research has shown that such a method has a dependence on the numerical model used (Dalcher and Kalnay 1987; Li et al. 2000). The Lyapunov exponent is a physical quantity to measure the long-term average exponential divergence or convergence of initial nearby orbits in phase space, which can estimate the mean growth rate of the initial infinitesimal error (Oseledec 1968). The nonlinear local Lyapunov exponent (NLLE; Ding and Li 2007; Li et al. 2008; Li and Wang 2008; Li and Ding 2011) was introduced to quantitatively measure the predictability limits of nonlinear systems. Using reanalysis data, local dynamical analogs (LDA) are applied to study the predictability limits of the geopotential height, vector wind field, and sea surface temperature. (SST; Li and Ding 2011, 2013, 2015). However, the NLLE method cannot overcome the uncertainties associated with numerical prediction models. According to the definition of atmospheric attractors, Li et al. (2018) proposed two invariant statistical quantities, the attractor radius (AR) and the global attractor radius (GAR), to define the geometric characteristics, average behavior of a chaotic system and its error growth. Moreover, GAR measures the average distance between two randomly selected states on an attractor, while AR quantifies the average distance of all states on the attractor from the average state. Both the AR and GAR are intrinsic properties of a chaotic system and independent of the forecast model and model errors, and thus provide more accurate, objective threshold to assess the global predictability limits of forecast models. Feng et al. (2019) used the AR and GAR to study the predictability limits of deterministic and ensemble mean forecasts, focusing on quantifying the relationship between forecast errors, especially in individual prediction cases.

The ocean and atmosphere are coupled dynamically and thermodynamically by the exchange of fluxes, including heat, freshwater and momentum fluxes, at the air-sea interface (Webster and Lukas 1992; Saenko et al. 2002). An atmospheric general circulation model (AGCM) simulates the atmosphere using a specified SST, while a coupled general circulation model (CGCM) couples an AGCM and an oceanic general circulation model (OGCM). Some pioneering works (Nese and Dutton 1993; Nese et al. 1996) recognized that the average predictability is significantly improved when an ocean circulation is coupled to the atmosphere. There is large literature discussing predictability of the atmosphere induced by boundary conditions, especially in the tropics (Charney and Shukla 1981; Shukla 1998; Palmer 1994; Reichler and Roads 2003; Bach et al. 2019). Boundary conditions can provide long-term predictability in the tropics, because baroclinic instability is less significant there (Charney and Shukla 1981). Shukla (1998) found that the flow and rainfall of the tropical atmosphere are more deeply influenced by SST than the extratropical atmosphere. Bach et al. (2019) used SST and low-level atmospheric variables to calculate the detailed spatial structure of ocean-to-atmosphere predictability. The ocean improves the atmosphere predictability most significantly in the tropics.

Moving from an AGCM to a CGCM, the initial-value predictability problem becomes a joint initial-value and boundary-value problem, so evaluating the effect of boundary conditions on the predictability limit of the atmosphere remains a crucial issue. This study attempts to examine quantitatively the practical predictability limits and decorrelation limits of the atmosphere in a CGCM system using the AR and GAR, compared with an AGCM.

The remainder of this paper is organized as follows. Section 2 describes the methodology, including simple conceptual coupled models and AR methods, as well as the CGCM and AGCM reanalysis and prediction data used in this study. Section 3 first systematically examines the predictability of the atmospheric and oceanic variables in simple conceptual coupled models using AR methods, and then derives the theory of the practical predictability limit of the ocean–atmosphere coupled system. Section 4 analyzes the global practical predictability limits of both the coupled Climate Forecast System (CFS) and the uncoupled Global Ensemble Forecast System (GEFS), illustrating the influence of low-frequency signals in a coupled system on the predictability limit. Finally, a summary and discussion are given in Sect. 5.

2 Model, methodology, and data

2.1 Conceptual coupled model

Lorenz (1963) proposed a simple model with only three variables to represent the chaotic characteristics of the atmosphere. The uncoupled Lorenz63 model cannot reflect the coupling processes between the atmosphere and the ocean. Zhang et al. (2011) added a slowly changing variable w, which is coupled with the Lorenz 3-variable (x, y, and z are the high frequency variables of atmosphere) chaotic model to simulate the interaction of the fast system with the slow upper ocean. We call this the 4-variable coupled model (4VCM):

$$ \begin{array}{l} {{\dot{\text{x}}} = - \sigma {\text{x}} + \sigma {\text{y}}} \\ {\dot{y} = - xz + (1 + c_{1} w)\kappa x - y} \\ {\dot{z} = xy - bz} \\ {{\text{O}}_{m} \dot{w} = c_{2} {\text{y}} - O_{d} w + {\text{S}}({\text{t}})} \\ \end{array} $$
(1)

At the same time, Zhang (2011) derived a conceptual pycnocline prediction model, which added the deep ocean pycnocline (η) on the 4VCM. We call this a 5-variable coupled model (5VCM):

$$ \begin{array}{l} {{\dot{\text{x}}} = - \sigma x + \sigma y} \\ {\dot{y} = - xz + (1 + c_{1} w)\kappa x - y} \\ {\dot{z} = xy - bz} \\ {O_{m} \dot{w} = c_{2} y + c_{3} \eta + c_{4} w\eta - O_{d} w + {\text{S}}_{{\text{m}}} + S_{s} \cos (2\pi t/S_{pd} )} \\ {\Gamma \dot{\eta } = c_{5} w + c_{6} w\eta - O_{d} \eta } \\ \end{array} $$
(2)

where setting the external forcing as Sm + Ss cos (2πt/Spd) simulates the constant and seasonal forcing for the “climate” system. The ``ocean'' can get the energy from the external forcing when the linear damping -Odw dissipate energy continuously. The parameters Spd, Sm and Ss define the period of seasonal cycle, the magnitudes of the annual mean and seasonal cycle of the forcing respectively Zhang (2011). described the values of the model parameters in detail. Using a fourth-order Runge–Kutta time stepping scheme with \(\Delta {\text{t = }}0.01\) dimensionless time units (TU), where 1 TU is the time taken for the model to pass through an attractive lobe, the model is specified with parameter values: \((\sigma , \, \kappa , \, b, \, c_{1} , \, c_{2} , \, O_{m} , \, O_{d} , \, S_{m} , \, S_{s} , \, S_{pd} , \, \Gamma , \, c_{3} , \, c_{4} , \, c_{5} , \, c_{6} ) = (9.95, \, 28, \, 8/3,10^{{{ - }1}} , \, 1, \, 10, \, 1, \, 10, \, 1, \, 10, \, 100, \, 10^{ - 2} , \, 10^{ - 2} , \, 1, \, 10^{ - 3} )\) . The model is spun up for an initial 104 TU starting from \(({\text{x}}, \, y, \, z, \, w, \, \eta ) = (0, \, 1, \, 0, \, 0, \, 0)\).

2.2 Attractor radius and global attractor radius

Li et al. (2018) proposed an attractor theory to measure the predictability of nonlinear dynamical systems. Three statistical quantities are used to describe the attractor: Attractor radius (AR), Global attractor radius (GAR) and Global average distance (GAD). First, consider x to be the state column vector on a compact attractor, the average root-mean-square distance between any point in the attractor and the center of the attractor is defined as the attractor radius (AR, RE) of a compact attractor Θ1:

$$ {\text{R}}_{{\text{E}}} = \sqrt {E(\left\| {x - E(x)} \right\|^{2} )} \, ,{\text{ x}} \in \Theta_{1} , $$
(3)

where E is the expectation of the time series and ‖·‖ is the L2-norm of a vector.

Second, the global attractor radius (GAR, RG) represents the average root-mean-square distance between any two points in the attractor:

$$ {\text{R}}_{{\text{G}}} = \sqrt {E(R_{L}^{2} )} = \sqrt {E(\left\| {x - {\text{y}}} \right\|^{2} )} , {\text{x, y}} \in \Theta_{1} $$
(4)

where x and y are two randomly selected state vectors from Θ1. There is a constant proportional relationship between \({\text{R}}_{{\text{G}}}\) and \({\text{R}}_{{\text{E}}}\) of a compact attractor Θ1: \({\text{R}}_{{\text{G}}} { = }\sqrt 2 R_{E}\). It is also consistent with the conclusions in Kalnay (2002). Third, the global average distance (GAD) between two attractors can be calculated from the AR:

$$ {\text{R}}_{{\text{G}}} (\Theta_{1} ,\Theta_{2} ) = \sqrt {E(\left\| {x - y} \right\|^{2} )} = \sqrt {R_{E}^{2} (\Theta_{1} ){ + }R_{E}^{2} (\Theta_{2} ){\text{ + d}}^{2} (x_{E} ,y_{E} )} ,{\text{x}} \in \Theta_{1} ,y \in \Theta_{2} $$
(5)

where xE (yE) is referred to as the center of the compact attractor Θ1 (Θ2). And d (xE, yE) is defined as the distance between the mean states of Θ1 and Θ2.

While the detailed and exhaustive mathematical derivations can be referred to the aforementioned literature (Li et al. 2018), thus we mainly comment on the computational implementation. For a real n-dimensional nonlinear dynamical system,

$$ {{{\text{d}}x(t)} \mathord{\left/ {\vphantom {{{\text{d}}x(t)} {{\text{d}}t}}} \right. \kern-\nulldelimiterspace} {{\text{d}}t}} = {\text{F}}(x(t)) $$
(6)

where x and F are n-dimensional nonlinear column vectors. Since the forecast model error can’t be ignored, the dynamical system is redefined as \({{{\text{d}}x(t)} \mathord{\left/ {\vphantom {{{\text{d}}x(t)} {{\text{d}}t}}} \right. \kern-\nulldelimiterspace} {{\text{d}}t}} = {\tilde{\text{F}}}(x(t))\). Let \(\Theta\) and \(\Theta_{{\text{M}}}\) correspond to the attractor of the real system and its approximate forecast model. Typically, they are different and thus their predictabilities are also different. x(t)and y(t) are fiducial orbits on Θ and ΘM, respectively. For an imperfect model, there are both initial error and model error in forecasting. And the initial value \({\text{y}}_{0} = x_{0} + \delta_{0}\), where \(\delta_{0}\) is a small initial error. Then, we can measure the RMSE between \(x(t)\) and \({\text{y}}(t)\),

$$ {\text{e}}_{{\text{M}}} (x_{0} ,\delta_{0} ,t) = \parallel \delta_{{\text{M}}} (t)\parallel = \parallel y(t) - x(t)\parallel ,{\text{x}} \in \Theta ,y \in \Theta_{M} $$
(7)

For global predictability, the global ensemble average of error growth can be defined as.

$$ {\overline{\text{e}}}_{{\text{M}}} (\delta_{0} ,t) = \sqrt {\left\langle {e_{{\text{M}}}^{2} (x_{0} ,\delta_{0} ,t)} \right\rangle_{N \to \infty } } { = }\sqrt {\int_{\Theta } {e_{M}^{2} (x(t_{0} ),\delta (t_{0} ),t)dx} } $$
(8)

where \(\left\langle {} \right\rangle_{{{\text{N}} \to \infty }}\) denotes the ensemble average of all samples of sufficiently large number N. For the chaotic system, as the integral step increases, \({\overline{\text{e}}}_{{\text{M}}} (\delta_{0} ,t)\) will converge to the error saturation.

According to the definition of GAD in Eq. (5), it’s the expectation of the RMS distance between two different attractors. Hence, the global ensemble RMSE between the real attractor Θ and the model attractor ΘM tends to the GAD with time. However, the GAD cannot directly be used as an objective threshold to determine the predictability limit, because the forecast model has associated model errors. Following the definition of GAD, to avoid the effect of the model errors, let \({\text{R}}_{{\text{e}}} = \min (R_{E} (\Theta_{M} ),R_{E} (\Theta ))\), and \({\text{R}}_{{\text{g}}} = \min (R_{G} (\Theta_{M} ),R_{G} (\Theta ))\), then the global practical predictability limits (\({\text{T}}_{{{\text{pr}}}}^{M}\)) and decorrelation limits (\({\text{T}}_{{{\text{de}}}}^{M}\)) of the forecast model are determined by.

$$ \overline{e}_{{\text{M}}} (\delta_{0} ,t)|_{{t < T_{{{\text{pr}}}}^{M} }} < R_{e} \,{\text{and}}\,\overline{e}_{{\text{M}}} (\delta_{0} ,t)|_{{t{ = }T_{{{\text{pr}}}}^{M} }} { = }R_{e} $$
(9)
$$ \overline{e}_{{\text{M}}} (\delta_{0} ,t)|_{{t < T_{{{\text{de}}}}^{M} }} < R_{g} \,and\,\overline{e}_{{\text{M}}} (\delta_{0} ,t)|_{{t{ = }T_{{{\text{de}}}}^{M} }} { = }R_{g} $$
(10)

where \({\text{T}}_{{{\text{pr}}}}^{M}\) is the time at which the error reaches the saturation value Re, and \({\text{T}}_{{{\text{de}}}}^{M}\) is the time at which the error reaches the saturation value Rg.

The global ensemble average of error growth (Eq. (8)) tends to a saturation value, which depend on model error. In most studies, the predictability limit has been defined as the time when the forecast error exceeds 95% of the saturation value. This means that the saturation value of forecast error depends on the used numerical model, and thus is not an objective threshold to quantify and compare the predictability limits between different models. In contrast, the AR and GAR is calculated using the ‘perfect’ model as an objective threshold, which only depend on the real attractor described by the ‘perfect’ model and are not affected by model drift. Thus, the AR and GAR are more suitable metrics to measure the predictability limit of a forecast model.

When the forecasting error has surpassed the AR but has not yet reached the GAR, the single model has lost its forecasting skill compared with the climate mean states. Leith (1974) suggested that the error variance of a single deterministic forecast is twice that of a climate prediction in the long-range forecast. The anomaly field is used to calulate a final best estimate of the state of the atmosphere using a linear regression of dynamic forecasting. Hence, the range of prediction error between AR and GAR is still related to predictability. Meanwhile, Feng et al. (2019) pointed out that the forecast error asymptotes to a saturation value of the GAR in a single forecast, while the error saturation level will be the AR in an ensemble forecast. This result is consistent with the conclusions in Leith (1974) and Kalnay (2002). In this work, we only use the single forecast data to study the deterministic forecast.

2.3 Data for the atmosphere-only and coupled model forecasts

In addition to the three simple models, the 45-day reforecasts of the Climate Forecast System (CFS) and the 35-day reforecasts of the Global Ensemble Forecast System (GEFS) are used to study the predictability limit of earth system models. This makes it possible to extend the results from simple conceptual models to operational forecasting models. Details of the CFS and GEFS models are given in Table 1.

Table 1 Some details about CFS and GEFS

The GEFS and CFS data are the medium- and long-range meteorological forecasts produced by NCEP, respectively. CFS is produced by a fully coupled model that represents the interaction between the ocean and atmosphere, while the GEFS is produced by an uncoupled model. We use a common period (1999–2010) of the CFS and GEFS reforecast datasets (note that only the control forecast of the GEFS system is used). The prediction results are verified by using the 00Z initial conditions of NCEP Climate Forecast system reanalysis (CFSR) from Jan 1999 to Dec 2010. When we calculated the AR and GAR using the CFSR, the annual cycle of the time series has been removed.

3 Results of the conceptual coupled model

3.1 Uncertainties in coupled vs. uncoupled systems

AR being the expected RMSE if the mean state of the attractor is chosen as a predictor, while GAR is the expected RMSE if a random point is chosen. Compared with the standard deviation (SD), AR has the same form to measure the variability of variables. AR (GAR) is objective threshold to determine the practical predictability limits and decorrelation limits of a forecast model. Table 2 lists the AR and GAR for the variables x, y, z and the vector (x, y, z) in the Lorenz63, 4VCM, and 5VCM perfect models. The results reveal that the 5VCM has the largest values of AR and GAR for three atmospheric variables in these models. As more coupling processes are added, the natural variability of the chaotic system gradually increase.

Table 2 The AR (GAR) of the variables x, y, z and the vector (x, y, z) in different simple conceptual models

The predictability limit is affected by the magnitude of initial errors (Ding and Li 2007). Figure 1a gives the forecast RMS error statistics of the vector for three atmospheric variables for these three simple conceptual models. Based on the perfect models of Lorenz63, 4VCM, and 5VCM, a set of initial errors with magnitudes 10–2, 10–3, 10–4, 10–5, 10–6, and 10–7 is generated with white noise superimposed on the atmospheric variables (x, y, z) of the unperturbed run to produce a set of initial conditions (total of 2 × 107). Forecasts for 50 TUs are then carried out. Figure 1b shows the global practical predictability limit (Tpr) of these three perfect conceptual models. As the magnitude of the initial errors decreases, the practical predictability limit increases. For a given magnitude of initial errors, it is clear that the practical predictability limit of the5VCM is shorter than that of the 4VCM, and the Lorenz63 model has the maximum predictability limit. In the 4VCM and 5VCM, the initial error superimposed on the atmospheric variable is gradually introduced into the ocean through a forcing term from the ‘atmosphere’ (c2y). In return, the error growth of the ocean component accelerates the error amplification of the atmosphere component. Finally, the practical predictability limit decreases with the increase of variables.

Fig. 1
figure 1

a Time series of global ensemble mean RMSE of vector (x, y, z) in different simple models and the magnitude of initial errors. The magnitudes of initial perturbation are 10–2, 10–3, 10–4, 10–5, 10–6, and 10–7, from left to right with same color. The solid line and the dashed line represent the corresponding GAR and AR of each model. b Global practical predictability limit Tpr of those three models with respect to initial error. The red, blue and green colors correspond to the Lorenz63, 4VCM and 5VCM models

According to the AR theory, the global practical predictability limit Tpr can be regarded as the objective criteria to measure the time when model forecasts have skill. Meehl et al. (2014) point out that predictability is the ability that a system signal can be predicted rather than the ability of current human being to predict some feature or quantity. Hence, we also use the NLLE (Ding and Li 2007; Li and Ding 2011) method to calculate the intrinsic predictability limits of those three perfect conceptual models based on their chaotic error growth. The intrinsic predictability limit is defined as the timescale on which the error reaches 98% of its saturation level (Li and Ding 2011). Figure 2 shows the intrinsic predictability limit of vector (x, y, z) using the NLLE method, which is consistent with the Tpr in Fig. 1. The 5VCM has the maximum error saturation, while the Lorenz63 has the minimum error saturation. The predictability limits for the Lorenz63, 4VCM, and 5VCM models of the vector (x, y, z) are 13.69, 10.07, and 9.50, respectively, while the average initial error of those three models are 10–2,10–1 and 10–1.

Fig. 2
figure 2

a Time series of the error growth of vector (x, y, z) in different simple models as estimated by NLLE. b Predictability limit of those three models of Lorenz63, 4VCM and 5VCM. The red, blue and green colors correspond to these three models

It is of great significance to objectively measure the forecasting skills of operational forecast systems and provide guidance for their development. As stated by Li et al. (2018), the AR method is more suitable than the NLLE to quantitatively measure the time when model forecasts have skill. And compared with the asymptotic value (AV), the AR method has the advantage that it can objectively use a criterion to measure the practical predictability limit instead of using forecast models that have a strong dependence on model errors. Hence, we will only use the AR method to measure the time when model forecasts have skill in the following research.

3.2 Predictability of coupled vs. uncoupled systems

When an uncoupled model is developed as a coupled model, the initial-value predictability problem becomes a joint initial-value and boundary-value problem. Here we assume that the 5VCM includes ‘perfect’ coupling processes between the ‘atmosphere’ and ‘ocean’ of the ‘real world.’ Let Θ and ΘM correspond to the attractor of the perfect 5VCM and its imperfect model with missing one or more coupling processes. Then, we’ll measure how well the imperfect model can predict the 5VCM.

To compare the effects of coupling processes on the predictability limit, several sets of experiments are designed, as listed in Table 3. By changing the equations of both w and η in 5VCM, the first four experiments are designed to study the impact of the coupling mechanism on the predictability limit of the vector (x, y, z). The interaction between the slab ocean and the deep ocean pycnocline is ignored in experiment Case1, where w is regarded as a mean value of the time series (w = 11.3854), calculated by the ‘perfect model’ (Fig. 3a). This experiment represents an atmosphere forced by constant bottom boundary conditions in an uncoupled model. In Case2, we retain the interaction between the high-frequency variables and the slab ocean, but neglect the impact of the deep ocean pycnocline. Based on Case2 (η = 0), the η value is replaced by the climatological mean 11.5196 to form Case3 (Fig. 3b). Case4 is a series of perturbation experiments, with initial errors of magnitudes from 10–2 to 10–7 superimposed on the initial conditions of the control run (or ‘truth’ run) to calculate the growth of forecast RMSEs, as in Sect. 3.1.

Table 3 The control equations of simple models from Case1 to Case7
Fig. 3
figure 3

Time series of (a) the upper-ocean variable w (black line) and (b) the deep ocean pycnocline depth anomaly η (black line) in the 5VCM. The red dashed line represents the corresponding mean value of each variable

Figure 4a shows the forecast RMS error statistics of the vector for three atmospheric variables between perturbed runs and the control run. The global practical predictability limit (Tpr) of the forecast model is shown in Fig. 4b. The first three experiments represent the influence of boundary conditions on the predictability limit. The global practical predictability limits for three atmospheric variables in Case1, Case2, and Case3 are 0.84, 3.73, and 6.43, respectively. As the boundary conditions become more and more accurate, the global practical predictability limit of the nonlinear dynamical system increases significantly. The Case4 experiments reflect the impact of initial errors with different magnitude. From Case4-1 to 4–6, the global practical predictability limits for three atmospheric variables are 4.44, 6.05, 7.68, 9.29, 10.94, and 12.55, respectively. When the magnitude of the initial error is smaller than 10–3, the predictability limit of the initial-value is longer than that of the boundary-value.

Fig. 4
figure 4

a Time series of global ensemble mean RMSE of vector (x, y, z) between the 5VCM ‘real model’ and imperfect model. The yellow, green, red, and blue lines correspond to the Case1, Case2, Case3, and Case4. Especially the magnitudes of initial perturbation are 10–2, 10–3, 10–4, 10–5, 10–6, and 10–7, from left to right with blue color. The solid line and the dashed line represent the corresponding GAR and AR of the 5VCM ‘real model’. b Global practical predictability limit Tpr of the vector (x, y, z) with respect to model errors or initial errors. The Case1, Case2 and Case3 represent the influence of model errors, while the Case4 reflect the impact of initial errors. The magnitudes of initial perturbation are 10–2, 10–3, 10–4, 10–5, 10–6, and 10–7 from Case4-1 to Case4-6

The last three experiments (Case5, Case6, and Case7) in Table 3 are designed to study the predictability of the slab ocean w. In Case5 and Case6, the equations of high-frequency variables are different from the ‘real model’ since we set c1wκx = 0, meaning that the atmosphere is not affected by the bottom boundary condition. Consequently, the ‘ocean’ components in these two experiments look like an ocean-only model which has forcing from the atmosphere but without feedback to the atmosphere. In Case5, we ignore the influence of the deep ocean pycnocline by setting η = 0, while the interaction between the slab ocean and the deep ocean pycnocline is retained in Case6. Case7 is designed as the coupled model to study the initial-value predictability for w, just as in Case4.

Figure 5a shows the forecast RMS error statistics of w for the imperfect model relative to the ‘real model’, and the global practical predictability limit of the forecast model is shown in Fig. 5b. The global practical predictability limit of the variable w is 1.92 in Case5 and 1.94 in Case6. Compared with the Case7, the predictability limit of the initial-value problem is longer than that of the boundary-value problem when the upper boundary forcing is closed.

Fig. 5
figure 5

Same as in Fig. 4, but for the variable w in the Case5, Case6, and Case7. The yellow, green, and blue lines correspond to the Case5, Case6, and Case7. The magnitudes of initial perturbation are 10–2, 10–3, 10–4, 10–5, 10–6, and 10–7 from Case7-1 to Case7-6

From an atmosphere-only system to an ocean–atmosphere coupled system, the initial-value predictability problem becomes a joint initial-value and boundary-value. Both boundary conditions and initial conditions can affect the predictability of the system. Improving the accuracy of the boundary conditions and reducing the initial errors of the models are key to improving forecasting skills in the future.

4 Results of CGCM predictions system

4.1 Practical predictability limit of CFS vs. GEFS

From the results of simple conceptual models, we learned that the predictability limit of coupled models is longer than that of uncoupled models. However, we still need to examine operational forecasting systems to further validate conclusions and understand the mechanisms. We first compare the CFS and GEFS data to verify the conclusions and use CFSR data to verify the forecasts and calculate the AR. According to the AR theory, first, we assume that the "true" state is described by the NCEP CFSR. Then the prediction results are verified by using CFSR to get the global mean RMSE. In the following, the AR and GAR are calculated using the CFSR. And the AR is an objective threshold to determine the practical limits of different models, where AR is independent of the model errors and initial errors.

Figure 6 illustrates the spatial distribution of the AR of the geopotential height (GHT) at 850-hPa, 500-hPa, and 200-hPa from Jan 1999 to Dec 2010. The AR of CFSR appears to have a regional characteristic on various pressure levels, with a maximum over mid-latitudes in the Southern Hemisphere (SH), and the lowest values over the tropics. Meanwhile, the spatial distribution of the AR is consistent with the trough and ridge positions in the Northern Hemisphere (NH). The centers of the maximum AR are observed at the positions of the East Asian trough, the North American trough, and the European trough. Baroclinic instability is generally negligible in the tropics, where barotropic and convective instabilities and their interactions are dominant. In the midlatitudes, baroclinicity is the dominant instability responsible for the growth of small errors at the synoptic (weather) scales. The existence of storm tracks over the midlatitudes explains the higher variability in these regions.

Fig. 6
figure 6

Spatial distribution of (left) the AR of the GHT from Jan 1999 to Dec 2010 and (right) their zonal mean profiles. Upper, middle, and lower panels are for 850-hPa, 500-hPa, and 200-hPa, respectively

Theoretically, the model attractor radius should be calculated using the model long free forecasts. But we cannot get sufficiently long free forecast time-series of the model. Considering that the reforecast frequency of GEFS is once a week, we use the first 7 days of each forecast case to form a continuous time series as the model long free forecasts to calculate the model attractor radius. And the model attractor radius for CFS is calculated similarly. The model attractor radius of CFS and GEFS are greater than the real AR in most areas on these three levels (not shown).

Figure 7 shows the spatial distribution of the mean practical predictability limit of CFS and GEFS at three pressure layers. In Fig. 7a, the practical predictability limits of the CFS at 850 hPa appear to have a regional characteristic; there are many maximum or minimum centers, with a maximum of 11–13 days over the Western Pacific Warm Pool area and a lowest limit of 2–3 days in the Andes, the Tibetan Plateau, and the Sahara. However, the practical predictability limits of the GEFS at 850 hPa have a significant difference (Fig. 7b), with 9–11 days over the North Pacific and of 3–7 days over the tropics. The zonal mean variation of practical predictability limits with latitude shows that the practical predictability limits of CFS over the tropics are significantly longer than those of GEFS (Fig. 7c).

Fig. 7
figure 7

Spatial distribution of the practical predictability limit of GHT for CFS (left panel) and GEFS (right panel). The black and red lines in the middle panels of zonal mean profile are practical predictability limit of CFS and GEFS, respectively. Upper, middle, and lower panels are for 850-hPa, 500-hPa, and 200-hPa, respectively

As shown in Fig. 7d, the practical predictability limits of 500 hPa GHT in CFS appear to have a zonal distribution. The practical predictability limits of 500 hPa GHT in CFS appear to be highest over the NH midlatitudes. However, relatively low predictability of GHT is observed over the tropics between 10°S and 10°N. In contrast, the practical predictability limits of the GEFS have a significant difference (Fig. 7e), with 10–12 days over the tropics, especially in the central equatorial Pacific. Over the NH mid–high latitudes, the practical predictability limits can reach 9–11 days. Predictability limits of geopotential height on 200 hPa are markedly longer than those on 500 hPa over the middle-east equatorial Pacific Ocean.

Figure 8 shows the difference in the practical predictability limit of GHT between CFS and GEFS. CFS has longer practical predictability limit than GEFS at 850 hPa over the tropics (Fig. 8a), while GEFS is more predictable than CFS at 500 hPa and 200 hPa (Fig. 8b, c). From 500 to 200 hPa, the difference between GEFS and CFS increased significantly in the tropics, especially over the middle-east equatorial Pacific Ocean. However, coupling process has a limited effect on the practical predictability limit in mid-high latitudes.

Fig. 8
figure 8

Difference (CFS–GEFS) in the practical predictability limit of GHT (left panels) between CFS and GEFS and their zonal mean profiles (right panels). Upper, middle, and lower panels are for 850-hPa, 500-hPa, and 200-hPa, respectively

The spatial distribution of practical predictability limits on 200 hPa varies with the season (Fig. 9). In the boreal spring, the practical predictability limit of both GEFS and CFS in the Northern Hemisphere is significantly longer than in the southern hemisphere. The situation in the boreal autumn is the reverse of that in the boreal spring. For those four seasons, there is little difference in the practical predictability limit of the zonal mean between CFS and GEFS over the mid-high latitudes. But GEFS is more predictable in the tropics than CFS, especially in boreal spring. In the boreal spring, the 200-hPa practical predictability limits over the middle-east equatorial Pacific Ocean of GEFS can be more than 22 days. And the seasonal difference of the practical predictability limits between CFS and GEFS are shown in Fig. 10. The practical predictability limits of GEFS have similar distributions to those of CFS, but with much higher values over the tropics, especially over the middle-east equatorial Pacific Ocean. It is worth noting that the difference in the practical predictability between two models over the tropics varies with the seasons: increasing in the winter and reaching its peak in spring, then decreasing and achieving its minimum in autumn finally.

Fig. 9
figure 9

Spatial distribution of the seasonal mean practical predictability limit of GHT at 200-hPa for CFS (left panel) and GEFS (right panel). The black and red lines in the middle panels of zonal mean profile are seasonal mean practical predictability limit of CFS and GEFS, respectively. First, second, third, and lower panels are for spring, summer, autumn and winter, respectively

Fig. 10
figure 10

Difference (CFS–GEFS) in the practical predictability limit of GHT (left panels) between CFS and GEFS and their zonal mean profiles (right panels). First, second, third, and lower panels are for spring, summer, autumn and winter, respectively

As shown in Figs. 7 and 9, the CFS has longer practical predictability limit than GEFS at 850 hPa over the tropics, while it is opposite at higher levels. The tropical oceans are crucial for large-scale ocean–atmosphere interactions, and have a profound impact on global climate variability. As the bottom boundary conditions of the atmosphere, the low-frequency signals generated by the ocean–atmosphere interaction can extend the predictability limits.

Previous studies have shown that the ocean has an important effect on the predictability of the lower atmosphere (Bach et al. 2019; Kumar et al. 2013; Peña et al. 2003; Wu and Kirtman 2007). Bach et al. (2019) found that the ocean improves predictions of the low-level atmosphere most significantly in the tropics, which is consistent with our results on 850 hPa. As Pegion et al. (2019) pointed out, the uncoupled atmosphere model could have higher predictability than a coupled atmosphere model with the ocean if the atmosphere model itself has a larger model error. They compared the 2-m temperature and precipitation with five coupled models and two uncoupled models using sub-seasonal prediction experiments. Their results show that the prediction skill of 2-m temperature of GEFS is better than that of all other models. Thus, we believe that there are two reasons for the difference in the predictability between CFS and GEFS over the tropics: coupling processes and atmosphere model. We will discuss these two points in details next.

First, the CFS is a coupled model. A complete reforecast of CFS have been made with the T126L64 GFS with half-hourly coupling to the ocean (MOM4 at 1/4° equatorial, 1/2° global) (https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/climate-forecast-system-version2-cfsv2). But GEFS is an uncoupled model. In the Environmental Modeling Center (EMC)-GEFS forecast system, SSTs are specified by relaxing the SST analysis to a combination of climatological SST and bias-corrected SST from operational NCEP-CFSv2 forecasts. The longer the lead time, the more weight given to the bias-corrected NCEP-CFSv2 forecast SST (Pegion et al. 2019).

Second, the GEFS atmosphere model has better performance than the CFS atmosphere model. The 35 days reforecasts of GEFS used the new operational Global Ensemble Forecast System version (GEFS; version11). In contrast to the atmospheric component of the CFS, there are several advantages in version11 (Zhou et al. 2017), including: (1) improved initial perturbations using an ensemble Kalman filter (EnkF) data assimilation system; (2) increased horizontal resolution to TL574(34 km)L64 in the first 192 h for weather, to better represent the interaction around multiple scales; (3) Eulerian dynamics is replaced with two time-level semi-implicit semi-Lagrangian dynamic Zhou et al. (2017). found that the RMSE of 500-hPa geopotential height in the new GEFS version (v11) is significantly smaller in the Northern Hemisphere than the old version.

Although the new GEFS version (v11) outperformed the T126L64 GFS in the geopotential height, the coupling process in the lower atmosphere results in a higher predictability limit of CFS than GEFS, especially in equatorial regions. However, in the middle and upper atmosphere, the coupling process has a smaller impact on the atmosphere, while the atmosphere model errors play an important role in the forecasting. As a result, at 200 hPa and 500 hPa, the practical predictability limit in GEFS over the tropics is longer than that of CFS.

4.2 Influence of low-frequency signals on predictability limit

The AR and GAR are introduced to quantitatively estimate the predictability limit of strange attractors. According to the attractor theory, error growth curves will converge to the GAR. Over the 200-hPa central-east equatorial Pacific of the GEFS, GAR is obviously larger than the error saturation, which leads to the long predictability limits in this region. What causes the contradiction between this result and attractor theory? Compared with the atmosphere, the ocean varies slowly. The ocean can suppress the fast-varying signals and enhance the low-frequency signals in the climate system through air–sea interaction, thus lengthening the predictability limit. The ocean influence on atmospheric components may cause the changes in the properties of the attractors. We will use 5VCM to explore the mechanism.

For atmospheric components x, y, and z in the 5VCM, we add a low-frequency term in the x equation to simulate the effects of the periodic signals, so the conceptual pycnocline prediction model becomes

$$ \begin{array}{l} {{\dot{\text{x}}} = - \sigma x + \sigma y + {\text{T}}_{l} ({\text{t}})} \\ {\dot{y} = - xz + (1 + c_{1} w)\kappa x - y} \\ {\dot{z} = xy - bz} \\ {O_{m} \dot{w} = c_{2} y + c_{3} \eta + c_{4} w\eta - O_{d} w + {\text{S}}({\text{t}})} \\ {\Gamma \dot{\eta } = c_{5} w + c_{6} w\eta - O_{d} \eta } \\ \end{array} $$
(10)

We simply set \({\text{T}}_{l} ({\text{t}}){\text{ = T}}_{0} {\text{ + A}}\sin (2\pi t/T_{pd} )\) to simulate a low-frequency forcing, where Tpd and T0 are set as 10 and 20, respectively. To compare the effects of signal magnitudes, we run three experiments: the magnitude of A is set to 10 in Case9 and 70 in Case10, while Case8 is defined as the control experiment, in which atmospheric components are not affected by the low-frequency signal.

For these three experiments, projections on the xz plane in the phase space of the 5VCM are presented in Fig. 11. In Case8 the atmospheric components are not affected by the low-frequency signals and the attractors are distributed symmetrically on the xz plane. By adding a low-frequency forcing to the variable x, the spatial distribution of attractor described by Case10 is significantly different from that of Case8 (Fig. 9a, e). Figures 11b,d,f give the forecast RMS error statistics of the vector for three atmospheric variables in these three experiments. The GAR in Case10 is larger than the error saturation, unlike in Case8 and Case9. This phenomenon is consistent with the situation in the GEFS over the middle-east equatorial Pacific. Compared with the results of the standard 5VCM, the low-frequency forcing of the chaotic system has an important effect on the properties of the atmospheric attractors. The addition of an external periodic forcing causes long-term trajectories to be closer than any two random points on the attractor (i.e., for the GAR to be higher than the error saturation), since the forcing can cause the trajectories to experience similar ‘weather’.

Fig. 11
figure 11

Projections (a) Case8, (c) Case9, and (e) Case10 on the x–z plane in the phase space of the 5VCM. 1 × 105 points are plotted. Time series of global ensemble mean RMSE of vector (x, y, z) in different experiments (b) Case8, (d) Case9, and (f) Case10 in the 5VCM with respect to the magnitude of initial error. The magnitudes of initial perturbation are 10–2, 10–3, 10–4, 10–5, 10–6, and 10–7, from left to right. Blue and red dashed lines are the AR and the GAR

The distribution of the practical predictability limits of GHT at 200 hPa shows that both the CFS and GEFS have the longest predictability limit over the middle-east equatorial Pacific compared with other regions in the tropics. This phenomenon varies seasonally, and the practical predictability limits in spring show larger spatial variability than in autumn. Compared with other tropical regions, the middle-east equatorial Pacific Ocean has very long predictability limits, but these long predictability limits are not seen over the midlatitudes. We use the Ensemble Empirical Mode Decomposition (EEMD) to investigate this phenomenon further. We analyze the effect of significant low-frequency signals for three regions: A (155°W–135°W, 10°S–10°N), B (120°E–140°E, 10°S–10°N), and C (170°E–170°W, 30°N–50°N), and find the significant low-frequency signals lengthen the predictability limits. Those three regions have been drawn in Fig. 7g, h.

The Ensemble Empirical Mode Decomposition (EEMD; Wu and Huang 2008) is a method of separating data into different components by their scales. Each component is defined as an intrinsic mode function (IMF). Compared with the Empirical Mode Decomposition (EMD), it overcome the scale mixing problem through a new noise-assisted data analysis method. It can be seen in Fig. 7 that the AR and GAR are calculated by CFSR from Jan 1999 to Dec 2010. However, in order to calculate the significant periods, we use both the CFSR (Jan 1979 to May 2011) and CFSv2 (Apr 2011 to Dec 2017).

Figure 12 shows the significant period calculated by the EEMD method. At the 200 hPa level, the low-frequency signals in regions A and B play a dominant role compared with region C, especially those in region A. The quasi-biennial oscillation (QBO) and quasi-4-year period contribute a greater variance than the seasonal oscillation. In contrast, the quasi-2-week period is the most significant period in region C. However, in the lower troposphere the main periods in the three regions are all intra-seasonal oscillations (ISO), where the largest variance contribution is greater than 20%.

Fig. 12
figure 12

Periods through 95% significance level, obtained using EEMD method. The periods of mean GHT from Jan 1979 to Dec 2017 for CFSR averaged region A (155°W–135°W, 10°S–10°N) in (a) 850-hPa and (d) 200-hPa, region B (120°E–140°E, 10°S–10°N) in (b) 850-hPa and (e) 200-hPa, and region C (170°E–170°W, 30°N–50°N) in (c) 850-hPa and (f) 200-hPa

To determine the influence of low-frequency signals on the predictability limits, a high-pass Butterworth filter of order 1 is used to filter signals longer than two years. A Butterworth filter can extract the significant period of a timeseries and obtain its main modes. Murakami (1979) first used the band-pass Butterworth filter to study the Madden–Julian oscillation (MJO), which is the largest element of intraseasonal (30–90 day) variability in the tropical atmosphere. Since then, it has been widely used in meteorology. Here, we define AR2 as the attractor radius obtained by filtered data, while AR1 is the attractor radius calculated with the original data, shown in Fig. 13. There is a very small difference between AR1 and AR2 for region C at 200 hPa and 850 hPa, while AR2 is significantly smaller than AR1 over regions A and B at 200 hPa. The results indicate that the low frequency signals in the atmosphere have an important influence on the attractor radius. When we use AR2 as a criterion to measure the global practical predictability limits, both CFS and GEFS have significantly shorter predictability limits over regions A and B in 200 hPa. Comparing regions of A, B, and C, we find that the low-frequency signals have a greater impact on the global practical predictability limits at 200 hPa over the tropics than those over the mid-latitude regions, especially for GEFS.

Fig. 13
figure 13

Time series of mean RMSE from Jan 1999 to Dec 2010 of GHT at 850-hPa and 200-hPa for CFS and GEFS operational forecast data averaged the Region A (155°W–135°W, 10°S–10°N) (left panel), the Region B (120°E–140°E, 10°S–10°N) (middle panel), and the Region C (170°E–170°W, 30°N–50°N) (right panel). Upper and lower panels are for GHT at 850-hPa and 200-hPa, respectively. Black and red solid lines respectively represent the RMSE of the CFS and GEFS. Green dashed lines are AR1 calculated by CFSR, while blue lines are AR2 where the signals over 2 years are filtered by high pass Butterworth filter of order 1. Green and blue solid lines are GAR1 and GAR2

Some previous studies have shown that the variability and predictability of the 200-hPa seasonal mean GHT can be divided into external and internal variability (Kumar et al. 2003). The seasonality of internal variability depends on the seasonality of the atmospheric mean state alone, while the seasonality in the external variability depends further on the seasonality of the SST forcing, especially the tropical SST variability associated with the El Niño–Southern Oscillation (ENSO) (Hoerling and Kumar 2002). In addition, Baldwin et al. (2003) reported that persistent circulation anomalies in the lowermost stratosphere could affect the troposphere through changes in waves in the upper troposphere. Abnormal signals in the stratosphere can travel down to the troposphere and influence its weather and climate. Because the general circulation of the stratosphere has a long time-scale, it may enhance the forecast skill in the troposphere.

5 Summary and discussion

We compared the difference in the global practical predictability limits of coupled and uncoupled models, using the attractor radius (AR), the global attractor radius (GAR), and the nonlinear local Lyapunov exponent (NLLE). We first used three simple models to systematically study the characteristics of predictability of a coupled system. Starting from the Lorenz butterfly model (Lorenz63), a slab ‘ocean’ variable and a ‘pycnocline’ variable are combined with the Lorenz63 through coupling processes (the 4VCM and 5VCM models). The coupling processes include the air-sea interaction and the coupling between the upper and deep oceans. Compared with the Lorenz63 model, the 4-variable coupled model (4VCM) and 5-variable coupled model (5VCM) have larger natural variability due to the low-frequency boundary signals from air-sea interactions. The 5VCM has the maximum AR and GAR. From an atmosphere-only system to an ocean–atmosphere coupled system, the initial-value predictability problem becomes a joint initial-value and boundary-value problem. The results show that both boundary condition errors and initial condition errors play important roles in growing forecast errors.

We performed quantitative analysis of the spatiotemporal distributions of the practical predictability limits of geopotential height (GHT), comparing the coupled Climate Forecast System (CFS) and the uncoupled Global Ensemble Forecast System (GEFS) at 850 hPa, 500 hPa, and 200 hPa. Retrospective forecasts of the CFS and GEFS were used to evaluate the practical predictability limit. In the low-level atmosphere, the ocean improves prediction of the atmosphere most significantly in the tropics. The practical predictability limits of the CFS at 850 hPa appear to have a regional characteristic, with many maxima and minima, with a maximum of 11–13 days over the Western Pacific Warm Pool and 3–5 days over the Antarctic. In contrast, the practical predictability limits of the GEFS at 850 hPa have a significant difference, with a maximum of 9–11 days over the North Pacific and 3–7 days over the tropics. Note that the predictability limits of coupled models at 850-hPa geopotential height are higher than those of uncoupled models, especially in the tropics.

CFS has longer practical predictability limit than GEFS at 850 hPa over the tropics, while GEFS is more predictable than CFS at 500 hPa and 200 hPa. From 500 to 200 hPa, the difference between GEFS and CFS increased significantly in the tropics, especially over the middle-east equatorial Pacific Ocean. Moreover, it is worth noting that the difference in the practical predictability at 200 hPa between two models over the tropics varies with the seasons: increasing in the winter and reaching its peak in spring, then decreasing and achieving its minimum in autumn finally. The tropical oceans are crucial areas where large-scale air–sea interactions occur that have profound impacts on global climate variability. Using air–sea interactions as the atmospheric bottom boundary conditions could extend the predictability limits for geopotential height in lower troposphere. In addition, based on the 5VCM, we find that low-frequency forcing of atmospheric components has an important effect on the properties of atmospheric attractors, which may enhance the forecast skill in the upper troposphere.

We believe that there are two reasons for the difference in the predictability between CFS and GEFS over the tropics: coupling processes and atmosphere model. The coupling process in the lower atmosphere results in a higher predictability limit of CFS than GEFS, especially in equatorial regions. However, in the middle and upper atmosphere, the coupling process has a smaller impact on the atmosphere, while the atmosphere model errors play an important role in the forecasting. As a result, at 200 hPa and 500 hPa, the predictability limit in GEFS over the tropics is longer than that of CFS.

We have only taken the first step in predictability studies for the coupled system. Compared with an uncoupled system, we find that a coupled system has longer practical predictability limits in the lower troposphere. However, further studies are required to clarify the physical processes at the air–sea interface on different temporal and space scales that extend the predictability limits in the coupled system. In addition to its application to the atmosphere within a coupled system, the concept of predictability limits can be extended to the ocean, land, and sea ice. In this paper, the retrospective forecasts of geopotential height in both the CFS and GEFS were used to evaluate predictability of atmospheric systems. In the future, more coupled models need to be employed to further explore the predictability limits of other components of coupled systems.