1 Introduction

Forecasts of sea surface height (SSH) on interannual timescales are affected by several sources of uncertainty. Assessing such uncertainty and associated mechanisms are crucial to provide reliable forecasts and potentially mitigate the effects of coastal flooding in certain regions. Furthermore, this can have implications for devising strategies on how best to design ocean observing systems and initialise climate models (Zanna et al. 2018). Finally, more skilful SSH predictions could help improve predictions of other parts of the climate system, for example increased skill in predicting the Gulf Stream’s meridional position could impact predictions of air–sea heat fluxes and atmospheric blocking (Scaife et al. 2011).

Increased understanding of the ocean-atmosphere processes which modulate interannual sea level variability could aid in improving SSH forecasts. Cabanes et al. (2006) found that the mechanisms which contribute to observed interannual SSH variability are regionally dependent, with the majority of the interannual variability controlled by both the local steric response to heat fluxes, and the large-scale oceanic adjustments to variations in the wind stress. In modelling studies (e.g., Roberts et al. 2016), the SSH interannual variability in subpolar gyre has been found to be mostly buoyancy driven through variations in both thermosteric and halosteric SSH components. Whereas, in the subtropical gyre, variability is primarily driven by the momentum forcing and variations in the thermosteric SSH component (e.g., Roberts et al. 2016). In the Gulf Stream region, intrinsic ocean processes (which are defined as ocean processes generated in the absence of atmospheric forcing variability) are responsible for the majority of the SSH interannual variability (Penduff et al. 2011). Such intrinsic ocean processes include mesoscale eddies.

Several studies have investigated seasonal to interannual SSH predictability (Chowdhury et al. 2007; Wang et al. 2013; Qiu et al. 2014) with some key mechanisms identified in General circulation models (GCMs) (e.g., Roberts et al. 2016). Nonaka et al. (2016) and Roberts et al. (2016) focused on interannual SSH variability in eddy-resolving and eddy-permitting ocean models, respectively. Nonaka et al. (2016) investigated the predictability of mid-latitude ocean currents, using an ensemble of eddy-resolving ocean GCM (OGCM) experiments with horizontal resolution of \(\sim 1/10{^{\circ }}\). They find a lack of predictability in the jet regions due to the contribution from the mesoscale eddy field. Roberts et al. (2016) examined interannual SSH predictability using the eddy-permitting (\(\sim 1/4{^{\circ }}\)) Hadley Centre Global Environment Model version 3 (HadGEM3) and found predictive skill in the Tropics on timescales of several years with a lack of skill in jet regions. Ensemble forecasts of SSH did not exhibit skill on 2–5 years that could beat persistence forecasts.

Other predictability studies, have used low resolution OGCMs (\(1{^{\circ }}\)\(2{^{\circ }}\)) to investigate dynamic and steric SSH predictability (Schneider and Griffies 1999; Miles et al. 2014; Polkova et al. 2015). Polkova et al. (2015) found predictive skill in interannual steric SSH predictions in the subtropics on timescales of 2–5 years. Such skill was related to adjustments due to baroclinic Rossby waves. Skill was also found in the North Atlantic subpolar gyre on timescales of 2–5 years, which was related to changes in spiciness along isopycnals. Schneider and Griffies (1999) found SSH predictability in the North Atlantic on times of up to 17 years, using an ensemble of coupled climate model runs, related to the large-scale ocean circulation.

The work presented here focuses on the predictability of dynamic sea level, i.e. variations which arise from ocean processes. These variations are linked to changes in ocean circulation, large scale heat transports, position of gyre boundaries and thus air–sea interactions.

The aims of this study are to

  • Establish the timescales of predictability due to internal variability of SSH in the North Atlantic;

  • Identify the regions where forecasts are most sensitive to perturbations in the initial conditions;

  • Evaluate the spatio-temporal characteristics relevant to assessing North Atlantic SSH predictability.

  • Relate any internally generated predictability to large-scale ocean characteristics, with an emphasis on both mid-latitude jets and the gyre circulations;

We focus our analysis on statistical methods, which are used to evaluate predictability generated via internal variability in a fully coupled climate model. We will make use of an extended model control run, without interannual variations in the model’s boundary conditions. This should enable us to isolate any interannual predictability related to the model’s internal variability. A perfect model approach and a combination of Linear Inverse Modeling (LIM), non-normal mode analysis, and average predictability time (APT) are used to evaluate how initial conditions influence error growth (Penland 1989; Farrell and Ioannou 1996; DelSole and Tippett 2009a).

This paper is organised as follows, Sect. 2 details the model set up and the interannual SSH variability present. Section 3 contains information on the statistical methods used to evaluate predictability and an investigation into the influence of eddy-mean flow interactions on forecast skill. Section 4 examines the predictability related to the initial conditions of the ocean model through both the optimal initial conditions of SSH and the predictable components diagnosed by evaluation of the APT of the system. The final section contains a discussion of the results.

2 Characterising interannual sea surface height variability in the North Atlantic in HadGEM3

2.1 HadGEM3 model description

We use the output from a 150-year free-running control simulation of a coupled climate model, HadGEM3 GC2.0 (Williams et al. 2015). This control simulation has repeated-year radiative forcings (e.g., aerosols and greenhouse gases) taken from the year 2000 (identical to experiment 2 in the Coupled Model Inter-comparison Project Phase 3, CMIP3). The ocean component of HadGEM3, GO5.0 is based on version 3.4 of NEMO (Nucleus for European Modelling of the Ocean) and is described in detail in Megann et al. (2014). The model is on an ORCA025 horizontal grid, which uses an eddy-permitting \(1/4{^{\circ }}\)  horizontal resolution and 75 vertical levels, with the vertical level spacings increasing from 1 m at the surface to 200 m at depth. The vertical level spacing provides high resolution near the surface for short to mid-range forecasting. It uses both a linear free surface and an energy conserving momentum advection scheme. The vertical mixing of tracers and momentum is parameterised by a turbulent closure scheme (Gaspar et al. 1990; Madec 2008). The horizontal viscosity used is bi-Laplacian and the bottom friction is quadratic (Megann et al. 2014).

Monthly mean SSH anomalies are defined as departures from a time-mean, constructed from the entire model run. These anomalies are then used to create the statistical forecast models in Sects. 3 and 4. In addition, monthly mean fields of surface net heat fluxes and both the zonal and meridional wind stress are used in Sect. 4.4 in the analysis of the mechanisms responsible for the predictable components. All the fields are linearly-detrended and have their seasonal cycles removed. Two independent methods were trialed to remove the seasonal variability. In the first method, the output seasonal cycle is by removed through Fourier-filtering, by fitting cosine and sine waves to the annual and semiannual harmonics and removing variability at these frequencies. The second method deseasoned the model output by subtracting the monthly climatology from each month. The results are insensitive to the method used and so the second method is chosen for simplicity.

2.2 Evaluation of HadGEM3 against observational references

Previous model verification experiments by Williams et al. (2015) have shown that there are some small model biases in SST, primarily located just to the south of Greenland. However, HadGEM3 fields are significantly improved relative to its predecessor HadGEM2-ES. HadGEM3 has a more accurate Gulf Stream path, which which leads to improved atmospheric blocking statistics for the UK and Europe (Scaife et al. 2011; Williams et al. 2015).

The time-mean SSH and its variability in the HadGEM3 model are compared to observations to ensure the model demonstrates a sufficient level of realism. An AVISO mean dynamic topography estimate for the period 1993–2012 (Rio et al. 2014), is used to evaluate the HadGEM3 time-mean SSH for the full model simulation. A uniform offset is applied to the AVISO data to give a similar domain average to the HadGEM3 simulation. These mean SSH profiles of the observation-based data and the model output are shown in Fig. 1a, b. There is a reasonable agreement between the model and the observations in both the magnitude and spatial pattern of the mean field. However, there are some slight discrepancies between the two: the observations have more prominent recirculation gyres flanking the Gulf Stream near to its detachment point; and the mean SSH is slightly lower in the west of the subpolar gyre in the model.

The observational estimate of sea level variability comes from the the European Space Agency (ESA) Climate Change Initiative (CCI) version 2.0 monthly gridded fields of surface height anomaly (Legeais et al. 2018). Monthly data for the period 1993–2012 are aggregated into annual mean values before computing the standard deviation for each grid box. For comparison, we select the last 20-years of the HadGEM3 simulation (representative of the simulation as a whole) to compute the standard deviation of annual mean values. The standard deviation of the interannual SSH anomalies of both the observations and the HadGEM3 output are shown in Fig. 1c, d. The patterns are in good agreement both in magnitude and spatially. In the observations there is a higher amount of interannual variability located at the Gulf Stream’s detachment point (5 cm), this is unsurprising as the \(1/4{^{\circ }}\)  resolution model will be missing some variability due to eddy-mean flow interactions in the most turbulent regions.

Fig. 1
figure 1

A comparison of mean SSH from a AVISO mean dynamic topography and b the HadGEM3 model output. The standard deviation of the interannual SSH anomalies calculated from c the ESA CCI version 2.0 monthly gridded fields of surface height anomaly and d the HadGEM3 model output. Both observation based plots are based on the the 20 year period of 1993–2012 inclusive. The mean HadGEM3 SSH is calculated from the entire 150 year model run. Whereas, the model SSH anomaly standard deviation plot is calculated from a representative 20-year period of the model run

2.3 Interannual sea surface height variability

Figure 2a shows the time-mean SSH of the control run. The characteristic double-gyre structure is evident, and the strong SSH gradient is indicative of the location of the Gulf Stream and its extension. The power spectra of SSH anomalies at several locations in the domain are shown in Fig. 2b. The power spectral density measured along the Gulf Stream (black and blue lines) is larger at all timescales than that within the gyre regions (red and green lines). The largest spectral power at interannual frequencies is near Gulf Stream’s detachment point (blue line). As a reference, the spectra are compared to the Zang and Wunsch (2001) canonical frequency-wavenumber spectrum (gray dashed line). As a function of frequency (\(\sigma \)), this spectrum is proportional to \(\sigma ^{-1/2}\) on periods longer than 100 days, whereas, for periods shorter than 100 days it is proportional to \(\sigma ^{-2}\) (i.e. red noise). Hughes and Williams (2010) also highlight regional deviations from this canonical spectrum. The spectra taken in the vicinity of the Gulf Stream, display approximately red noise profiles up to timescales of a year with whiter noise profiles on longer timescales. This whitening is indicative that the predictability of SSH in the Gulf Stream may be limited on interannual timescales. In contrast, the profiles taken in the subpolar and subtropical gyres (red and green lines), are closer to being red noise like in nature for all time periods. This is indicative that skillful interannual SSH forecasts can potentially be made in these regions.

Fig. 2
figure 2

HadGEM3 150-year control run: a Time mean SSH; b Power spectra of SSH anomalies taken at the four locations in the North Atlantic as indicated in a: near the detachment point of the Gulf Stream (blue, \(35.3{^{\circ }} \hbox {N}\), \(72.0{^{\circ }} \hbox {W}\)), in the subtropical gyre (green, \(24.8{^{\circ }} \hbox {N}\), \(40.8{^{\circ }} \hbox {W}\)), in the Gulf Stream extension region (black, \(40.7{^{\circ }} \hbox {N}\), \(40.8{^{\circ }} \hbox {W}\)) and in the subpolar gyre (red, \(56.5{^{\circ }} \hbox {N}\), \(40.8{^{\circ }} \hbox {W}\)). The gray dashed line represents the Zang and Wunsch (2001) spectrum at an arbitrary chosen amplitude. The power spectral density is in units of (\({\text {m}}^{2}\)/cpy) where cpy denotes cycles per year

Figure 3a shows the standard deviation of annual mean SSH anomalies. This again shows that most of the interannual variability is located in the vicinity of the Gulf Stream’s extension and in the subpolar gyre. There are several potential mechanisms for such interannual variability in the Gulf Stream’s extension, including: baroclinic Rossby waves directly modulating the jet extension (Sasaki and Schneider 2011; Qiu et al. 2014); variations in the western boundary currents due to changes in wind forcing (Andres 2016); and modulation by the mesoscale eddy field (Spall 1996; Berloff et al. 2007).

The signal to noise ratio of interannual variability in the North Atlantic is investigated by diagnosing \(\frac{\sigma _{N}}{\sigma _{1}}\), where \(\sigma _{N}\) represents the standard deviation of N-year means of SSH. This measure is often referred to as potential predictability (Boer 2004; Hawkins et al. 2011). Figure 3b, c show the potential predictability associated with \(\sigma _3\) and \(\sigma _5\). Regions with weak interannual variability (standard deviations \(<0.02\)m) are masked (white regions in Fig. 3b, c). These regions of low variability are located along the eastern edge of the basin. The US east coast south of \(45{^{\circ }} \hbox {N}\) stands out as the only portion of coastline where any potential interannual predictability is present (Fig. 3b, c). The largest potential predictability is located in the subpolar gyre. The dynamics of this region are likely to be relatively linear, dominated by mixed layer’s response to forcing and mean advection, and not heavily influenced by the effects of turbulent mesoscale eddies (Sérazin et al. 2015). However, such relatively linear dynamics in the model may also be because the \(1/4{^{\circ }}\) model resolution will not fully resolve the small internal deformation radius in the subtropical gyre and therefore will likely underestimate effects related to the baroclinic instability. Figure 3a shows large values of interannual variability in the Gulf Stream extension, however, Fig. 3b, c demonstrate this pattern of large variability does not translate into a comparable pattern of large potential predictability. Nevertheless, although the potential predictability is lower in the Gulf Stream region it is still non zero, in agreement with the power spectra on interannual timescales in Fig. 2b (black and blue profiles). Therefore even in the eddy-active Gulf Stream region, there appears to be some potentially predictable interannual variability.

Fig. 3
figure 3

a Standard deviation of SSH anomalies of the control run created with 1 year means. Ratio of standard deviations of the b 3 and c 5 year means to standard deviation based on 1-year means

3 LIM forecast analysis: influence of eddy field initialisation on interannual forecasts

Although the SSH variability analysis hints at the presence of interannual timescales, an investigation of SSH forecasts is needed to evaluate any interannual SSH predictability present. Traditionally the statistics needed to evaluate predictability in a GCM are generated by creating an ensemble of model simulations. Depending on the model used, and the size of the ensemble, this process can be computationally expensive (Collins 2007). In this paper we use a contrasting approach whereby the methods used to evaluate predictability are based on statistical models created from one long dynamical model run. The statistical models used here have the benefit of being computationally cheap to run and can still provide insights into the dynamics of the system due to their simplicity (e.g., Sonnewald et al. 2018).

Linear Inverse Modeling (LIM; Penland 1989) has previously been used to evaluate predictability in sea surface temperature, in both models and observations (e.g., Penland 1989; Hawkins et al. 2011; Zanna 2012; Huddart et al. 2017; Dias et al. 2018). The method models the evolution of the desired fields as a linear process forced by white noise. In doing so, the linear inverse model gives information about the predictability of the fluctuations in the system. To enable this calculation, the SSH anomalies are decomposed into empirical orthogonal functions (EOFs) and their related principal components (PCs),

$$\begin{aligned} SSH(x,y,t)=\sum _{i}EOF_i(x,y)PC_i(t). \end{aligned}$$
(1)

The EOFs are constructed using monthly-mean SSH model output. The EOFs are also weighted by the area of their grid boxes as the NEMO grid is irregular. In the calculation of the EOFs, SSH in the Gulf of Mexico is not used as we wished to focus on the SSH predictability in the main ocean basin.

In the following analysis we use 25 EOFs explaining 63% of the variance; the leading three EOFs (responsible for 9.9%, 5.2% and 5.0% of the variance, respectively) are shown in Fig. 4. The leading EOF has a spatial structure reminiscent to that calculated using observations (Häkkinen and Rhines 2004; Häkkinen et al. 2013). Häkkinen et al. (2011) attributed this pattern of SSH variability to variability in the wind stress curl. As the wind stress curl varies, there are associated variations in the strength and sizes of the subpolar and subtropical gyres and a resultant change in SSH (Häkkinen et al. 2013). This ‘gyre mode’ varies between a state with a small subpolar gyre with a large eastward extended subtropical gyre and a state with a large eastward extended subpolar gyre with a small contracted subtropical gyre. A similar variation and associated dependence on the wind stress curl has been identified in this model (shown in the supplementary material). Moreover, there is a lagged response of the first principal component of SSH to a leading principal component of the wind stress, again in agreement with Häkkinen et al. (2011).

Fig. 4
figure 4

a The fraction of variance each of the leading 25 EOFs explains, calculated from 150 years of monthly mean SSH model output. The error bars represent the one standard deviation error related to sampling (calculated using equation 24 in North et al. (1982)). Timeseries of the: b first, c second and d third principal components. The spatial components of the: e first, f second and g third EOFs

The evolution of the PCs of SSH anomalies is approximated by a linear stochastic model (Penland and Sardeshmukh 1995)

$$\begin{aligned} \frac{d\mathbf P }{dt} = \mathbf AP (t) + \xi , \end{aligned}$$
(2)

where \(\mathbf {P}\) is the vector of n-PCs, with dimensions of n by 1, \(\xi \) is a stochastic forcing term, and A is a linear n by n matrix which controls the temporal evolution of the n-PCs. The linear operator

$$\begin{aligned} \mathbf A = \frac{1}{\tau _0}\ln [\mathbf C (\tau _0)\mathbf C (0)^{-1}], \end{aligned}$$
(3)

contains dynamical information about the variability of the system as it is constructed using the, n by n, covariance matrices at lag-\(\tau _0\) and lag-0, which are calculated from the PCs as

$$\begin{aligned} \begin{aligned} \mathbf C (\tau _0)&= \langle \mathbf P (t+ \tau _0)\mathbf P ^{T}(t) \rangle , \\ \mathbf C (0)&= \langle \mathbf P (t)\mathbf P ^{T}(t) \rangle . \end{aligned} \end{aligned}$$
(4)

Here \(\langle \rangle \) indicates an average over all times.

Forecasts, \(\hat{\mathbf {P}}\), are then generated using the model such that, with respect to the initial time, t,

$$\begin{aligned} \hat{\mathbf{P }}(t+\tau )=\mathbf B (\tau )\mathbf P (t), \end{aligned}$$
(5)

where \(\tau \) is the forecast lead time and the n by n matrix B is the forecast propagator,

$$\begin{aligned} \mathbf B (\tau )=\exp (\mathbf A \tau )= \exp \bigg [\frac{\tau }{\tau _0}\ln [\mathbf C (\tau _0)\mathbf C (0)^{-1}]\bigg ], \end{aligned}$$
(6)

Predictability can then be evaluated by examining the difference between the probability distribution of the predictions and that of the climatology. For LIM to be applicable, the system being examined is required to possess several characteristics (Penland and Sardeshmukh 1995):

  • it can be described by Gaussian statistics;

  • A is independent of the time lag, \(\tau _0\), used to calculate it;

  • all real parts of the eigenvalues of A must be negative and therefore decay.

Finally, to prevent overfitting of the linear models, the data used in each experiment is separated into a training and a verification data set. In the presented results the training and verification data sets are 140 years and 10 years long respectively. The linear model is constructed using the training data set. This model is then used to make predictions for the verification set, and the skill of these predictions is used to evaluate the system’s predictability.

Tests to assess how well these conditions are met for SSH anomalies in the North Atlantic and are shown in Fig. 5. Panel a shows a comparison of the cumulative density function of the 150 years of SSH anomalies with that of an idealised Gaussian distribution with the mean and variance of the model output, all calculated in the area used to calculate the EOFs (shown in Fig. 4e). The agreement between the two profiles demonstrates that the system is well described by Gaussian statistics. To examine the influence of mesoscale eddies on the skill of the forecasts two different linear operators are constructed. In each experiment, the same reduced basis, consisting of 25 EOFs and PCs, is used to create each propagator. The first propagator contains all available frequencies. A second temporally smoothed propagator is constructed by applying an 18-month running mean filter to the PCs. Figure 5b shows the Frobenius norm of A as a function of different lag times, \(\tau _0\), for these two operators. Both operators possess regions where the norm of A only varies by a small amount as a function of \(\tau _0\). However, for the 18 month filter there is strong variation in the Frobenius norm of A, when \(\tau _0\) is 15 months. This large variation indicates that in this parameter range the model output may not be well represented by a system of the form shown in Eq. 2. When using the operator constructed with monthly means a \(\tau _0\) of 6 months is chosen and when using the smoothed operator a \(\tau _0\) of 2 months is used. In both cases, all the real parts of the eigenvalues of A are negative and thus satisfy the final necessary condition.

Fig. 5
figure 5

a A comparison of the cumulative distribution functions of the monthly mean SSH model output and an idealised Gaussian distribution constructed using the mean and standard deviation of the SSH model output. b The Frobenius norm of the operator A as a function of different lag times, \(\tau _0\). The two operators are constructed with monthly mean and 18 month filtered principal components respectively

We can now create forecasts of SSH anomalies using the viable LIMs. Forecasts are initialised every 6 months throughout the 150 years of model output, creating 300 forecasts in total. When creating these forecasts it is crucial to ensure a difference between the training and test datasets. At each forecast initalisation date the LIM propagator used to generate the forecasts is trained on 140 years of model data. In each case these 140 years are comprised of the the data which is not within a ten-year window centered on the forecast initialisation date. In order to create a benchmark for the forecasts made using the LIM models, lagged correlation forecasts are also made (Lorenz 1963),

$$\begin{aligned} x(t_{0}+\tau )=\beta (\tau )x(t_{0}), \end{aligned}$$
(7)

where \(\beta \) is the auto-correlation of the time series at a point in space, x is time series of the quantity being predicted and \(\tau \) is the lag time of the forecast. This is a type of ‘damped persistence’ forecast which may provide forecasts better than climatology (\(\beta =0\)) and persistence (\(\beta =1\)) forecasts. The model output used to construct these damped persistence forecasts is reconstructed from the same EOFs and principal components used to construct the LIM models, to allow for a fair comparison.

To evaluate the skill of the statistical models relative to climatological forecasts, we use a root mean square error metric (\(RMSE_{Relative}\)) (Hawkins et al. 2011):

$$\begin{aligned} RMSE_{Relative}=\frac{RMSE_{pred}}{RMSE_{clim}} \end{aligned}$$
(8)

where \(RMSE_{clim}\) is the root mean square error of a climatological forecast over the forecast period and \(RMSE_{pred}\) is the root mean square error of the predicted field. The climatological forecast assumes the SSH anomalies keep their initial values, i.e. \(\hat{\mathbf{P }}(t+\tau )=\mathbf P (t)\) and is constructed using the same EOFs and PCs used to construct the LIM models (i.e the time filtered forecasts are verified against the time-filtered truth). The RMSEs are calculated relative to the control run solution, which has been reconstructed using the same EOFs and PCs used to construct the LIM models. A value greater than unity indicates that the model’s forecasts are inferior to those generated using the climatology, whereas, a value less than unity demonstrates forecast skill superior to climatology.

Fig. 6
figure 6

RMSEs relative to the climatology in three different regions all with longitudes \(18{^{\circ }}\)\(74{^{\circ }}\) W: a ‘Subpolar Gyre’, \(46{^{\circ }}\)\(65{^{\circ }}\)N, b ‘Subtropical Gyre’, \(18{^{\circ }}\)\(37{^{\circ }}\)N , c ‘Gulf Stream region’, \(37{^{\circ }}\)\(42{^{\circ }}\)N created using both LIM and damped persistence models. Two differing temporal smoothings are used to construct these models shown, monthly mean (blue and yellow), and monthly means with an 18-month running mean applied (red and purple). The remaining panels show maps of relative RMSE at given lead times. Forecasts made using the LIM model trained on: df monthly means and gi 18 month temporally smoothed principal components. Forecasts made with damped persistence models constructed with: jl monthly means and mo an EOF reconstruction made with 18 month temporally smoothed principal components

The forecast error maps for the LIM model trained on monthly data are shown in Fig. 6d–f. Errors emerge rapidly in the subpolar and subtropical gyres (seen in panels a and b). Only in the Gulf Stream region and southern part of the domain are any areas of skillful forecasts seen (confirmed in panel c). The damped persistence forecasts created with the monthly mean model output (panels j, k, and l) exhibit small errors in the subpolar gyre and parts of the subtropical gyre (panels a and b), coinciding with regions of large potential predictabilities (Fig. 3). The forecasts created with the LIM model trained on monthly mean SSH anomalies are less skillful than those produced with the damped persistence model. The inclusion of the high-frequency components of the SSH in the construction of the LIM model means that predictably is not exhibited on timescales longer than a year.

The error maps which are subject to 18-month filtering produce smaller errors in all regions (Fig. 6a–c, g–i, and m–o). These models again display the smallest relative RMSEs in the subpolar and subtropical gyres, with more substantial errors in the Gulf Stream region. The LIM model outperforms the damped persistence forecasts in the majority of areas and timescales [the exception being in the subtropical gyre (panel b)]. The subpolar gyre emerges as the region with the largest amount of predictability, on timescales longer than a year (panels a and i). Small errors are also exhibited in the tropics, and extending eastwards towards the Iberian Peninsula. The US east coast stands out as the only section of coastline which borders a region with forecast errors of less than 0.8 on interannual timescales. These results are in agreement to those found by Nonaka et al. (2016), where a lack of any predictability, on timescales longer than a few months, is also found in the Kuroshio.

4 Predictable patterns: optimal initial conditions and average predictability time

The spatio-temporal structure of the predictability can also be analysed by explicitly identifying any patterns which are predictable on interannual timescales. Two methods are now used: (1) an examination of the growth of optimal initial conditions leading to a maximum increase of variance and (2) a decomposition of the system into predictable components, ranked by their relative contributions to the total average predictability time present.

4.1 Non-normal mode analysis and optimal initial conditions

The characteristics of the trained linear model can be used to infer information about a system’s sensitivity to initial conditions. In a series of papers, Farrell and co-authors developed a methodology, generalised linear stability theory, to investigate the transient behaviour resulting from initial perturbations to its mean state (Farrell and Ioannou 1996). This methodology has been used to examine a range of geophysical problems including: Couette flow (Farrell 1982), atmospheric forecast error growth (Farrell 1990), quasi-geostrophic turbulence (Farrell and Ioannou 1995), the El Niño Southern Oscillation (Penland and Sardeshmukh 1995), Gulf Stream dynamics (Farrell and Moore 1992) and the Atlantic meridional overturning circulation (Zanna and Tziperman 2005, 2008; Hawkins and Sutton 2009).

This analysis investigates the transient growth in linearly-stable fluid dynamical systems. It may appear counter-intuitive that there can exist disturbances which lead to growth in a stable system. However, when the operator A is non-normal, i.e. \(\mathbf{AA}^{\mathbf{T }}\ne \mathbf{A}^{\mathbf{T }}\mathbf{A }\), it is possible for the eigenmodes of the system to interact and give a large amplification of variance at a finite-time (Farrell and Ioannou 1996). The solutions to

$$\begin{aligned} \frac{d\mathbf P }{dt} = \mathbf AP (t), \end{aligned}$$
(9)

can be written in terms of the eigenvectors, \(\mathbf {e}_i\) as

$$\begin{aligned} \mathbf {P}(t)=\sum _{i}\mathbf {e}_ia_i\exp {\lambda _it}, \end{aligned}$$
(10)

where \(\lambda _i\) are the eigenvalues of \(\mathbf {A}\) and \(a_i\) is a complex constant. The SSH anomaly growth at time \(\tau \) by non-normal eigenmode interference is given by

$$\begin{aligned} \begin{aligned} \mu (\tau )= \frac{\mathbf{P}(\tau )^{\mathbf {T}} \mathbf {P}(\tau )}{\mathbf {P}(0)^{\mathbf {T}} \mathbf {P}(0)}. \end{aligned} \end{aligned}$$
(11)

The longest timescale on which this growth occurs can be thought of as an optimistic upper bound on the predictability of linear events without forcing. The corresponding spatial patterns, which lead to a maximum growth at a time \(\tau \), are called the optimal initial conditions and are given by calculating the leading singular vector of \(\mathbf B (\tau )\). In this section, the LIM constructed using 18 month temporally smoothed principal components is used as it exhibits skillful forecasts on interannual timescales.

Fig. 7
figure 7

a The maximum amplification curve (Eq. 11). b The optimal initial condition with its tripolar pattern. c The optimal at 10 months. d The optimal at 20 months, its state of maximal growth. e The initial optimal in just the area bordering the US east coast. f The optimal at 20 months in just the area bordering the US east coast. The black and green dotted ellipses indicate the regions which are correlated with monthly means from the model output. The ellipses are characteristic anomalies described in the text. The black dotted lines indicate the 0m contour in the time mean SSH

Fig. 8
figure 8

a The initial optimal added to the mean field. b The initial optimal propagated forward in time by 20 months added to the mean field. c The negative version of the initial optimal added to the mean field. d The negative version of the initial optimal propagated forward in time by 20 months added to the mean field. The black dotted lines indicate the 0m contour in the time mean SSH. The green lines denote the SSH 0m contour when the optimal initial condition of double the magnitude of that shown in Fig. 7b evolves, after 20 months

Fig. 9
figure 9

Anomalies in the zonal geostophic velocities calculated from the SSH of: a the initial optimal, b the initial optimal propagated 10 months forward in time, c the initial optimal propagated 20 months forward in time (All with the same magnitudes as the patterns shown in Fig. 7)

The curve depicting the growth of SSH anomalies, \(\mu (\tau )\), is shown in Fig. 7a. The perturbations can grow through non-normal interactions on time scales of up to 100 months, with the maximum growth occurring at 20 months. The optimal initial condition pattern in SSH, which leads to the largest growth in SSH anomalies after 20 months, is shown in Fig. 7b. This pattern has a very weak gyre scale tripolar pattern, reminiscent of EOF 1 (shown in Fig. 4e). The pattern has two main notable smaller scale features, a tripolar structure off Cape Hatteras (situated at \(32.5{^{\circ }}\)\(42.5{^{\circ }} \hbox {N}\), \(67{^{\circ }}\)\(74.55{^{\circ }} \hbox {W}\), shown by the green ellipse in panel e) and a single sign SSH anomaly along the US east coast (black ellipse, panel e). The propagated optimal initial condition is shown in Fig. 7, panels c and d, at 10 and 20 months, respectively. After 10 months, the SSH anomaly along the boundary no longer has a single sign. There is an increase in SSH along the path of the Gulf Stream and in the subtropical recirculation gyre, and a decrease in SSH in the subpolar gyre. After 20 months, an SSH anomaly grows along the Gulf Stream path, and its magnitude is seen to double. The magnitude of SSH in the subpolar and subtropical gyres is also seen to increase significantly. One interpretation of these optimal initial conditions is that it is especially important to constrain the position of the Gulf Stream separation in the initial conditions as initial errors in this region lead to gyre-scale errors within 10–20 months. However, it is also possible that it is the weaker gyre-scale pattern present in the optimal initial conditions which leads to this growth, as SSH anomalies can be integrated by the gyre circulation on interannual timescales.

Figure 8 shows the positive optimal initial condition (of the same magnitude as that shown in Fig. 7b) and its evolution after 20 months when it is added to the mean SSH field. It also shows the negative version of the optimal initial condition added to mean field, which is an equally valid solution since the evolution is linear. The initial and propagated version of the positive optimal initial condition, Fig. 8, panels a and b, demonstrates an increase in strength of the subpolar gyre, as well as an increase in the SSH gradient across the Gulf Stream. The change in the SSH gradient is linked to variations in the geostrophic transport along the Gulf Stream path, shown in Fig. 9. The resultant geostrophic velocity anomalies act in different directions in the two gyres and are particularly evident in the subtropical gyre. The SSH 0m contour is also seen to be shifted to a higher latitude. However, this is a marginal effect as shown by the contours in Fig. 8’s panel b (less than a degree in latitude, for an initial perturbation with double the magnitude of that shown in Fig. 7a). The evolution of the negative optimal initial conditions, shown in Fig. 8c, d, demonstrates an increase in SSH along the US east coast North of Cape Hatteras, as well as a southward shifted Gulf Stream detachment point. The SSH gradient across the Gulf Stream is also lower, indicating a decrease in Gulf Stream transport. Panel d shows that the SSH 0m contour’s position can move significantly southward (approximately \(5{^{\circ }}\)  in latitude, for an initial perturbation with double the magnitude of that shown in Fig. 7a and that the subtropical gyre contracts to the west of the basin. The initial conditions associated with timescales (\(\tau \)) ranging from 10 to 30 months are also calculated and compared to the optimal calculated at the maximum amplification time. The spatial correlation between these initial patterns are found to be at least 0.8, and the patterns behave in a qualitatively similar manner when propagated in time. The optimal initial conditions, calculated from similar models with differing numbers of EOFs, exhibited small-scale (\(1/2{^{\circ }}\)) spatial differences in the Gulf Stream’s extension, however both the signal along the US east coast and the tripolar pattern appear robust. Moreover, the propagated optimals all resemble that shown in Fig. 7d.

4.2 Optimal initial conditions occurring in the model output

It is important to determine how often the optimal initial conditions and their evolved patterns are realised in the model output. Figure 10a, shows the projections of the initial states on the model output (at each time t the projection is the product of the \(\mathbf P (t)\) and the optimal initial condition, i.e. the leading singular vector of B), as well as the projections of the evolved initial conditions 20 months later. The growth in these projections is seen to be close to that predicted by the maximum amplification curve (Fig. 7a). The occurrences of the the tripolar SSH pattern, seen in the optimal initial condition, are detected by using an algorithm which calculates the 2D correlation coefficients between the monthly mean SSH anomalies and optimal initial condition in the relevant area (the green ellipse in Fig.  7e). Spatial correlations which are greater than 0.8 are retained. Out of the 1800 monthly-mean SSH anomalies comprising the model output, 404 are found to display a tripolar anomaly structure off Cape Hatteras. After 15–20 months from those 404 tripolar anomaly patterns, 310 ( 77%) lead to SSH anomaly growth along the US east coast (as in Fig. 7f, green ellipse). This growth is detected by evaluating the change in sign of the SSH anomalies (in the the green ellipse in Fig.  7f). About 140 (out of 310) also display a change in sign of SSH along the coast (as in Fig. 7f, black ellipse). The occurrence of these optimal evolutions in the model output indicates that changes in the SSH in the Gulf Stream near its detachment point are potentially important in predicting SSH variations along the US east coast.

Fig. 10
figure 10

Projections of the initial and final states on the model output. The black line is a linear fit to these projections, and the red line is a line with a slope corresponding to \(\mu ^{1/2}\) (\(\tau = 20\) months)

4.3 Average predictability time

We complement the analysis of the optimal patterns, which depends on the target timescale, by examining predictable patterns that persist over all timescales, and are therefore the most predictable over a range of target times (DelSole and Tippett 2007). This is done by calculating the average predictability time (APT) (DelSole and Tippett 2009a). This index of predictability is based on the Mahalanobis signal (DelSole and Tippett 2007),

$$\begin{aligned} S(\tau ) = \frac{1}{k} tr[({\varvec{\varSigma }}_\infty - {\varvec{\varSigma }}_\tau ) {\varvec{\varSigma }}_\infty ^{-1}], \end{aligned}$$
(12)

where k is a constant related to the number of principal components used in the analysis, tr is the trace of the matrix, \({\varvec{\varSigma }}_\tau \) is the covariance matrix of the forecast error at lead time \(\tau \), \({\varvec{\varSigma }}_\infty \) is covariance matrix of the forecast distribution at long lead times. Here, \(S(\tau )\) has a value of 1 when the system is completely predictable, and a value of 0 when the forecast covariance matrix is the same as the climatological covariance matrix, meaning the system is unpredictable. This method has been used before to examine the predictability of several geophysical fields, including the upper ocean temperature and the AMOC (Branstator et al. 2012; Branstator and Teng 2014).

The APT can be defined by integrating the Mahalanobis signal over all lead times (DelSole and Tippett 2009a), leading to

$$\begin{aligned} APT=2 \sum _{\tau =1}^{\infty } S(\tau ). \end{aligned}$$
(13)

The factor of two makes APT agree with the e-folding time in the univariate case. In one dimension, APT resembles a root mean square error and is given by DelSole and Tippett (2009b)

$$\begin{aligned} APT=2\sum _{\tau =1}^{\infty } \frac{\sigma _{\infty }^{2}-\sigma _{\tau }^{2}}{\sigma _{\infty }^{2}} =2\sum _{\tau =1}^{\infty } \bigg ( 1-\frac{\sigma _{\tau }^{2}}{\sigma _{\infty }^{2}} \bigg ). \end{aligned}$$
(14)

Since APT is the integral of predictability over all times, it is independent of the chosen lead time. This measure can also be used to define predictable components by finding the projection vectors q that maximize APT. In which case, the component \({\mathbf{q }^{\mathbf{T }}\mathbf{P }}\), with \(\mathbf {P}\) being the principal component state vector, has forecast and climatological variances given by \( \sigma _{\tau }^{2}=\mathbf {q}^{\mathbf {T}} {\varvec{\varSigma }}_{\tau }\mathbf {q}\) and \(\sigma _{\infty }^{2}=\mathbf {q}^{\mathbf {T}} {\varvec{\varSigma }}_{\infty }\mathbf {q}\), respectively.

In this study, the APT of the whole system and of the leading predictable components are calculated using the method contained in DelSole and Tippett (2009b). Firstly, to prevent overfitting, the data is separated into training and verification data sets, in the same manner as described previously. Forecasts are then generated by forming linear regression models from the training data (DelSole and Tippett 2009b), i.e. the projections \(\hat{\mathbf {P_L}}(t+\tau )\), are given by

$$\begin{aligned} \hat{\mathbf {P_L}}(t+\tau )=\mathbf {C}(\tau )\mathbf {C}(0)^{-1}\mathbf {P}(t). \end{aligned}$$
(15)

Using such models and in the case of a zero-mean stationary process, meaning \(\mathbf {C}(0)={\varvec{\varSigma _{\infty }}}\), the forecast error covariance matrix is given by

$$\begin{aligned} {\varvec{\varSigma }}_\tau =\mathbf {C}(0)- \mathbf {C(\tau )C}(0)^{-1}\mathbf {C(\tau )}^{T}. \end{aligned}$$
(16)

These values for \({\varvec{\varSigma }}_\infty \) and \({\varvec{\varSigma }}_\tau \) can be substituted into Eq. 13 to calculate the APT of the entire system. In order to maximize APT in Eq. 14, the problem reduces to solving the generalized eigenvalue problem (See DelSole and Tippett (2009b) for a full derivation),

$$\begin{aligned} \mathbf {Gq}=\lambda \mathbf {C}(0)\mathbf {q} \end{aligned}$$
(17)

where

$$\begin{aligned} \mathbf {G}=\sum _{\tau =1}^{\infty } \mathbf {C}(\tau )\mathbf {C}(0)^{-1}\mathbf {C}(\tau )^{T}. \end{aligned}$$
(18)

The projection vectors \(\mathbf {q}\) are uncorrelated with each other because G and \({\varvec{\varSigma _{\infty }}}\) are symmetric. The spatial patterns, \(\mathbf {p}\), associated with the projection vectors \(\mathbf {q}\) are found by using

$$\begin{aligned} \mathbf{p}={\langle \mathbf{PP }^{\mathbf{T }}\mathbf{q }\rangle } =\varvec{\varSigma }_\infty \mathbf{q}; \end{aligned}$$
(19)

these spatial patterns can be projected back onto the EOFs and are referred to as the predictable components. The predictable components of the system are calculated using only the training data set. To prevent overfitting and calculate APT of each predictable component, the projection vector, \(\mathbf {q}\), calculated from the training data is applied to verification data set. Thus, the squared multiple correlation between the component time series and the verification data is

$$\begin{aligned} \mathbf {R}_{\tau }^2=\frac{\mathbf {q}^{T}\mathbf {C}(\tau )\mathbf {C}(0)^{-1}\mathbf{C}(\tau )^{T}\mathbf {q}}{\mathbf {q}^{T}\mathbf {C}(0)\mathbf {q}}, \end{aligned}$$
(20)

where \(\mathbf{q}\) is calculated from the training data set and the correlations are calculated from the verification set. Therefore, \(\mathbf {R}_{\tau }^2\) can be interpreted as the variance of the predictable component time series, which is explained by a linear regression prediction at time lag \(\tau \). The predictability time of each component, \(APT_p\), is then calculated as,

$$\begin{aligned} APT_p=2\sum _{\tau =1}^{\infty } \mathbf {R}_{\tau }^2. \end{aligned}$$
(21)
Fig. 11
figure 11

a Predictability measured by the Mahalanobis signal (\(S_{\tau }\) as defined in Fig. 12) of the whole system (blue line), created using linear regression models and the leading 25 principal components. The envelope denotes the spread in predictability measured by the Mahalanobis signal (\(S_{\tau }\)) for each of the leading predictable components. The solid black line is representative of 5% significance level calculated using a student’s t-test. b The average predictability time (APT) for the first 25 predictable components. The orange line indicates the 5 % significance level estimated using the Monte Carlo method discussed in the Appendix. The spatial patterns of the c first, d second and e third predictable components (\(\mathbf {p}\)), ranked in order of their values of APT. The associated time series for the f first, g second and h third predictable components (\(\mathbf {q}^{\mathbf {T}}\mathbf {P}\))

Figure 11a shows that the predictability of the SSH of the whole system, measured by the Mahalanobis signal (solid blue line), diminishes rapidly, reaching a value of approximately 0.25 after 10 months. This is in approximate agreement with the timescale found in the LIM study. However, Fig. 11a (blue shaded region) also shows that several of the individual components of the system have Mahalanobis signals which decay on longer timescales. The APT of the leading 25 predictable components shown in panel b, confirms that several components demonstrate predictability on timescales longer than 2 years. The three leading predictable components (those which have the largest values of APT) have average predictability times of 26–28 months. The corresponding spatial patterns of the leading three components are shown in panels c, d, and e. The pattern which is associated with the largest value of APT (panel c) has large SSH magnitudes in the jet extension region. The second pattern (panel d) is localised mainly to the US east coast and Gulf Stream extension region, whereas, the third pattern (panel e) is similar to the evolved optimal initial conditions (Fig. 7), and EOF1. The time series related to each of these spatial patterns, shown in panels f, g and h, all display interannual variability and have autocorrelation times of 30–60 months.

The similarities between the third leading predictable component, the evolved optimal initial conditions, and EOF1 indicate some robustness of the constructed predictability patterns. The time series associated with the third predictable component correlates strongly with the leading principal component at zero lag. The leading predictable patterns (1 and 2) are not merely EOF1, highlighting that the mode capturing most of the variance is not necessarily the most predictable.

4.4 The influence of atmospheric forcings on the predictable components

The steric component dominates interannual variability in SSH in the North Atlantic. Roberts et al. (2016) confirmed that this is also the case in an ocean only component of HadGEM3 (forced NEMO simulation) and that it is the thermosteric and wind-driven components which contribute most to the interannual variability in the subtropical gyre. In the subpolar gyre, the variability is caused by both the thermosteric and halosteric components and is dominated by the response of the ocean to variations in the buoyancy forcings. Furthermore, in the Gulf Stream region, the variations due to intrinsic ocean processes are also important (Penduff et al. 2011; Sérazin et al. 2015).

Attempts are now made to establish the dynamical origin of the predictable patterns by examining their relationships with fields relating to the wind and buoyancy driven circulations, namely the Ekman components of SSH and the net heat fluxes. The interannual variability detected is likely related to the oceanic adjustment to variations in these forcings. The net heat fluxes contribute to the thermosteric buoyancy forced component of SSH, and the wind stresses contribute to the steric advective components.

The SSH, meridional and zonal wind stress fields are used to decompose the ocean currents into the associated geostrophic \(\mathbf {u_g}=(u_g,v_g)\) and Ekman components \(\mathbf {u_e}=(u_e,v_e)\),

$$\begin{aligned} \mathbf {u}=\mathbf {u_{g}}+\mathbf {u_{e}}. \end{aligned}$$
(22)

The Ekman components are calculated from,

$$\begin{aligned} v_{e} =-\frac{\tau ^{x}_{s}}{f\rho _{0}d_{Ek}} \quad \text {and}\quad u_{e} =\frac{\tau ^{y}_{s}}{f\rho _{0}d_{Ek}}, \end{aligned}$$
(23)

where f is the Coriolis parameter, \(\tau ^{x}_{s}\) and \(\tau ^{y}_{s}\) are the zonal and meridional components of the wind stress, \({\varvec{\tau _s}}\), at the ocean’s surface. The density, \(\rho _{0}\), and the Ekman depth, \(d_{Ek}\), are taken to be constants of 1025 kg/m3 (a typical value at the surface of the North Atlantic (Wang et al. 2010)) and 100m (a typical value in the Subtropical gyre in the winter (Stommel 1979)), as most of the variability in the Ekman velocity component is due to variations in the wind stress. The Ekman pumping velocity is also calculated as

$$\begin{aligned} w_{e} = \frac{1}{f\rho _{0}}(\nabla \times {\varvec{\tau _s}}), \end{aligned}$$
(24)

and the geostrophic currents are calculated as

$$\begin{aligned} v_{g} =\frac{g}{af cos \phi }\frac{\partial \eta }{\partial \lambda } \quad \text {and}\quad u_{g} =-\frac{g}{af}\frac{\partial \eta }{\partial \phi }, \end{aligned}$$
(25)

where \(\eta \) is the monthly mean SSH, a is the radius of Earth, \(\phi \) is latitude, \(\lambda \) is longitude and g is gravity.

These fields and the net heat fluxes are then regressed against the normalized time series of the three leading predictable components (Fig. 11). The fields are smoothed with a 6-month running mean before regression, to focus on the interannual variability. The regression coefficients of the geostrophic currents and the first predictable component are shown in Fig. 12. The regression coefficients with SSH are also shown as contours. These show a westward propagation of SSH in the subtropical gyre. The meridional geostrophic velocities have large regression coefficients with the leading predictable component, in the subpolar gyre along the Canadian east coast. At times where the current is leading the predictable component time series, there is also a positive signal at the Gulf Stream’s detachment point. The lack of a clear lead–lag relationship here makes causality hard to distinguish. However, from these strong correlations, it is apparent that there is significant interannual predictability present in the western boundary current in the subpolar gyre. There is a lack of any apparent changes in the large-scale patterns of the zonal geostrophic current regression coefficients. These coefficients are large in the subtropical gyre and in the Gulf Stream region. The regression coefficients relating to the net heat fluxes have a sizeable dipolar pattern in the Gulf Stream region at all times. However, in the east of the Subpolar gyre, a strong signal appears at lead times of 15 months.

The regression coefficients of the leading predictable component with the Ekman currents are shown in Fig. 13. There is an apparent time-lagged relationship present, with variations in the Ekman currents leading the predictable component strongly on timescales of up to 15 months. The Ekman currents are associated with the large-scale Sverdrup transport, within the wind-driven gyres. These regression coefficients imply that wind-driven variations in the gyre circulations lead to a predictable change in SSH on interannual timescales. The associated changes in gyre scale variations of the Ekman currents translate to variations in SSH in the Gulf Stream extension and subpolar gyre regions. This result is also indicative that interannual forecasts of SSH can be improved by better representing the zonal and meridional wind stress fields, on longer than monthly timescales.

It is difficult to discern anything about the variability of the time-lagged geostrophic regression coefficients and predictable component 2 (the figures relating to the regression analysis for the second and third predictable components are contained in the supplementary information). However, there is a signal in the regression coefficients of the net heat fluxes which leads the predictable component by 8–15 months. This signal is located in the Gulf Stream extension region. The regression coefficients relating to the zonal and meridional Ekman components also strongly lead the predictable component and are related to changes in the wind stress in the east of the North Atlantic and the subpolar gyre.

The third predictable component’s regression with the Ekman components demonstrate a clear lead–lag relationship. The Ekman components are seen to lead variations in the predictable component. The patterns of the regression coefficients are large scale and sizeable in the west of the basin. The meridional components of the Ekman currents can be interpreted as causing a convergence or divergence of SSH in the Gulf Stream region as the gyres react to changes in the wind stress. There are also variations in the net heat fluxes in the subpolar gyre, which lead a change in the predictable component.

Therefore it is concluded that predictable component one is largely a response to variations in the wind at the latitudes of the Gulf Stream. Predictable component two is likely due to variations in both the net heat fluxes and the wind stress in the subpolar gyre and in the Gulf Stream extension region. Finally, predictable component three is likely due to the oceanic adjustment resulting from a combination of both variations in the wind stress in the subpolar gyre and east of the ocean basin, and the response to variations in the net heat fluxes in the subpolar gyre. All three patterns show that changes in the atmospheric forcings lead large-scale predictable patterns of SSH. Even though the variations in the wind stress and net heat fluxes are unpredictable at times longer than a few months, the oceans adjustment to them appears to be predictable on timescales of approximately 1–2 years. However, further experiments and analysis are needed to determine how the dynamical processes present generate the diagnosed predictable components.

Fig. 12
figure 12

Time lagged linear regression coefficients between the time series of the first predictable component and monthly mean zonal and meridional geostrophic currents, created using 150 years of control run output. Only regression coefficients with p values of < 0.05 are retained. Units of the regression are m/s for those involving the geostrophic components and \({\text {W/m}}^{2}\) for those investigating the net heat fluxes. The contours show the time-lagged regression coefficients of the predictable component and SSH at 0.1 m intervals. The maximum regression coefficient magnitude is 0.3 m. The green contours denote positive regression coefficients, whereas, the purple contours denote negative coefficients

Fig. 13
figure 13

Time lagged linear regression coefficients between the time series of the first predictable component and monthly mean zonal and meridional Ekman currents and the Ekman pumping/suction velocity, created using 150 years of control run output. Only regression coefficients with p values of \(< 0.05\) are retained. Units of the regression are m/s. The contours show the time-lagged regression coefficients of the predictable component and SSH at 0.1 m intervals. The maximum regression coefficient magnitude is 0.3 m. The green contours denote positive regression coefficients, whereas, the purple contours denote negative coefficients

5 Summary and discussion

The predictability of SSH in the North Atlantic in a control run of a fully coupled model (HadGEM3) was evaluated using methods based on linear inverse modeling and average predictability time. The key findings from this study include:

  • predictability of SSH in the subpolar gyre and along the west coast of the Atlantic basin on timescales of up to 20 months (using LIM).

  • Short predictability times in the Gulf Stream extension region (5–10 months).

  • Optimal initial conditions resulting in regional SSH changes in the subpolar and subtropical gyres and a change in SSH gradient along the Gulf Stream’s extension over a timescales of approximately 20 months. The optimals consist of a weak large-scale SSH tripole, with a stronger signal at the Gulf Stream’s detachment point.

  • large-scale predictable patterns on timescales of 26–28 months, characterized by SSH variations of order 5–10 cm along the US east coast, extending to the gyre scale, which are predictable on timescales of 26–28 months. These predictable patterns are calculated using a complementary method to LIM, namely APT, which is independent of target time.

  • these predictable components correlate significantly with persistent, large-scale, evolving features in SSH, and which appear to be induced by wind and heat flux forcing in the preceding 8–15 months.

To our knowledge, this is the first analysis of SSH using such a comprehensive set of linear predictability methods. These linear methods provide a computationally inexpensive alternative to ensemble modelling techniques (Hawkins and Sutton 2009). There is an expected trade-off between a linear approximation of the dynamical system and computational savings. However, the use of temporal filtering or averaging appears to improve interannual predictions, most likely because the direct influence on the variability from the strongly nonlinear ocean mesoscale is removed. For example, optimal initial conditions of SSH identified by the non-normal mode analysis, when pattern matched in the full, nonlinear forward model, evolve at 15–20 months as predicted by the linear method for almost 80 % of events.

The interannual ocean variability in mid-latitude jet extensions is dominated by the intrinsic component (Sérazin et al. 2015). Our study shows that the variability in the Gulf Stream extension, is not generally predictable on interannual timescales. As the predictability in SSH in the turbulent jet extension regions is limited to less than order 5–10 months (in agreement with Nonaka et al. (2016) and Roberts et al. (2016)). However, in the subpolar gyre and in areas of the subtropical gyre significant predictive skill was found on timescales over a year. As, predictability in SSH and ocean dynamics might be enhanced via atmospheric forcing integrated over large-scale regions (Cabanes et al. 2006). In addition, SSH predictability along the US east coast might be influenced by the position and strength of the Gulf Stream. The timescales and patterns of predictability of SSH in the North Atlantic derived from statistical forecasts trained on model output are comparable to those in Roberts et al. (2016), diagnosed using multiple runs of a fully dynamical model indicating that the results are robust to the method chosen.

The maximum amplification of the optimal initial conditions occurs 20 months after initialisation, which can be used as a predictability timescale and results in several different effects. Firstly, the amplification can lead to a doubling in the magnitude of the initial SSH anomalies in the Gulf Stream region (e.g., an optimal initial condition perturbation with such a tripolar structure and amplitude 3 cm can propagate to give anomalies of 6 cm along the Gulf Stream path). Secondly, the amplification acts to increase (or decrease, depending on the sign) the SSH gradient across the Gulf Stream, leading to a geostrophic velocity anomaly of the order of 10 cm/s (for an optimal perturbation, P, with pattern and magnitude as in Fig. 7b). Moreover, these changes can lead to a meridional shift of an SSH contour of 0 m by several degrees in latitude (a southward shift of approximately \(5{^{\circ }}\) for a perturbation \(-2 P\); see also Fig.  8b, d). The optimal perturbation, P, results in a 5 cm increase in SSH along the US east coast at latitudes of \(30{^{\circ }}\)\(40{^{\circ }} \hbox {N}\) (5 cm decrease for a perturbation \(-P\)). This change is large compared to the recent (1993–2009) rate of global mean SSH rise from satellite altimetry, which is \(3.2\pm 0.4\, {\text {mm}}\, {\text {year}}^{-1}\) (Church and White 2011).

In order to investigate the dynamical evolution of the optimal initial condition the climate model could be restarted with the optimal initial conditions, and with parts of the optimal initial conditions masked (however, restarting the climate model with optimal initial conditions would require multivariate 3D restarts). Such experiments would help examine the role of the oceanic dynamical processes which lead to the growth of the initial conditions, and the relative importance of atmospheric noise and model error.

The optimal initial conditions also have implications for observations. The initial conditions found are indicative that to better constrain interannual predictions of SSH, in the North Atlantic, it would be beneficial to incorporate a higher number of ocean observations (SSH, temperature and velocity fields) in the region near to the Gulf Stream’s detachment point. This area has already been the subject of many observational studies (e.g., Line W, Toole et al. (2011)) and is well observed by altimetry (see Lillibridge and Mariano (2013) and references within). Furthermore, initialisation of GCM ensemble simulations with the optimal initial conditions could provide a better estimate of initial condition uncertainty in SSH prediction (Zanna et al. 2018).

This study did not entirely decouple the effects of applying interannual external forcings from the intrinsic variability due to the mesoscale eddy fields. It also made use of a single model, and therefore the optimal initial conditions presented may be model specific. It would therefore be beneficial to examine the SSH predictability with a more extensive ensemble of model simulations, including those which isolate the effects of intrinsic processes (e.g., Sérazin et al. (2015) and Zanna et al. (2018)). Moreover, a comparison with altimetry or higher resolution model output may further elucidate the effects of the eddy field on interannual variability. It would be interesting to assess how a change in spatial resolution affect the diagnosed optimal initial conditions. Such a resolution change might impact the rectification and behavior of the jets, and therefore the diagnosed mode of Gulf Stream variability. Finally, a series of idealised simulations, with selective timescales of the wind and buoyancy forcings, may aid in explaining the dynamical origin of the predictable components. Such ensembles already exist for a range of applications (Gregory et al. 2016; Roberts et al. 2016; Meyssignac et al. 2017). Alternatively, a probabilistic approach as described by Bessières et al. (2017) could be used to disentangle the forced and intrinsic variability components, thus, better explaining the dynamical origin of the predictable patterns.