1 Introduction

Simulations performed with Ocean–Atmosphere General Circulation Models (OAGCM) under the Climate Model Intercomparison Project, phase 5 (CMIP5) (Taylor et al. 2012), provide climate predictions for the upcoming 100 years (the so-called radiative concentration pathways emission scenarios). However, CMIP simulations suffer from severe limitations in predicting climate at a short-time horizon (< 10 years), as highlighted by the “hiatus” in global-mean surface temperature rise (Watanabe et al. 2013; Kosaka and Xie 2013; Meehl et al. 2014). To fill this gap, initialized near-term (or decadal) prediction systems have been developed (Meehl et al. 2009) to provide better predictions than uninitialized simulations (Bellucci et al. 2013; Karspeck et al. 2015) on seasonal-to-decadal timescales. Decadal prediction systems are initialized frequently (generally from every year to every 5 years) to improve the simulation of internal variability. The added value of the initialization stands out over the North Atlantic Ocean where an accurate initialization of the Atlantic Meridional Overturning Circulation (AMOC) is thought to allow Atlantic Multidecadal Variability (AMV) to be predicted years in advance (Griffies and Bryan 1997; Boer 2000; Collins et al. 2006; Pohlmann et al. 2013). Thus, near-term climate prediction systems are useful tools to provide predictions at interannual-to-decadal timescales, which are helpful to calibrate plans and actions related to climatic events due to climate variability and change (Hibbard et al. 2007; Cox and Stephenson 2007).

An additional motivation for using initialized decadal predictions comes from various studies that have successfully predicted regional climate on decadal time scales (Smith et al. 2007; Keenlyside et al. 2008; Pohlmann et al. 2009; Mochizuki et al. 2012; Bellucci et al. 2013, 2015; Chikamoto et al. 2013; Doblas-Reyes et al. 2013; Karspeck et al. 2015; García-Serrano et al. 2015; Monerie et al. 2017). The skill in predicting climate is primarily determined by the external forcing (i.e. the increase in the atmospheric well-mixed greenhouse gas concentration), especially for lead-times greater than 2 years (van Oldenborgh et al. 2012). Major volcanic events induce an abrupt cooling through the ejected volcanic aerosols (Guemas et al. 2013; Mehta et al. 2013) and are also a source of additional skill in retrospectively predicting temperature (Timmreck et al. 2016). The skill found in hindcasts owing to volcanic eruptions is however artificial since these events cannot be predicted, as highlighted by Meehl et al. (2009). The second source of skill is due to the Ocean initialization, that can lead to additional skill, especially for the first years of the hindcasts (Yeager et al. 2012; Robson et al. 2012; Matei et al. 2012; Chikamoto et al. 2013; Doblas-Reyes et al. 2013; Monerie et al. 2017).

The skill in predicting SAT over land is generally weaker than for the SSTs (as seen in Bellucci et al. 2013; among others) and oceanic heat content (Chikamoto et al. 2013) because SAT changes are associated with stochastic perturbations of the atmosphere while SST is associated with the higher inertia of the Ocean. However, predicting SAT and precipitation over land is of particular interest for decision makers. Besides the impact of external forcings, evidence suggests that skill in predicting precipitation and SAT years ahead is linked to the ability of a climate model to simulate the interdecadal evolution of the SSTs and their associated remote impacts over land. For instance, high skill in predicting the West African Monsoon (WAM) evolution can be found due to a high skill in predicting the AMV (Gaetani and Mohino 2013; Mohino et al. 2016; Sheen et al. 2017). More precisely Sheen et al. (2017) have shown that the DePreSys3 prediction system is able to predict the multidecadal evolution of the WAM, including the large drought of the 70s and 80s and the recovery of the 90s, due to the SST change over the North Atlantic Ocean and the Mediterranean Sea. The skill is thereby due to the ability of models to simulate the remote impact of the SSTs on a decadal time-scale. Such a mechanism implies predicting precipitation and SAT over land owing to the potential predictability of the North Atlantic Ocean temperature.

SAT warmed substantially in North East Asia (NEA) during the mid-1990s, and more strongly than over the surrounding regions (Chen and Lu 2014), along with a decrease in precipitation (Zhu et al. 2011; Qi and Wang 2012; Chen and Lu 2014). The warming was about 1 °C in summer, which is larger than the standard deviation of summer SAT over NEA (0.6 °C), implying a strong signal-to-noise ratio and is therefore an interesting case study to test the ability of a prediction system to predict SAT over land. The warming over NEA has been linked to a circulation change; In particular, a positive anomaly of upper-level geopotential height (Zhu et al. 2011, 2012; Chen and Lu 2014). The change in atmospheric circulation is thought to have been influenced by remote SSTs. For example, Wu et al. (2016a, b) and Lin et al. (2016) have highlighted that the north Atlantic Ocean can influence East Asian climate through the modulation of the atmosphere by a Rossby wave that propagates eastward along the northern Hemisphere, leading to a SAT change over East Asia. This mechanism is called the circumglobal teleconnection (CGT) pattern and is one of the two dominant modes of the North Hemisphere extratropical upper-tropospheric atmospheric variability (Ding and Wang 2005). At the interannual time-scale the CGT pattern is associated with the El-Niño Southern Oscillation and the Indian summer monsoon (Ding and Wang 2005; Lin 2009; Ding et al. 2011) while at the interdecadal time-scale the CGT pattern is associated with the North Atlantic Ocean (Wu et al. 2016a; Lin et al. 2016). The wave-like structure of the CGT pattern is maintained by extratropical atmospheric internal dynamics (Yasui and Watanabe 2009; Ding et al. 2011). This mechanism may be very relevant for the recent NEA warming since Qian et al. (2014) have reported that the influence of the AMV on northern China has become stronger since the mid-1990s.

There is also evidence that the recent negative phase of the Pacific Decadal Oscillation (PDO) is associated with changes over eastern Asia (Zhu et al. 2011). Therefore, variability in ocean SST appears of particular importance, as highlighted by Dong et al. (2016), which found that 76% of the warming signal in model simulations is explained by SST/sea-ice extent changes, while 24% of the warming is due to increased concentration in greenhouse gases (GHG) and changes in anthropogenic aerosol emissions.

Other regional changes can also lead to changes of the atmospheric circulation over NEA. For example, Wu et al. (2010) showed that snow cover changes over the Tibetan Plateau contributed to recent changes over NEA by altering the contrast in temperature between the plateau and surrounding regions and, hence, leading to changes in sea-level pressure and winds. In addition Kwon et al. (2007) pointed out that a recent increased number of typhoons over Southern China has contributed to the upper-level circulation change through diabatic heating of the atmosphere. Precipitation has indeed changed following a shift to a dipole in precipitation anomalies: the South-Flood-North-Drought (SFND) pattern, which has emerged particularly strongly since the mid-90s (Kwon et al. 2007; Qian et al. 2014; Ueda et al. 2015; Han et al. 2015).

Among the aforementioned changes, potential predictability of the NEA’s climate is underlined by (1) the change in external forcings (which are prescribed in near-term climate predictions), and (2) changes in SSTs via the remote atmospheric teleconnections. The main aim of this study is to assess the skill of state-of-the-art climate predictions over Asia, with a focus on the SAT and precipitation, and to understand the source of the predictive skill. The structure of the paper is summarized as follows: Sect. 2 describes the data used and the decadal prediction system. In Sect. 3 the skill in predicting surface-air temperature is presented and its source investigated. A discussion is given in Sect. 4 and a conclusion in Sect. 5.

2 Data and method

2.1 DePreSys3

We use the 3rd version of the UK Met Office Decadal Prediction System (DePreSys3; Dunstone et al. 2016). DePreSys3 is based on the Hadley Centre Global Environment Model version 3, global coupled configuration v2 (HadGEM3-GC2; Williams et al. 2015). The atmosphere model is the Global Atmospheric version 6.0 of the Met Office Unified Model and is run at the N216 resolution (~ 60 km in mid-latitudes) with 85 vertical levels ensuring a resolved stratosphere. The Ocean model is the Global Ocean version 5.0 (Megann et al. 2014), which is based on version 3.4 of the Nucleus for European Models of the Ocean model (NEMO; Madec 2008). The ocean is run at a quarter degree resolution using the NEMO tri-polar grid with 75 vertical levels (the ORCA025L75 grid; Bernard et al. 2006). The land surface model is version 6.0 Global Land version of the Joint UK Land Environment Simulator (JULES; Best et al. 2011) and the sea-ice models is CICE version 4.1 (Hunke and Lipscomb 2004) from the United States Los Alamos National Laboratory. These models are coupled with OASIS3 (Valcke 2013). More relevant details on the UM-JULES and NEMO-CICE coupling are given in Walters et al. (2014) and Megann et al. (2014) respectively.

DePreSys3 is full-field initialized by relaxing a coupled integration of HadGEM3-GC2 towards gridded observations. Ocean temperature and salinity are relaxed toward the Met Office global statistical ocean reanalysis (MOSORA; Smith and Murphy 2007; Smith et al. 2015) with a 10-day relaxation timescale. The sea-ice concentration is taken from HadISST sea-ice concentration (Rayner et al. 2003) with a one day relaxation timescale. The atmosphere model is initialized from and ERA-interim (Dee et al. 2011) atmospheric temperature and winds, using a 6-h relaxation timescale.

Hindcasts are forced by the historical evolution of external forcings (GHG, aerosols, ozone, solar, radiation and volcanoes) and follow RCP4.5 after 2005 as in the CMIP5 protocol (Taylor et al. 2012). Ten ensemble member hindcasts are started every 2–3 years between 1960 and 2008, and every year from 2009 to 2014 with a total of 26 start-dates (1960, 1962, 1965, 1968, 1970, 1972, 1975, 1978, 1980, 1982, 1985, 1988, 1990, 1992, 1995, 1998, 2000, 2002, 2005, 2008, 2009, 2010, 2011, 2012, 2013, 2014), covering the 1960–2014 period. Each hindcast lasts for 5 years. Members are initialized on the 1st November and the ten members are generated using different seeds to a stochastic physics scheme (MacLachlan et al. 2015).

2.2 Observations/reanalysis

To evaluate model skill we selected the National Centers for Environmental Prediction (NCEP) (R-2) reanalysis (hereafter referred to as NCEP), which is more accurate than the NCEP (R-1) reanalysis by the removing of several errors (Kanamitsu et al. 2002). This reanalysis offers a 2.5° resolution (144 longitude grid points and 72 latitude grid points) with 17 altitude levels. The selected variables are air temperature, wind, specific humidity, geopotential height and surface air temperature (2 m up to the ground). NCEP spans a long-period (1948 to present) allowing the evaluation of the skill of using all DePreSys3 hindcasts.

For precipitation we used the Global Precipitation Climatology Center (GPCC) version v7 (Schneider et al. 2014), available from 1901 to 2013 on a 0.5° × 0.5° longitude on a global grid (720 × 180). The full data product v7 is based on quality-controlled data from 67,200 stations world-wide.

2.3 Bias adjustment

Climate models do not perfectly simulate the observed climate. When initialized with observations, models drift toward their preferred imperfect climatology, which can lead to biases in the forecasts. Therefore, the drift has to be removed before assessing the ability of a prediction system to simulate climate. Here we used the standard procedure following the World Climate Research Program recommendations (ICPO 2011) to remove the drift, a posteriori and in a linear way: the drift \(\left(dr\left(\tau \right)\right)\) is computed as the difference between the average over all members (i) and start-date (j) of DePreSys3 \({(Y}_{j}^{i}\left(\tau \right))\), minus the corresponding observations/reanalysis \({(X}_{j}\left(\tau \right))\), for each lead-time \(\left(\tau \right)\), i.e. \(dr\left(\tau \right)= \frac{1}{nm}\sum _{j=1}^{n}\sum _{i=1}^{m}{Y}_{j}^{i}\left(\tau \right)-\frac{1}{n}\sum _{j=1}^{n}{X}_{j}\left(\tau \right)\) for \(n\) start-dates and \(m\) members.

The drift is thereby only dependent on the lead-time and we assume the drift to be independent of the start-date. Although some evidence highlights that the drift can be non-stationary and the result may be sensitive to this methodology (in the case where there is strong non-stationary drifts it could be preferable to use the method developed in Kruschke et al. 2016), we assume here the ICPO method to be a reliable bias adjustment method. The drift is then removed from the hindcasts at each lead-time. We estimate the drift in surface-air temperature, geopotential height and wind from NCEP, and in precipitation from GPCC. A mean surface air temperature bias, and the drift of SAT in several areas (global mean, subpolar gyre, North East Asia, North Atlantic) are shown in Figure S1.

2.4 Evaluation of hindcast skill

We evaluate the skill of DePreSys3 to predict climate with the Anomaly Correlation Coefficient (ACC), r, given by

$$r\left(\tau \right)=\frac{{\sum }_{j=1}^{n}\left[{X}_{j}\left(\tau \right){Y}_{j}\left(\tau \right)\right]}{\sqrt{\sum _{j=1}^{n}{\left[{X}_{j}\left(\tau \right)\right]}^{2}\sum _{j=1}^{n}{\left[{Y}_{j}\left(\tau \right)\right]}^{2}}}$$

where \({Y}_{j}\) is the ensemble mean anomaly for the jth hindcast starting in year j; \({X}_{j}\) is the observation anomaly for the corresponding starting date j. τ is the lead-time and n the number of start dates. Anomalies for \({X}_{j}\) and \({Y}_{j}\) are calculated independently to have a zero mean over the hindcast period.

The significance of the ACC is estimated through re-sampling (5000 permutations) in a Monte Carlo framework. For each grid points, DePreSys3 times-series are randomly re-sampled and the correlations between DePreSys3 and NCEP are recalculated. Synthetic times-series are reconstructed using successive randomly selected 3 year periods until the size of the original time-series (the number of start dates) is reached, to preserve the multi-annual variability. Obtained correlations follow a Gaussian distribution. Correlations are then judged significant at the 5% level when stronger than 97.5% of the randomly obtained correlation values. The persistence is computed using reanalysis from the years of and before the model initialization (on a 1st of November). The n-year persistence is computed based on the observed values in the n years prior to the start date. We computed a 1 year and a 4 year persistence.

2.5 Wave activity flux

The propagation features of the Rossby wave are analyzed by computing the wave activity flux, following the formulation by Takaya and Nakamura (2001):

$$W=\frac{1}{{2\left| {\bar {{\varvec{u}}}} \right|}}\left\{ {\begin{array}{*{20}{c}} {\bar {u}\left( {\psi _{x}^{{\prime 2}} - {\psi ^\prime }\psi _{{xx}}^{\prime }} \right)+~\bar {v}\left( {\psi _{x}^{\prime }\psi _{y}^{\prime } - {\psi ^\prime }\psi _{{xy}}^{\prime }} \right)} \\ {\bar {u}\left( {\psi _{x}^{\prime }\psi _{y}^{\prime } - {\psi ^\prime }\psi _{{xy}}^{\prime }} \right)+~\bar {v}\left( {\psi _{y}^{{\prime 2}} - {\psi ^\prime }\psi _{{yy}}^{\prime }} \right)} \end{array}} \right\}$$

where, \(u\) is the zonal wind velocity, \(v\) is the meridional wind velocity, subscripts x and y represent zonal and meridional gradients. \(\varvec{ }\varvec{u}\) is the horizontal wind velocity: u = (u,v). Ψ represents eddy stream functions. Overbars represent the climatology and primes the perturbation (deviation from the climatology). The wave activity flux is computed with monthly fields.

3 Results

3.1 Skill in predicting surface-air temperature

First we assess the prediction system ability to predict SAT by computing the ACC from DePreSys3 and NCEP for the first year and the 2–5 year lead-time. The skill of the first year represents the seasonal-to-interannual predictions; the 2–5 year lead-time represents the interannual timescale (Goddard et al. 2013). In summer (JJAS), the ACC for DePreSys3 is high over the equatorial and subtropical Atlantic Ocean, the Pacific Ocean, Indian Ocean and over Eastern Europe, the Arabian Peninsula and northern China and Mongolia (Fig. 1a). The ACC is larger for the 2–5 year lead-time, especially over the North Atlantic Ocean (subpolar gyre), the Mediterranean Sea and surroundings regions, and over Russia (Fig. 1b). The larger ACC for the 2–5 year lead time, compared to the 1 year lead time, is likely due to the temporal smoothing (4-year average) and to the long term trend induced by external forcing (Goddard et al. 2013).

Fig. 1
figure 1

Anomaly correlation coefficient skill score (ACC) for SAT in DePreSys3 hindcasts (using NCEP as observations) in summer (JJAS) for a year 1, b year 2–5 lead-times. Also shown is the ACC calculated after a linear trend is removed at each grid-point for c year 1 and d year 2–5. Stippling indicates that the ACC is different to zero at the 95% confidence level according to a Monte-Carlo procedure (see text for details)

van Oldenborgh et al. (2012) have shown that skill in predicting SAT is primarily determined by the response to external forcings. Since the main aim of a decadal prediction system is to provide information to decision makers on short-term time horizon, a particular focus is made on the skill at capturing changes relative to the long-term trend. We thus removed the long term trend by subtracting a linear trend from both hindcasts and observations/reanalysis. A linear trend may not be the best estimate of the impact of increased GHG on climate, hence leading to uncertainties in assessing the skill in predicting temperature in regard to the long-term trend. We thus also calculated skill in predicting SAT after regression between surface air temperature and the radiative forcing of GHG (in CO2 equivalent), following van Oldenborgh et al. (2012). However, we found very similar results in predicting SAT (Fig. S2) because the radiative forcing due to GHG emissions was approximately linear over the 1960–2014 time-period (as seen in Meinshausen et al. 2011). The detrended SAT skill is presented in the bottom panels of the Fig. 1 for the 1 year lead-time (Fig. 1c) and the 2–5 year lead-time (Fig. 1d). The ACC is weaker when the time-series have been detrended, compared to the non-detrended time-series (compare Fig. 1a, c). Although the ACC is reduced overall, we find that tropical Pacific SSTs are predictable for the first summer, as well as temperature over the equatorial Atlantic Ocean and the eastern subpolar gyre (Fig. 1c). We also find significant and positive correlations over NEA (especially, northern China and Mongolia). The 2–5 year lead-time also exhibits a decrease in the skill in predicting SAT after removing the linear trend. However, we still find significant skill at capturing SAT over the subpolar gyre, the North Pacific Ocean, Northern Africa, China, Western Canada and Alaska (Fig. 1d). ENSO is not predictable for the 2–5 year lead-time, as underlined by the lack of skill in predicting temperature over the equatorial Pacific Ocean. Similar results are found over NEA, SPG and over Eurasia when the ACC is calculated using COWT(Cowtan and Way 2014) (Fig. S3).

The high values of ACC for predictions of SAT over the subpolar gyre is consistent with the high potential predictability of the Atlantic Ocean owing to the memory of the ocean, including the inertia of the AMOC (Pohlmann et al. 2004, 2013). Indeed, the skill in predicting the evolution of the subpolar gyre has been linked to the ocean initialization (Yeager et al. 2012; Robson et al. 2012; Msadek et al. 2014; Monerie et al. 2017).

We now focus the analysis of skill for SAT averaged over four regions defined on the Fig. 1d: the subpolar gyre [SPG; 50°N–65°N; 60°W–10°W; as defined in Robson et al. (2012)], the Atlantic Multidecadal Variability [AMV; 0°N–60°N; 80°E–0°W, as defined in Trenberth and Shea (2006)], NEA [40°N–50°N; 90°E–130°E; as defined in Chen and Lu (2014)] and we add the global mean surface temperature (GMST; 90°S–90°N; 180°W–180°E). Results are presented in Fig. 2. When including trends, the skill in predicting GMST is high for GMST for both DePreSys3 and the persistence (> 0.9, see Fig. 2a). When the trend is removed the ACC for DePreSys3 is reduced to ~ 0.5, but remains larger than for the persistence (except for the first summer). Thus, the prediction system is able to retrospectively predict GMST, giving confidence that DePreSys3 can broadly simulate and predict climate variability.

Fig. 2
figure 2

Anomaly correlation coefficient (ACC) for SAT in DePreSys3 hindcasts (using NCEP as observations) in summer (JJAS) for a the global mean surface temperature (90°S–90°N; 180°W–180°E), b the subpolar gyre (50°N–65°N; 60°W–10°W), c the Atlantic Multidecadal Variability (0°N–60°N; 80°E–0°W) and d North East Asia (40°N–50°N; 90°E–130°E), for different lead-times (the first summer, 2–3, 3–4, 4–5 and 2–5 year). The red (black) line represents the skill score of DePreSys3 (persistence). The dashed (continuous) line is used for linearly detrended values (raw values). A red circle indicates that the ACC is significantly different to zero at the 95% confidence level according to a Monte-Carlo procedure (see text), and larger than the corresponding ACC for persistence

For the SPG, the ACC for DePreSys3 is significant for all lead-times when the linear trend is removed (Fig. 2b). This is consistent with Menary et al. (2016) that found skill in predicting the top 500 m Labrador Sea temperature in DePreSys3. The skill is particularly high for the 2–5 year lead time, when the time-series are smoothed. Removing the trend does not drastically alter the ability of DePreSys3 to predict SAT over the SPG since the long-term warming is weak.

Although there is substantial skill at capturing the AMV (ACC > 0.8), the skill reduces to ~ 0.5 after detrending (Fig. 2c). The relatively low skill in predicting the AMV contrasts somewhat with the high skill in predicting detrended SAT variability over the subpolar gyre. However, it is consistent with the result of the Fig. 1d, e.g. stronger values of the ACC are found over the north Atlantic than over the equatorial and subtropical Atlantic Ocean (excepted for the 1 year lead-time).

Over NEA the skill is high (> 0.6 and significant) for all the lead times (Fig. 2d). The skill in predicting the temperature over NEA is not mainly due to a long-term warming since the detrended values are almost as strong as the non-detrended ones.

While the ACC skill score measures only the phase difference between observations and hindcasts, the root-mean square error (RMSE) measures the magnitude of the error between hindcasts and observations. Therefore, these metrics provide complementary information on the prediction skill of DePreSys3. RMSE also shows that DePreSys3 provides better skill than the persistence, and that skill in predicting SAT over the SPG and NEA is improved for the 2–5 year lead-time. Over both NEA and the SPG the RMSE is smaller when the trend is removed (Fig. S4).

Similar levels of skill in predicting the AMV, SPG and NEA indices do not imply by itself a relationship between North Atlantic and NEA. However, the SAT has undergone a similar multidecadal evolution in each of these regions, with a cooling over the 1960s and a warming during the 1990s, which has shown by numerous authors for the AMV (Martin and Thorncroft 2014; among others), the SPG (Robson et al. 2012) and NEA (Zhao et al. 2014; Chen and Lu 2014; Gao et al. 2014; Dong et al. 2016) (see also Fig. S5ac). Hence, the similarity between detrended SAT in Asia and over the North Atlantic is assessed in the Sect. 3.2.

3.2 Evolution of surface-air temperature over the subpolar gyre, North East Asia and the Atlantic multidecadal variability

We assess the ability of DePresys3 to simulate the low and high frequency SAT variability for the 1st year lead-time (Fig. 3) and the 2–5 year lead-time (Fig. 4).

Fig. 3
figure 3

Summer (JJAS) SAT evolution of NCEP (black line) and DePreSys3 ensemble-means (red line) for the 1 year lead-time (first row), the low-frequency part of the SAT evolution (i.e. with a 5 year running mean—middle row) and the high-frequency part of evolution of the SAT (top minus middle row; bottom row), for the SPG (left column), the AMV (middle column) and NEA (right column). The spread is computed on the ten members as more or less 1 standard deviation (red shading). All time-series have been linearly detrended. The correlation between NCEP and DePreSys3 time-series is shown in the top left of each panel. The significance of the correlation has been assessed through a Monte Carlo framework. One star indicates that the correlation is significant at the 90% confidence level. Two stars indicate that the correlation is significant at the 95% confidence level

Fig. 4
figure 4

As for the Fig. 3a–c but for the 2–5 year lead-time

Figure 3 shows that the SAT evolution over the SPG, the AMV and NEA for summer (JJAS). Note that NCEP is included between the lower and upper bounds of the DePreSys3 inter-ensemble standard deviation which further suggests that DePreSys3 is able to simulate the observed climate variability in these regions (Fig. 3a–c, 1-year lead-time). We decompose the time-series of 1 year lead-time predictions into low frequency (by using a 5-year running mean) and high frequency components (defined as the residual departure from the low-frequency component).

For the SPG and the 1 year lead-time, the multidecadal evolution of the low frequency component is well simulated (r = 0.84) (Fig. 3d). Over the subpolar gyre the temperature decreased during the 1960s and experienced an abrupt warming during the mid-1990s in observations (Fig. 3a) (Robson et al. 2012, 2014). DePreSys3 is able to reproduce this multidecadal SAT variability. The low-frequency component of the NEA index is also well simulated (r = 0.90) (Fig. 3f) and resembles the evolution of the SPG index (Fig. 3d). The low-frequency of the AMV index is also well simulated (r = 0.65). However, DePreSys3 appears to simulate the AMV index with a slight delay in phase in comparison with NCEP (the weakest values occurred in the mid-90s in NCEP and in the mid-1980s in DePreSys3). The delay in phase is not obtained when analyzing the whole time-series (Fig. S5), and is due removal of the trend from the time-series with the poor sampling (one start-date every 2–3 years), which highlights the cooling of the early 1990s in NCEP more strongly.

In contrast the decomposition shows that the weak correlation for the 1 year lead-time SPG index overall is due to the inability of DePreSys3 to predict the high-frequency component of the SPG variability (Fig. 3g). Note that we are dealing with SAT which is noisier than oceanic heat content as it is more easily impacted by surface winds. The skill in predicting the subpolar gyre could be higher using heat content (Chikamoto et al. 2013) as found by Menary et al. (2016) with DePreSys3. For NEA the high frequency index is noisier and less predictable than the low-frequency component of the SAT evolution (Fig. 3i). The AMV index does exhibit a good correlation for high frequency component (r = 0.64) (Fig. 3h).

Time series showing the 2–5 year anomalies are shown in Fig. 4. Correlations are strong and significant for the SPG, the AMV and NEA indices (0.82, 0.52 and 0.73 respectively). The mid-1990s shift is also reproduced for both SPG and NEA indices, but with a weaker intensity in DePreSys3 (Fig. 4a, c).

Interestingly the low frequency temperature evolution over NEA resembles the temperature evolution over the North Atlantic, with a decrease in temperature over the 1960s and a strong warming over the 1990s (Fig. 3c). The latter is consistent with the rapid warming which occurred there during the mid-1990s (Chen and Lu 2014). We do not obtain a strong warming over NEA for the other seasons (not shown), in consistency with Gao et al. (2014), which have shown that the “shift” was stronger in summer than in winter. In both observations and DePreSys3, the SAT evolution over the North Atlantic (SPG and AMV) is similar to the evolution of the SAT over the NEA box.

As there are uncertainties in observational reanalysis we have also used several other observed datasets of near-surface air temperature (CWT, Cowtan and Way 2014; BEST; Rohde et al. 2013; GISTEMP; Hansen et al. 2010; MLOST; Vose et al. 2012). Results indicate that different data sets give very similar time evolutions of SAT indices over SPG, AMV and NEA (Fig. S5). The skill of DePreSys3 to accurately simulate the low-frequency SAT variability over these three regions is thus robust and not sensitive to a particular observational data set used for model evaluation.

Since we found strong similarities between SAT variability over NEA and the North Atlantic we assess whether DePreSys3 simulates a link between these areas in the next section.

3.3 Changes in the atmospheric circulation associated with the mid-90s North-east Asian warming

We now explore the mechanism associated with the abrupt summer (JJAS) warming that occurred over the SPG and NEA by computing the difference between two periods: 1995–2010 minus 1979–1994, i.e. the years before and the years after the mid-1990s warming (as seen in Fig. 4d with the blue vertical lines). We used the low-frequency component of SAT for the 1 year lead-time, and the non-smoothed 2–5 year lead-time time series. Time-series are linearly detrended prior to compute the anomalies. We found that DePreSys3 exhibits a strong interhemispheric warming after the mid-1990s, with a more homogeneous warming over the Northern Hemisphere than in NCEP (Fig. S6). Therefore, to highlight the modulation of SAT over land in DePreSys3 and in NCEP, we remove a spatial-average over the plotting region of temperature at each grid-point in Fig. 5 (see Sects. 3.3.1 and 3.3.2). For consistency a spatial-average over the same plotting region was also removed for the sea-level pressure and the low-level streamfunctions. For geopotential height we removed a zonal mean to highlight gradients associated with circulation change.

Fig. 5
figure 5

The difference in temperature (°C) in summer (JJAS) between the 1995–2010 minus the 1979–1994 periods for lead time 1 year and 2–5 years, for both NCEP (a, b) and DePreSys3 (c, d). We removed a spatial temperature average (from 20°S to 90°N and 180°E to 180°E) to highlight changes occurring over land in DePreSys3. All grid-points were linearly detrended before the composite was computed. North East Asia (NEA) is defined as the box represented in black: (90°E–130°E; 40°N–50°N). Dots indicate that anomalies are significant at the 95% confidence level according to a Student’s t-test. Note the different colour bars for NCEP (a, b) and DePreSys3 (c, d)

3.3.1 Impact of the observed circumglobal teleconnection pattern

At both the 1 year and 2–5 year lead-times, NCEP shows a significant increase in SAT over NEA, North Russia, West of the Pacific Ocean, the Northern Atlantic Ocean and Northern Europe, Western North America and Alaska (Fig. 5a). The Pacific Ocean exhibits a shift to a negative phase of the Interdecadal Pacific Oscillation, as indicated by the cooling over the eastern Pacific Ocean and the warming over the subtropical north and South Pacific Ocean. There is a dipole over north Africa with a warming over the Saharan desert and a cooling over the Sahel, denoting a northward shift of the monsoon cell and an increase in Sahel precipitation, i.e. the Sahel precipitation recovery (Nicholson 2013; Sanogo et al. 2015). The warming is stronger over the North Atlantic Ocean and NEA at the 2–5 year lead-time (Fig. 5b).

In NCEP the composite analysis exhibits a modulation of SAT with zonally distributed successive patterns of positive/negative anomalies, which seem to be associated with atmospheric circulation changes related to a large-scale process: the circumglobal teleconnection pattern (CGT) (Ding and Wang 2005). Indeed these positive and negatives SAT anomalies are associated with an increase (over the western part of the Northern Atlantic Ocean, the over subtropical Atlantic Ocean, Northern Europe and NEA) and a decrease (over the eastern North Atlantic Ocean, central Eurasia and North West America) in 250 hPa geopotential height (ZG250), successively (Fig. 6a, b). The Rossby wave propagates along the subtropical westerly jet (blue lines), which acts as a wave-guide, as indicated by the wave activity flux (red arrows). The wave path is different for the 1 year lead-time than for the 2–5 year lead-time. For the first year lead-time, the observed wave propagates over northern Russia and moves then southward to reach eastern Asia and Japan by following the polar front, propagating from 90°E and 80°N to 110°E to 45°N (Fig. 6a). For the 2–5 year lead-time the wave propagates more southward, staying south of 70°N.

Fig. 6
figure 6

As in the Fig. 5 but for the geopotential height anomalies at 250 hPa (m; colors) and the wave activity fluxes (m2 s−2; vectors). The zonal mean was removed to highlight gradients in geopotential heights. North East Asia (NEA) is defined as the box represented in black: (90°E–130°E; 40°N–50°N). Dots indicate that anomalies of geopotential height are significant at the 95% confidence level according to a Student’s t-test. Note the different colour bars for NCEP (a, b) and DePreSys3 (c, d)

Changes in the high level atmosphere projects to the surface since observed changes in Sea Level Pressure (SLP) highlight the eastward progression of a Rossby wave: SLP decreases over the Sahel and the northern subtropical Atlantic Ocean, Western Europe, over East Asia and Siberia for both the 1 year and the 2–5 year lead-times (Fig. 7a, b). A decrease in SLP is associated with negative values in stream function and thus with cyclonic circulation anomalies. An increase in SLP is associated with positive values in stream function and thus with an anomalously anticyclonic circulation at the surface. Over East Asia SLP decreases and there is a cyclonic circulation. Over NEA, the increase in SAT is located eastward of the decrease in SLP and of the low-level cyclonic circulation. Thus, the southerlies appear to contribute to the SAT warming, by advecting heat from the South and East of NEA.

Fig. 7
figure 7

As in the Fig. 5 but for the sea-level pressure anomalies (hPa; colors) and low-level streamfunction (in 106 m2 s−1; red continuous contours for positive values; blue discontinuous contours for negative values; black contours for the zero value). For NCEP contours are displayed every 0.20 × 106 m2 s−1; For DePreSys3 contours are displayed every 0.02 × 106 m2 s−1. We removed a spatial temperature average (from 20°S to 90°N and 180°E to 180°E). North East Asia (NEA) is defined as the box represented in red: (90°E–130°E; 40°N–50°N). Dots indicate that anomalies of sea-level pressure are significant at the 95% confidence level according to a Student t-test. Note the different colour bars for NCEP (a, b) and DePreSys3 (c, d)

The geopotential height anomalies are also analyzed at 500 and 850 hPa. The results reveal barotropic anomalies over the North Atlantic Ocean and Europe, and baroclinic anomalies over Asia and the subtropical Atlantic Ocean (Fig. S7). The baroclinic structure over NEA is also shown by the cyclonic anomaly obtained at low-level (Fig. 7a, b) and the anticyclonic anomaly obtained at upper-level (Fig. 6a, b). Change in the low-level atmosphere circulation does not directly follow the changes obtained at upper-level. This is likely due to the strong surface warming (Fig. 5) that induces a decrease in SLP and is associated with cyclonic circulation anomalies (Fig. 7).

The results obtained above that show that Asia’s climate was modulated by a wave pattern is consistent with a growing body of evidence that highlights the importance of Rossby waves on this area (Lu et al. 2002; Enomoto et al. 2003; Enomoto 2004; Ding et al. 2011; Kosaka et al. 2011; Huang et al. 2012; Wu et al. 2016a, b; Lin et al. 2016; Wang et al. 2017). The associated pattern is known as “silk road” or CGT pattern, depending on the path of the wave.

3.3.2 Impact of the simulated circumglobal teleconnection pattern

At a 2–5 year lead-time DePreSys3 exhibits a wave pattern, which is associated with a significant warming over Western Europe, from central Russia to East Asia (from 70°E to 120°E, center around 50°N) and North of Japan (from 150°E to 180°E) (Fig. 5d). In comparison, temperature over Western Russia (from 40°E to 60°E) and Eastern Russia (from 120°E to 150°E) warmed less. Similar to the observed changes, regions of weak change in SAT are associated with negative anomalies in ZG250 and regions with a strong surface warming are associated with positive anomalies of ZG250 (Fig. 6d). The wave activity flux indicates that a Rossby wave propagates from the Atlantic Ocean to East Asia, following the subtropical westerly jet. As for the NCEP, anomalies in ZG250 in DePreSys3 weaken as the wave propagates to the East (i.e. ZG250 changes are stronger over Europe than over East Asia).

Along with the changes in the upper atmosphere, SLP decreases from the subtropical Atlantic Ocean to eastern Asia and increases over northern Europe (Fig. 7d). As for NCEP the SLP decrease is associated with a negative anomaly in stream function over East Asia and over the western subtropical Pacific Ocean, indicating cyclonic circulation anomalies at low-level. These anomalies over the western subtropical Pacific Ocean are in-line with the negative anomalies of ZG250 at the upper-level, hence, they are exhibiting a barotropic structure. Over Asia the large extension of negative SLP anomalies is associated with the strong warming over land and reveals a complex relationship between changes occurring at the surface and throughout the atmospheric column, which leads to a baroclinic structure over East Asia.

At the 2–5 year lead-time the composite analysis reveals strong resemblances between NCEP and DePreSys3, despite a slight phase shift in ZG250 anomalies, highlighting the importance of an accurate simulation of atmospheric teleconnections to predict temperature over land. However, note that SAT and atmospheric changes are weaker in DePreSys3 than in NCEP, which is likely due to the average over ten members with a large internal variability. For instance changes in upper-level geopotential height is five times stronger in NCEP than in DePreSys3 (Fig. 6).

Although there are strong evidences for a CGT pattern at the 2–5 year timescales the first year lead-time ZG250 anomalies do not clearly indicate a CGT pattern since there is no obvious eastward propagation between 40°E and 100°E in the northern Hemisphere (Fig. 6c). However, the wave activity flux indicates that the wave propagates more southward, following the southern boundary of the subtropical westerly jet, as in the “silk Road” pattern of Enomoto et al. (2003). As in Enomoto et al. (2003) the ZG250 anomalies are found over the Eastern Mediterranean Sea (negative anomaly) and northeast (negative anomaly) and west (positive anomaly) of northern India. The ZG250 change over East Asia is also associated with changes occurring over the North Pacific Ocean where the south-to-north zonal anomalies are similar to the Pacific-Japan (PJ) pattern (Nitta 1987), which is known to impact eastern Asia and Japan (Wu et al. 2016b). The PJ pattern has been associated with SSTs over the Pacific Ocean during the preceding winter (Kosaka et al. 2012). DePreSys3 is able to predict SAT over the Pacific Ocean in winter for the 1 year lead-time (Figs. S8 and S9), the Pacific Ocean could thus, also, be a source of predictability for East Asia. However, Kosaka and Nakamura (2010) have shown that the PJ pattern is better defined at low than at high level while it is not the case with DePreSys3 (Fig. S7) revealing that such a teleconnection is not clear here.

Interestingly DePreSys3 exhibits high skill in predicting SAT over the SPG, Eastern Europe, NEA and the Northwestern Pacific, for the 2–5 year lead-time (Fig. 1d) where we found positive anomalies of ZG250 in both NCEP and DePreSys3 (Fig. 6). Thus, prediction of the SAT changes over these areas appears be related to the ability of DePreSys3 to simulate a large-scale atmospheric teleconnection pattern that links these regional SAT changes to remote SST anomalies. The mechanism involving a Rossby wave to modulate the change in temperature over NEA is consistent with Lin et al. (2016) and (Wu et al. 2016a, b) which have highlighted the impact of the interdecadal circumglobal teleconnection pattern (CGT) over Eastern Asia. However, the source of the wave has still to be defined.

3.4 The origin of the circumglobal teleconnection

Rossby waves that impact eastern China have been associated with changes over the (1) Atlantic Ocean (Wu et al. 2016a, b; Lin et al. 2016; Wang et al. 2017), (2) the Pacific Ocean (Ding et al. 2011; Lin et al. 2016) and (3) the Indian summer monsoon (Enomoto et al. 2003). Diabatic heating due to increased precipitation over these areas can lead to disturbances at upper-level and to a propagation of energy via a Rossby wave (Hoskins et al. 1977; Hoskins and Karoly 1981; Lau 1997). However, in DePreSys3 we cannot propose a role of the Indian monsoon since we do not see changes in precipitation for both the 1 year and the 2–5 year lead-time (Fig. 8c, d). Moreover skill in predicting summer Indian precipitation is not significant (Fig. S10). The PJ pattern is not well defined for the 1 year lead-time and DePreSys3 does not exhibit skill in predicting SAT over the Pacific Ocean for the 2–5 year lead-time. In addition the simulated and observed wave activity fluxes do not clearly show a wave propagation from the Pacific Ocean to East Asia (Fig. 6). Therefore, we focus on changes over the Atlantic Ocean.

Fig. 8
figure 8

The difference in linearly detrended precipitation (mm day−1) in summer (JJAS) between the 1995–2014 minus the 1974–1994 periods for lead time 1 year and 2–5 years, for both GPCP (a, b) and DePreSys3 (c, d). North East Asia (NEA) is defined as the box represented in black: (90°E–130°E; 40°N–50°N). Dots indicate that anomalies are different to zero at the 95% confidence level according to a Student t-test. Note the different colour bars for NCEP (a, b) and DePreSys3 (c, d)

In DePresys3 positive anomalies of precipitation over the northern subtropical Atlantic and negative anomalies of precipitation over the southern subtropical Atlantic (Fig. 8c, d) are indicative of a northward shift of the ITCZ [which is also clearly seen in GPCP (Fig. S11), albeit using a shorter period]. The Sahel precipitation recovery is also evident in GPCC (Fig. 8a, b) and DePreSys3 (Fig. 8c, d). This northward shift in the ITCZ is consistent with Sheen et al. (2017), who highlighted the skill of DePreSys3 to predict Sahel precipitation mainly due to the ability of DePreSys3 to simulate the AMV. Indeed, the AMV is known to strongly impact Sahel precipitation (Knight et al. 2006; Martin and Thorncroft 2014). The strengthening of the ITCZ over the subtropical Atlantic is also associated with a precipitation decrease over the northern Brazil and South America (an area which also shows high skill for precipitation in summer—Fig. S10).

The ability of DePreSys3 to simulate a northward shift of the ITCZ along with increased Sahel precipitation is likely to be associated to its ability to simulate the SAT decadal variability over the North Atlantic, and particular the SPG. Many studies have shown that extratropical SST anomalies can affect the tropical circulation through changes in atmospheric heat transport (Kang et al. 2008; Smith et al. 2010). This was supported in particular by Dunstone et al. (2011), which showed that an accurate simulation of SPG temperature variability is key to obtain skill in predicting temperature and precipitation over the subtropical North Atlantic Ocean in idealized experiments.

Associated with the both the observed and simulated changes in precipitation, the subtropical North Atlantic experiences a shift to an anomalously cyclonic (anticyclonic) circulation at low-level (upper-level), as indicated by the Figs. 6 and 7, for both the 1 and 2–5 year lead-time. This baroclinic mode of the atmospheric circulation is consistent with a response to increased tropical precipitation through diabatic heating of the atmosphere (Hoskins et al. 1977; Hoskins and Karoly 1981; Lau 1997). We also found an increase in wind divergence at upper-level over the subtropical Atlantic Ocean (and over the Sahel in NCEP) in association with increased precipitation (not shown). Therefore, the results suggest that the warming over the North Atlantic Ocean excited the CGT pattern in DePreSys3, and, hence, modulated the surface temperature over Eurasia. Such a mechanism is consistent with (Liu and Chiang 2012), which highlighted the role of North Atlantic in triggering the CGT pattern.

4 Discussion

In this paper we have assessed the predictability of SAT over NEA and found substantial skill in DePreSys3 owing to the ability of the decadal prediction system to simulate the circumglobal teleconnection pattern variability. DePreSys3 is in good agreement with NCEP, especially for the 2–5 year lead-time. There are, however, some uncertainties, especially for the 1 year lead–time for which the change in atmospheric circulation at upper-level is not clearly apparent. These uncertainties are due to (1) the experimental methodology (i.e. sampling errors due to the initialization strategy), (2) the fidelity of the simulation and (3) other processes that may be important. These issues are discussed above:

  • (1) The initialization of hindcasts every 2–3 years can lead to uncertainties in the representation of climate anomalies (e.g. differences with the AMV index in NCEP; as seen in Fig. 3e). Furthermore, Lin et al. (2016) found that both interdecadal and interannual CGT present a different vertical structure, although they both lead to a strong warming over East Asia. However, in the present study we cannot properly separate the interannual and interdecadal evolution of the CGT since hindcasts are initialized every 2–3 years (from 1960 to 2008), instead of every year, which limits our ability to elucidate the source of the CGT at either time-scale.

  • We obtain a delay in phase in the AMV index in DePreSys3. We computed anomalies between two periods (1995–2010 minus 1979–1994, as in the next section), using NCEP and the Cowtan and Way (2014) dataset (CWT), with the entire time series and a sub-sample of data, and found that the sampling can impact the obtained changes in temperature (Fig. S12). Therefore, the number of hindcasts start dates (i.e. the sampling) is found to be an important issue in the assessment of the model performance. However, we assume that it is better to compare DePreSys3 and NCEP by analyzing them in a similar way, i.e. with the sub-sampled start-date.

  • (2) Kosaka et al. (2012) showed that an accurate simulation of the Rossby waves is crucial for predicting SAT over East Asia. We test the reliability of the possible impact of North Atlantic on East Asia through the circumglobal teleconnection pattern in winter. For example, for winter we found similar results with DePreSys3 (e.g. increased precipitation over the subtropical North Atlantic Ocean associated with a Rossby wave, which led to increased geopotential height at upper-level over NEA) (not shown). This is, as in summer, associated with a warming over the North Atlantic Ocean, particularly the subpolar gyre. However this teleconnection is not found in NCEP in winter for both the first and the 2–5 year lead-time. This difference may contribute to the low skill in predicting SAT over NEA in winter (Figs. S8 and S9). Furthermore, we highlighted that in summer the observed and simulated CGT is slightly out of phase, particularly over Europe, which could affect the skill scores (see Fig. 6). Therefore, an accurate representation of the CGT pattern is important for predicting SAT.

  • GPCC and GPCP show a decrease in precipitation over NEA after the mid-1990s, leading to a strengthened subsidence and a decreased cloud cover, which, hence, leads to increased incoming solar radiative flux (not shown) that contributes to the surface warming. Over NEA such a precipitation change is not seen in DePreSy3 (Fig. 8) (moreover DePreSys3 has low skill in predicting precipitation over NEA). This observed-modelled difference in precipitation can, in part, explain the difference in the magnitude of circulation change between NCEP and DePreSys3.

  • (3) Extra-tropical SSTs could also impact Europe by triggering Rossby waves (Ghosh et al. 2016). The wave can then propagate further east reaching Asia. However, in this study we found a barotropic change over the northern Atlantic Ocean, showing that the Rossby wave is unlikely to be due to Extra-tropical SST changes. The impact of extra tropics on East Asia will be investigated in a further study (see conclusion).

  • The complexity of the climate system means that teleconnections are not straightforward, and climate anomalies could have many competing causes. For instance, the AMV phase in the 1990s change do not appear to explain the Sahel precipitation recovery alone since Dong and Sutton (2015) have pointed out a role of greenhouse gases as well as anthropogenic aerosols (AA) emissions. Dong et al. (2016) have also highlighted the role of European emissions in anthropogenic aerosols in the recent summer warming over NEA. Indeed over the recent decades the emission in AA has decreased over Western Europe, leading to a surface warming through Europe to Asia (Bauer and Menon 2012; Dong et al. 2016). We analyzed the incoming shortwave radiation and found that the overestimated Northern Hemisphere warming (Fig. 5) can be due to anomalously strong incoming shortwave radiation due to decreased AA emissions over Europe and North Atlantic (Fig. S13). This is in line with Booth et al. (2012) which have suggested that anthropogenic aerosol emissions contribute to the observed variability of the North Atlantic Ocean. This warming over Europe can also be responsible for a part of the ZG250 change, and can, therefore, also impact East Asia in DePreSys3. (Kosaka et al. 2012) have for instance also highlighted that weather regimes (blocking) over Europe are additional sources of Rossby wave that can impact East Asia.

  • For the 1 year lead-time we highlight a potential role of the Pacific Ocean in leading to skill in predicting SAT over Northeast Asia. Changes in precipitation and temperature over the western Pacific Ocean was found to be able to trigger the Pacific-Japan pattern, which can strongly impact East Asia’s climate (Kosaka et al. 2011). It is interesting to note that the high frequency evolution of the CGT pattern has been associated with the PDO (Zhu et al. 2011; Nakamura and Miyama 2014; Lin et al. 2016). However, the composite analysis does not show a PDO pattern for DePreSys3 at the first year lead-time (Fig. 5).

5 Conclusion

In this study we examine skill at capturing surface air temperature (SAT) in the DePreSys3 decadal prediction system. DePreSys3 is based on a high-resolution ocean–atmosphere general circulation model, HadCM3-GC2, and is full-field initialized with start dates every 2–3 years from 1960 to 2008, and every year from 2009 to 2014 (a total of 26 start-dates). We focus on the skill at predicting surface-air temperature (SAT) over Northeast Asia (NEA) for 1 and 2–5 year lead-times, and on understanding the source of this prediction skill.

We find that DePreSys3 has significant skill at capturing surface temperature evolution over NEA, including skill when trends in SAT are removed. Furthermore, we find that the skill in predicting SAT over NEA is mainly due to the successful simulation of the strong multidecadal variability, rather than the inter-annual variability.

The multi-decadal evolution of SAT over NEA, which decreases during the 1960s and 1970s and increases during the 1990s is similar to that over the North Atlantic subpolar gyre where DePreSys3 also demonstrates skill in predicting SAT.

By focusing on the mid-1990s warming in both regions, we highlight that the multi-decadal variability in SAT in the North Atlantic subpolar gyre and NEA appear to be linked via upper-level atmospheric Rossby wave. Thus, we propose that skill in predicting low-frequency SAT variability over NEA is due to a mechanism, which can be summarized as follows:

  1. 1.

    The warming of the North Atlantic Ocean is associated with an increase in precipitation over the Sahel and the subtropical Atlantic Ocean.

  2. 2.

    The increase in precipitation triggers a Rossby wave through diabatic heating of the atmosphere.

  3. 3.

    The Rossby wave propagates eastward, following the upper-level subtropical jet, and modulates the upper-level geopotential height from Europe to East Asia.

  4. 4.

    Anomalously positive geopotential height over NEA and associated upper-level anticyclonic circulation anomalies are associated with the warming at the surface over North-east Asia.

We show that NEA SAT can be modulated by variability in the North Atlantic via the CGT pattern. Hence, we argue that the high potential predictability of the SSTs over the North Atlantic Ocean (Pohlmann et al. 2004) is a source of predictability for the decadal to multi-decadal evolution of SAT over NEA. This is a promising result since changes of SAT relative to a long-term trend are difficult to predict over land (Bellucci et al. 2013; among others). This mechanism is found to be robust at a 2–5 year lead-time and is consistent with the literature (Wu et al. 2016a, b; Lin et al. 2016) and has been associated with changes over the Atlantic Ocean (Lu et al. 2006; Lin et al. 2016; Wu et al. 2016a, b; Wang et al. 2017). We also tested this mechanism in observations with a comparison between the previous positive phase of the AMV (1950–1965) and the last negative phase of the AMV (1979–1993) and found a similar result but with an opposite sign: a Rossby wave extends from the Atlantic Ocean leading to negative SAT anomalies over NEA (Fig. S14). This teleconnection between the AMV and Asia has been highlighted in paleoclimate studies (Feng and Hu 2008; Zhao et al. 2014) and appears to have become stronger since the mid-90s (Qian et al. 2014). This is also in-line with Dunstone et al. (2011), which have shown that decadal Atlantic SPG variability can drive tropical Atlantic variability (precipitation, ITCZ shift), highlighting the importance of an accurate initialization and prediction of temperature over the North Atlantic Ocean in decadal prediction systems.

As described in Sect. 4 the simulation of CGT is slightly different between DePreSys3 and NCEP, especially for the 1 year lead-time for which the CGT pattern does not stand out. Therefore, better predictions can be provided by improving the simulation of the Atlantic Ocean dynamics (e.g. the Atlantic Meridional Overturning Circulation; as discussed in Menary et al. (2016)) and the teleconnection between the Atlantic Ocean and North-east Asia. Moreover, the source of the observed and simulated CGT pattern is still uncertain due to the complexity of the climate system. Hence, the change of the CGT pattern has to be assessed in a further study by conducting controlled sensitivity experiments to further probe the impact of a warming Atlantic Ocean on North East Asia, by following the protocol described in Boer et al. (2016) for the Decadal Climate Prediction Project of the Climate Model Intercomparison Project, phase 6.