1 Introduction

Low prediction skill in the tropical Pacific is a common problem in decadal prediction systems especially for lead years (LY) 2–5. Multi-model studies, which are analyzing decadal prediction systems, reveal in general high prediction skill over the oceans where the low-frequency climate variability is dominant. However, low prediction skill is often present over the equatorial and North Pacific for LY 2–5, affecting also the prediction skill of the Pacific Decadal Oscillation (PDO) (Kim et al. 2012; Doblas-Reyes et al. 2013). In the tropical East Pacific the prediction skill is in many systems even lower than in uninitialized experiments (Doblas-Reyes et al. 2013; Kirtman et al. 2013; Mignot et al. 2016). While the low prediction skill in the North Pacific is assumed to result from biases in the representation of ocean mixing processes in the climate models (Guemas et al. 2012) the reason for the low prediction skill in the tropical Pacific is unknown.

A related problem to the low prediction skill in the tropical Pacific is that of failure of hindcasting hiatuses in global warming, such as occurred between 1998 and 2014 (Easterling and Wehner 2009). It is thought that the global warming hiatus was strongly affected by a cooling in tropical Pacific sea surface temperature (SST) associated with the PDO (Meehl et al. 2011; Kosaka and Xie 2013; England et al. 2014). Initialization of hindcast simulations has shown some improvement in the prediction skill for global mean SST for lead times up to 5 years due to improved heat uptake mainly in the tropical Pacific and Atlantic (Guemas et al. 2013) compared to uninitialized simulations. However, a successful hindcast of the global warming hiatus requires a successful hindcast of the tropical Pacific SST (Meehl et al. 2014) aside from other ocean regions.

As in other decadal prediction systems low prediction skill in the tropical Pacific is also present in the decadal hindcasts that were produced with the Max Planck Institute Earth System Model (MPI-ESM) (Stevens et al. 2013) for the fifth phase of the Climate Model Intercomparison Project (CMIP5) (Taylor et al. 2012). In the tropical Pacific the prediction skill is even lower than in uninitialized experiments (Müller et al. 2012). This decadal prediction system builds the basis for the national German “Mittelfristige Klimaprognosen” (MiKlip) project (http://www.fona-miklip.de). Although the prediction skill was enhanced by the MiKlip project through the initialization from an ocean reanalysis product, the problem of low prediction skill in the tropical East Pacific is still present (Pohlmann et al. 2013).

A promising improvement in the prediction skill in the tropical East Pacific was achieved with an alternative initialization method using only wind stress applied to the MPI-ESM (hereafter referred to as Modini) (Thoma et al. 2015). However, these hindcasts are restricted to the period 1990–2006 due to the availability of the surface wind stress product from National Centers for Environmental Prediction central forecast system (NCEPcfs) reanalysis (Saha et al. 2010) used by Thoma et al. (2015) which was chosen for its high resolution and quality. Wind stress is an especially important part of an initialization system in the equatorial region since there is no Coriolis term to balance the zonal pressure gradient; rather the zonal pressure gradient in the ocean is in a dynamically evolving balance with the zonal wind stress (Bell et al. 2004). Wind stress also matters for the initialization in the tropics off the equator since anomalous Ekman pumping drives heat content anomalies and these heat content anomalies matter, e.g. for the prediction of transitions in the PDO (Meehl and Hu 2006; Meehl et al. 2009, 2016; Ding et al. 2013).

The promising Modini results motivated us to extend the Modini method back to 1960 by using wind data from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis (Kalnay et al. 1996) in order to construct the wind stress fields needed for the initialization. Although in this extended-Modini experiment the prediction skill is no better than in the MPI-ESM hindcasts for CMIP5, the sensitivity experiment gives us the unique possibility to pin down the reason for the low prediction skill, namely to problems with the wind stress used for the initialization.

This paper is organized as follows: The climate model and the prediction systems are presented in Sect. 2. The problem of the initialization shock leading to the low prediction skill in the tropics is presented in Sect. 3. This study closes with a discussion in Sect. 4.

2 The decadal prediction systems

This study compares two decadal prediction systems: (1) the MPI-ESM hindcasts for CMIP5 (called baseline0) and (2) the extended-Modini system. Both systems use the coupled climate model MPI-ESM version 1.0 (Stevens et al. 2013) in low resolution (LR). The atmospheric model resolution is T63 with 47 levels and the oceanic model resolution is 1.5° with 40 levels. In each system an ensemble of 3 members is produced by starting the hindcasts from consecutive days after the 1st of January of an assimilation run in each year. We apply a lead-time dependent bias correction method following the recommendations of WCRP (International CLIVAR Project Office 2011). The bias correction method adjusts only systematic, lead-time dependent offsets but not trends since trends include also wanted climate signals, like global warming, which would otherwise be removed.

The decadal hindcasts with MPI-ESM for CMIP5 (Müller et al. 2012) are initialized from an ocean-only model run forced by a selection of key variables from the NCEP/NCAR reanalysis (Kalnay et al. 1996), like surface winds, air temperature, short-wave radiation, humidity, cloud cover, and precipitation (e.g., Haak et al. 2003). In a second step, an MPI-ESM assimilation run is produced over the period 1960–2013 by relaxation of temperature and salinity in MPI-ESM to the anomalies from the ocean only run added to a long-term MPI-ESM climatology. Ten year long (decadal) hindcast simulations are started from the assimilation run in each year.

For the extended-Modini system the method of Thoma et al. (2015) [based on methods of Cane et al. (1986) and Chen et al. (1997)] is applied, forcing the ocean component of the coupled model MPI-ESM directly with wind stress anomalies for the period 1950–2013. The wind stress anomalies used for extended-Modini are derived using a bulk formula applied to 6 hourly surface wind data from the NCEP/NCAR (Kalnay et al. 1996) reanalysis and are added to the model climatology. This means, the two systems use the same wind stress data from the NCEP/NCAR reanalysis for the initialization. An assimilation run is produced over the period 1960–2010 from which an ensemble of (unforced) decadal hindcasts is started.

MPI has produced many MPI-ESM hindcasts for CMIP5, some with a higher number of ensemble members and higher resolution than used here. However, results from these hindcasts are not considered here, since their initialization frequency, of mainly only every 5 years, increases sampling uncertainty and hampers the detection of prediction skill (Sienz et al. 2016). It should be noted, however, that we make use of 3 uninitialized ensemble members from the standard CMIP5 set-up with the MPI-ESM referred to here as the historical runs.

3 Results

Surface air temperature (SAT) in the baseline0 system has relatively high correlation values in the assimilation run and also LY 1 (Fig. 1a, b). However, from LY 2 onward negative correlation skill is present particularly in the tropical Pacific (Fig. 1c) which points to dynamical problems in this region. The extended-Modini system starts from lower correlation values in the assimilation run (Fig. 1d) compared to baseline0 due to the use of only wind stress for the initialization. The prediction skill is also reduced in many regions for LY 1 (Fig. 1e) compared to the baseline0 results. However, for LY 2–5, a similar deterioration in skill, mainly in the tropical Pacific, can be seen in the extended-Modini system (Fig. 1f) as in the baseline0 system.

Fig. 1
figure 1

Anomaly correlation of surface air temperature against observations from HadCRUT3 (Brohan et al. 2006) over the period (start years) 1961–2010 in the baseline0 system (ac) and the extended-Modini system (df) for the assimilation run (a, d), lead year 1 (b, e) and lead year 2–5 (c, f). Dots denote correlations exceeding the 5–95% confidence level and gray areas denote regions with sparse/missing data

The problem of the low correlation skill is now analyzed in more detail, by looking into the time-series of sea surface temperatures (SST) in the NINO3.4 region (Fig. 2a, c). Time-series of the observations and the 3 member ensemble mean from the uninitialized historical simulations both show a small positive linear trend over the past 50 years. In contrast, the trend of the time-series resulting from the ensemble mean averaged over LY 2 for each hindcast is strongly reversed in the baseline0 system (Fig. 2a) and extended-Modini system (Fig. 2c). This spurious trend is the main problem that is causing negative correlation values for LY 2–4 (Fig. 2b, d).

Fig. 2
figure 2

Time-series of annual mean NINO3.4 (5S–5N, 120 W–170 W) SST for HadISST (Rayner et al. 2003) observations (black), historical (purple) and lead year (LY) 2 of the hindcasts (red) for baseline0 (a) and extended-Modini (c), and their correlation skill for different LYs for baseline0 (b) and extended-Modini (d)

The NINO3.4 LY2 time-series of baseline0 (Fig. 2a) and extended-Modini (Fig. 2c) are on average relatively warm in the early period (1960s and 1970s) and cold in the later period (mid 1980s until 2000s) with respect to the observational/historical runs. We analyze now the spatial dimension of the drift caused by the initialization shock, by showing the SST difference between LY2 and the assimilation run averaged over the periods 1961–1970 (Fig. 3a, c) and 1995–2004 (Fig. 3b, d). On average, all hindcasts starting in the first period reveal a strong warming in the tropical East Pacific (Fig. 3a, c), while they show a strong cooling in the later period in this region (Fig. 3b, d) with stronger maxima and minima in the extended-Modini than in the baseline0 system. The SST differences in Fig. 3 resemble El Niño and La Niña structures. The El Niño/Southern Oscillation (ENSO) is a coupled atmosphere–ocean phenomenon. During the El Niño phase the trade winds are weakened and the SST in the tropical East Pacific is relatively warm. During the La Niña phase the trade winds are enhanced and the SST in the tropical East Pacific is relatively cool. The phase shift to El Niño is accompanied by collapsing trade winds triggering Kelvin waves which deepen the thermocline in the tropical East Pacific, increasing the SST there (e.g., Sheinbaum 2003).

Fig. 3
figure 3

SST difference between LY2 averaged over the experiments starting in the periods 1961–1970 (a, c) and 1995–2004 (b, d) and the corresponding years of the assimilation run for baseline0 (a, b) and extended-Modini (c, d)

We analyze now the wind stress fields which are assimilated into the decadal prediction systems. We find a strong change in the eastward wind stress over the past 50 years in the region 150–120W, 10S–10N in the NCEP/NCAR wind stress data (Fig. 4a). The average zonal wind stress over this region computed from the NCEP/NCAR reanalysis is strongly westward before the 1970s and strongly eastward after 1980 compared to other reanalyses, like ERA-20C (Poli et al. 2015), ERA-40 (Uppala et al. 2005) and NCEPcfs (Saha et al. 2010). The change of wind stress in the NCEP/NCAR reanalysis is also of much larger amplitude than the variability seen in the historical runs with MPI-ESM, the latter being without much change during the past 50 years in this region (Fig. 4b). It is therefore possible that using the NCEP/NCAR wind stress as a forcing for the initialization could cause an imbalance when the model is run freely in hindcast mode. Additionally, the meridional component of the NCEP/NCAR wind stress shows a reduction in convergence in the tropical East Pacific (Fig. 4c, d). Artificial trends in NCEP/NCAR wind (and hence wind stress) data have also been reported in other studies (Wittenberg 2004; Monahan 2006; McGregor et al. 2012).

Fig. 4
figure 4

Linear trend in eastward zonal a wind stress (N/m2) computed from the NCEP/NCAR reanalysis over the period 1961–2010. b Time series of annual and zonal mean wind stress (positive eastward) in the region 150–120W, 10S–10N from different reanalyses (see text) together with 3 members of the historical runs (thin purple) and their ensemble mean (thick purple). c as in a but for the northward meridional wind stress. d as in b but for the meridional wind stress in the region (150–120W, 0–10N) minus (150–120W, 0–10S). Note Convergent fluxes at the equator in the east Pacific become weaker in the NCEP/NCAR reanalysis over the past 50 years

We continue by analyzing the mechanisms causing the artificial El Niño and La Niña events in the extended-Modini hindcasts. (An almost identical mechanism is found in baseline0 and therefore not shown.) Key to the onset of ENSO events are Kelvin waves which can be identified by thermocline depth perturbations (e.g., Sheinbaum 2003). Thus we analyze the zonal wind stress and the thermocline depth near the equator (Fig. 5). In the period 1961–1970 the westward zonal wind stress is enhanced in the central tropical Pacific compared to uninitialized historical runs, causing the thermocline to be systematically deeper than in the historical runs in the assimilation in the central and west Pacific (Fig. 5a). When the climate model is switched into its forecast mode the wind stress reduces, with the thermocline becoming shallower in the west and deeper in the east in the following years (Fig. 5b, c). Eventually, after 5 years the wind stress strength and thermocline depth are back to their neutral conditions (Fig. 5d). By contrast, in the period 1995–2004, the westward zonal wind stress is reduced compared to the historical runs in the central Pacific and enhanced in the west Pacific causing the thermocline to be systematically shallower than in the historical runs in the central and west Pacific (Fig. 5e) in the initialization. When in this period the climate model is switched into its forecast mode, the change in wind stress is causing the thermocline to become shallower everywhere in LY1 (Fig. 5f) and then to deepen in the west in LY2 while remaining shallow in the east (Fig. 5g). Again, after about 5 years the wind stress strength and thermocline depth are back to their neutral conditions (Fig. 5h). These thermocline deviations are triggering the artificial El Niño and La Niña events at the surface.

Fig. 5
figure 5

Results from hindcasts using the extended-Modini initialization. Surface wind stress (positive eastward) along the equator (black and red) and thermocline depth (purple and blue) averaged over periods (ad) 1961–1970 and (eh) 1995–2004 from the assimilation run (a, e; full-lines), and hindcasts lead year (LY) 1 (b, f; long-dashed), LY2 (c, g; short-dashed), and LY5 (d, h; dotted). Data of the historical runs are shown as a reference in (ad) and (eh), respectively, for surface wind stress (black) and thermocline depth (purple)

4 Discussion

We have addressed the question, why reduced prediction skill in the tropical Pacific is present in the MPI-ESM decadal hindcasts for CMIP5 (baseline0) in LY 2–5 and performed a sensitivity experiment (extended-Modini) that is initialized using only wind stress data. We have identified an initialization shock in both systems arising from a trend in the wind stress computed from the NCEP/NCAR reanalysis that was used for the initialization. The trend is much larger than similar trends found in other reanalysis products or, indeed, in the freely running MPI-ESM. As a consequence, the thermocline is displaced unrealistically from its preferred state during the initialization procedure. This leads to a spurious equatorial adjustment involving Kelvin and Rossby waves (Gill 1982) once the climate model is switched into its forecast mode. Thereafter, the thermocline returns back to its long-term average state over the following 3–5 years. The thermocline deviations cause large-scale sea surface temperature anomalies that resemble El Niño and La Niña events.

The skill reduction in extended-Modini is similar to the one seen in baseline0. While the assimilation procedure in baseline0 uses many variables, the extended-Modini experiment uses only wind stress. We therefore conclude that the problem with the initialization shock in the MPI-ESM hindcasts for CMIP5 is related to the wind stress data used for the initialization and, in particular, the trends we have identified in the wind stress data computed from the NCEP/NCAR reanalysis which appear to be spurious. This insight demonstrates the importance of using high quality observational estimates of surface winds for the initialization.

A similar problem is also reported for the Community Climate System Model version 4 (CCSM4) decadal prediction system (Karspeck et al. 2014). The hindcasts in the CCSM4 system (which were initialized, like in the MPI-ESM hindcasts for CMIP5, using data from an ocean-only run) drift into El Niño states during the period 1961–1976 and La Niña states thereafter. Karspeck et al. (2014) speculate that the spin-up of their system could be the cause for the drift. Several cycles of ocean-only runs were used to produce the initial conditions for the hindcasts, each cycle started at the end of the previous one. Anomalies from the end of the cycle could have continued to propagate through the Pacific basin and influenced the hindcasts in the first two decades (Karspeck et al. 2014). In our extended-Modini system, no such cycles are involved since we start immediately from a historical run. We can also rule out the impact of variables other than wind stress for degrading the prediction skill in the extended-Modini system since wind stress is the only variable used for the initialization. In fact, utilizing the extended-Modini runs enables us to pin down the reason for the failure of the earlier MPI-ESM hindcasts for CMIP5, providing new insights that can help to improve future decadal prediction systems.