1 Introduction

Natural decadal climate variability (DCV) at 10–20 years oscillation periods (or, timescales) is one of the oldest areas of climate research. For centuries, DCV was associated with decadal solar and lunar cycles, giving rise to a voluminous published literature and motivating attempts to predict climate with sunspot numbers as a predictor. These largely statistical studies had very limited success in unraveling the physics of DCV or in advancing the field of climate prediction, perhaps due to a lack of physically-meaningful solar, oceanic, and atmospheric observations; and climate models based on fundamental principles of physics. These historical studies, however, provided strong and credible indications of DCV associations with meteorological, hydrological, and agricultural variability in many areas of the world, thus providing a strong motivation to understand causes of DCV, and to predict DCV and its impacts when the necessary observations and climate models became available towards the end of the twentieth Century.

Due to the availability of archives of multidecades-long oceanic observations since the end of the Cold War, quality-checked and model-assimilated global atmospheric and oceanic observations, and the development of climate models incorporating increasingly realistic physical processes, a substantial body of research has emerged in the last two to three decades, focused on understanding causes and mechanisms of DCV (see, for example, Meehl et al. (2009), Mehta et al. (2011a) for reviews). Several DCV phenomena—the Pacific Decadal Oscillation (PDO) (Mantua et al. 1997; Minobe 1997) or the Interdecadal Pacific Oscillation (IPO) (Power et al. 1999), tropical Atlantic sea-surface temperature (SST) gradient variability (TAG for brevity) (Houghton and Tourre 1992; Mehta and Delworth 1995; Chang et al. 1997; Mehta 1998), West Pacific Warm Pool SST variability (WPWP) (Wang and Mehta, 2008), and decadal variability of El Niño–La Niña (EN–LN) events (Balmaseda et al. 1995; Kestin et al. 1998; Power et al. 1999; An and Wang 2000)—have been identified in observational records and associated with the occurrence of decadal hydrologic cycles (DHCs) on land (Mehta 1998, 2017; Nigam et al. 1999; Enfield et al. 1999; Hidalgo 2004; McCabe et al. 2004, 2007; Meehl and Hu 2006; Schubert et al. 2009; Mehta et al. 2011b, 2014, 2016; Hannaford et al. 2013; Watras et al. 2013; Biron et al. 2014; and references therein).

Accurate simulations of a climate phenomenon by a model based on principles of physics, chemistry, and biology are important to establish that the phenomenon is not an artefact of statistical analysis, to understand the underlying physical processes, and to establish the model’s suitability to predict the phenomenon. If skillful decadal climate predictions can be made with these models, they can greatly benefit planning in many societal sectors, such as agriculture, reservoir operations, municipal water supply and drainage systems, hydro-electricity generation, thermal and nuclear power plant operations, water-borne transportation, fisheries and wildlife habitat maintenance, forest fires, river- and reservoir-based recreation industry, and state and national government decisions [see, for example, Meehl et al. (2009), Mehta et al. (2013a), Mehta (2017)]. In addition to the importance of DCV prediction for societal impacts prediction and planning, skillful predictions are also important for understanding and attributing observed past, current, and future climate to natural DCV or anthropogenic climate change. In order to develop models and methodologies of climate simulation and prediction, the World Climate Research Program organized the Coupled Model Intercomparison Project 5 (CMIP5; Taylor et al. 2012) to assess the ability of the current generation of Earth System Models (ESMs) used in climate and impacts assessments by the Inter-governmental Panel on Climate Change to simulate past and present climate; and to predict/project past, present, and future climate.

The present study is a part of a program to develop a decadal climate and impacts simulation and prediction system for the Missouri River Basin (MRB),Footnote 1 to develop adaptation options for water and agriculture sectors in the MRB using decadal climate and impacts information, and to develop a methodology to estimate the value of decadal climate and impacts information to the agriculture sector. Global ESMs, a very high-resolution land use–hydrology–crop model, and an economic impacts model are being used in this program. From this program, preliminary results on decadal predictability of ocean basin averaged SSTs in decadal hindcast experiments with the Geophysical Fluid Dynamics Laboratory (GFDL)–CM2.1, the UK Meteorological Office (UKMO) HadCM3, the Japanese Model for Interdisciplinary Research On Climate 5 (MIROC5), and the National Center for Atmospheric Research (NCAR)–CCSM4 ESMs were reported in Mehta et al. (2013b); and a dynamical–statistical technique for decadal hydro-meteorological predictions being developed—applied to southern Africa as a test case—was reported in Mehta et al. (2014). Research designed to simulate impacts of DCV phenomena on surface and ground water in the MRB was reported in Daggupati et al. (2016) and Mehta et al. (2016), and on spring and winter wheat in the MRB is reported in Mehta et al. (2017a). The methodology developed to estimate the value of decadal climate information to the agriculture sector in the MRB is described in Fernandez et al. (2016). The ability of these four ESMs to simulate important attributes of these three DCV phenomena is addressed in the present paper. Research addressing the ability of CM2.1, HadCM3, MIROC5, and CCSM4 ESMs in CMIP5 to hindcast attributes of the PDO, the TAG variability, and the WPWP variability is described in a companion paper (Mehta et al. 2017b; hereafter referred to as MMW). These four ESMs were selected because it is important to assess simulation and hindcast skills of the same ESMs in the same experimental framework to establish their suitability for prediction/projection. The modeling groups who have developed these four ESMs conducted CMIP5 experiments with generally the same model configurations. Also, simulation and hindcast/forecast experiments with these four ESMs were run in CMIP5 in the ensemble mode with up to 10 ensemble members in each ensemble.

1.1 Review of previous research

A brief review of the history of DCV and predictability research, especially that associated with the 11- and 22-year sunspot cycles, is given in MMW. Therefore, only a brief review of modeling of DCV as internally-generated climate variability is given here. Also, as explained in MMW, the focus in these two papers is on approximately 10–20 years oscillation periods (or timescales).

Simulations of DCV in the Pacific and the Atlantic regions in the last 25 years have generally invoked two major hypotheses about the mechanisms of DCV. One is the so-called “red noise” hypothesis in which low(er)-frequency variability of oceanic temperatures is generated by the integration of atmospheric “white noise” or high-frequency forcing by the ocean which has high thermal and mechanical inertia [see, for example, Hasselmann (1976), Frankignoul and Hasselmann (1977)]. According to this hypothesis, temperature variance increases at lower frequencies or longer oscillation periods, among them decadal-multidecadal, and there are no preferred timescales of variability. The second hypothesis invokes two-way, coupled ocean–atmosphere interactions in which ocean circulations in the horizontal and/or vertical planes and oceanic Rossby waves interact with the atmosphere, and either the circuit times of the ocean circulations in the zonal-meridional plane or the meridional-vertical plane, or propagation times of Rossby waves provide distinct decadal-multidecadal timescales [see, for example, Mehta (1991, 1992), Weaver and Sarachik (1991), Latif and Barnett (1994), and Gu and Philander (1997) for an early history].

1.1.1 The Pacific decadal oscillation

The PDO in its original identification (Mantua et al. 1997) is a primarily North Pacific pattern with one sign of SST anomalies in central North Pacific and opposite sign anomalies around the central North Pacific in a horseshoe pattern. This SST pattern oscillates primarily at decadal-multidecadal timescales although oscillation periods from a few months to a few years are also present. Another version of this meridionally-broad SST pattern oscillating at decadal timescales and straddling the equator was identified by Zhang et al. (1997). The IPO (Power et al. 1999) encompasses the North and South Pacific Oceans, with SST anomalies of one sign in the tropical-subtropical Pacific regions and of opposite sign in midlatitude Pacific regions. The IPO SST pattern, by construction and definition, varies at equal to or longer than decadal timescale. Decadal–multidecadal variability, however, is nearly identical in the PDO and the IPO.

The North Pacific part of the PDO has been the subject of many studies with non-coupled ocean models and coupled ocean–atmosphere models. Liu (2012) and Zhang and Delworth (2015, 2016) have reviewed many of these studies, so only a brief overview of important conclusions is given here. Possible roles of tropical and extratropical Pacific Ocean, coupled ocean–atmosphere interactions in the Kuroshio-Oyashio Extension region, sub- and extra-tropical oceanic Rossby waves, and interactions between the tropical and extra-tropical Pacific regions have been addressed in such studies. Both non-coupled ocean models and coupled ocean–atmosphere models show PDO-like SST patterns which oscillate at decadal–multidecadal time scales. Stochastic atmospheric noise appears to be the dominant forcing for generating the PDO-like variability according to many of the studies [see, for example, Jin (1997), Pierce et al. (2001), Newman et al. (2003), and the review in Liu (2012)]. Newman et al. (2003) also suggested that the interannual, tropical El Niño Southern Oscillation (ENSO) phenomenon may be forcing the PDO in extratropical Pacific. These studies, however, are not conclusive about the presence or absence of a distinct spectral peak or peaks above a red noise background at 10 to 20 years oscillation periods in observations and model simulations of the PDO. A distinct peak would imply that there is a selection mechanism which prefers a particular decadal timescale out of all the possible timescales which can be generated by the processes mentioned above, especially east to west transit times of oceanic Rossby waves across a part of or entire Pacific Ocean.

1.1.2 The tropical Atlantic SST gradient variability

Another DCV phenomenon which has attracted attention of researchers for over 30 years is the TAG SST variability [see, for example, Folland et al. (1986, 1991), Semazzi et al. (1988), Hastenrath (1990), Ward and Folland (1991), Hastenrath (1991), and Nobre and Shukla (1996) for an early history]. The TAG SST pattern of multiyear to decadal variations has nearly constant amplitude with respect to longitude in the tropical Atlantic and opposite signs on the two sides of the equator, with the maximum amplitudes at approximately 15°N and 15°S. This north–south, cross-equatorial pattern also emerges from empirical orthogonal function (EOF)—principal component (PC) analysis of tropical Atlantic SST anomalies and was characterized by some researchers as a dipole mode of tropical Atlantic SST variability. This type of SST pattern also emerged when wind or rainfall anomalies over the tropical Atlantic and some neighboring continental regions were regressed against SST anomalies. A controversy raged in the 1990s CE about the physics of this SST pattern (see, for example, Houghton and Tourre (1992), Kawamura (1994), Mehta and Delworth (1995), Chang et al. (1997), Mehta (1998), Rajagopalan et al. (1998), Delworth and Mehta (1998), and Enfield et al. (1999)), especially whether this pattern is a dynamical and/or thermo-dynamical dipole mode of variability of the tropical Atlantic Ocean–atmosphere system or it represents variability of the cross-equatorial, tropical Atlantic SST gradient. Assuming this SST pattern to be a dipole mode, Chang et al. (1997) proposed a mechanism in which the wind speed—evaporation—SST feedback is positive and heat advection feedback by the Brazil Current along the South American coast is delayed and negative. In their model with these feedbacks, Chang et al. (1997) found self-sustaining dipole oscillations across the equator. Xie and Carton (2004) have reviewed the tropical Atlantic SST variability and its possible mechanisms in detail. Regardless of the physical interpretation of the tropical Atlantic SST pattern and its mechanisms, its association with climate variability in West Africa, Brazil, the U.S.A., and other regions of the world is incontrovertible as studies cited earlier and Mehta (2017) show.

1.1.3 The West Pacific warm pool SST variability

The WPWP stands out in global SST maps due to its very large area of warmest temperatures. The WPWP SSTs exhibit large variations on intraseasonal, seasonal to interannual, decadal, and longer time scales. The WPWP SSTs have been warming over the twentieth Century and into the twenty-first Century. Both the WPWP indices with and without linear trend show interannual to decadal variability of the WPWP SSTs as shown later in this paper. They also show that the WPWP SST can persist in the warmer or less warm conditions for several years to a decade or longer. These decadal WPWP SST variations influence worldwide atmospheric circulations (Wang and Mehta 2008); and hydro-meteorology, river flows, agriculture, and fish and crustacean catches (Mehta 2017). The formation of the WPWP is driven by ocean dynamics and thermodynamics. Penetrative solar radiation, and atmospheric processes such as evaporation cooling and cloud-radiation feedback also play an important role. The WPWP is a region of net atmospheric freshwater flux (precipitation minus evaporation) from the atmosphere into the ocean, which significantly affects the dynamics and thermodynamics of the WPWP via its effect on salinity and ocean–atmosphere heat exchanges. Natural and/or human-induced perturbations in some or all of these processes can cause the observed interannual to decadal variability of the WPWP SSTs (see, for example, Huang and Mehta (2004) for a simulation of the WPWP variability as a response to freshwater forcing variability). The warming trend of the WPWP SST may be a manifestation of human-induced global warming or due to as-yet undetermined natural process(es).

1.2 Present study

A physical climate phenomenon has several important spatial and temporal attributes. In order to establish a model’s ability to simulate an observed phenomenon, it is very important to compare, to the extent possible, important attributes of the observed phenomenon with the corresponding ones of the same phenomenon simulated by a model. Four major attributes—spatial patterns, times (of the year) of maxima and minima of the average annual cycle of each pattern, and oscillation periods of significant spectral peak(s), if any, of each pattern’s variability - are traditionally considered important for assessing a climate model’s simulations. These attributes are important not only for understanding a phenomenon’s physics, but also for using an ESM to predict the phenomenon. Also, as noted by several studies [see, for example, Mehta et al. (2013b), Ding et al. (2014], and references therein), volcanic eruptions appear to influence the evolution and predictability of area-averaged SSTs. Therefore, possible influences of volcanic eruptions on the DCV phenomena should also be analyzed. In view of this brief background, the following questions are addressed in this paper using data from simulation experiments with four ESMs in CMIP5: What are the SST patterns of the PDO, the TAG variability, and the WPWP variability in selected ESMs? How do the SST patterns compare with observations? Are there substantial and significant simulation skills of these three DCV phenomena in the selected ESMs? Is there a substantial influence of volcanic eruptions on the simulated DCV phenomena?

This paper is organized in the following sections: CMIP5 and other data used in this study and analysis techniques are described in Sect. 2. Then, simulated DCV phenomena, their simulation skills, and apparent influences of volcanic eruptions are described in Sect. 3. Finally, results are summarized and discussed in Sect. 4.

2 Materials and methods

2.1 CMIP5 and observed data sets

In CMIP5, a set of simulation experiments and two sets of core decadal prediction experiments were conducted (Taylor et al. 2012). These simulation (or, historical) experiments, some starting from 1856 CE, were forced with estimated carbon dioxide concentrations in the atmosphere, aerosol optical depths (AODs)—including those due to volcanic eruptions—and solar radiation, and were initialized with random initial conditions.

We used simulated SST and specified AOD data for 1861 to 2005 CE from HadCM3, CM2.1, CCSM4, and MIROC5. The four selected ESMs were developed at four major climate research institutions in Asia, Europe, and North America. Table 1 summarizes major attributes of these four ESMs and the CMIP5 experiments conducted with them. All available ensemble members from each ESM were used. In all CMIP5 experiments with these ESMs, Northern Hemisphere and Southern Hemisphere time series of AOD, based on observations [Amman et al. (2003) in CCSM4, and Sato et al. (1993) and Hansen et al. (2002) in the other three ESMs], were specified. These data provide zonal-average, vertically-resolved AOD for visible wavelengths and column-average effective radii of aerosols (Stenchikov et al. 2006). The AOD estimates used in CCSM4 are 20–30% larger than those used in the other three ESMs (Ding et al. 2014). We also combined data from the four ESMs as a multi-model ensemble (MME; Krishnamurti et al. 2000). The MME in this study is an average of the ensemble-average data from each ESM. In this way, each ESM is treated equally in the MME. We used the Extended Reconstructed SSTs (ERSST; Reynolds et al. 2002) for comparison with simulated SSTs.

Table 1 CMIP5 experiments with Earth System Models used in this study

2.2 Analysis techniques

Two SST indices were constructed by averaging SST in the Western Pacific Warm Pool (20°S–20°N, 90°E–180°) for the WPWP index, and in the tropical North (5°–20°N, 30°–60°W) and South (0°–20°S, 30°W–10°E) Atlantic Ocean with the TAG index defined as the difference between the former and the latter. In this study, the PDO index is defined (Mantua et al. 1997) as the normalized PC time series of the first EOF (Lorenz 1956) of the Pacific SST anomalies within the 20°N to 65°N, 125°E to 100°W domain. ESM data from 1861 to 2005 CE were used in these analyses.

Various statistical analysis techniques were used to isolate and analyze attributes of simulated and observed SST patterns of the PDO, the TAG, and the WPWP SST variability. The EOF-PC analysis was used to isolate spatial patterns of PDO SST anomalies. The TAG and WPWP spatial patterns were isolated by regressing the two indices against global SST anomalies. The Fourier spectrum analysis was used to identify preferred timescales, if any, of variability of the isolated PDO SST patterns and the TAG and WPWP indices; the statistical significance of spectral peaks was estimated via a comparision with corresponding red noise spectra (Wilks 1995).

Following the definition of decadal hindcast skill by Smith et al. (2007), Keenlyside et al. (2008), Pohlmann et al. (2009), and Mehta et al. (2013b), the correlation coefficient between observed and simulated DCV SST indices was defined as simulation skill estimator. The observed and simulated data for the 1961–2005 CE period were also used to estimate simulation skill to compare with decadal hindcast skills reported in MMW. The correlation coefficients were evaluated based on the ensemble-average and monthly average data from each ESM and also the data from the MME. Prior to calculating correlation coefficients, linear trends over the 1961–2005 CE period were removed from all data because of the comparison with decadal hindcast skills in MMW over this period; due to a substantial linear trend in the WPWP indices, correlation coefficients between observed and simulated indices were calculated without and with linear trends. The Monte Carlo technique (Wilks 1995) was used to estimate statistical significance of correlation coefficients, with those equal to or greater than 95% confidence limit referred to as statistically significant in this paper. Negative correlation coefficients are referred to as no skill.

3 Results

3.1 Simulated decadal climate variability phenomena

Major attributes of the PDO, the TAG variability, and the WPWP variability simulated by the four ESMs and the MME are described in the following three sub-Sections. Results are summarized at the end of each sub-Section.

3.1.1 The Pacific decadal oscillation

The spatial patterns of the PDO SST anomalies simulated by the four ESMs have approximately the same share of the total SST anomaly variance, shown in Table 2, as the ERSST PDO anomaly pattern. The ERSST PDO pattern (Fig. 1a) has the same sign in the tropical Pacific region and along the North American coast to the Gulf of Alaska; and opposite sign in mid-latitude Pacific regions centered at approximately 30°-40° latitudes. The amplitude of the North Pacific SST anomalies associated with the ERSST PDO is larger than that of the tropical SST anomalies. The CM2.1 PDO pattern (Fig. 1b) has the same sign in the tropical Pacific region and along the North American coast, especially in the Gulf of Alaska; and opposite sign in central and western North Pacific region centered at approximately 45° N, with a broad pattern extending from the WPWP to central Pacific in the Southern Hemisphere. The HadCM3 PDO pattern (Fig. 1c) has the same sign in the central tropical Pacific region and between 30°N and 60°N along the North American coast; and opposite sign in central and western North Pacific region centered at approximately 30°N with the maximum amplitude near the coast of central Japan. The MIROC5 PDO pattern (Fig. 1d) is generally similar to the ERSST pattern, but the equatorial part is more in the central and western Pacific compared to the ERSST pattern, with a somewhat larger variance in the Gulf of Alaska–Barents Sea region and in the Southern Hemisphere from the WPWP to central Pacific. The CCSM4 PDO pattern (Fig. 1e) is generally similar to the ERSST pattern, with a somewhat larger variance near the coast of Japan. The MME PDO pattern (Fig. 1f) is generally similar to the ERSST PDO pattern, but with the North Pacific center closer to Japan and in the Southern Hemisphere from the WPWP to central Pacific. Spatial correlation coefficients between the ensemble-average simulated and the ERSST PDO patterns are 0.8 for CM2.1, 0.72 for HadCM3, 0.82 for MIROC5, 0.88 for CCSM4, and 0.88 for the MME over the North Pacific region. Thus, ensemble-average, simulated PDO patterns are not too dissimilar compared to the observed pattern. It is important to note here that there are variations among the PDO spatial patterns of individual members even in the same ESM ensemble.

Table 2 Major attributes of observed (ERSST) and simulated (CMIP5 earth system models) Pacific decadal oscillation
Fig. 1
figure 1

SST anomalies (°C) associated with 1 standard deviation of PC1 of North Pacific SST anomalies over the 1861–2005 CE period derived from a observations (ERSST) and from historical runs with b CM2.1, c HadCM3, d MIROC5, e CCSM4, and f MME

The average annual cycles of the PDO index in ERSST and the ESMs are described here (not shown), with months of maximum and minimum given in Table 2. The ERSST PDO pattern peaks in June–July, with the minimum in October. Table 2 shows that there are phase differences of a few months between the average annual cycles of the PDO in ERSST and ESMs; the ERSST and the MME annual cycles are the closest in amplitude and phase.

The Fourier spectrum of the ERSST PDO (Fig. 2a, red) has major significant peaks at approximately 5.5-year and 16–20 years periods, with minor peaks at approximately 8.5- and 12-year periods. The ensemble-average Fourier spectrum of the CM2.1 PDO (Fig. 2a, blue) has major significant peaks at approximately 3.5–6- and 16–19-year periods. Both these peaks coincide with the corresponding peaks in the ERSST data, but the shorter timescale peak has much smaller spectral density and is much broader in the CM2.1 data. Incidentally, both PDO spatial pattern and spectra peaks in the CMIP5 simulations with CM2.1 are generally similar to those in other simulation experiments with CM2.1 by Zhang and Delworth (2015). The ensemble-average Fourier spectrum of the HadCM3 PDO (Fig. 2b, blue) has major significant peaks at approximately 8.5–15-year and longer than 19-year periods, with minor significant peaks at approximately 6-year period. The decadal peaks are close to or overlap with with the corresponding peaks in the ERSST data, but the former have much larger spectral density than the latter, reflecting much stronger decadal variability of the PDO in the HadCM3 data than in the ERSST data. The ensemble-average Fourier spectrum of the MIROC5 PDO (Fig. 2c, blue) has a broad, significant peaks from approximately 4–10 years, with a sharp peak at 5–6 years coincident with the corresponding ERSST peak. Thus, the dominant PDO variability in MIROC5 data is at considerably shorter period than in the ERSST data. The ensemble-average Fourier spectrum of the CCSM4 PDO (Fig. 2d, blue) has major significant peaks at approximately 4.5–5.5-year periods, with minor but significant peaks at approximately 7- and 12–14-year periods. The MME PDO spectrum (Fig. 2e, blue) has peaks at the same periods as the corresponding ERSST PDO spectrum between 4 and 12 years, but does not show significant decadal variability. As mentioned in Table 1, CM2.1 and HadCM3 ensembles have ten members each, MIROC5 ensemble has four members, and CCSM4 ensemble has six members in the CMIP5 simulation mode; the MME, as formed in this study, has 30 members. Figure 2a to d show that there is substantial variation among the Fourier spectra of members of each ensemble in preferred periods as well as spectral densities. Although the average spectrum of each ESM ensemble has some resemblance to the ERSST PDO spectrum, the closest at sub-decadal periods is the MME PDO spectrum which is the average of 30 individual member spectra. This shows the importance of conducting ensembles of simulation experiments with the same ESM even if the external forcings are the same in all ensemble members. The simulated decadal variability of the PDO, however, is much closer to the observed in the CM2.1, HadCM3, and CCSM4 data.

Fig. 2
figure 2

Estimated spectra of the PDO index (normalized PC1) derived from ERSST (red) and historical runs (dark blue for ensemble mean and light blue for ensemble members) with a CM2.1, b HadCM3, c MIROC5, d CCSM4, and e MME. Dashed lines are corresponding red-noise spectra

As the foregoing description shows, the ensemble-average simulated and observed PDO patterns have approximately the same share of total SST anomaly variance in each respective data set. There are regional differences among the simulated and observed PDO patterns, but the overall patterns are not too dissimilar. It is noteworthy that the observed PDO pattern has the maximum amplitude in the eastern and central North Pacific region, whereas the CM2.1, HadCM3, and CCSM4 ESMs’, and the MME PDO patterns have the maximum amplitude over the Kuroshio-Oyashio Extention (KOE) region in the western North Pacific region; the MIROC5 ESM’s simulated PDO pattern is the closest to the observed PDO pattern in the North Pacific region. It is also noteworthy that all ESMs and the MME show an El Niño-like SST pattern in the equatorial Pacific region, but the amplitudes of the equatorial SST anomalies differ in the ESMs and observation. The average annual cycles of the PDO patterns in ERSST and the ESMs have comparable amplitudes and maximum/minimum phases differing by 1–2 months among the various ESM data sets. The Fourier spectrum of the ERSST PDO has major and minor significant peaks at decadal periods (8.5 and 12 years), reflecting the importance of decadal variability of the PDO pattern; the strongest peak, however, is at 5.5 years. All ESMs except MIROC5 have ensemble-average spectral peaks at generally decadal timescales, with variations in spectral densities of these peaks among the ESMs; the MME spectrum is the closest to the ERSST spectrum at sub-decadal timescales. Thus, there are substantial variations in spatial patterns, annual cycle, and Fourier spectra among members of each ESM’s ensemble. Major attributes of the observed and simulated PDO are summarized in Table 2.

3.1.2 The tropical Atlantic SST gradient variability

The dominant pattern of regression between the TAG SST index and Atlantic SST anomalies in ERSST is the bipolar pattern (Fig. 3a), with centers at approximately 10°-15° N in an east–west band and approximately 10°–15°S near the African coast. There is also a center of variance south of Greenland with the same sign as the tropical North Atlantic. The ensemble-average TAG pattern in CM2.1 (Fig. 3b) is similar to the ERSST pattern, but the South Atlantic center is located closer to the equator rather than at 10°–15°S as in the ERSST TAG pattern. Also, associated with the CM2.1 TAG is a large-scale tropical Pacific pattern with the same sign as the tropical North Atlantic and an opposite-sign center in mid-latitude North Pacific. The Pacific pattern appears somewhat like an El Niño pattern, including a center of variance in the western Indian Ocean. While the ensemble-average pattern of TAG in HadCM3 (Fig. 3c) is similar to the ERSST pattern, the centers in the tropical North and South Atlantic are further westward compared to the ERSST TAG pattern. The ensemble-average TAG pattern in MIROC5 (Fig. 3d) is similar to the ERSST TAG pattern. Also, associated with the MIROC5 TAG is a large-scale tropical Pacific pattern with the same sign as the tropical North Atlantic as in CM2.1. The ensemble-average TAG pattern in CCSM4 (Fig. 3e) is similar to the ERSST TAG pattern. Also, associated with the CCSM4 TAG is a large-scale tropical Pacific pattern with the same sign as the tropical North Atlantic as in CM2.1 and MIROC5. The MME TAG pattern (Fig. 3f) is a combination of the TAG patterns from the four individual ESMs and, therefore, has a Pacific component associated with the TAG pattern, but the Atlantic part is not very different from the ERSST TAG pattern.

Fig. 3
figure 3

Sea-surface temperature (SST) anomalies associated with 1 °C tropical Atlantic SST gradient index based on linear regressions of monthly average SST over the 1861–2005 period for a observations (ERSST) and historical simulations with b CM2.1, c HadCM3, d MIROC5, e CCSM4, and f MME

The average annual cycle of the monthly TAG index in ERSST and the ESMs is described here (not shown), with months of maximum and minimum given in Table 3. The average annual cycle has a maximum in September and a minimum in March. Phases of the average annual cycle are remarkably similar in ERSST and the ESMs, but the average TAG index is 1–2 °C cooler through out the year in the ESMs.

Table 3 Major attributes of observed (ERSST) and simulated (CMIP5 earth system models) tropical Atlantic sea-surface temperature gradient (TAG) variability

The Fourier spectrum (Fig. 4a) of the ERSST TAG pattern has major significant peaks at approximately 4-, 9-, 10–13-year and longer than 19-year periods; the 10–13 year ensemble-average peak is, by far, the most prominent. The Fourier spectrum (Fig. 4a) of the CM2.1 TAG pattern has major significant peaks at approximately 3.5-6.5-year period coincident with ERSST peaks. The ensemble-average Fourier spectrum (Fig. 4b) of the HadCM3 TAG pattern has major significant peaks at approximately 6-, 8–10-, and 14–17-year periods. The ensemble-average Fourier spectrum (Fig. 4c) of the MIROC5 TAG pattern has a somewhat significant but weak peak at approximately 9-13-year period, coincident with the ERSST TAG spectral peak. The Fourier spectrum (Fig. 4d) of the CCSM4 TAG pattern has a “white noise” appearance, with minor peaks at approximately 4–6-, 10-year, and longer than 16-year periods. The MME Fourier spectrum (Fig. 4e) has barly significant and very small peaks at 6, 9.5, and 15.5 years that are not coincident with the ERSST TAG spectral peaks. Thus, no one of the ESMs’ TAG spectrum has a significant peak at the 10–13 years ERSST TAG spectral peak, and the decadal ensemble-average MIROC5 peak is much weaker than the ERSST peak. Figure 4a–d also show that there are large variations among member spectra within each ESM’s ensemble, with the result that ensemble-average spectra have generally low spectral density and are devoid of significant peaks.

Fig. 4
figure 4

Estimated spectra of the tropical Atlantic sea-surface temperature index derived from ERSST (red) and historical runs (dark blue for ensemble mean and light blue for ensemble members) with a CM2.1, b HadCM3, c MIROC5, d CCSM4, and e MME. Dashed lines are corresponding red-noise spectra

Thus, simulated and observed TAG patterns are generally similar except in the exact locations of variance centers. The simulated TAG patterns, however, have a very substantial El Niño-like signature in the Pacific in all four ESMs and the MME whereas the ERSST TAG pattern does not. Thus, the tropical Pacific and Atlantic are too strongly coupled in the ESMs compared to observations. The average annual cycles of the TAG index in ERSST and the ESMs have exactly same phases, with the ESMs’ TAG amplitudes generally cooler throughout the year. The estimated Fourier spectrum of the ERSST TAG has several major and minor significant peaks at decadal periods (9, 10–13 years, and longer than 19 years), reflecting the dominance of decadal timescales in the TAG variability. Large variations among simulated TAG variability within each ESM’s ensemble renders their ensemble-average spectra generally featureless and dissimilar to the ERSST TAG spectrum. The earlier mentioned strong coupling between the El Niño-like tropical Pacific pattern and the TAG pattern in the ESMs may be a (or, the) cause of the absence of decadal spectral peaks in the simulated TAG variability. Major attributes of the observed and simulated TAG variability are summarized in Table 3.

3.1.3 The West Pacific warm pool variability

The dominant pattern of WPWP SST anomalies in ERSST (Fig. 5a) has the same sign of anomalies in East Indian Ocean Warm Pool, in the tropical West Pacific Ocean, along the South Pacific Convergence Zone, along the East Asia coast, the Barents Sea, near the west coast of tropical Africa, and near the west coast of southern South America. In the CM2.1 ESM, the ensemble-average WPWP SST pattern (Fig. 5b) is zonally uniform in the tropical Pacific and the tropical Indian Ocean, but the amplitude is much larger in western Pacific. The sign of SST anomalies in the tropical Indian and Atlantic Oceans is the same as in the WPWP region. The ensemble-average WPWP SST pattern in HadCM3 (Fig. 5c) is zonally uniform from coastal eastern Pacific to coastal western Indian Ocean, the Arabian Sea, and the Bay of Bengal. This pattern has amplitude maxima north and south of the equator in western Pacific. Outside of the western Pacific region, the pattern has same-sign anomalies along the west coasts of tropical North and South America, same-sign anomalies in tropical Atlantic, and opposite sign in central mid-latitude North and South Pacific. Thus, the ensemble-average HadCM3 WPWP pattern is very different compared to that in ERSST. In the MIROC5 ESM, the ensemble-average WPWP SST pattern (Fig. 5d) looks similar to an El Niño pattern spanning the entire equatorial Pacific Ocean, and smaller amplitudes in tropical North and South Pacific and along the North American coast to the Gulf of Alaska and the tropical South American coast. There are opposite-sign SST anomalies centered at 40°–45°N and 20°–25°S in the Pacific Ocean. The ensemble-average WPWP SST pattern in the CCSM4 ensemble (Fig. 5e) is generally similar to the ERSST pattern in the western Pacific, but has a strong La Niña – like pattern in central and eastern equatorial Pacific Ocean. The MME WPWP pattern (Fig. 5f) has largest amplitude in equatorial western Pacific, but also has substantial amplitudes in the central and eastern tropical Pacific regions.

Fig. 5
figure 5

Sea-surface temperature (SST) anomalies associated with 0.5 °C West Pacific Warm Pool index (detrended) based on linear regressions of monthly average SST over the 1861–2005 period for a observations (ERSST) and historical simulations with b CM2.1, c HadCM3, d MIROC5, e CCSM4, and f MME

The average annual cycle of amplitude of the ERSST WPWP index (not shown) has maxima in May and November, and minima in February and August (Table 4); the average WPWP SST index is approximately 28.1 °C. Although there are minor differences in the amplitudes of the average annual cycles of the WPWP indices in ERSST and the ESMs, they all have maxima and minima at the same times as Table 4 indicates. There is a pronounced linear trend in the ERSST WPWP index. After fitting and removing the trend in the entire index time series, the annual cycle phases remain the same as without detrending (not shown), but the annual cycle amplitudes in ERSST and the ESMs become (nearly) identical.

Table 4 Major attributes of observed (ERSST) and simulated (CMIP5 Earth System Models) West Pacific Warm Pool (WPWP) sea-surface temperature variability

The Fourier spectrum of the ERSST WPWP index (Fig. 6a, red) has a major significant peak at approximately 8-year period, with minor peaks at less than 4-year and 12–14 years periods. The ensemble-average Fourier spectrum of the CM2.1’s WPWP index shows that there are significant ensemble-average peaks at approximately 9.5 and 13–14 years periods (Fig. 6a, blue). Lastly, the CM2.1 ESM’s WPWP index spectrum has a large and significant increase in spectral density at longer than 18 years. The ensemble-average Fourier spectrum of the HadCM3 WPWP index is shown in Fig. 6b (blue). There are significant peaks in the HadCM3 index at almost the same periods as the ERSST index spectrum. The MIROC5 ensemble-average WPWP Fourier spectrum is dominated by large and significant spectral peaks at 4–7 and 8–10 years (Fig. 6c, blue). The latter peak is nearly coincident with a significant peak in the ERSST index spectrum at the 8–9 years period. The CCSM4 ensemble-average WPWP index time series has an upward trend after the 1960s, and some of the maxima and minima of low-frequency variability in the model index are in phase with those in the ERSST index before and after removing the overall linear trend. It is remarkable that the Fourier spectra (Fig. 6d, blue) of the ERSST and CCSM4 ensemble-average WPWP index time series have peaks at 2–4, 5–7, 8–9, and 13–15 years periods; the last three peaks in the model spectra, however, are substantial but not significant. The Fourier spectrum of the detrended WPWP index time series from the MME (Fig. 6e, blue) has peaks at approximately 9–10 and 12–14 years. While the former peak in the corresponding ERSST spectrum is at slightly shorter period, periods of the latter peak from the MME and ERSST are identical. Although Fourier spectra of members of each ensemble are substantially different, it is remarkable that ensemble-average spectra of all WPWP simulations have prominent peaks at almost the same decadal periods as the ERSST WPWP index peaks.

Fig. 6
figure 6

Estimated Fourier spectra of the West Pacific Warm Pool index (detrended) derived from ERSST (red) and historical simulations (dark blue for ensemble average and light blue for ensemble members) with a CM2.1, b HadCM3, c MIROC5, d CCSM4, and e MME. Dashed lines are corresponding red-noise spectra

Thus, simulated and observed WPWP SST patterns are very different as described. No one of the four ESMs simulates the WPWP variability confined to western Pacific Ocean, eastern Indian Ocean, and the South Pacific Convergence Zone. The simulated WPWP variability in three ESMs (CM2.1, HadCM3, and MIROC5) span the entire tropical Pacific Ocean and CCSM4 simulates a very strong, opposite-phase signal in the eastern equatorial Pacific Ocean. The average annual cycles of the WPWP pattern in ERSST and all four ESMs and the MME have perfectly synchronized extreme phases with slightly different amplitudes. The estimated Fourier spectrum of the ERSST WPWP pattern has the major spectral peak at 8–9 and 13–14 years. Ensemble-average spectra of WPWP indices in all ESMs also have spectral peaks at these two period ranges, with variations in spectral densities of these peaks among the ESMs. The four major attributes of the WPWP variability in ERSST and the four ESMs are summarized in Table 4.

3.2 Simulation skill of decadal climate variability phenomena

A free-running ESM with fixed and constant external forcings would not be able to simulate the time history of any observed phenomenon because of natural variability. In the CMIP5 simulation experiments, however, the external forcings (carbon dioxide, AOD, solar radiation) themselves have time histories based on observed estimates of these forcings since the 1850s CE. Therefore, there is a reason to expect that the time history of the simulated climate should resemble, to some extent, that of the observed climate for the same period. One should, however, expect model to model variations and variations even among members of the ensemble simulated with the same model. Also, the simulation skill would depend substantially on the amount of natural variability due to internal processes in the ESMs and the sensitivity of each ESM to variations in external forcings. Finally, it is important to estimate simulation skill to evaluate hincast skill of the same ESMs initialized with observed data at specific times to evaluate impacts of ESM initialization with actual, observed data. Quantitative assessments of overall simulation skill of SST indices of the three DCV phenomena by the four ESMs and the MME are described in this Section. Impacts of AOD changes are described in the next Section.

Correlation coefficients were estimated between simulated and observed, monthly DCV indices over only the 1961 to 2005 CE to compare with decadal hindcast skills for the 1961 to 2010 CE period described in MMW. Simulation skill of each DCV index in each ESM ensemble was also estimated over each decade of this period to compare with the decadal hindcast skills of the same ESMs described in MMW. Both groups of simulation skills (correlation coefficients), with 95% and 99% confidence limits, are shown in Fig. 7. The simulation skill for the three DCV indices over the 1961 to 2005 CE period, shown in Fig. 7a, is significant for the WPWP index for all four ESMs and the MME, followed by TAG (three out of four ESMs, and the MME), and PDO (two out of four ESMs, and the MME) indices; correlation coefficients between simulated and observed WPWP indices are as high as 0.45 to 0.7 over the 45-year period, implying that a substantial component of WPWP SST evolution over the 45 year period is due to external forcings. This is clearly not the case for the PDO and TAG indices.

Fig. 7
figure 7

Correlation coefficients between monthly indices derived from ERSST and ensemble average historical simulations a over the 1961–2005 period and over each decade for b Pacific Decadal Oscillation (PDO), c tropical Atlantic sea-surface temperature gradient (TAG), and d West Pacific Warm Pool (WPWP). Solid (dashed) gray lines denote the 99% (95%) significance level. The last decade only has 5 years’ data from 2001 to 2005

The decade-by-decade simulation skills are shown in Fig. 7b–d. In general, simulation skills of all indices fluctuate from decade to decade. The simulation skill of the PDO index (Fig. 7b) is not significant or is even zero in all four ESMs in the 1960s CE and the first decade of the twenty-first century (“the oughts”). Only MIROC5 has significant PDO simulation skill in the 1970s CE. CM2.1, MIROC5, and the MME have significant skill in the 1980s CE; and CCSM4, CM2.1, HadCM3, and the MME have significant PDO simulation skill in the 1990s CE. CM2.1 and HadCM3 have significant skill of TAG hindcasts (Fig. 7c) in four out of five decades, followed by the MME (three decades) and MIROC5 (two decades). CCSM4 does not show significant TAG hindcast skill in any decade. Except for HadCM3 in the 1980s, all four ESMs and the MME perform well in simulating WPWP index variability (Fig. 7d). In the 1960s and the 1990s CE, average (of four ESMs, and the MME) skill exceeded 0.5 and the skill of all ESMs and the MME reached 0.8 in the oughts.

Overall, of the 18 combinations of DCV indices and time periods shown in Fig. 7, there is significant simulation skill in 14 combinations in CM2.1, 12 combinations in MIROC5, 11 combinations in HadCM3, 7 combinations in CCSM4, and 13 combinations in MME. Thus, although model-dependent, there is significant - and occasionally substantial - simulation skill of the three DCV indices in this sub-set of CMIP5 ESMs.

To separate effects of linear trends and variability on simulation skill, all observed and simulated DCV index time series were detrended over the 1961 to 2005 CE period. The biggest effect of removing linear trends is on the WPWP index simulation skill in the 1970s of all four ESMs and the MME as shown in Fig. 8. A comparison of Figs. 7d and 8d shows that the linear trend increases the simulation skill of the WPWP index substantially, especially in the 1960s and the 1970s CE. The trend removal has minor effects on simulation skills of the PDO and TAG indices; HadCM3 and CM2.1 lose PDO simulation skill in the 1960s and the oughts, respectively, and HadCM3 loses significant TAG simulation skill in the oughts.

Fig. 8
figure 8

As in Fig. 7, but with detrended indices

These estimates of simulation skill imply interesting physics at work. As explained in the beginning of this Section, the simulated climate has variability/changes forced by variability/changes in external forcings, intrinsic variability/changes due to interactions among earth system components, and interactions between the two types. Since the PDO and TAG are expressions mainly of intrinsic variability, a moderate to high simulation skill of these phenomena would not be expected and whatever skill there may be would fluctuate over time as intrinsic variability of these phenomena go in and out of phase with the observed phenomena. The results in Figs. 7 and 8 show exactly this type of behavior of these two phenomena over the 45 years’ period of the simulation skill analysis. These figures also show that the simulation skill is not changed substantially by the removal of linear trends from the indices of these two phenomena because the trends are insignificantly small over the 45 years’ period. Some of the decade to decade fluctations in skill can be related to fluctuations in AOD associated with volcanic eruptions. This aspect is further explored in the next section. The simulation skill of the WPWP variability is substantially large and nearly constant in all four ESMs and throughout the analysis period. The removal of linear trend, however, removes a substantial deterministic component from the simulation of the WPWP index, with the result that the overall skill is reduced and natural variability’s role becomes more prominent (Fig. 8). The linear trend in WPWP SST, especially since the 1970s CE, and possible causes of the trend are being investigated and will be reported elsewhere.

3.3 Volcanic eruptions and simulation skill

As mentioned in Sect. 2, the CMIP5 experiments were run with prescribed carbon dioxide, AODs, and solar radiation; the AODs included those due to volcanic eruptions. In this Section, associations between low-latitude volcanic eruptions, and observed and simulated DCV indices are described and simulation skills of the four ESMs and the MME are interpreted accordingly. The full time series of observed and simulated SST indices from the four ESMs and the ERSST from 1861 to 2005 CE are used in this analysis to include as many volcanic eruption events as possible. There were over 30 low-latitude volcanic eruptions during the simulation period. On the Volcanic Explosivity Index (VEI) scale (Newhall and Self 1982), the most explosive eruptions during the simulation period were the Krakatoa, Indonesia, event in June–August 1883 (VEI 6); the Santa Maria, Guatemala, event in October 1902 (VEI 6); the Mount Pinatubo, Phillipines, event in June 1991(VEI 6); the Volcán de Colima, Mexico, event in January 1912–1913 (VEI 5); the Mount Agung (Bali), Indonesia, event in February–May 1963 (VEI 5); the El Chichón (Chiapas), Mexico, event in March–April 1982 (VEI 5); and the Volcan de Fuego (Guatemala) event in 1974–1975 (VEI 4). Apparent effects of these eruptions on net surface heat flux and its individual component fluxes, surface and sub-surface ocean temperatures, and indices of the PDO, the TAG variability, and the WPWP variability were analyzed and inferences were drawn about possible physics of the effects. In these ESMs, 0 m to approximately 5 m depth potential temperatures are defined as SSTs, and then potential temperatures at below top 5 m depth are defined at approximately 100 m, 200 m, 300 m, and lower depths. Exact depths at which these potential temperatures are defined vary among ESMs. CM2.1 CMIP5 data do not have latent heat flux, so computed net surface heat flux from this ESM does not have all components.

3.3.1 The West Pacific warm pool variability

In the time series of observed and simulated WPWP indices, and AOD, a decrease in the observed WPWP index can be seen immediately or a few months after each of the major eruptions (not shown). Impacts of these eruptions were felt by the ensemble-average WPWP index simulated by all ESMs and the MME to various extents. After removing overall linear trends from the observed and simulated WPWP time series, the association between major volcanic eruptions and decreases in the WPWP index becomes even clearer (not shown).

Composites of AOD and observed WPWP SST anomaly; and anomalous net surface heat flux, SST, and sub-surface ocean temperatures from each of the four ESMs during 3 VEI 6 and 4 VEI 4 and 5 events (hereafter referred to as VEI 4 + 5) led to insights into how and how much these events influenced WPWP SST index. These composites from 12 months before eruption to 60 months after eruption are shown in Fig. 9; please note that x-axis scale is the same for all ESMs and variables, but y-axis scales for each variable are not the same for all ESMs due to different sensitivities of the ESMs to AOD changes. In the VEI 6 eruptions, the observed WPWP SST index (Fig. 9, second row) begins to decrease as the AOD begins to build up in the atmosphere approximately 6–8 months before eruption peak. Although variability due to other dynamical forcings increases the WPWP SST index for a few months, the overall decreasing trend continues to 12 months after eruption peak when the index reaches the minimum (− 0.28 °C) and stays there for 8 months, subsequently increasing in the next 36–40 months. The composite, observed WPWP SST index in the VEI 4 + 5 eruptions shows a general decrease 12 to 20 months after eruption peak, but it is less (− 0.1° to − 0.2 °C) than the decrease for the VEI 6 eruptions; as for the VEI 6 eruptions, effects of other dynamical forcings on the WPWP SST index are evident (Fig. 9, second row).We will see now how each of the ESMs responded to the composite AOD changes.

Fig. 9
figure 9

Composite evolutions of net surface heat flux anomalies (Wm− 2) averaged in the West Pacific Warm Pool (WPWP) region, observed and simulated WPWP sea-surface temperature anomalies, and sea-surface temperature (0–5 m depth) and potential temperature anomalies (°C) at approximately 100 and 200 m depths averaged in the WPWP region simulated by the HadCM3, MIROC5, CCSM4, and CM2.1 Earth System Models (ESMs) in volcanic eruption events. Monthly anomalies with respect to 1861 to 2005 CE average from 12 months before to 60 months after eruptions, simulated by HadCM3 (boxes ad), MIROC5 (boxes eh), CCSM4 (boxes il), and CM2.1 (boxes mp) ESMs, are depicted. Composites contain averages of three eruption events of Volcanic Explosivity Index (VEI) 6 and four eruption events of VEI 4 and 5. Vertical dashed line at 0 month indicates peak eruption time. Legends for each row are shown in boxes(il). Each ESM’s name is shown above each column. See text for details

The top row in Fig. 9 shows monthly, net surface heat flux anomalies from each ESM, averaged in the WPWP region, from 12 months before to 60 months after composite AOD peaks in VEI 6 and VEI 4 + 5 events. All ESMs begin responding to AOD increases as eruptions begin and the net surface heat flux anomalies decrease to a maximum negative value approximately 4 months before the AOD peak. The duration and magnitude of the flux decrease varies among ESMs with maximum magnitude (− 8 to − 10 Wm− 2) in CCSM4 (Fig. 9i) and maximum duration (12–16 months) in CM2.1 (Fig. 9m) for the VEI 6 eruptions; the largest contribution to net surface heat flux anomalies is by decreasing downward shortwave radiation due to increased albedo. For the VEI 4 + 5 eruptions, maximum magnitude of decrease (average − 3 Wm− 2) is in CCSM4 (Fig. 9i) and maximum duration of the decrease (12–14 months) is in CM2.1 (Fig. 9m). SSTs and sub-surface ocean temperatures in the ESMs respond to various extents to these net surface heat flux changes. SST decreases begin as the net surface heat flux begins to decrease, but the minimum SSTs are reached 4–8 months after AOD peaks, except in MIROC5 the minimum SST is almost at the time of AOD peak. The SST decreases in VEI 6 eruptions range from − 0.2 °C in HadCM3 (Fig. 9b) to − 0.55 °C in CCSM4 (Fig. 9j). Sub-surface temperature anomaly composites for VEI 6 eruptions (Fig. 9, third row), averaged in the WPWP region, show that the volcanic signal is mixed substantially to approximately 200 m and then decreases by one or more orders of magnitude at lower depths (not shown). This depth of signal penetration is consistent with 150–200 m mixing depths estimated from anomalous net surface heat flux and time rate of change of temperature shown in Fig. 9. In VEI 4 + 5 eruptions, the depth of substantial penetration of volcanic signal is the same as in VEI 6 eruptions, but the temperature anomalies are smaller (Fig. 9, fourth row). Thus, vertical mixing of heat, forced by reduced net surface heat flux, seems to be the primary driver of WPWP SST response during and after volcanic eruptions. The delay and longer duration of temperature anomalies than those of net surface heat flux are due to vertical mixing of the heat anomaly in the upper ocean. It is interesting to observe in Fig. 9 (second row) that although the observed WPWP SST index shows effects of other variability, the underlying cooling of observed index lasts as long as the cooling in the ESM WPWP SST index; the magnitude of the cooling in the ESMs, as mentioned earlier, depends on the net heat flux anomaly generated by each ESM and the response of the ESM’s oceanic component to the generated flux anomaly. In the observed WPWP index and in upper ocean ESM temperatures in the WPWP region, the recovery time to average conditions appears to take 4–5 years. Thus, VEI 4 and higher explosivity volcanic eruptions clearly and substantially change the phase of the WPWP variability from positive to negative as these results show. It appears that higher correlation coeffcients between observed and simulated WPWP indices may have a substantial contribution from volcanic eruptions.

3.3.2 The Pacific decadal oscillation

Time series of observed and simulated PDO indices, and the AOD from 1861 to 2005 CE; and time series of individual members and averages of ensembles from the simulations for each of the four ESMs and the MME were compared (not shown). The VEI 4 to VEI 6 events can be clearly seen in the AOD time series. A decrease in the observed PDO index can also be seen immediately or a few months after many of these eruptions. The tropical–subtropical Pacific SSTs cooled so much after the 1883 CE (Krakatoa, Indonesia), 1902 CE (Santa Maria, Guatemala), 1912–1913 CE (Volcán de Colima, Mexico), 1982 CE (El Chichón, Mexico), and 1991 CE (Mount Pinatubo, Philippines) eruptions that the observed PDO index changed sign from positive to negative for several months to a year or longer after each eruption. As for the WPWP variability, composites of observed and ESM variables from 12 months before to 60 months after the eruption events led to insights into how and how much these events influenced the PDO evolution; ensemble-average variables were averaged in tropical Pacific and mid-latitude North Pacific regions where the simulated PDO patterns show largest amplitudes (Fig. 1). Such composites were also made of eruption events which began when the PDO was in negative phase and when it was in positive phase to gain further insights into the physics of eruption effects on PDO evolution. Due to different PDO states (signs and magnitudes) at the time of each eruption of the same VEI and substantial natural variability of the PDO, the physics of the ESMs’ responses appear clearer in individual events than in composites. It was also found in these analyses that CCSM4’s simulated PDO’s responses to volcanic eruptions are largest in magnitude and the ensemble-average CCSM4 PDO index is the closest to the observed PDO index among the four ESMs; this may be due to 20–30% larger values of the AOD used in CCSM4 as mentioned in Sect. 2.1. CCSM4’s larger response to volcanic eruptions is consistent with Ding et al. (2014)’s analyses of the responses of global-average SSTs to volcanic eruptions as simulated by several CMIP5 ESMs. Therefore, the evolutions of the observed and ensemble-average simulated PDO index; and ensemble-average, anomalous net surface heat flux, SST, and sub-surface potential temperatures during two VEI 6 events (1883 CE and 1991 CE) and two VEI 5 events (1912 CE and 1982 CE) in CCSM4 experiments are shown in Fig. 10 and described here.

Fig. 10
figure 10

Evolutions of net surface heat flux anomalies (Wm− 2) averaged in the tropical Pacific and mid-latitude North Pacific, observed and simulated Pacific Decadal Oscillation (PDO) indices, and sea-surface temperature (0–5 m depth) and potential temperature anomalies (°C) at approximately 100 and 200 m depths averaged in the tropical Pacific and mid-latitude North Pacific simulated by the CCSM4 Earth System Model in four volcanic eruption events. Monthly anomalies with respect to 1861 to 2005 CE average from 12 months before to 60 months after eruptions in 1883 (boxes ad), 1991 (boxes eh), 1912 (boxes il), and 1982 (boxes mp) CE are depicted. Vertical dashed line at 0 month indicates peak eruption time. Legends for each row are shown in boxes eh. Volcanic Explosivity Index of each eruption is shown above each column. See text for details

In all four eruption events, there is a 5–10 Wm− 2 decrease in net surface heat flux in tropical Pacific and North Pacific a few months before or at the time of AOD maximum as Fig. 10a–e, i–m show. Anomalous downward shortwave radiation is the largest contributor to net surface heat flux anomalies due to an increased albedo associated with volcanic matter injected into the atmosphere, especially the stratosphere. Then, as downward shortwave radiation slowly recovered after the AOD peak, downward longwave radiation continued to decrease and latent heat flux also began to decrease—perhaps, as a result of initial SST cooling. Figure 10a–e, i–m also show that a substantial minimum in net surface heat flux occurred approximately 12 months after the AOD peak, largely as a result of minima in downward longwave radiation and latent heat. Annual cycles of SST, surface winds, and atmospheric humidity and temperatures—in conjunction with decreasing downward longwave radiation and latent heat, and the remaining AOD in the atmosphere—perhaps are responsible for the substantial decrease in net surface heat flux in all four eruption events approximately 12 months after peak eruption. The ensemble-average PDO index in all four events felt effects of these two net surface heat flux anomalies at 12 months intervals as Fig. 10b–f, j–n show; the observed PDO index is also shown in these four figures. As Fig. 10b shows, the observed index was negative when the 1883 eruption of Krakatoa began. At the time of peak AOD, both the observed and model indices decreased slightly in response to the net surface heat flux decrease and then both indices increased and became positive perhaps due to internal ocean–atmosphere dynamics. The initial decrease and subsequent increase are evident in SST (0–5 m) and 100 m potential temperature anomalies averaged in the tropical Pacific (Fig. 10c). Ocean potential temperatures in the North Pacific felt the heat flux decrease 2 months after the AOD maximum and perhaps responded slower due to deeper mixing of the heat anomaly. This relative warming of the tropical Pacific with respect to North Pacific is reflected in the observed and model PDO indices becoming and staying positive for over a year after the AOD maximum. The 100 and 200 m potential temperature anomalies were negative when the eruption begain and were increasing perhaps due to internal dynamics. Then, approximately 4–6 months after the AOD maximum, they began to decrease as Fig. 10c shows. It is possible that the effect of these upper ocean thermodynamics influenced the 0–5 m temperature 12 months after the AOD maximum, combined with the second decrease in net surface heat flux 12 months after the first caused the 0-5m temperature to begin decreasing at this time. This decrease was 0.9 °C in 24 months (36 months after the AOD maximum). A few months after the tropical 0-5m temperature began to decrease, the North Pacific upper ocean temperatures began to recover from the cooling they had experienced due to net surface heat flux anomalies in the first 12 months after the AOD maximum. The combination of the cooling upper ocean in the tropical Pacific and warming upper ocean in the North Pacific resulted in the model PDO index to decrease substantially over almost 2 years as Fig. 10b shows. Effects of natural variability are also evident in the observed PDO index. Three years after the AOD maximum, the tropical Pacific upper ocean potential temperatures began to recover from the cooling and mid-latitude North Pacific upper ocean potential temperatures began to cool again; this combination caused the model PDO index to increase to positive which happened in the observed index a few months later. The VEI 6 eruption event of Mount Pinatubo in 1991 CE also had generally similar impacts on net surface heat flux (Fig. 10e), tropical Pacific and North Pacific upper ocean temperatures (Fig. 10g, h), and the simulated PDO index (Fig. 10f) as the 1883 CE Krakatoa event. The VEI 5 events in 1912 CE (Volcán de Colima, Mexico) and 1982 CE (El Chichón, Mexico) made smaller magnitude impacts on net surface heat flux, tropical Pacific and North Pacific upper ocean temperatures, and the simulated PDO index than the VEI 6 events, but the similarity of the simulated impacts on the PDO index in both VEI 5 events and their general similarity with the two VEI 6 events are unmistakable (Fig. 10i–l for the 1912 CE event and Fig. 10m–n, o–p for the 1982 CE event). It is, therefore, not surprising that average composites of all VEI 6 events and all VEI 4 and VEI 5 events simulated by CCSM4 depict genarally similar evolutions of fluxes, temperatures, and PDO indices as in the individual cases shown as described here. Among the other ESMs, CM2.1’s simulations of impacts of volcanic eruptions on the PDO is closest to CCSM4’s and the observed PDO, followed by MIROC5 (not shown). HadCM3’s simulations, even though substantially responsive in net surface heat flux in tune with the eruptions, have much smaller and noisier ocean temperature and PDO index responses.

It can be spectulated that the generally higher correlation coefficients between observed and simulated PDO indices in the 1980s and 1990s CE (Fig. 7b) could be partially due to the response of the ESMs to the 1982 and 1991 CE eruptions and subsequent recovery from them. It can also be speculated that the negative correlation coefficients in the oughts (Fig. 7b) reflected the absence of one or more major volcanic eruptions in this decade.

3.3.3 The tropical Atlantic SST gradient variability

Responses of the TAG index simulated by the four ESMs to VEI 4, 5, and 6 eruptions is an enigma. The observed and ensemble-average, simulated TAG indices from the four ESMs, along with the AOD, are shown in Fig. 11 from 1861 to 2005 CE. A sharp decrease in the observed TAG index can be seen immediately or a few months after some of the eruption events, especially the 1982 and 1991 CE events. As Fig. 11 shows, however, no one of these events evokes a substantial response in the TAG index simulated by any of the four ESMs. Analyses of net surface heat flux, ocean temperatures in tropical South Atlantic and tropical North Atlantic, and the TAG index simulated by each ESM show that quite large heat flux anomalies are generated in response to AOD changes in some of the eruption events, the ocean temperature responses they evoke are very different from the observed TAG index. Even when there are net surface heat flux anomalies of opposite signs in the tropical North and South Atlantic, temperature responses do not correctly simulate the observed TAG responses. So, it appears that these ESMs are deficient in their simulations of the tropical Atlantic Ocean temperature variability and its response to external forcing such as volcanic eruptions. Such a conclusion is consistent with a similar conclusion reached by Wang et al. (2013) and Xue et al. (2012) regarding tropical Atlantic Ocean simulation by ocean models.

Fig. 11
figure 11

Annual average tropical Atlantic sea-surface temperature gradient index (°C) (1861 to 2005) derived from ERSST (red) and historical simulations (dark blue for ensemble average and light blue for ensemble members) with a CM2.1, b HadCM3, c MIROC5, d CCSM4, and e MME. Green line shows annual average volcanic aerosol optical depth

4 Summary and discussion

We analyzed simulations of three DCV phenomena—the PDO, the TAG variability, and the WPWP variability—by the CCSM4, CM2.1, HadCM3, and MIROC5 ESMs, and an MME formed by combining data from the four ESMs; and compared them with observed data from ERSST. We estimated simulation skill as indicated by correlation coefficients between simulated and observed data. We also compared the simulated and observed DCV indices with AODs, net surface heat flux, surface and sub-surface ocean temperatures, and time (months, years) of volcanic eruptions. In these analyses, data from 1861 to 2005 CE were used. We found that:

  • Simulated and observed PDO spatial patterns have approximately the same share of total SST anomaly variance, and the overall patterns are not too dissimilar as indicated by spatial correlation coefficients and visual inspections. However, there are notable differences between the observed and simulated PDO patterns, especially in the North Pacific region. The average annual cycles of the PDO patterns have comparable amplitudes and extrema of phases 1 or 2 months earlier/later than the observed PDO’s extrema. The estimated Fourier spectrum of the ERSST PDO has a significant sub-decadal peak at 8.5 years and a decadal peak at 12 years. Spectra of simulated PDOs of all ESMs except MIROC5 show significant peaks near the ERSST’s decadal spectral peak, with variations in spectral densities of the decadal peaks among the ESMs.

  • Simulated and observed TAG spatial patterns are generally similar in the Atlantic except in the locations of SST variability centers; all simulated TAG patterns, however, have an El Niño-like signature in the tropical Pacific which the observed pattern does not. The average annual cycles of the TAG patterns have comparable amplitudes. Extreme phases have different times, with HadCM3 and CM2.1 ESMs simulated phases shifted by nearly 180° compared to the extreme phases of the ERSST TAG pattern. The estimated Fourier spectrum of the ERSST TAG has significant peaks at 9, 10–13 years, and greater than 19 years periods. Ensemble-average spectra of simulated TAG variability in all ESMs are generally featureless which may be due to the strong coupling with the interannual tropical Pacific variability.

  • Simulated and observed patterns of WPWP SST variability are very different. No one of the four ESMs or the MME simulates the WPWP SST variability pattern. The average annual cycles of the WPWP pattern in ERSST and all four ESMs and the MME have comparable extreme phases with almost the same amplitudes. The estimated Fourier spectrum of the WPWP SST pattern has a significant spectral peak at 8–9 years in ERSST, HadCM3, and MIROC5, with variations in spectral densities of these peaks among the ESMs. CM2.1 and CCSM4 simulations of the WPWP variability show insignificant spectral peaks at decadal timescales.

  • The ERSST PDO index decreased immediately or a few months after major volcanic eruptions. The index changed sign from positive to negative for several months to a year or longer after very powerful eruptions in 1883 CE, 1912–1913 CE, 1982 CE, and 1991 CE. CCSM4 net surface heat flux responded the strongest to these four events, perhaps because of the 20–30% larger AODs specified in this ESM. Simulations of evolutions of ensemble-average PDO index by CCSM4 was the closest to the observed PDO index in each of these four eruptions. Analyses of surface and sub-surface ocean temperatures showed that immediate and delayed responses of the tropical Pacific and mid-latitude North Pacific temperatures and, consequently the simulated PDO index, were due to vertical mixing of net surface heat flux anomalies down to 150–200 m below the surface. The ensemble-average PDO indices from CM2.1 (5 events), HadCM3 (3 events), MIROC5 (5 events), and the MME (3 events) also felt effects of some of the major eruptions to various degrees. The PDO index state (amplitude and sign) at the time of an eruption appeared to be important in determining the magnitude and delay of response. The ERSST WPWP index decreased after all major eruptions as did the ensemble-average WPWP SST indices from all ESMs and the MME. Immediate responses of the ESMs’ net surface heat fluxes to major eruptions and long-term responses of surface and sub-surface ocean temperatures were generally similar to observed responses of the WPWP SST index. Although the observed TAG index showed a decrease after some of the eruption events, the ensemble-average TAG indices from all four ESMs did not show responses similar to the observed changes despite the fact that all ESMs’ net surface heat flux responses in the tropical North and South Atlantic were substantial.

  • There is significant, but model-dependent, simulation skill in this sub-set of CMIP5 ESMs and the MME, especially in the years following moderate to major volcanic eruptions.

In response to the four questions posed in Sect. 1.2, we found that major attributes of the simulated PDO in the CMIP5 historical simulations with the selected ESMs and the MME bear substantial resemblance to their counterparts in observed data, including spatial patterns, average annual cycle, and significant spectral peaks from interannual to decadal timescales. The simulated TAG variability has no preferred timescale, unlike the observed variability which has very prominent decadal timescales, and the spatial pattern of the simulated WPWP variability is very different compared to the observed pattern. There is, however, an intriguing similarity of preferred timescales in the observed and simulated WPWP SST variability. There are substantial and significant, but fluctuating, simulation skills of all three DCV phenomena in all four ESMs and the MME. The WPWP variability simulations show moderate to high skills, suggesting that external forcings, rather than internal variability, determine long-term evolution of the WPWP SST in observations and simulations. Preliminary analyses of surface energy balance in the WPWP region in the ESMs suggest that increasing net downward heat flux, perhaps due to increasing carbon dioxide, is forcing the long-term SST evolution in this region. Further analyses are in progress and will be reported elsewhere.

Volcanic eruptions, especially the more explosive ones, appear to make substantial impacts on observed and simulated DCV indices. These impacts imply that volcanic eruptions can influence global atmospheric dynamics and climate not only directly via interactions between ejected material in the atmosphere and short- and long-wave radiations, but also via influencing DCV phenomena’s impacts on global climate. While the similarities in observed and simulated PDO variability imply that these ESMs are suitable for conducting PDO predictability and prediction experiments, the differences between observed and simulated TAG and WPWP variability are not negligible and their potential impacts on global climate simulation and predictability require further analysis. Impacts of these similarities and differences on decadal predictability are further discussed in MMW. From the point of view of understanding the physics of the DCV phenomena in these ESMs, further diagnoses and assessment of the ESMs’ ability to reproduce forcing mechanisms, as in Yim et al. (2014) and Zhang and Delworth (2015), are needed. Finally, in view of the remarkable ability of CCSM4 with 20–30% larger AODs to simulate impacts of volcanic eruptions on the PDO and the WPWP variability, and also on global- and hemispheric average SSTs (Ding et al. 2014), it is very important to arrive at a consensus on AODs to be prescribed in all future ESM experiments including CMIP6 for a fair comparison of ESMs’ ability to simulate and predict climate and its variability and changes.