1 Introduction

The Pacific Decadal Oscillation (PDO) is a dominant pattern of decadal sea surface temperature (SST) variability over the mid-latitude Pacific basin and has well-defined spatial characteristics. It can affect the marine and terrestrial ecosystems (Mantua et al. 1997; Newman et al. 2016) and drive climate changes in many regions such as North America and East Asia (Zhou and Wu 2017). This decadal variability also has a marked influence on the marine ecosystems, climate and vegetation conditions in the pan-Pacific regions, including changes in precipitation and air temperature (Mochizuki et al. 2010). For example, the PDO regime shift in the late 1990s led to a devastating and profound drought in California (Li et al. 2020). Also, evidence has shown that the PDO also plays an important role in modulating the global warming trend, contributing to the global warming hiatus since the late 1990s (Xie et al. 2016). Consequently, successfully predicting the state of the PDO several years in advance has substantial benefits in many industries including fishery, agriculture and water resource management (Wen et al. 2014).

However, the PDO is highly unpredictable because it combines processes of different origins on various time scales, instead of being a single physical mode (Cassou et al. 2018). The processes that contribute to the PDO variability, dynamics and regime shifts mainly include: (1) local stochastic forcings induced by fluctuations in the Aleutian low; (2) local ocean dynamics in North Pacific; (3) tropical forcings which propagate into the North Pacific through the atmospheric bridge (Newman et al. 2016; Farneti 2017). Almost all models tend to underestimate the PDO variability, associated with negligible advancement from CMIP3 to CMIP5 (Farneti 2017). Developing the skill of North Pacific decadal prediction has so far been a vital target of substantial ongoing research (Meehl et al. 2009; Ding et al. 2013; Farneti 2017; Cassou et al. 2018).

From the theoretical perspective, the decadal prediction has some predictable signals in the initial state, from which we can exploit more prediction potential (Meehl et al. 2009). This is because, on the decadal range, the prediction encompasses aspects of both an initial condition and a forced boundary condition problem. A successful decadal prediction requires a good simulation of both the external forced (both anthropogenic and natural) and internally generated components of the system (Meehl et al. 2009; Boer et al. 2016). The initialization can, not only provide information about the phase of the unforced internally generated climate variability, but also correct the response to external forcings distorted by systematic errors in the model (Doblas-Reyes et al. 2013). This gives birth to the emergence of the “Near-term Decadal” experiments in CMIP5 and the subsequent counterpart in CMIP6 – the component A of Decadal Climate Prediction Project (DCPP-A) experiments as a contribution to CMIP6 (Boer et al. 2016). DCPP-A comprises an ensemble of decadal retrospective forecasts, initialized at yearly interval and forced by prescribed CMIP6 historical external forcings. It is designed to assess the effect of initialization on the climate predictions that has been ignored by traditional climate projections (Doblas-Reyes et al. 2013; Boer et al. 2016). In recent years, more and more research has investigated the possible improvement of predictive skills with initialization (Doblas-Reyes et al. 2013; Kim et al. 2014; Wang et al. 2019; Bethke et al. 2021). Results based on CMIP5 hindcasts revealed that, there is a skill in decadal predictions over certain regions including the North Atlantic, the northwestern Pacific, and the Indian Oceans. Most of the skill comes from the time-varying external radiative forcings, both natural and anthropogenic (Doblas-Reyes et al. 2013). The initialization displaces robust improvement for a couple of forecast years on SAT predictions over limited ocean areas (e.g., the North Atlantic, the southeast Pacific and the Indian Ocean) and an overall improvement in the global-mean SAT prediction (Doblas-Reyes et al. 2013; Meehl et al. 2014). Also the benefits of initialization are most remarkable in the first year of forecast and decline at multi-year forecast range where skill is mostly provided by external forcings (Corti et al. 2015; Boer et al. 2016; Bethke et al. 2021). However, over the North Pacific, the initialization shows no robust skill improvement in SAT hindcast due to the comparably un-prominent predictable linear trend in the North Pacific, the failure in hindcasting the most extensive warming events. The inconsistency of SST and upper ocean heat-content predictions also accounts for the negligible effect of initialization on SAT forecast over the North Pacific (Doblas-Reyes et al. 2013). Although a few studies argue that the initialization of the upper-ocean state works effectively in the successful prediction of the PDO state or Pacific temperature based on the simulation of specific models (Mochizuki et al. 2010; Wang et al. 2013; Wu et al. 2015, 2018), current models generally have limited skill in reproducing spatial patterns of the PDO variability over the northern part of Pacific basin (Farneti 2017). Regarding this, we propose two questions: Is the predictive skill on PDO improved in CMIP6? Can initialization help to promote predictive skill?

A recent work based on climate network analysis has provided a novel perspective for the early warning of the PDO regime shift (Lu et al. 2021). The climate network analysis is an emerging approach that has been shown to have predictive power for many climate phenomena such as El Nino events (Ludescher et al. 2013; Meng et al. 2020), extreme droughts (Ciemer et al. 2020), extreme rainfall events (Boers et al. 2019), the onset and withdrawal of Indian summer monsoon (Stolbova et al. 2016), etc. Lu et al. (2021) presented that an early warning signal preceding the PDO regime shift can be detected using the climate network method, which successfully forewarned all 6 PDO regime shifts between 1890 and 2000s. This signal virtually represents the cooperative behaviours of the SST variations at different grid cells in the North Pacific, denoted by a sudden enhancement of negative correlations between the nodes in the northwestern Pacific and northeastern Pacific several years before the regime shift occurrence. It is also closely associated with the upper ocean heat-content variation in the northwestern off-equatorial Pacific (Lu et al. 2021). In this work, we will investigate if the same warning signal exists in CMIP6 simulations, with special attention paid to the near-term decadal predictions of the two major PDO regime shifts in the late-1970s and late-1990s.

The rest of the paper is organized as follows. In Section 2 the data and methodology used in this article are described. In Section 3, the performance of multiple models participating in CMIP6 on simulating PDO and the early warning signal without initialization is first evaluated. Then the improvement of hindcast skill by the initialized process using initialized decadal hindcasts from the DCPP experiment is assessed. Section 4 summarizes this paper and discusses some possible ways for further improvement on the PDO prediction using climate models.

2 Materials and Methods

2.1 Datasets

The monthly SSTs from historical simulations of 39 CMIP6 models (listed in the legend of Figs. 1, 2) over the period 1854–2014 are used here to evaluate the performance of current CMIP6 models in reproducing the SST over the North Pacific. The Version 4 of the Extended Reconstructed Sea Surface Temperature (ERSST-V4; Huang et al. 2015) from the National Oceanic and Atmospheric Administration (NOAA) with a period of 1854–2019 and a 2° × 2° horizontal resolution is also used in this article for the validation of simulations.

Fig. 1
figure 1

a The investigated climate network over the North Pacific using ERSST-V4 Sea Surface Temperature (SST) gridded data. The black line separates the regions for positive and negative anomalies of PDO spatial pattern. (b) The same as (a) but using the multi-model mean of 39 models’ historical simulations from CMIP6. (c) Relationships between the Pacific Decadal Oscillation (PDO) index (red line) and total degrees (TDA→B; blue line) from Region A (red dots) to Region B (blue dots) using ERSST-V4 SST; only negative links with 0 < \({\theta }_{max}\)<20 are considered while calculating the TDA→B. The red spots located on the black line indicate the time when the PDO regime shifts occur. The dashed blue line represents the threshold (2 times the TDA→B STD) identifying if the TDA→B decrease is significant. The Heidke Skill Score (HSS) index on the right corner is calculated with the number of PDO regime shifts spanning from 1869 to 2014 and the impulses in the TD series using Eq. (3). (d) The PDO index for each ensemble member (grey line) and the multi-model mean (black line); the red line marks 0 value. The PDO indices have been calculated as 11-year running means

Fig. 2
figure 2

Taylor diagram of the PDO spatial pattern series for 39 ensemble members compared with that for ERSST-V4. Here the figure for ERSST-V4 is calculated using the data from 1854 to 2014, which is consistent with the period of historical simulations. The star signal denotes the figure for the multi-model mean

After the evaluation of historical simulations, we then estimate the effect of initialization on the hindcasts by comparing the results obtained from uninitialized historical simulations with those from the initialized decadal hindcast simulations (dcppA-hindcast; Boer et al. 2016). The decadal hindcasts utilized here are from two ensemble members of the CMIP6 dcppA-hindcast experiment, whose detailed information are summarized in Table 1. The dcppA-hindcast experiment is a set of multi-model ensembles of hindcasts initialized at yearly intervals from 1960 onward, with a duration of approximately 10 years. The initial conditions of dcppA-hindcast simulations are provided by assimilation simulations (dcppA-assim), in which oceanic profile and atmospheric (not necessary) observations are assimilated by a dynamical model adopting sophisticated initialization strategies (the third column of Table 1). In fact, dcppA-hindcast SST of 9 models are available but we only utilize that of 2 models here, because only the SST reanalysis observations of these two models for generating the initial conditions in the assimilation process are available. Other models either assimilate only the potential temperature without making use of SST data or have too many default values in their SST observations.

Table 1 The ensemble members of CMIP6 dcppA-hindcast experiment used in this article

Since the PDO has decadal scale climate variability, a long time series spanning at least several decades are required to investigate its temporal-spatial variation characteristics. Therefore, we reconstruct SST series of longer range by connecting the observational SST (only data after 1890 is used because of a large uncertainty in the earlier years) with the hindcast SST time series at the starting date of hindcasts. Since the initial condition of the 10-year-long decadal hindcast from BCC-CSM2-MR is derived from ERSST-V4 SST observation, the ERSST-V4 is used to connect the BCC-CSM2-MR hindcasts. Using the observations participating in the initializing processes of hindcasts can make sure that discontinuities between the observations and the retrospective forecasts are avoided, ensuring that they are of the same phase and have no significant change at the linking time point. The two time series have also been standardized before the conjunction to ensure they have the same magnitude. Similarly, we concatenate the corresponding observation with the NorCPM1 decadal hindcast. Monthly SST observation for the initialization of NorCPM1 is from the version 2 of HadISST dataset of the Met Office Hadley Centre (HadISST2.1.0.0; Rayner et al. 2003), which offers 10 realizations of monthly SST with a 1° × 1° resolution over the period of 1850–2010. The average of the 10 realizations is considered as the SST observation.

2.2 Methods

We apply the climate network techniques to the North Pacific SST network and produce evolving climate networks to investigate the performance of different models in CMIP6. The nodes of the networks are identified with grid points in the studied region (e.g., Fig. 1a, b). The network set up with observation-BCC-CSM2-MR decadal hindcasts concatenated series has a horizontal resolution of 4° × 4°, and that of observation-NorCPM1 decadal retrospective predictions is 5° × 5°. This article defines the link strengths between different nodes using the same method as in Lu et al. (2021). The detailed procedures are as follows:

First, calculate the sea surface temperature anomaly (SSTA). For each time point t, we subtract the 30-year seasonal cycle before time t from the SST value at time t and repeat it at all gridded nodes, such that the SSTA series Tk(t) over the considered area are achieved.

Second, investigate the correlation structure of certain predefined regions. Here the North Pacific is divided into two regions based on the sign of the spatial pattern of the leading Empirical Orthogonal Function (EOF) on monthly SSTs; the region beside eastern Asia is Region A and the region along the North American coast is Region B (e.g., Fig. 1a). For each pair of nodes i and j located respectively in Region A and Region B, define their link for each month using the cross-correlations \({C}_{i,j}^{t}(\theta )\) over a window of 66 months after the considered time point t, with time lags \(\theta\) varying from -100 to 100 months:

$${C}_{i,j}^{t}(\theta )=\left\{\begin{array}{c}\frac{\langle {T}_{i}(t-\theta ){T}_{j}(t)\rangle -\langle {T}_{i}(t-\theta )\rangle \langle {T}_{j}(t)\rangle }{\sqrt{\langle {({T}_{i}(t-\theta )-\langle {T}_{i}(t-\theta )\rangle )}^{2}\rangle \langle {({T}_{j}(t)-\langle {T}_{j}(t)\rangle )}^{2}\rangle }}, \theta >0, A\to B\\ \frac{\langle {T}_{i}(t){T}_{j}(t+\theta )\rangle -\langle {T}_{i}(t)\rangle \langle {T}_{j}(t+\theta )\rangle }{\sqrt{\langle {({T}_{i}(t)-\langle {T}_{i}(t)\rangle )}^{2}\rangle \langle {({T}_{j}(t+\theta )-\langle {T}_{j}(t+\theta )\rangle )}^{2}\rangle }}, \theta <0, B\to A\end{array}\right.$$
(1)

where the bracket represents a temporal average over the current window. As such, for each pair of nodes from Region A and Region B, we obtain the \({C}_{i,j}^{t}(\theta )\) series varying from t and \(\uptheta\).

Third, choose the maximum absolute correlation within all time lags (-100 < \(\uptheta\)<100 months) as the link strength \({C}_{i,j}^{t}({\uptheta }_{\mathrm{max}})\). Here the \({\uptheta }_{\mathrm{max}}\) can represent the link direction. \({\Theta }_{\mathrm{max}}>0\) indicates the link direction pointing from I to j, and vice versa for \({\uptheta }_{\mathrm{max}}<0\).

After constructing the network, we calculate the node degree \({K}_{i}(t)\) by summing up all the nodes in Region B that are connected with node I in Region A. The total degree (TD) can be obtained by accumulating all the node degrees in Region A:

$$TD(t)=\sum \nolimits_{i\in A, j\in B}{C}_{i, j}^{t}({\theta }_{max})$$
(2)

By accumulating the links with \({\uptheta }_{\mathrm{max}}>0\), the total degree from Region A to Region B (TDA→B) is computed. This article considers only the negative links and the links with 0 < \({\uptheta }_{\mathrm{max}}\)<20 months while calculating the TD(t). When the TD(t) exceeds the threshold TDH(t), which is defined as two times the standard deviation (STD) of TD(t) in this article, a warning is considered to happen at time point t. Lu et al. (2021) employed this method to uncover 7 early warnings from 1890 to 2000s, which successfully forewarn all the 6 PDO regime shifts occurred during this period.

In order to assess the performance of models/observations in simulating/representing the early warning signal, we employ the Heidke Skill Score (HSS; Boers et al. 2014), a score widely used to measure the accuracy of forecasts relative to that of random chance. It is defined as:

$$HSS=\frac{2(ad-bc)}{(a+c)(c+d)+(a+b)(b+d)}$$
(3)

where a stands for the number of events with correct forecasting alarms, b measures the number of alarms with no following event, c counts the times where no alarm was issued but an event occurred, and d is the number of instances of no alarm and no event. A score of HSS = 0 represents the skill of a random forecast, whereas a value of HSS = 1 indicates a perfect forecast.

To quantify the ability of models to represent the PDO spatial pattern, the S-index is applied to measure the similarity of observed and predicted patterns (Taylor 2001). This score is defined as follows:

$$S=\frac{4(1+R)}{{(\sigma +\frac{1}{\sigma })}^{2}(1+{R}_{0})}$$
(4)

where R0 denotes the maximum correlation attainable (set to 1 in this article), R denotes the correlation between observation and simulation, and \(\sigma\) denotes the ratio of the STD of simulation against that of the observation. As the simulated spatial pattern approaching the observed one, R \(\to\) 1 and \(\sigma \to\) 1, and S \(\to\) 1.

3 Results

3.1 Performance of CMIP6 Models in Simulating the PDO and the Early Warning Signal

In order to assess the performance of current state-of-art models in simulating the PDO, we first employ the EOF on the monthly SSTA from historical simulations of 39 ensemble members in CMIP6. The obtained time series of the leading principal component is identified as the PDO index. Figure 1b depicts the average spatial pattern for all ensemble members, and each grey line in Fig. 1d is the PDO index for each ensemble member. The black line in Fig. 1d indicates the multi-model mean for the PDO index. It should be noted that the PDO indices have been calculated as 11-year running means to extract the variability of PDO on a decadal time scale, during which the average value is put in the intermediate time point of each sliding window. The values at the beginning and at the end of the 1854–2014 time period (60 months long for each) are not shown. As a comparison, we also apply the same method to the observational data (ERSST-V4), whose PDO spatial pattern and PDO index are shown in Fig. 1a and the red line in Fig. 1c, respectively. The spatial patterns of the PDO simulated by most models are well-defined, characterizing a horseshoe pattern of SSTA with one sign in the central North Pacific surrounded by SSTA of opposite sign (figures not shown). Based on the sign of the spatial pattern of PDO, the North Pacific is separated into Region A and Region B (Fig. 1a, b). The border between Region A and B for the ensemble members (Fig. 1b) is mainly located at the same place as that for the observation (Fig. 1a), indicating that models have a good skill in resolving the PDO spatial pattern in general, in reasonable agreement with Farneti (2017).

However, basically no member can represent the decadal variation of the PDO index and the regime shifts correctly (Fig. 1d), consistent with the results of prior research (Doblas-Reyes et al. 2013; Farneti 2017; Cassou et al. 2018). There are huge differences among the PDO indices of all the ensemble members, regardless if their amplitudes or phases are considered (Fig. 1d), and their average index shows nothing besides values approximately equal to zero.

Using Eq. (4), the S-index for each ensemble member is computed to further examine each member’s skill in simulating the PDO spatial pattern. The investigated members are listed in order by their score in the legend of Fig. 2. The Taylor diagram shows that the PDO spatial patterns of most members are quite similar to that of the observation in terms of their STDs and their correlation coefficients, except a few members, including IPSL-CM6A-LR-INCA, EC-Earth3-CC, NESM3, CanESM5, MPI-ESM-1–2-HAM, EC-Earth3-Veg. Then we pick out the top 6 scored members and further examine their simulated PDO index and their ability to simulate the early warning signals, namely CESM2, ACCESS-CM2, NorESM2-MM, CMCC-ESM2, CESM2-WACCM and SAM0-UNICON.

Although these models have skill in simulating the spatial pattern of the PDO, their simulated PDO indices (Fig. 3, red lines) are far away from the observational one (Fig. 1c, red line) despite their phases. Although there seems to be decadal variability in these indices associated with several regime shifts, the regime shifts are not evenly distributed over the late-1890s to 2015 as that of the observation. The resolved regime shifts are either too sparsely distributed or too tightly distributed over specific periods (Fig. 3a-f). The outlines of these simulated PDO indices are also not successfully predicted. There are huge differences between the simulations of different ensemble members, revealing a high level of uncertainty in the PDO index forecast. Actually, the uncertainty in the assessment of the PDO index is also found in the observational data (Wen et al. 2014).

Fig. 3
figure 3

Relationships between the PDO index (red line) and TDA→B (blue line) using the historical simulation of the top 6 scored ensemble members. (a) CESM2 (b) ACCESS-CM2 (c) NorESM2-MM (d) CMCC-ESM2 (e) CESM2-WACCM (f) SAM0-UNICON. The historical simulations were only initialized once in 1850 and have bad behaviour in predicting the PDO index and TD index. Only negative links with 0 < \({\theta }_{max}\)<20 are considered while calculating the TDA→B. Only PDO regime shifts from 1869 to 2009 are considered while calculating the HSS

In order to examine if the model can simulate the early warning signal before the regime shift, we first construct the climate network between nodes in Region A and nodes in Region B and define the link strength using Eq. (1). Then by applying Eq. (2), we sum all the negative links (\({C}_{i,j}^{t}\left({\uptheta }_{\mathrm{max}}\right)<0\)) pointing from Region A to Region B with 0 \({<\uptheta }_{\mathrm{max}}<20\) and obtain the TDA→B (Lu et al. 2021). The TDA→B for observational SST is shown in Fig. 1c (blue solid line), the warning signals revealed by which can forewarn the regime shifts of the PDO. Note that Lu et al. (2021) used the ERSST-V3b data (Smith et al. 2008), while in this study we analyze the ERSST-V4 data, which gives similar results. The TDA→B for the 6 ensemble members with the best performance in resolving the PDO spatial pattern is shown in Fig. 3a-f (solid blue lines). The total regime shifts count throughout 1869–2009 for the top 6 scored members are listed in the second column of Table S1, and the number of alarms correctly forewarn the regime shift no more than 10 years in advance is listed in the second column of Table S1. Figure 4 illustrates the false alarm rates and the hit rates of the top 6 scored ensemble members, in which the hit rate indicates the ratio of the correctly alerted PDO regime shift events, and the false alarm rate indicates the proportion of the false alarm numbers to the total alarm numbers. Among these 6 members, the HSS of CESM2-WACCM is the highest (Fig. 3e), indicating that CESM2-WACCM has the best performance in capturing the correspondence of early warning signals and the PDO regime shifts. In addition, the HSS of NorESM2-MM and CMCC-ESM2 is close to 0.5 (Fig. 3c, d). These indicate that some models to some extents have the skill to simulate the early warning signal corresponding to the subsequent regime shift, despite of the chronological accuracy. In other words, a few models can represent the cooperative behaviours of nodes in the North Pacific.

Fig. 4
figure 4

The hit rate and false alarm rate for ERSST-V4 and the historical simulations of the top 6 scored (based on their spatial patterns) ensemble members for regime shifts occurred in 1869–2014

3.2 Effect of Initialization on Hindcasts of PDO and Early Warning Signals

The result above using historical simulations from CMIP6 reveals that the predictive skill of current state-of-the-art models is only limited in simulating the PDO spatial pattern and nearly absent regarding the PDO index. In this sector, research will be conducted on whether the initial state can improve models’ predictive skill using SST from the DCPP experiment. Note that the decadal hindcast experiments use the same external forcings as for the historical simulations. Therefore, the difference between the decadal retrospective forecasts and the historical simulations merely represents the additional information from the initial states. Since the DCPP data is only available from 1960, we only focus on two PDO regime shifts in the late-1970s and the late-1990s.

Consistent with Ding et al. (2013), our results indicate that initial-condition information is conducive to successful hindcasts of the PDO compared with uninitialized historical simulations, most significant in the simulation of the PDO index (Figs. 5, 6). In terms of the PDO spatial pattern, models considering initial state also perform as well as the historical simulations without initialization do (Figures not shown). Regarding the PDO index, the non-initialized simulated PDO indices are far from the observed one, and so are the simulated regime shifts in the late-1970s and late-1990s (as shown in Figs. 1d, 5a and 6a), implying a small contribution from forced boundary conditions. While after initialization, hindcasts of the PDO index have considerable skill for a couple of years, as shown in Figs. 5 and 6, based on the output of 2 ensemble members participating in CMIP6 DCPP, namely the NorCPM1 and the BCC-CSM2-MR. It is clear that the initialized hindcasts of NorCPM1 and BCC-CSM2-MR have shown some success in the simulation of the PDO cool phase in the mid-1970s, and the warm phase in the mid-1990s (red lines in Figs. 5, 6), with NorCPM1 hindcasts initialized from the observed state in 1971 beginning to successfully represent the late-1970s PDO regime shift (Fig. 5c), and NorCPM1 hindcasts starting to succeed in simulating the late-1990s regime shifts from the initialization in 1992 (Fig. 5f). The performance of BCC-CSM2-MR is relatively worse than that of NorCPM1. Not until the initialization in 1974 and 1994 can it successfully simulate the PDO regime shift in the 1970s and 1990s, respectively (Fig. 6c, f), but its retrospective forecast also shows significant improvement after incorporating the information of the initial state. The reason for this improvement by initialization on the predictive skill is that the initialized process can not only put the model in phase with observed internal variability (as shown in Figs. 5, 6) but also correct the systematic error to that point (Bilbao et al. 2021). However, as forecasts evolve, the forecast system gradually loses the initial state information and approaches a forced state (Boer et al. 2016). Therefore, the improvement of initialization is most significant in the first few forecast years (Bilbao et al. 2021). It is worth noting that the red and yellow lines in Figs. 5, 6b-g are not identical even in early periods before the model outputs are connected with the observations, i.e., 1960s in Figs. 5, 6b-d, 1980s in Figs. 5, 6e-g. This is mainly due to the nature of the EOF analysis. The inclusion of ten years hindcast data, which is different from the ten years observations, can affect the PDO indices even in earlier periods. In addition, a large difference between the yellow and the red lines can be observed in 1968 in Fig. 6b, indicating a potential initialization shock for the hindcast initialized in 1973, since values of the first few months from the 1973-intialized hindcast are included in the calculation of the 11-year sliding mean PDO index in 1968. For more details about this issue, please refer to the supplementary information (Figs. S1-S5).

Fig. 5
figure 5

Relationships between the PDO index (red line) and TDA→B (solid blue line) calculated with non-initialized/initialized simulated SST anomalies from the ensemble member NorCPM1. (a) The PDO index calculated with non-initialized historical simulation SST (red line) and observational HadISST2 SST (yellow line) over the period of 1890–2009, and TDA→B time series (blue solid line) calculated with Non-initialized historical simulation SST over the same time period; the rest subpanels describe the PDO indices (red lines) of dcppA-hindcast SST initialized in October in (b) 1970; (c) 1971; (d) 1972; (e) 1991; (f) 1992; (g) 1993. The TDA→B and PDO indices in (b)-(g) are calculated with SST time series concatenated by the standardized HadISST2 observation starting from 1890 and 10-year-long decadal hindcasts from NorCPM1. The yellow lines in (b)-(g) denote the HadISST2 observation with the same period of the concatenated observation-hindcast time series as comparisons to the red lines. Only negative links with 0 < \({\theta }_{max}\)<20 are considered while calculating the TDA→B. The dashed blue lines represent the threshold (2 times the TDA→B STD) identifying if the TDA→B decrease is significant. The PDO indices have been calculated as 11-year running means

Fig. 6
figure 6

The same as Fig. 5 except that the SST anomalies are from the ensemble member BCC-CSM2-MR. Subpanels (b)-(g) describe the PDO indices (red lines) of dcppA-hindcast SST initialized in January in (b) 1973; (c) 1974; (d) 1975; (e) 1993; (f) 1994; (g) 1995. The TDA→B and PDO indices in (b)-(g) are calculated with SST time series concatenated by the standardized ERSST-V4 observation starting from 1890 and 10-year-long decadal hindcasts from BCC-CSM2-MR. Compared with Fig. 5, the initialized decadal hindcasts from BCC-CSM2-MR show lower skill regarding both the PDO index and TD index hindcasts

The early warning signals for PDO regime shifts can be manifested in decadal hindcasts with initialization. The dashed lines in Figs. 5, 6 denote the threshold of TDA→B(t). When the TDA→B(t) is lower than the threshold, an early signal is considered to happen at time point t. Note that the calculation of TDA→B(t) at time point t mainly involves 66-month long SST data after t, so if a decrease of TD(t) happens after the initial year, this signal is contributed by the model data. Without initialization, NorCPM1 and BCC-CSM2-MR can not capture the impulse of TD series both in the 1970s and 1990s (Figs. 5a, 6a), which is reasonable because they can even not simulate the PDO state and regime shifts correctly. Figure 5b-d shows that initialized NorCPM1 can simulate the warning signal for the late-1970s shift 6–7 years in advance. Also, in the 1991-initialized case, NorCPM1 can not simulate the significant decrease of TD, whereas after the 1992-initialization, the signal manifests in 1992, approximately 5 years before the 1990s shift. Compared to NorCPM1, BCC-CSM2-MR has worse skill in representing the warning signals, both in the 1970s and 1990s (Fig. 6b-g). However, there is also a marked tendency of decrease in TD in 1976–1978 (solid blue lines in Fig. 6c, d). Although the significant TD impulse in the 1974-initialized case (Fig. 6c) no longer exceeds the 2*STD TDA→B threshold in the 1975-initialized case (Fig. 6d), it can still be seen as a significant warning signal under a slightly less strict definition for the TD threshold (e.g., 1*STD). Regarding the late-1990s regime shift, the early warning signal is absent in the simulated TD series of BCC-CSM2-MR, probably due to the relatively worse performance in representing the PDO state and regime shift in the 1990s (red lines in Fig. 6e-g). Moreover, results of other initialized years for 1970–1979 and 1991–2000 are shown in the supplementary information (Figs. S6-S33).

The discrepancy in the representations of TD impulse between the two ensemble members indicates that NorCPM1 somewhat outperforms BCC-CSM2-MR in representing the North Pacific SST variations. Suppose a model can successfully simulate the sudden strengthening of negative connections between the northwestern Pacific and the northeastern Pacific before the regime shift. It means that the model is able to reproduce the cooperative behavior of SST in these regions. Therefore, the early warning signal in the TD time series can be seen as a verification to identify the performance of models. In this sense, NorCPM1 has better hindcast skill than BCC-CSM2-MR, since the latter one has worse performance in resolving the early warning signals for the late-1990s regime shift in its TD time series. Since the two members are under the same external forcings, the better skill of NorCPM1 in PDO simulation can only be attributed to its better initial state, or better capture of the climate response to the external forcings which is related to the ability of a model to resolve the dynamics and processes underlying the PDO. In terms of the advancement in the initial state, NorCPM1 has an outstanding coupled initialization capability adopting the multivariate Ensemble Kalman Filter (EnKF; Evensen, 2003) data assimilation method which has been proven by some research to precede the SST nudging scheme adopted by BCC-CSM2-MR for the North Pacific SST simulation (Wang et al. 2019). This is because, the multivariate EnKF method performs better in regions (e.g., the extra-tropics) with weak air–sea coupling and little consistence between the surface and subsurface variability (Wang et al. 2019). Also, it can skilfully constrain the upper ocean heat content in the North Pacific (Counillon et al. 2016) and is beneficial to reduce initialization shock and forecast drift by utilizing anomaly assimilation (Wang et al. 2019).

4 Discussion and Conclusions

This article has assessed the performance of state-of-the-art climate models from CMIP6 in simulating the PDO, with an emphasis on the evaluation of CMIP6 models in simulating the recently detected early warning signals for the PDO regime shift using the climate network analysis as Lu et al. (2021). In non-initialized historical simulations, the predictive skill for SSTs in the North Pacific remains at a low level. Models can only reproduce the spatial pattern of PDO, with nearly no skill in simulating the PDO index and the regime shifts correctly, in agreement with the prior research using models from CMIP5 (Doblas-Reyes et al. 2013; Zhou and Wu 2017). The failure of resolving the PDO index is due to the complexity of dynamical processes or drivers governing the PDO (Newman et al. 2016), limited understanding of the physical mechanisms involved, and the inability in resolving all physical processes (Farneti 2017). Although the models without initialization can not resolve the correct phase and amplitude of the PDO index (Fig. 1d), some of them can still represent the corresponding relationship between the early warning signal and the PDO regime shift, regardless of the chronological accuracy (Fig. 3c-e).

Whether the initial procession can improve the predictive skill of decadal prediction is considered to be the core of the decadal predictability problem (Mochizuki et al. 2010; Zhou and Wu 2017). Our results suggest that the initial state can indeed help improve the hindcast skill of PDO index and regime shifts for a few years and contribute to the successful hindcast of regime shifts, consistent with prior research (Mochizuki et al. 2010), because the initial condition can provide skillful information about external forced climate change and the state of the internal climate variability (Meehl et al. 2009; Zhou and Wu 2017). It is consistent with most prior results shown in decadal retrospective forecast experiments of CMIP5 (Ding et al. 2013). Due to the limitation of data, only decadal hindcasts from two initialized models and two major regime shifts (the late-1970s and late-1990s) have been investigated. Among them, the initialized NorCPM1 has better skill than BCC-CSM2-MR in the PDO retrospective forecast. The former can correctly simulate the 1970s/1990s regime shifts with 1971/1992-initialization (Fig. 5c, f), whereas the latter can not correctly predict the shifts until be provided with the 1974/1994-initialization (Fig. 6c, f). Also, NorCPM1 can successfully represent and capture the early warning signals of both the late-1970s and late-1990s regime shifts (Fig. 5), whereas BCC-CSM2-MR can only reproduce a tendency of the warning signal for the late-1970s regime shift (Fig. 6c, d). In general, the decadal hindcasts of NorCPM1 and BCC-CSM2-MR demonstrate that the initialization is conducive to a successful simulation of early warning signals (Figs. 5, 6). This result also proves that this warning signal detected by the climate network is somewhat robust, both in a proportion of simulations or observations, confirming the corresponding relationship between TD sudden decrease and regime shift suggested by Lu et al. (2021). However, we have to admit that this work only focuses on the decadal hindcasts of two ensemble members due to a lack of SST observations for other members. A more reliable assessment of the influence of initialization needs further investigation on more ensemble members in the DCPP experiment. Also, the effect of initialization on SSTs in Pacific Ocean warrants a more specific assessment on different forecast lead time. For example, Bethke et al. (2021) argued that the initialization even has a detrimental effect on SST prediction specifically in eastern Pacific for lead time of more than 2 years, despite a beneficial effect for lead time of 1 year.

In general, this study gives a hint on an effective way to further exploit the potential for PDO decadal prediction. Since the initial state can provide valuable information for the PDO prediction, the quality of initial conditions needs to be attached more importance on in the future. The improvement of initial conditions can be achieved by assimilating suitable observations, especially on the ocean, in the initialization. Besides, adopting a better initialized scheme, like the EnKF data assimilation method, can also improve the quality of initial conditions (Cassou et al. 2018; Kushnir et al. 2019). We underline the contribution of the DCPP experiment that is designed to evaluate the effect of initialization, indicating that the DCPP is of great importance in promoting the decadal prediction and warrants continually development in the future CMIP experiments. This work also first manifests that the early warning signal may serve as a test-bed for models to testify their performance in representing the cooperative behaviours of nodes in the North Pacific. The underlying mechanisms of the early warning signals warrants further research, which would be beneficial to the improvement of model prediction skill for PDO.