1 Introduction

On seasonal time scales, the El Niño-Southern Oscillation (ENSO) is a major source of predictive skill for atmospheric circulation anomalies (McPhaden et al. 2006; Timmermann et al. 2018). Therefore, assessing and improving ENSO prediction is one of the most important challenges in the modeling and seasonal forecasting community (e.g., Barnston et al. 2012). There are various dynamical prediction systems (e.g., ECMWF; NCEP; Meteo-France and others participating in NMME and C3S multi-model projects) available in the community that provide seasonal forecasts on a regular basis (L’Heureux et al. 2020). Most of the models perform reasonably well for ENSO prediction with some limitations (Risbey et al. 2021; Tippett et al. 2012). Since March 2017, Saudi-KAU CGCM is a part of the ENSO plume published monthly at the International Research Institute for Climate and Society (IRI).Footnote 1 Therefore, it is important to analyze and document the ENSO prediction skill of this model.

ENSO is a naturally occurring atmosphere/ocean coupled phenomenon that originates in the tropical Pacific region. The warm (El Niño) and cold (La Niña) phases of ENSO are associated with the above and below normal SST anomalies in the central and eastern equatorial Pacific regions (Yang and DelSole, 2012; Yeh et al. 2018). The seesaw of warm and cold phases of ENSO has a large impact on the temperature and precipitation of different regions around the globe (e.g., Mason and Goddard, 2001; Diaz et al. 2001; Ehsan et al. 2013, 2020a, 2020a; Abid et al. 2016; Almazroui et al. 2019; Atif et al. 2019; Ehsan 2020) and modulates the occurrence and intensity of heat waves, floods, droughts, severe storms and tropical cyclones (Camargo and Sobel, 2005; Curtis et al. 2007; Camargo et al. 2007; Allen et al. 2015; Sobel et al. 2016). Since ENSO is an important predictor for climate extremes and for the regional and global climate, it is important that models predict the ENSO reasonably well in advance.

There is a long history in the literature of assessing various aspects of the skill of ENSO forecasts. Recently, Tippett et al. (2019) investigated probabilistic forecasts of ENSO phases and strength from the North American Multi-Model Ensemble (NMME), complementary to the examination by Barnston et al. (2019) of the skill of deterministic ENSO forecasts from the NMME. Similarly, Trenary et al. (2018) explored the ENSO forecast skill and the optimal lagged ensemble size to reduce the forecast error of the Climate Forecast System version 2 (CFSv2). Furthermore, Jin et al. (2008) analyzed the ENSO prediction skill by using the datasets of “APCC-CliPAS” and “DEMETER” projects. Other studies highlight particular aspects of ENSO prediction skill. For example, the evaluation of ENSO predictability up to 2–4 years has been discussed in Chen et al. (2004) and Luo et al. (2008). The initialization of ENSO prediction prior to March/April in any year leads to a sharp decrease in skill, which is referred to as a “spring predictability barrier” (Webster and Yang, 1992; Zhu et al. 2015). This spring predictability barrier typically restricts the seasonal prediction based on the coupled atmosphere–ocean model up to 1 year lead time (Graham et al. 2011). Larson and Kirtman (2017) investigated the dynamical processes that are responsible for the growth of certain errors and the sources of error which limits the ENSO predictability.

The Saudi-KAU Atmospheric Global Climate Model (AGCM) forced with observed SSTs has been used in many studies to understand the potential predictability due to ENSO and the influence of ENSO on regional climate (e.g., Ehsan et al. 2017a, b; Abid et al. 2018; Rahman et al. 2018; Kamil et al. 2019; Almazroui et al. 2019). However, to date, there has been no study available that analyses the skill of the Saudi-KAU initialized coupled model (CGCM) in forecasting ENSO. The article is organized as follows. Section 2 introduces the Saudi-KAU model, prediction, and observational datasets and methods. Observed and simulated mean and variability analysis of SST, and skill assessment are presented in Sect. 3. A summary and conclusions are given in Sect. 4.

2 Model, Experiment Design, and Data

2.1 Description of the Saudi-KAU Coupled GCM

The prediction data used in this study comes from the Saudi-KAU CGCM. Briefly, the numerical design of Saudi-KAU spectral dynamical core can be traced back to Bourke (1974). The code solves prognostic equations for vorticity and divergence, from which the zonal and meridional components of the wind are derived diagnostically. Fast Fourier transform and Legendre transforms are performed at each time step so that all linear calculations are done in spectral or wave space, while all nonlinear calculations are performed in (physical) grid space. To advance the model state forward in time, a semi-implicit time integration scheme is adopted. The atmospheric model can be used at several horizontal and vertical resolutions, but in this work we use spectral T42 (2.8° × 2.8°) horizontal resolution with 20 vertical levels (For details please see Almazroui et al. 2017). The atmospheric model is coupled with the modular ocean model version 2.2 (MOM2.2), developed at the Geophysical Fluid Dynamics Laboratory, (GFDL; Pacanowski 1995). MOM2.2 uses the Arakawa B-grid to solve the primitive equations and the hydrostatic and Boussinesq approximations. In MOM2.2, the zonal grid spacing is 1.0°, while the (1/3)° meridional grid spacing between 8°S and 8°N increases gradually to 3.0° between 30° S and 30° N, being fixed at 3.0° for the higher latitudes, and it has 32 vertical levels. In MOM2.2, a mixed layer model developed by Noh and Kim (1999) is implanted to improve the vertical structure of the upper ocean. The convection, radiation, planetary boundary layer (PBL), and Land–Surface Model (LSM) are based on the Simplified Arakawa–Schubert Cumulus (SAS) scheme by Moorthi and Suarez (1992), Nakajima and Tanaka (1986), Holtslag and Boville (1993), and Bonan (1998) parametrizations, respectively. The physical schemes used in the initial conditions and hindcast simulations are identical.

2.2 Experimental Design

The initialization method described in Fig. 1a is based on an atmosphere and ocean nudging performed at every model time step. Zonal and meridional winds, specific humidity, temperature and surface pressure are nudged in the AGCM, while potential temperature and salinity are nudged in the OGCM. A typical equation used for nudging is:

$$\frac{\partial T}{{\partial t}} = - \overline{v} \cdot \nabla T + \frac{Q}{{\rho C_{P} H}} + \frac{{T_{{{\text{obs}}}} - T}}{{\tau_{{}} }}.$$
Fig. 1
figure 1

Schematic diagrams showing the (a) nudging initialization process and (b) lagged average forecast (LAF) method

Here, “T” is the potential temperature, Cp is the specific heat capacity at constant pressure, “\(\rho\)” is the density, “Q” the heat energy, and “H” the enthalpy; the last term on the right-hand side of the above equation is the nudging term. The main parameter that controls the nudging strength is τ, also known as the relaxation time. This factor is chosen based on empirical considerations and differs from variable to variable. For example, τ is set to 5 days (432,000 s) for the temperature and salinity in case of ocean models, and for atmospheric variables, the atmospheric nudging time scale “τ” is fixed at 6 h (21,600 s). If τ is small, the solution converges toward the observations quickly, and the dynamics does not have time to adjust. That is, nudging terms contribute significantly to prognostic tendencies. In this case, the dynamical balance may not be satisfied when the nudging terms are removed and the analysis field is used as the initial condition (e.g., initial shocks). If τ is large, the model can grow too far away from observations before the nudging becomes effective.

Initial conditions are available for the period 01 Mar 1980 to 31 Dec 2019 with 6-hourly output frequency. The lagged average forecast (LAF) is used as a perturbation method to generate the initial conditions for the ensemble hindcast simulations (Hoffman and Kalnay 1983). Importantly, the atmospheric initial conditions are perturbed with a 6-h interval while the ocean initial condition remains the same for all ensemble members (Fig. 1b). The retrospective ensemble forecast dataset from Saudi-KAU CGCM with 20 ensemble members starts on the first of each month and extends 12 months beyond the start day.

2.3 Observational Data

The NCEP Reanalysis-2 (Kanamitsu et al. 2002) for the atmospheric nudging is at 2.5° × 2.5° horizontal resolution and 17 vertical levels (ftp://ftp.cpc.ncep.noaa.gov/wd51we/reanalysis-2/6hr/pgb) and the GODAS pentad data (Behringer and Xue 2004; https://cfs.ncep.noaa.gov/cfs/godas/pentad/) for ocean nudging has 1.0° × 0.3° horizontal resolution and 40 vertical levels. For comparison, the observed SST used is from the global monthly Extended Reconstructed Sea Surface Temperature version 5 (ERSSTv5: Huang et al. 2017).

2.4 NMME and C3S Models Data

In this study, we analyze the ENSO forecast skill of the Saudi-KAU CGCM and compare it to that of operational models in the NMME and C3S projects. The NMME project contains models from operational forecast centers and research institutes in the USA and Canada, while C3S contains models from European countries. The exceptional feature of the “NMME” is the accessibility of consistent hindcasts on seasonal to sub-seasonal time scales. For details, please see Kirtman et al. (2014). The real-time forecasts are generated on the 8th day of every month to use in operational sub-seasonal to seasonal climate prediction. From C3S, we utilized the forecast data from the European Centre for Medium-Range Weather Forecasts (ECMWF), and Météo-France. Forecasts from these models are launched on the 1st day of the month and released on the 13th day by C3S, which causes a slight delay in the forecast availability. Recently, Ehsan et al. 2020b explored the ENSO forecast skill in ECMWF SEAS5 during the boreal summer season of Jul-–Aug (JA) for June, May and April start. They found a very high correlation at Lead-1 (CC = 0.90) in the Niño3.4 region, while skill decreases as the lead time increases. For further details about the performance of SEAS5 and other models, readers are encouraged to explore Johnson et al. (2019) and https://cds.climate.copernicus.eu/. Table 1 provides an overview of the different models along with Saudi-KAU used in this study including native atmospheric and oceanic resolutions, number of ensemble members, the time period employed in this study, as well as the related references.

Table 1 Overview of the models used in this study

2.5 Methods

Saudi-KAU model provides the reforecast for all the calendar months. To compare with our model, we selected only those models from NMME and C3S which are operational and most importantly have reforecast available for all calendar months. In this setup, we have different time periods for different models. In this regard, the lead seasonal (December to February: DJF, March to May: MAM, June to August: JJA, and September to November: SON) or monthly ensemble mean climatologies for each model were constructed over the period mentioned in Table 1, while the respective anomalies were computed by removing these climatologies from the corresponding model climate. The observed seasonal and monthly climatologies and anomalies match those for individual models. For all analysis, the ensemble mean is utilized for Saudi-KAU as well as NMME and C3S models. The Niño 3.4 index (Barnston et al. 1997) is the spatially averaged SST anomaly over the equatorial Pacific region (5°S–5°N and 170°W–120°W). Two other ENSO indices are used together with Niño 3.4, to represent the ENSO diversity (Capotondi et al. 2015), namely Niño 3 (5°S–5°N and 150°–90°W) and Niño 4 (5°S–5°N and 160°E–150°W). For the analysis purpose, we re-gridded the model and observation datasets to 1.0° × 1.0° resolution by using the bi-linear re-gridding technique.

3 Results

First, we compare the mean state and variability of SST in the tropical Pacific Ocean, followed by the skill assessment, and we finish by comparing the skill of the Saudi-KAU CGCM with other models in predicting Niño 3.4 index.

3.1 Mean and Variability of SSTs in Tropical Pacific Ocean: Saudi-KAU vs Observation

The seasonal climatology of the predicted SSTs from the Saudi-KAU model is estimated based on the 20-ensemble members and compared with the observed SSTs shown in Fig. 2. The model predicted seasonal mean SSTs for each season (i.e., DJF, MAM, JJA, and SON) is at Lead-1 forecasts (i.e., initialization dates in the preceding November, February, May and August respectively), over the 1982–2019 study period. Figure 2 shows that the zonal gradient of the mean SSTs in the tropical Pacific is well simulated by the model compared to the observations. The Saudi-KAU model captures well the equatorial cold tongue along the equator over the eastern Pacific. However, the cold tongue extends westward more than in observations, which introduces a cold bias in the model. To explain it further, supplementary Fig. S1 shows the SST bias (negative in all seasons, but varying in magnitude) over the tropical Pacific domain during all four seasons. The analysis reveals the time when the errors in the SST in the tropical Pacific Ocean are greatest among the seasons, and when they are the smallest as simulated by the Saudi-KAU model. Typically, we can see cold bias in all seasons, and the highest cold bias (~ 1.0 to 1.5 °C) is observed during SON, while the lowest (0.5–1.0 °C) cold bias is observed in boreal winter time (Fig. S1-a) in the central-eastern equatorial Pacific. Moreover, a strong positive bias is observed near the coasts of South America in the Niño 1 + 2 region during all seasons (Fig. S1). This positive bias is also highest during boreal fall (SON) (Fig. S1-d).

Fig. 2
figure 2

Climatology of SST in the tropical Pacific Ocean in observation (Left Column) and Saudi-KAU model forecast (Right Column) at Lead-1 for DJF (a, b), MAM (c, d), JJA (e, f), and SON (g, h) seasons. The period of analysis is 1982 to 2019. The unit of SST climatology is oC

Figure 3 shows the interannual variability of the predicted SSTs as measured by the standard deviation of each seasonal pattern in the tropical Pacific region, compared to observations for DJF, MAM, JJA, and SON at Lead-1. The overall structure of the SST variability in observations is well captured by the Saudi-KAU model in all seasons. Generally, tropical Pacific SST anomalies tend to have a maximum variance during boreal fall and winter time, and minimum values during boreal spring and summer, and the Saudi-KAU model simulates this feature quite well (Fig. 3), with reduced latitudinal extent in all seasons. In observations, the highest variability (~ 1.5 to 1.75 °C) is observed during SON and DJF, while MAM and JJA have lower standard deviation values (~ 0.75 to 1.0 °C) in the central-eastern equatorial Pacific. The Saudi-KAU model is able to capture the variability of the most variable seasons (DJF and SON) with good accuracy relative to observations. However, the longitudinal extent of the variability is small compared to observations in both seasons (Fig. 3a-b for DJF, and 3 g-h for SON). Spring (MAM) and summer (JJA) are the seasons with low variability, and the model also has low variability during these two seasons with a reduced longitudinal extent (Fig. 3c, d for MAM, and 3e-f for JJA).

Fig. 3
figure 3

Standard deviation of SST in the tropical Pacific Ocean in observation (Left Column) and Saudi-KAU model forecast (Right Column) at Lead-1 for DJF (a, b), MAM (c, d), JJA (e, f), and SON (g, h) seasons. The period of analysis is 1982 to 2019. The unit of SST standard deviation is oC

3.2 Trajectory Maps and Skill Assessment

Figure 4 shows the trajectory maps of the Niño3.4 index, comparing the predictions of the Saudi-KAU CGCM with the observations. The prediction trajectories are shown as lines from each consecutive start month extended to the maximum lead, which is 12 months in this case. Generally, the prediction anomalies closely match the observations. However, this correspondence weakens with increasing lead time. Some ENSO events (e.g., 1998, and 2015) were relatively very well predicted even at long leads; on the other hand, several other specific events were less well predicted (Tippett et al. 2020). These results are largely consistent with the results from Barnston et al. (2019), who compared the Niño3.4 trajectories in CMC11-CanCM3, NCEP-CFSv2, as well as a multi-model ensemble (MME) composed of eight individual models. Importantly, the predictions started during El Niño years have less spread compared to those in normal years.

Fig. 4
figure 4

Observed (black) and forecasted (Saudi-KAU model: blue) SST anomalies spatially averaged over Niño3.4 region during the period (1982–2019). A forecast trajectory (12 month’s forecast) is shown for each calendar month. The unit of SST anomalies is oC

In Fig. 5, we compare the warm (El Niño: 13 events) and cold (La Niña: 13 events) episodes in observations and model (at six leads) during the period 1982–2019, for the boreal winter/DJF season. Again, Niño 3.4 is used to define the warm (SST anomaly in the Niño 3.4 region must be greater than or equal to + 0.5 C) and cold (SST anomaly in the Niño 3.4 region must be less than or equal to − 0.5 C) episodes. 11 out of 13 warm episodes were well predicted by the Saudi-KAU model up to 6 months in advance. However, all forecasts missed the1987–1988 warm event. While the warm event of 1994–95 was missed at longer leads (Lead-5 and 6), the model was able to capture this event at shorter lead times, but with a lower amplitude than in observations. Turning our attention to cold episodes, the model is somewhat able to capture 12 out of 13 events quite well, with the exception of the last (2000-01) of the triplet La Niña event (1999–2001). In addition, the model amplitude of Niño 3.4 during cold episodes is typically higher than in observations.

Fig. 5
figure 5

Observed and predicted SST anomalies spatially averaged over Niño3.4 region during a the warm and b the cold episodes during the boreal winter season (DJF) at six different lead times. The unit of SST anomalies is oC

Figure 6 shows the anomaly correlation coefficients (ACC) of the three ENSO indices for four seasons and for lead time up to 6 months. Among the three ENSO indices, the Niño 3.4 shows a comparatively higher prediction skill for all seasons. By season, the lowest correlation skill is observed during boreal summer. For instance, the correlation coefficient of the Niño 3 index during JJA at Lead-1 is about 0.82, and at a 6-month lead time, its value reduces to 0.35. These findings are in line with the Kug et al. (2005) results that show reasonably good forecast skill up to 6 months lead time in an intermediate El Niño prediction model.

Fig. 6
figure 6

Anomaly correlation coefficients (ACC) averaged over the three ENSO ((a) Niño 3, (b) Niño 3.4, (c) Niño 4) regions for four target seasons and six starts. On average, ACC is usually above significant level for three ENSO regions (except for Niño 3 region at lead 6 and target season JJA) and four target seasons, representing the positive skill of the Saudi-KAU seasonal forecasting system

Next, the anomaly correlation skill of the model-predicted SSTs in the tropical Pacific region is estimated for different seasons (i.e., DJF, MAM, JJA and SON) at Lead-1 as shown in Fig. 7. Prediction skill for other lead times is depicted in the supplementary figures S2 to S6. The Saudi-KAU model highest values (greater than 0.9) occur in DJF (Fig. 7a), similar to Kim et al. (2012) findings that the ENSO prediction skill is higher during the winter. In the case of MAM (Fig. 7b) and JJA (Fig. 7c), the highest values are located slightly off the equator (north-central and southwestern); however, in case of SON (Fig. 7d) the highest contour values are located right at the central-eastern equatorial Pacific. Furthermore, we see a sharp drop in the correlation values during JJA and MAM at longer lead times as compared to DJF and SON (supplementary Figs. 2, 4, 5, 6). The lowest skill values occur during JJA at longer leads and MAM for February initial conditions, which may be due to the spring predictability barrier, as this remains a large challenge that limits the models’ skill in forecasting ENSO (e.g., Chen et al. 2020; Zhang et al. 2021). Nonetheless, the correlation analysis indicates that the Saudi-KAU CGCM skill remains statistically significant at approximately the 6-month lead time, and therefore, it can be potentially useful for regional climate services providing early warning of seasonal climate forecasts for precipitation and temperature that depend on the predictions of the ENSO (e.g., Timmermann et al. 2018).

Fig. 7
figure 7

Saudi-KAU model forecast skill at Lead-1 for Sea surface temperature (SST) over the tropical Pacific Ocean measured as the temporal anomaly correlation between the observed and ensemble mean (ad) DJF, MAM, JJA, and SON SST anomalies respectively, during 1982–2019. Here, the color bar is adjusted to minimum value 0.3, which is the 95% confidence level threshold

3.3 Saudi-KAU CGCM vs NMME and C3S Models

In this section, we compute the temporal anomaly correlation coefficients between the ensemble mean of the models’ forecasts and observations for each target calendar month and lead separately for Saudi-KAU CGCM and other eight models from the NMME and C3S projects. Figure 8 shows the temporal anomaly correlation between the ensemble mean of the models’ forecast and observations as a function of target month and lead. The correlation tables for different models can be used to compare the predictive potential in each model. CanSIPSv2, CanCM4i, GEM-NEMO, Meteo-France-Sys7, and ECMWF are among those models that show the highest skill (greater than 0.90), which can persist for several months. It is important to mention here that CanSIPSv2 that is basically the multi-model mean of CanCM4i and GEM-NEMO (Lin et al. 2020) shows the highest skill as compared to individual models, which provides further evidence that a multi-model approach is better than using the single model (e.g., Palmer et al. 2004; Hagedorn et al. 2005; DelSole and Tippett, 2014; Delsole et al. 2014). Saudi-KAU, NCEP-CFSv2, COLA-RSMAS-CCSM4, and GFDL-SPEAR, also show high skill for short leads. However, this skill diminishes quickly as the lead time increases. Nevertheless, the Saudi-KAU CGCM predictions are at approximately the same skill level as the other state-of-the-art models.

Fig. 8
figure 8

Anomaly correlation of ensemble mean forecast and observed Niño3.4 SST index as a function of target month and lead time up to 6 months

4 Summary and Conclusions

The Saudi-KAU CGCM skillfully predicts ENSO-related SST anomalies. The hindcast from the Saudi-KAU CGCM is based on 20 ensemble members for the period 1982–2019, while the other models have different hindcast periods. The verification measures used here for the skill assessment are the mean, variability, and the temporal anomaly correlation coefficients of the Saudi-KAU CGCM. Overall, the Saudi-KAU CGCM is able to capture the observed SST climatological pattern and interannual variability (measured as the standard deviation) over the tropical Pacific region during four standard seasons. The Saudi-KAU CGCM shows a cold bias (0.5–1 °C) in the equatorial Pacific and a warm bias near the coast of South America in almost all seasons (SON shows highest bias). Trajectory maps of the Nino3.4 index from the Saudi-KAU CGCM generally follow the observed pattern, particularly during the strong ENSO events. A statistically significant positive prediction skill of the SST anomalies in the tropical Pacific Ocean during all seasons is noted up to six months in advance of the forecasts. Comparatively, the skill is lower in spring and summer than fall and winter varying by year and lead times. A multi-model comparison of Niño3.4 index skill shows that the Saudi-KAU predictions are in line with the other state-of-the-art models.

The high skill demonstrated by the Saudi-KAU CGCM forecasts suggests that useful operational forecasts of the ENSO are feasible. However, there are a number of caveats which are of concern as compared to other state-of-the-art ENSO forecasting models, namely:

  1. (1)

    Low horizontal and vertical resolution of the Saudi-KAU AGCM.

  2. (2)

    Obsolete Oceanic component (MOM2.2).

  3. (3)

    Low prediction skill during boreal summer at longer leads.

The current Saudi-KAU CGCM can be upgraded to a version with higher horizontal and vertical resolutions in addition to the new physical schemes for convection, microphysics, and land surface processes that have been introduced during recent years (e.g., Almazroui et al. 2017). Future plans also include an upgrade to a more up-to-date oceanic component, because the current ocean model MOM2.2 is quite old and does not contain recent improvements in the representation of oceanic physical processes. Finally, the cold bias in the central-eastern equatorial Pacific, and warm bias over some localized areas, is another area of focus for future improvements of the Saudi-KAU CGCM. Due to the disproportionate impacts of ENSO on the climate in different parts of the world, progress in these topics will probably result in better ENSO forecasts at longer lead times that may lead to better forecasts of regional seasonal rainfall and temperature anomalies (e.g., Ehsan et al. 2021; Acharya et al. 2021) which may have direct positive societal impacts.