1 Introduction

The Sahel is a semiarid region located in the westernmost part of the tropical African continent, between the south of the Sahara desert and the humid savanna (e.g., Nicholson 2013). Most of the population in this region reside in rural areas and their sustenance is mainly based on the development of agriculture and pastoralism activities, sectors vulnerable to rainfall variability (Mortimore et al. 2001; Kanji et al. 2006). Therefore, the understanding of the changes in rainfall as well as having good predictions thereof are crucial for this region.

Rainfall over the Sahel presents a strong meridional gradient, with annual rainfall mean values of roughly 550 mm over its southern part and those on the order of 150 mm over its northern part (Nicholson 2013). Throughout the year, rainfall mainly occurs during the summer months (July–August–September, JAS) with a maximum in August, and it is associated with the West African Monsoon (WAM). Occasional rainfall might also be observed during winter time, though related to extratropical systems (Nicholson 2013).

Boreal summer seasonal precipitation over the Sahel presents large variability from interannual to interdecadal timescales (Kitoh et al. 2020 and references therein). Previous studies have shown that the sea surface temperature anomalies (SSTa) over different basins can impact rainfall interannual variability over the Sahel (Rodriguez-Fonseca et al. 2011, 2015). Analysis of observations and model experiments have shown that warm SST anomalies in the eastern equatorial Atlantic reduce the land–atmosphere temperature and surface pressure gradients and tend to be associated with an anomalous dipole of rainfall with positive values over the Guinea Gulf and negative ones over the Sahel (e.g. Vizy and Cook 2002; Losada et al. 2010; Polo et al. 2008). Warm anomalies in the equatorial Pacific tend to produce a stabilization of the air column and subsidence over West Africa, weakening the monsoon and reducing precipitation seasonal amounts and the occurrence of heavy precipitation events (Rowell 2001; Janicot et al. 2001; Mohino et al. 2011; Parhi et al. 2016; Joly and Voldoire 2009; Diakhaté et al. 2019). Cold SSTa over the equatorial Pacific tend to promote the opposite effect. Out of the tropics, warm Mediterranean SSTa enhance local evaporation leading, through southerly moisture advection, to an increase of low-level moisture convergence and destabilization over the Sahel. This strengthens the monsoon and increases seasonal rainfall averages there. The contrary occurs for cold SSTa over the Mediterranean (Rowell 2003; Fontaine et al. 2010, 2011; Gaetani et al. 2010).

Note that the above rainfall anomalies associated with tropical Atlantic, Pacific and extratropical Mediterranean sea surface temperatures (SSTs) represent the direct response to each isolated SST forcing. Nevertheless, oceans are interconnected and pantropical interactions have been detected during certain decades (Cai et al. 2019; Wang 2019; Kitoh et al. 2020). In particular, it is known that, from the 1970’s, the Atlantic and Pacific Niños appear in opposition of phases in summer (Rodriguez-Fonseca et al. 2009). As warm SSTs over both, the equatorial Atlantic and Pacific oceans, decrease rainfall, a counteracting effect over the Sahel appears under that “opposition of phase” configuration (Polo et al. 2008; Losada et al. 2012; Suarez-Moreno et al. 2018).

In addition, non stationarities are found in the impact of SSTs on Sahelian rainfall (Rodriguez-Fonseca et al. 2011, 2015, 2016; Suarez-Moreno et al. 2018). During the period between the 1950 and 1980s no impact from the Mediterranean is detected in the observations, being significant in the previous and later decades (Suárez-Moreno et al. 2018). In turn, in the last decades, the relation with the tropical Pacific is strong, while that with the Atlantic appears absent. This lack of connection may be the result of the above mentioned counteracting effect with the Pacific (Losada et al. 2012). Thus, in recent decades, Pacific and Mediterranean seem to dominate the interannual variability of Sahelian rainfall.

The slow varying SST and their influence over continental areas at seasonal timescales constitutes the physical basis of the seasonal predictions. Similar to other regions in the world, seasonal forecasts over the Sahel were initially developed from statistical methods based on empirical teleconnections between SST anomaly patterns and continental anomalies (Folland et al. 1991). In a second phase, seasonal predictions were performed by using atmospheric general circulation models (AGCM) forced by observed SSTs (Ndiaye et al. 2009). Nevertheless, the dynamical predictions presented limited skill in the Sahel and often, complementary approaches relating both dynamical and statistical predictions were developed (Garric et al. 2002; Ndiaye et al. 2009). In a third phase, and together with the development of coupled models, seasonal forecasts were carried out with coupled systems (atmosphere model coupled to an ocean model). Today, seasonal predictions based on coupled systems are made operational and delivered by the main centers of operational forecasting around the world. Moreover, in the late 1990s, several international initiatives proposed the development of multi-model seasonal climate predictions systems (i.e. DEMETER, ENSEMBLES, SINTEX), with the aim of joining efforts and comparing seasonal predictions performed with different coupled models. Currently, multi-model ensemble forecasting is a mainstream method used for seasonal predictions. Several studies based on these multi-model ensembles have revealed that the multi-model predictions present higher skill than individual systems (Palmer et al. 2004; Doblas-Reyes et al. 2009), as they also account for model uncertainty. Focusing on the Sahel, Batté and Déqué (2011) pointed out that the multi-model ENSEMBLES seasonal predictions enhance the skill of the western Africa precipitation, by reducing the skill-spread ratio. Rodrigues et al. (2014) conclude that the state-of-the-art EUROSIP (European Seasonal to Interannual Prediction) and NMME (North American Multi model Ensemble) seasonal predictions are reliable in predicting the interannual variations of the Sahel precipitation regimes. Recently, Giannini et al. (2020) analyzed the precipitation skill of five seasonal forecast models from the NMME and, based on the multi-model mean, showed that precipitation anomalies during the monsoon season can be predicted even with lead times as far as 3–4 months. They also found that such skill comes mainly from ENSO and the North Atlantic sea surface temperatures. Nevertheless, the rest of the models in the NMME were not analyzed, nor were other SSTa regions which could contribute to skill addressed.

Here, we extend the number of NMME models and analyze the precipitation prediction skill seeking to understand where the skill or lack thereof comes from. Our interest is on the interannual rainfall variability associated with oceanic forcing. The starting hypothesis is that much of the precipitation prediction skill in models should come from the teleconnections with the SSTa in different regions. Therefore, we first analyze the main sources of predictability for the Sahel rainfall in observations, and then we evaluate whether these potential predictors obtained from observations are also sources of predictability in models. The final aim is to evaluate the skill of the seasonal forecast models to predict rainfall over the Sahel, analyzing whether the models are able to reproduce the SST over the observed potential predictor regions and the sign and amplitude of the SST-Sahel teleconnections.

2 Data and methodology

2.1 Data

2.1.1 Observational data

Observational data is considered to contrast results from models and analyze their seasonal forecasting skill. We employ the monthly precipitation values from GPCPv2.3 with a spatial resolution of 2.5º × 2.5º. They are provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, and are available at the website: https://psl.noaa.gov/data/gridded/data.gpcp.html (Adler et al. 2003). Additionally, we also consider the monthly SSTs data from HadISSTv1.1 with 1º × 1º of resolution (Rayner et al., 2003). The study is focused on July–August–September (JAS), which is the season when the monsoon takes place (Rodrigue-Fonseca et al. 2015; Nicholson 2013; Thorncroft et al. 2011). In order to check the sensitivity of the results to the choice of the observational dataset, the analysis is also performed using SSTs from ERSSTv5 (with a resolution of 2º × 2º, Huang et al. 2017) and precipitation from CRU TS 4.03 (Harris et al. 2020). CRU TS is derived by the interpolation of monthly climate anomalies from extensive networks of weather station observations. However, GPCP considers data from rain gauge stations, satellites, and sounding observations from 1979 to present. Regarding SSTs, HadISST uses reduced space optimal interpolation applied to SSTs from the Marine Data Bank (mainly ship tracks) and the International Comprehensive Ocean–Atmosphere Dataset (ICOADS) through 1981 and a blend of in-situ and adjusted satellite-derived SSTs for 1982-onwards. ERSSTv5 uses new data sets from ICOADS Release 3.0 SSTs, Argo floats above 5 m and Hadley Centre Ice-SST version 2 (HadISST2) ice concentration. Given that conclusions are not altered, in this work we only show the results obtained by contrasting model simulations with observations from GPCPv2.3 and HadISSTv1.1. Additionally, the results are not dependent on the actual choice of the peak season, as results for August–September are consistent with the ones presented for JAS in the paper (not shown).

Finally, we also consider the monthly mean sea level pressure (MSLP) and horizontal winds at 850 hPa and 200 hPa from ERA5 reanalysis (Hersbach et al. 2020) in order to analyze the spatial teleconnection patterns of eMED and Niño3 with the Sahel.

2.2 NMME models

We consider the monthly hindcast of SSTs and precipitation from a set of fifteen seasonal forecasting models belonging to the North American Multi-model ensemble (NMME). Table 1 shows a summary of the models used. Though models have different native spatial resolution, their output are monthly forecasts with a similar resolution of 1º × 1º. They are available at the web page http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/ (Kirtman et al. 2014). The common period for which models have complete predictions is 1982–2010 and, therefore, this will be our period of study. Finally, in order to analyze the prediction skill, six forecast start times (FST) were considered for all the models, from 1st July to 1st February.

Table 1 NMME models considered in this study

The analysis of the skill for individual models is performed by considering the ensemble mean, constructed by averaging 4 different simulations of each model. The conclusions of this study do not differ from those obtained when considering all the members available for each model (not shown). We also analyze the skill of the multi-model mean, constructed following the methodology of pooling of first n-members of each ensemble (Hemri et al. 2020) to construct the multi-model mean of all the NMME models. The parameter n is selected as 4, since this is the maximum number of ensemble members in one of the models (NASA-GEOSS2S). The results are qualitatively in agreement with a different definition of the multi-model mean based on an average of each model mean in which all the available members (not just 4) are taken into account (not shown).

Finally, we also analyze the monthly hindcasts of mean sea level pressure and horizontal winds at 200mb and 850mb. These fields are considered in order to evaluate the atmospheric patterns of the teleconnections in models. Given that only the models CanCM4i, CMC1-CanCM3, CMC2-CanCM4 and GEM-NEMO have available atmospheric data, the analysis of the teleconnection mechanisms will be evaluated for these 4 models and, for the sake of briefness, only the average of them will be shown.

2.3 Methodology

We begin by revisiting the main SST sources of predictability for Sahel precipitation in observations for the period of study. To do so, we obtain summer (JAS) seasonal anomalies of SSTs and precipitation by subtracting the seasonal mean. Seasonal forecasts aim at providing predictions of climate anomalies for the forthcoming season (Balmaseda et al. 2009), which should be taken into account when assessing their skill. In this work, we use a 29-yr window for the analysis and furthermore remove the trends in all variables prior to any other calculation. This is particularly relevant in the case of the Sahel rainfall, which shows strong climate variability at decadal timescales (Kitoh et al. 2020). We then calculate the regression map of SST anomalies worldwide onto the Sahel precipitation index (Fig. 1). The precipitation index is defined as the seasonal average of the precipitation anomalies over the Sahel region (see domain on Table 2).

Fig. 1
figure 1

Regression map between the SST anomaly field and the precipitation index over Sahel. Dotted regions are significant at 95% confidence level in two tailed t-test with an effective number of degrees of freedom. SST from HadISST v1.1 and PCP from GPCPv2.3. Regression is computed considering the average of the anomalies in JAS during the period (1982–2010)

Table 2 Spatial domain where indices were computed

Secondly, we define the indices that represent the temporal evolution of the SST anomalies over the regions associated with the main observed predictors. As it will be shown in Sect. 3.1, the potential sources of interannual predictability for rainfall over Sahel in JAS during the period of study are the eastern Mediterranean (eMED) and the eastern equatorial Pacific (Nino3) (in agreement with Suárez-Moreno et al. 2018). These indices are computed as the average of JAS seasonal SST anomalies in the appropriate regions (Table 2). Note that the indices so defined do not show a linear trend, as this was removed in the definition of the SST anomalies to avoid introducing long-term variability in our analysis. Additionally, in order to analyze the spatial atmospheric teleconnection patterns, we also compute the (1) correlation maps of the eMED index onto the mean sea level pressure anomalies and winds anomalies at 850 hPa, and (2) the correlation maps of Niño3 index and the anomalous velocity potential difference between 200 and 850 hPa levels \(\left({Vpot}_{200}-{Vpot}_{850}\right)\) as a way of illustrating the baroclinic atmospheric response associated with ENSO forcing. Velocity potential (\(Vpot)\) is obtained by solving the Poisson’s equation: \(\nabla \cdot V=\Delta Vpot\), where V represent the horizontal winds.

Thirdly, to evaluate the contribution of the eMED and Niño3 signals to precipitation variability over Sahel, the precipitation index is fitted with a multilinear regression model as follows:

$$ {\text{PCP}}_{{{\text{reg}}}} = \, \alpha \cdot{\text{eMED}}_{{{\text{index}}}} + \, \beta \cdot{\text{Ni}}\mathrm{\tilde{n} } {\text{o3}}_{{{\text{index}}}} + \, \varepsilon $$
(1)

where PCPreg is the precipitation index obtained from multiple regression analysis, \(\alpha\) and β represent the coefficients of multilinear regression for eMED and Niño3, respectively, eMEDindex and Nino3index are the standardized indices associated with the eastern Mediterranean and equatorial Pacific El Niño, respectively. Finally, \(\epsilon \) represents the residual fitting. The statistical significance of the multiple linear regression is assessed by considering a F-test with a 95% confidence level.

Using this fit, the total variance of precipitation can be decomposed into the following components (Eq. 2):

$$ {\text{Var}}\left( {{\text{PCP}}_{{{\text{reg}}}} } \right) = \, \alpha^{{2}} + \beta^{{2}} + {2}\cdot\alpha \cdot\beta \cdot{\text{cov}}\left( {{\text{eMED}},{{Ni}} \mathrm{\tilde{n} } {{o3}}} \right) \, + {\text{ var}}\left( \varepsilon \right) $$
(2)

where \({\alpha }^{2},{\beta }^{2}\) represent the part of the total precipitation variance which is explained by eMED and Niño3, respectively, and the term \(2\cdot \alpha \cdot \beta \cdot cov(eMED, Ni \mathrm{\tilde{n} } o3)\) stands for the covariance between eMED and Niño3.

Fourthly, we analyze the precipitation prediction skill by means of the anomaly correlation coefficient (ACC), the root mean squared error (RMSE) and the mean squared error skill score (MSESS). ACC is computed as the correlation between the observed and the modeled indices. We consider that a model presents skill in terms of ACC when the ACC is positive and its value is statistically significant at 95% confidence level in two-tailed t-test with an effective number of degrees of freedom (Mitchell et al. 1966; Bretherton et al. 1999). ACC values are computed considering the ensemble mean of each model.

As it was aforementioned, our starting hypothesis is that much of the skill for predicting Sahel precipitation should come from its teleconnections with the SST anomalies in different regions worldwide. Therefore, to investigate where the precipitation ACC skill scores comes from, we also evaluate the models’ skill in predicting the SST anomalies over the main sources of predictability and their ability in reproducing the teleconnections with precipitation. To evaluate the teleconnections skill, we compare the regression and correlation coefficients between each predictor and Sahel precipitation in the models with the one obtained from observations. We also compare the physical mechanisms for the teleconnections in models and observations by analyzing the atmospheric patterns that each one of the potential predictors induces. Given the availability of atmospheric data in models, this can be only done considering 4 of the 15 models.

To evaluate the contribution of the different SSTa signals on precipitation ACC skill score in models, we use the same multilinear regression analysis than in observations. Considering this analysis, the contribution of each predictor to the precipitation ACC skill score in models can be estimated in terms of the multilinear regression coefficients and the correlations between the observed precipitation index (PCPobs) and the simulated oceanic indices (as in Mohino et al., 2016):

$$ACC=\rho \left({PCP}_{obs},{PCP}_{nmme}\right)=\frac{\alpha }{\sqrt{var\left({PCP}_{nmme}\right)}}\rho \left({PCP}_{obs},{eMED}_{nmme}\right)+\frac{\beta }{\sqrt{var\left({PCP}_{nmme}\right)}}\rho \left({PCP}_{obs},{Ni\mathrm{\tilde{n} }o3}_{nmme}\right)+\frac{\sqrt{var({\varepsilon }_{nmme})}}{\sqrt{var\left({PCP}_{nmme}\right)}}\rho \left({PCP}_{obs},{\varepsilon }_{nmme}\right) $$
(3)

where \(\rho \) represents the correlation coefficient, PCPobs the precipitation index from observations, \(PCPnmme\) the precipitation index from NMME, \(eMEDnmme,Ni\tilde{n} o3nmme\), the indices of eMED and Niño3 from NMME models, and \(\epsilon \) the residual fitting from the multiple regression in NMME models. With this decomposition, the first and second terms in the right-hand side of the equation can be understood as the part of the precipitation ACC skill score explained by the eMED and Niño3 indices, respectively. The third term corresponds to the unexplained ACC skill score, which could be related to unaccounted sources.

Finally, as mentioned previously, forecast skill is also assessed considering the RMSE and MSESS. RMSE is defined as:

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i,nmme}-{y}_{i,obs}\right)}^{2}}$$
(4)

where yi,obs is the observed value (PCPi,obs, eMEDi,obs or Niño3i,obs) and yi,nmme is the forecasted one (PCPi,nmme, eMEDi,nmme or Niño3i,nmme) (Déqué 2011). Note that RMSE values are computed considering the full fields (and not the anomaly). On the other hand, MSESS is defined as:

$$MSESS=1-\frac{MSE}{{MSE}_{clim}} $$
(5)

where \(MSE=\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i,nmme}-{y}_{i,obs}\right)}^{2} \, {\rm and}\, {MSE}_{clim}=\frac{1}{n}\sum_{i=1}^{n}{\left(\overline{{y }_{obs}}-{y}_{i,obs}\right)}^{2}=var\left({Y}_{obs}\right)\), being Yobs = yi,…,yn the time series of the observed values and \(\overline{{y }_{obs}}\) its mean. The maximum value of MSESS is 1 and occurs when the MSE = 0, that is, when model gives a perfect forecast. MSESS = 0 takes place when MSE = MSEclim, which implies that the model forecast skill in terms of MSE is equal to that one provided by a climatological forecast. Finally, negative MSESS values implies that model forecast skill in terms of MSE is worse than consider a climatological forecast (Murphy 1998; Déqué 2011).

3 Results

3.1 Observational analysis

In this section we briefly analyze the teleconnection of Sahel precipitation and SST during the period of study and estimate the percentage of the total precipitation variance explained by each predictor in observations. Figure 1 shows the regression map between the precipitation index over Sahel and worldwide SST anomalies. In the period under analysis, Sahel precipitation shows statistically significant positive regression coefficients with positive SSTa over the eastern Mediterranean (eMED) and negative ones over equatorial Pacific (Niño3), in accordance with previous studies (e.g. Janicot et al. 2001; Rowell 2001, 2003; Fontaine et al. 2010, 2011; Gaetani et al. 2010; Diakhaté et al. 2019). It also shows positive connections with North Atlantic and Pacific SSTs. However, these signals are centered in the northern hemisphere subtropical gyres, which suggests they could be due to Ekman induced transport in response to changes in the subtropical high pressure systems (i.e., Barrier et al. 2014), as a consequence of atmospheric teleconnections from El Niño and the Mediterranean. Particularly, the North Atlantic positive significant area shows a statistically significant correlation of 0.6 with eMED index. Conversely, the Niño 3 and eMED indices show no statistically significant correlation among themselves (correlation value is −0.15) and are, therefore, selected as the two independent predictors for rainfall over Sahel in the rest of the study.

The teleconnection mechanisms through which eMED and Niño3 influence rainfall variability over Sahel are shown on Fig. 2a, b, respectively. Focusing on the eMED teleconnection (Fig. 2a), an anomalous warming over the eastern Mediterranean induces an anomalous low level pressure over the eastern Sahara (in agreement with Gómara et al. 2018). Additionally, Fig. 2a shows an intensification of the southwesterly monsoonal flow due to the strengthening of the meridional MSLP gradient between the Gulf of Guinea and the Sahara. As a consequence, a warming over the eMED is related to an increase of precipitation over Sahel. These results are consistent with previous observational and modeling studies demonstrating the existence of this mechanism (Diakhaté et al. 2019; Gómara et al. 2018; Fontaine et al. 2010, 2011; Gaetani et al. 2010; Jung et al. 2006). In particular, Rowell (2003) and Fontaine et al. (2010) performed sensitivity experiments with Atmospheric General Circulation Models and found that positive (negative) SST anomalies in the Mediterranean Sea produced an enhancement (reduction) of Sahel rainfall.

Fig. 2
figure 2

a Correlation map between the fields wind anomalies at 850 hPa and mean sea level pressure anomalies and the eMED index. b Correlation map between the velocity potential field anomaly and the Niño3 index. The velocity potential fields used here is the difference between the velocity potential at 200 hPa and 850 hPa (VPOT200/850). Shaded regions are significant at 95% confidence level in two tailed t-test with an effective number of degrees of freedom. Mean sea level pressure, horizontal winds are from ERA5 reanalysis. Correlations are computed considering the average of the anomalies in JAS during the period (1982–2010)

Regarding the influence of Niño3 over the Sahel (Fig. 2b), the anomalous warming over the equatorial Pacific generates an anomalous increase of the difference between the anomalous velocity potential at 200 hPa and 850 hPa (VPOT200/850), suggesting a weakening the monsoon and reducing convection and precipitation over Sahel (in agreement with Diakhaté et al. 2019; Gómara et al. 2018; Suárez-Moreno et al 2018; Mohino et al. 2011; Joly and Voldoire 2009). This mechanism is similar to the one relating El Niño with Atlantic decreased tropical cyclone (TC) activity and Atlantic hurricanes, so that the rainfall over the Sahel is reduced as a result of increased wind shear and atmospheric static stability (e.g. Goldenberg and Shapiro 1996). The opposite takes place for La Niña events.

The multiple regression analysis performed for Sahel rainfall considering only the eMED and Niño3 indices suggests that these two predictors can explain in total 58% of the total precipitation variance at interannual timescales and that the eMED is the most dominant influence (see Table 3). If the analysis is repeated with the eMED and Niño3 indices leading the JAS Sahel rainfall with lags from 0 to 6 months, the percentage of total precipitation variance explained by these two predictors decreases with the lag (see Sect. 1 in additional material). On the other hand, the percentage explained by the residue (42%) represents the part of the total precipitation variability not explained by changes in the eMED and Niño3 indices. We speculate that this could be partly driven by land–atmosphere and aerosol-radiative processes (e.g. Nicholson et al. 2013; Rodriguez-Fonseca et al. 2015).

Table 3 Results from multiple regression analysis considering eMED and Niño3 as predictors

3.2 Prediction skill for precipitation

Figure 3 shows the precipitation prediction ACC skill scores in JAS for each FST and model. Results show that models in general lack skill to predict rainfall over Sahel, although there are some of them which present statistically significant ACC values for specific FST. For example, GFDL-CM2p5-FLOR-A06 and GFDL-CM2p5-FLOR-B01 present skill for FST 1st July and 1st June, GFDL-CM2p1 for FST from 1st May to 1st March, GFDL-CM2p1-aer04 for FST = 1st June, 1st April and 1st February, NCEP-CFSv2 for FST = 1st June, 1st April and 1st March, and COLA-RSMAS-CCSM4 for FST = 1st Feb. Note that even in these cases, ACC skill scores are low. Comparing results from the multi-model mean and the models, the former presents larger precipitation ACC values than most of the models for all the FST, suggesting that the pooling of models leads to a forecast skill greater than the majority of single model systems. Nonetheless, the multimodel mean only presents statistically significant ACC values for FST = 1st June and 1st Feb and is outperformed by some single models: GFDL-CM2p5-FLOR-A06, GFDL-CM2p5-FLOR-B01 for FST = 1st June; and GFDL-CM2p1-aer04 and COLA-RSMAS-CCSM4, which only shows statistical ACC scores for FST = 1st Feb. As it will be shown later, the emergence of skill at long lead times for this last model is not related to an enhancement of skill in predicting the SST indices.

Fig. 3
figure 3

Precipitation prediction ACC skill scores for the different models and FST. Colors represent the ACC values and the boxes marked with “x” present the ACC values statistically significant at 95% confidence level from two tailed t-test with an effective number of degrees of freedom. Note that FST = 1st July corresponds to lead time 0. ACC values are computed considering precipitation index from GPCPv2.3. Results are similar considering precipitation from CRU TS 4.03 (not shown)

We also computed the RMSE of precipitation (Fig. 4a). Results show that models present a RMSE smaller than 3 mm/day, with the best representation provided by GFDL-CM2p5-FLOR-A06 and NCEP-CFSv2 models for FST 1st July, GFDL-CM2p1 and NCEP-CFSv2 for FST = 1st June, COLA-RSMAS-CCSM3 and Multimodel mean for FST 1st May, and COLA-RSMAS-CCSM3 for FST 1st April to 1st May, and GFDL-CM2p1 for FST = 1st Feb. On the other hand, the model with highest RMSE is NASA-GEOSS2S for all the FST. These RMSE values do not show any statistical significant correlation with the standard deviation of the ensemble mean precipitation index in models (not shown). Regarding the MSESS, Fig. 4b shows that most of models present negative MSESS, which suggest that models forecast skill in terms of MSE is, in general, worse than consider the climatological forecast. The model with worst MSESS values is NASA-GEOSS2S, in agreement with results from RMSE. Nonetheless, some models present positive MSESS for specific FST, such as GFDL-CM2p5-FLOR-A06 for FST = 1st July, CMC2-CanCM4 for FST = 1st Feb, NCAR-CESM1 for FST = 1st Feb and NCEP-CFSv2 for FST = 1st June, suggesting that precipitation forecast given by these models at those FST is better, in terms of MSE, than the climatological forecast. Finally, results from the multimodel-mean of the precipitation index are one of the best in terms of RMSE and MSESS in comparison with the rest of the models (see Fig. 4a, b), although it MSESS remains negative.

Fig. 4
figure 4

a Root mean squared error (RMSE) of precipitation in models (units: mm/day). b Mean squared error skill score for precipitation in models. RMSE and MSESS were computed for the JAS season

On the basis of these results, we present 2 questions: Does the lack of precipitation skill in NMME arise from a wrong prediction of the SST over the eMED and equatorial Pacific? Or does the lack of precipitation skill in NMME arise from an incorrect prediction of the SST-Sahel rainfall teleconnection? In the next section we try to give answers to both questions and quantify the relative importance of both factors on the prediction precipitation skill over the Sahel.

3.3 Prediction skill for SSTs

3.3.1 Eastern mediterranean (eMED)

Figure 5 shows the ACC values for eMED. Most models present statistically significant ACC values for FST = 1st July. However, only GFDL-CM2p5-FLOR-A06 and NCAR-CESM1 have statistical significant ACC values for earlier FST (FST = 1st June). Results are similar for the case of the multi-model mean, which loses its ability for FSTs before 1st July. This result is consistent with the absence, to our knowledge, of a priori indications in the scientific literature suggesting predictability of eastern Mediterranean SSTs.

Fig. 5
figure 5

eMED prediction skill in terms of ACC for the different models and FST. Colors represent the ACC values and the boxes marked with “x” present the ACC values statistically significant at 95% confidence level from two tailed t-test with an effective number of degrees of freedom. Note that FST = 1st July corresponds to lead time 0. ACC values are computed considering SST from HadISSTv1.1. Results are similar considering precipitation from ERSSTv5 (not shown)

Figure 6a shows that most of the models present RMSE smaller than 12 ºC, with the best representation provided by GFDL-CM2p1-aer04. The model with higher RMSE is IRI-ECHAM4p5-DirectCoupled followed by IRI-ECHAM4p5-AnomalyCoupled, COLA-RSMAS-CCSM3 and COLA-RSMAS-CCSM4. The large values of RMSE found for both IRI-ECHAM models are related to the large cold systematic bias of 9–12 ºC that models present over the eMED regions. A similar problem is also found for the COLA-RSMAS models with a cold bias of around 5–6 ºC (not shown). Regarding MSESS, Fig. 6b shows that all models present negative MSESS, suggesting that the eMED model forecast in terms of MSE is worse than the climatological forecast. In this case, the multimodel-mean does not present one of the best MSESS values. This is related to the existence of some models (IRI-ECHAM4p5-DirectCoupled, IRI-ECHAM4p5-AnomalyCoupled, COLA-RSMAS-CCSM3 and COLA-RSMAS-CCSM4) with very high RMSE values in comparison with the rest of the models (see Fig. 6b).

Fig. 6
figure 6

a Root mean squared error (RMSE) of eMEDin models (units: ºC). b Mean squared error skill score (MSESS) for eMED in models. RMSE and MSESS were computed for the JAS season

The observed positive connection between eMED and Sahel precipitation is reproduced by most models, although with an underestimation of the magnitude in both correlation value and multiple linear regression coefficient (Fig. 7, first and second columns respectively). The underestimation of regression values (α from Eq. (1)) suggests a weaker sensitivity of Sahel rainfall to eMED anomalies in models. In addition to this lower sensitivity, models tend to show too low variance of the SST anomalies over the eMED in models, compared to observations (see Sect. 2 in additional material), which could allow other sources of precipitation variability to become more dominant in models and further explain the underestimation in correlation values. The underestimation of the eMED—precipitation teleconnection is also evident in the case of the multi-model mean (Fig. 7b). Note that the multi-model mean is the average of the 15 models which show very poor ACC skill scores to predict eMED (Fig. 5). The lack of phasing of the eMED among the models reduces the eMED signal in the multi-model and therefore, weakens the eMED—precipitation teleconnection signal in this case. All these results suggest that the generalized lack of skill to predict the precipitation anomalies in models could be related not only to the lack of skill (in terms of ACC) to reproduce the SST anomalies in eMED (Fig. 5) but also to the underestimation of the of the eMED—precipitation teleconnection amplitude (Fig. 7b).

Fig. 7
figure 7

Histograms of the correlation (first column) and regression (second column) coefficients between the precipitation index over Sahel and eMED. The regression coefficient plotted on the second column is α from Eq. (1). Each one of the rows makes reference to a lead time, from 0 in the first row to 5 in the sixth. 1st July corresponds to lead time 0 and 1st Feb to lead time 5. The vertical black dot lines, the blue and red ones represent the significance threshold levels, the observed correlation value and the correlation for multimodel mean, respectively. In the case of the multi-model mean, the value is computed as the correlation between the precipitation index of the multi-model mean and the eMED index of the multi.model mean. The threshold level (vertical black dot line) was established considering values exceeding the 95% confidence level from one tailed t-test. Correlations were computed for each one of the 4 simulations of the 15 NMME models. Observed Precipitation is from GPCPv2.3 and observed SST is from HadISSTv1.1. l

In order to assess how an improvement on eMED SST skill impacts the skill of its teleconnection with the Sahel, Fig. 8a, b show the scatter plots of precipitation prediction ACC skill scores vs. eMED prediction ACC skill scores considering all the models and the first two FST separately, 1st July and 1st June, respectively. Information regarding the rest of the FST can be found in Table 4 (or Sect. 3 in additional material). The correlation between these two variables is statistically significant at 95% confidence level in a two tailed t-test only for the FST = 1st July and 1st May (see Table 4), suggesting that for these FST an increase of the eMED ACC skill scores is related to an increase of the PCP prediction ACC skill scores in models (Fig. 8a, b and Table 4). Additionally, Table 4 shows that the largest correlation between PCP ACC skill score and eMED ACC skill score is found at FST = 1st July, when a large part of the models present skill (in terms of ACC) to predict eMED (see Fig. 5). Figure 9a shows the correlation maps between the eMED index and the MSLP and horizontal winds at 850 hPa for the FST: 1st July (see Sect. 4 in additional material for the rest of the FST). This map is obtained by averaging the 4 correlation maps of the NMME models for which atmospheric data was available. Figure 9a shows that, in agreement with observations (compare Figs. 9a and 2a), an anomalous warming over the eMED generates an anomalous low level pressure at the east of Sahara desert which favors the southward advection of moisture toward Sahel (in agreement with Gómara et al. 2018). Additionally, there is strengthening of the meridional MSLP across the Sahel that intensifies the southeasterly monsoonal flow. This favors moisture supply and positive precipitation anomalies (in agreement with Rowell 2003; Fontaine et al. 2010; Gómara et al. 2018).

Fig. 8
figure 8

Scatter plots PCP skill in terms of ACC vs eMED skill in terms of ACC for FST = 1st July (a), and FST = 1st June (b). Scatter plots eMED contribution to PCP skill in terms of ACC vs eMED skill in terms of ACC for FST = 1st July (c), and FST = 1st June (d). Black line represents the linear regression between variables and the correlation values are shown on the title. Precipitation from GPCPv2.3 and SST from HadISSTv1.1. Threshold correlation value is 0.50 considering a two-tailed t-test with a 95% confidence level. Correlations for the rest of the FST appear on Table 4

Table 4 Correlation between PCP skill and eMED skill (second column), eMED skill and the eMED contribution to PCP skill (third column), PCP skill and Niño3-PCP teleconnection skill (fourth column), and between the Niño3 contribution to PCP skill and Niño3-PCP teleconnection skill (fifth column)
Fig. 9
figure 9

a Correlation map between the fields wind anomalies at 850 hPa and mean sea level pressure anomalies and the eMED index. b Correlation map between the velocity potential field anomaly (VPOT200/850) and the Niño3 index. The velocity potential fields used here is the difference between the velocity potential at 200 hPa and 850 hPa. Each map represents the average of the correlation maps of the 4 NMME considered models: CanCM4i, CMC1-CanCM3, CMC2-CanCM4, GEM-NEMO. The correlation map of each model is computed after concatenating the 4 members of the ensemble. Shaded regions are significant considering the following MonteCarlo test: we generate 4 pairs of surrogate time series of white noise following a gaussian distribution and with the same time length as in models (that is, 116 time values). All these time series are correlated in pairs and the average correlation is calculated. The process is repeated N = 100,000 times and the threshold for statistical correlation is established at 95 percentile using the probability distribution function obtained. Correlations are computed considering the average of the anomalies in JAS during the period (1982–2010). Figures represent results for FST = 1st July. The rest of the FST can be found in Sects. 4 and 5 of the additional material

3.3.2 Equatorial Pacific (Niño3)

In the case of the tropical Pacific, all models present good ability for reproducing the variability of the SST anomalies over the equatorial Pacific at all considered FST, although their ability is gradually reduced as the FST is moved backwards (Fig. 10). The multimodel-mean presents larger ACC scores than most individual models for all the FST (Fig. 10), suggesting that the pooling of models can improve the ability to predict Niño3 variability for most of them. Nonetheless, the multi-model mean is outperformed by some particular models (e.g. NASA-GEOSS2S and GEM-NEMO for FST = 1st July, and COLA-RSMAS-CCSM3 for FST = 1st Feb).

Fig. 10
figure 10

Niño3 prediction skill in terms of ACC for the different models and FST. Colors represent the ACC values and the boxes marked with “x” present the ACC values statistically significant at 95% confidence level from two tailed t-test with an effective number of degrees of freedom. Note that FST = 1st July corresponds to lead time 0. ACC values are computed considering SST from HadISSTv1.1. Results are similar considering precipitation from ERSSTv5 (not shown)

Figure 11a shows that most models present an RMSE smaller than 2ºC, with the best representation provided by the multimodel-mean for most of the FST. The model with highest RMSE is GDFL-CM2p1. Regarding MSESS, Fig. 11b shows that models CMC2-CanCM4, CanCM4i, GEM.NEMO, COLA-RSMAS-CCSM4 (GFDL-CM2p5-FLOR-A06, GFDL-CM2p5-FLOR-B01 and NASA-GEOSS2S) present positive MSESS for FST from 1st July to 1st May (from 1st July to 1st June), suggesting that for these FST the Niño3 prediction provided by these models is better, in terms of the MSE, than the climatological forecast. On the opposite side, models GFDL-CM2p1, GFDL-CM2p5-FLOR-A06, NCEP-CFSv2 and CMC1-CanCM3 show negative MSESS values all the FST. In general terms, it is found that the earlier the FST, the lower the MSESS (more negative), and that most models show negative MSESS for FST before 1st May, indicating that the Niño3 index forecasted by models is worse than considering the climatological forecast for that FSTs. The worst Niño3 predictions are given by GDFL-CM2p1 for most of the FST (Fig. 6b), consistent with results of RMSE (see Fig. 11a). Comparing results from the multimodel-mean and models (Fig. 11b), the former presents larger MSESS values than the rest of the models for all the FST, suggesting that the pooling of models can improve the ability to predict Niño3.

Fig. 11
figure 11

a Root mean squared error (RMSE) of Niño3 in models (units: ºC). b Mean squared error skill score for Niño3 in models. RMSE and MSESS were computed for the JAS season

Figure 12 (first column) shows the histogram of the Niño3—precipitation correlation in NMME models. Most models correctly reproduce the negative sign of the observed Niño3—precipitation teleconnection, although with a generally reduced regression value (Fig. 12, second column), suggesting Sahel precipitation in models tends to be less sensitive to Niño3 anomalies than observed. Correlation values are also underestimated, although less than in the eMED-Sahel precipitation case, which could be explained by a proper representation of the Niño3 index variance (see Sect. 2 in additional material). Conversely, the correlation value is overestimated in the case of the multimodel mean. Note that in this case we are comparing the correlation/regression of individual simulations with the correlation of the mean of all simulations. A reasonable explanation for such overestimation is that the averaging of all the simulations and models filters out the internal atmospheric variability and those signals that the model cannot predict, making Niño3 the main signal explaining Sahel precipitation variability, unlike in observations.

Fig. 12
figure 12

Histograms of the correlation (first column) and regression (second column) coefficients between the precipitation index over Sahel and Niño3. The regression coefficient plotted on the second column is β from Eq. (1). Each one of the rows makes reference to a lead time, from 0 in the first row to 5 in the sixth. 1st July corresponds to lead time 0 and 1st Feb to lead time 5. The vertical black dot lines, the blue and red ones represent the significance threshold levels, the observed correlation value and the correlation for multimodel mean, respectively. The threshold level was established considering values exceeding the 95% confidence level from one tailed t-test. Correlations were computed for each ensemble member of each NMME model. Precipitation from GPCPv2.3 and SST from HadISSTv1.1

For the case of Niño3-Sahel teleconnection, it is found that the better the skill in reproducing the Niño3-Sahel precipitation teleconnection, the better the ACC skill score in predicting Sahel precipitation (Fig. 13a, b, Table 4) and the larger Niño3 contribution to precipitation prediction ACC skill score in models (Fig. 13c, d, Table 4). This suggests that enhanced precipitation prediction skill can be obtained by improving the simulation of the precipitation—Niño3 teleconnection, in accordance with Giannini et al. (2020). Figure 9b shows the correlation map between the Niño3 index and the anomalous velocity potential difference between the levels 200 hPa and 850 hPa for the FST: 1st July (see Sect. 5 in the additional material for the rest of the FST). This map is obtained by averaging the 4 correlation patterns of the NMME models for which atmospheric data was available. In order to get a more robust signal of the teleconnection patterns in models, the correlation maps for each model are computed by concatenating first the 4 members of the ensemble and then, by computing the correlations. Results show that, in agreement with results from reanalysis (compare Figs. 9b and 2b), an anomalous warming over the equatorial Pacific reduces the vertical ascent motions, leading to a weakening of the monsoon and to negative rainfall anomalies over Sahel (in agreement with Gómara et al. 2018; Joly and Voldoire 2009). Although this spatial pattern is obtained considering the average of the correlation maps from 4 different models, the atmospheric teleconnection patterns in the individual models are largely similar to the mean (not shown).

Fig. 13
figure 13

Scatter plots PCP ACC vs Niño3—PCP teleconnection skill for FST = 1st July (a), and FST = 1st June (b). Scatter plots Niño3 contribution to PCP ACC vs Niño3—PCP teleconnection skill for FST = 1st July (c), and FST = 1st June (d). Black line represents the linear regression between variables and the correlation values are shown on the title. Precipitation from GPCPv2.3 and SST from HadISSTv1.1. Threshold correlation value is 0.50 considering a two-tailed t-test with a 95% confidence level. Correlations for the rest of the FST appear on Table 4

In summary, we are able now to answer question 1 and question 2. There is skill for Pacific but not for Mediterranean SSTs. Additionally, it is found that the better the skill in simulating the Pacific SST- rainfall teleconnection, the better the skill in predicting Sahelian rainfall, however, in the case of the eMED, models with a better ACC skill scores in simulating Mediterranean SSTs tend to better reproduce the observed Sahel rainfall when there is skill for predicting eMED (FST = 1st July).

3.4 Explained model variance and ACC scores

Variance of the Sahel precipitation index in models is partitioned following the same multiple regression analysis already applied to the observed index (see Eqs. (1) and (2)). Figure 14 shows the variance of the precipitation index partitioned into the components explained by the eMED, Niño3, the cov(eMED,Niño3) and the residue. The latest (residual term) represents the part of the Sahelian precipitation variance which is not explained by the multiple linear regression, or in other words, the part of the variance not explained by the considered oceanic signals (eMED and Niño3). Given that the variance of precipitation in models is much lower than in observations (see Sect. 2 in additional material), precipitation indices in Fig. 14 are standardized for an easy comparison with observations. The standardization of the precipitation index in each model is done by dividing the index by the standard deviation of the index in the model. Figure 14 only shows results for the first two FST (1st July and 1st June), whereas the rest of the FST can be found in the additional material (see Sect. 6). Additionally, Table 5 shows the percentage of total variance of precipitation that is explained by the eMED and Niño3 indices for each model and FST.

Fig. 14
figure 14

Total precipitation variance decomposition. Given that the variance of precipitation index in observations is larger than in models, precipitation indices are standardized in order to better compare results from observations and models. OBS means observations

Table 5 Percentage of total PCP variance explained by oceans in models

In observations, the precipitation variance explained by the eMED and Niño SSTs is approximately 58%. Focusing on Table 5 and Fig. 14, this value is lower in models, suggesting that models underestimate the influence of these basins (eMED and Niño3) on rainfall over Sahel. The larger precipitation variance explained by the residue in models could be related to the underestimation of the sensitivity of Sahel precipitation to the eMED and Niño3 signals (Figs. 7 and 12, second columns), and to the smaller variance of the eMED SST anomalies in models (see Sect. 2 in additional material). Additionally, the residual shows no statistical significant ACC scores when compared with the observed index for all models and FSTs (except for the NCEP-CFSv2 model at FST = 1st May), suggesting that no further sources of predictability aside from the eMED and Niño are present in the models (not shown). Note that other factors as those related to the interbasin interactions, which could introduce counteracting effects, are not considered in this analysis (Polo et al. 2008; Losada et al. 2012; Suarez-Moreno et al. 2018).

Focusing on FST = 1st July, Fig. 14a shows different cases in models’ behavior. Whereas in observations the precipitation variance explained by eMED is much larger than the one explained by Niño3, in models there are cases in which (1) the eMED explain more percentage than the Niño3 (e.g., CMC2-CanCM4, GEN-NEMO, COLA-RSMAS-CCSM3, NCEP-CFSv2), (2) others in which Niño3 explain a larger percentage of precipitation variability than eMED (e.g., CMC1-CanCM3, CanCM4i, COLA-RSMAS-CCSM4, the 4 GFDLs models and NASA-GEOSS2S), and (3) others in which the SSTs (eMED and Niño3) explain a very small percentage (both IRI-ECHAM models).

Finally, in Fig. 15 we decompose rainfall ACC skill score provided by each model in the contributions coming from each predictor and the residue. To do so, we follow Eq. (5). The eMED/Niño3/residue contribution to precipitation prediction ACC skill score must be understood as the part of the precipitation ACC skill score which is explained by eMED/Niño3/residue, respectively. In Fig. 15 we only show results for the first two FST (1st July and 1st June). The rest of the FST can be found in the supplementary material (see Sect. 7). Even though the eMED and Niño3 indices did not account, in general, for a great part of the modeled rainfall variability, using only the eMED and Niño3 SST indices, a large fraction of the ACC skill scores obtained for Sahel precipitation in Fig. 3 can be explained for most models and FST. Among them, the Niño3 is the main contributor to these precipitation ACC skill scores (see Fig. 15a, b and Sect. 7 in the additional material).

Fig. 15
figure 15

Contribution of eMED, Niño3 and residue to JAS ACC precipitation skill score. Boxes marked with discontinuous black lines represent the value of the correlation between PCPnmme and PCPobs (PCP skill). Boxes in colors represent the part of the PCP ACC skill score which is due to each one of the predictors considered in the multiple linear regression model. Results considering oceanic indices from HadISSTv.1 and precipitation index from GPCPv2.3

The contribution of eMED to precipitation skill is weak and its role is mainly restricted to FST 1st July to 1st May (see Fig. 8a, b and Sect. 3 in supplementary material). For these FST, it is found that the greater eMED ACC skill score in models, the larger contribution of eMED to precipitation skill (Fig. 8c, d, Table 4). Focusing on FST = 1st July, when most models present skill to predict eMED, it is found the larger eMED skill, the larger precipitation prediction ACC skill scores (see Fig. 8a). These results suggest that increased skill in Sahel precipitation can be expected with improved predictions of eMED SSTs and improved simulation of the Niño3-Sahel precipitation teleconnection.

4 Discussion

Results of this study suggest that, although most of the NMME models show poor skill for predicting precipitation over the Sahel, the better they represent SST signals and SST-rainfall teleconnections, the better the skill in predicting rainfall. These results highlight the importance of ocean variability for the predictability of Sahelian rainfall and that the simulation of teleconnections is a key element to consider for a correct forecast.

As previously shown, models in general lack skill to predict rainfall over Sahel, although there are some of them which present statistically significant ACC skill scores for specific FSTs (see Sect. 3.2). Results are similar for the multi-model mean, which presents statistically significant skill scores only for FST = 1st June and 1st Feb (see Fig. 3). These low skill scores are in striking contrast to the statistically significant ACC skill scores obtained by Giannini et al. (2020), especially for the multi-model ensemble, which presented statistically significant skill scores for JAS Sahel precipitation well above the significant threshold for FSTs up to 5 months before. Several factors could contribute to these contrasting results, like the number of models considered (5 vs 15), the zonal domain used to define the Sahel (20ºW–40ºE vs 15ºW–15ºE), the period of study (1982–2016 vs 1982–2010) and the type of observational dataset employed to assess skill scores (CHIRPS (Funk et al. 2015) vs GPCP and CRU). However, we find that the main reason for such differences is related to the methodology used to preprocess the forecasts. While Giannini et al. (2020) did not remove the long-term trend in the data, we remove it because we focus on the interannual precipitation variability over the Sahel. In Fig. 16a, b we show the skill of 5 NMME models when the trend is removed (Fig. 16a) and when the trend is not removed (Fig. 16b). In this figure and, as in Giannini et al. (2020), we only consider one model per modelling group. From the comparison of these two figures we can see that when the trend is not removed (as in Giannini et al. 2020) precipitation skill for most models and FST increases. Results in Fig. 16b are more consistent with the ones obtained by Giannini et al. (2020) and suggest that a part of the significant skill scores obtained in that work are related to the long-term trend. Nevertheless, regardless of the actual values of the ACC skill scores, both studies agree on the key role of the ocean and, in particular, ENSO, in the seasonal predictability of Sahel rainfall.

Fig. 16
figure 16

a Precipitation prediction skill in NMME models removing trend in data and b without removing the trend. Here, we consider 5 models, one per modeling group. The horizontal black dot line represents the threshold level considering a two tailed t-test with a 95% confidence level

There is another factor which merits further discussion: the role of the equatorial Atlantic. As stated in the introduction, studies show that an anomalous warming over the equatorial Atlantic (Atl3) reduces precipitation over the Sahel (see references in the introduction), and viceversa for a cooling. The sign of the correlation between the Atl3 and rainfall coincides with the one from Niño3—precipitation teleconnection. Rodríguez-Fonseca et al. (2009) showed that, after 70 s, the Atl3 and Niño3 started to be connected in such a way that anomalous warming over the equatorial Pacific would be concomitant with anomalous cooling over the eastern equatorial Atlantic. Taking into account that our period of study is after 70 s, warmer Niño3 events could be coexisting with cooler Atl3 events, counteracting their effects (Losada et al. 2012). This could be a reason for the nonsignificant correlation found in Fig. 1 over the equatorial Atlantic.

5 Conclusions

The objective of this study is to analyze the PCP prediction skill over the Sahel in 15 seasonal forecasting models from NMME, understanding where the skill or lack thereof comes from and what models need to improve to get better precipitation predictions.

The forecast skill is analyzed considering the ACC skill score, RMSE and MSESS. Results show that the precipitation ACC skill scores over Sahel are low and that most of models present negative MSESS, indicating that models forecast skill in terms of MSE is, in general, worse than considering the climatological forecast. The multi-model mean shows one of the best results in terms MSESS and ACC, suggesting that, although its precipitation prediction is still worse than the climatological forecast (because the negative MSESS values observed in Fig. 4b), the pooling of models leads to a forecast skill greater than the majority of single model systems (see Figs. 3 and 4b). On the other hand, most models present an RMSE lower than 3 mm/day. The model which provides a better representation of the precipitation (lower RMSE) depends on the FST and NASA-GEOSS2S shows the highest RMSE values for all the FST.

In general, results of this paper highlight the importance of El Niño and the Mediterranean Sea surface temperature in explaining rainfall predictability. Although the precipitation ACC skill scores over Sahel are low, the better the SST variability and SST-rainfall teleconnections is represented by models, the higher the precipitation ACC skill scores. The starting hypothesis of this study is that in observations roughly half of the total variance of Sahel precipitation at interannual timescales can be explained by its teleconnections with SSTs in the tropical Pacific and Mediterranean regions. Given that SSTs can provide long-term memory to the climate system, these areas could also be sources of predictability for Sahel rainfall. In this work we show that for the 1982–2010 period, the main sources of interannual variability for Sahel rainfall are the SST anomalies over the eMED and Niño3 (see Fig. 1), explaining together up to 58% of the total precipitation variance. However, this percentage is reduced in models (see Fig. 14 and Table 5). Two reasons could be behind of this result. The most important is the lack of skill of the NMME models to correctly reproduce the SST anomalies over eMED (see Sect. 8 in additional material), which is the main source of precipitation variability in observations (Fig. 1 or the bar from observations on Fig. 12). In general, models presents low eMED ACC skill scores and negative MSESS values, indicating that models do not reproduce the observed variability of the SST over eMED and that the eMED SST forecast is, in terms of MSE, worse than the climatological forecast. Additionally, when most of the models present statistically significant ACC skill scores to predict eMED (FST = 1st July), it is also found that the larger eMED ACC skill score in NMME models, the larger precipitation prediction ACC skill score (Fig. 8a). However, although models correctly reproduce the sign of the eMED—precipitation teleconnection (see Fig. 7, first column), they strongly underestimate the amplitude thereof (see Fig. 7, second column). This could be another reason for the lack of precipitation skill and the reduced percentage of total variance of precipitation explained by eMED and Niño3 indices in models. The underestimation of the amplitude of this teleconnection could be related to the lower variance of the SST anomalies over this region in models (see Sect. 2 in additional material), making the teleconnection weaker.

On the other hand, all the models present good skill for predicting the variability of the SST anomalies over Niño3 (Fig. 10) and also for reproducing the sign of the Niño3—PCP teleconnection (Fig. 12, first column), although most of them underestimate the amplitude (Fig. 12, second column). These results make the Niño3, unlike observations, to be the main contributor to the predictability of precipitation (and precipitation skill) in models (see Figs. 14 and 15), a result that agrees with a recent study of Giannini et al. (2020). Although models present a good skill for predicting the variability of the SSTa in the equatorial Pacific (high Niño3 ACC skill scores), most of them only show positive MSESS values for the first two FST, suggesting that the Niño3 SSTs forecast given by them is, in terms of MSE, better than the climatological forecast only for these two first FST.

The variance of the SSTa over Niño3 region in models is similar to the one in observations (see Sect. 2 in additional material). Thus, the underestimation of the teleconnection Niño3—precipitation amplitude could be mainly related to difficulties in models to correctly reproduce the intensity of the teleconnection mechanism. On the other hand, it is also found the larger correlation Niño3—precipitation in models, the larger contribution to precipitation ACC skill score in models (see Fig. 13c, d, Table 4) and precipitation ACC skill score (see Fig. 13a, b, Table 4). Additionally, results from Fig. 15 suggest that the election of the predictors is appropriate, given that most of the models’ skill is explained considering eMED and Niño3 as predictors.

Therefore, results from this study suggest the models need two requirements for having a better precipitation prediction skill: an improvement of the models’ ability to reproduce the SST anomalies variability over the eMED region and a better simulation of the amplitude of the teleconnections: Niño3-precipitation and eMED-precipitation.

Finally, is it important to mention that other sources of predictability should be considered in the future, as this study is just valid for the assessment period used. SST—rainfall telconnections are not stationary on time and Mediterranean, Atlantic, Indian and Pacific oceans could exert different impact on Sahel rainfall depending on the time period considered. Also, interbasin teleconnections could be counteracting or adding their effects on the Sahel rainfall depending on the period considered, a feature that should be checked with observations when analyzing the seasonal prediction system of study.