Introduction

The African continent is characterized by different weather and climatic conditions, and morphological and cultural features more than any other continent on Earth. This diversity of the weather, climate and ecosystems reflects the need for reliable information on all temporal and spatial scales. According to the sixth Intergovernmental Panel on Climate Change (IPCC) report, hot extremes are increasing and are projected to continue throughout the 21st century in Africa, while the frequency and intensity of heavy precipitation events are projected to also increase everywhere in this continent (IPCC 2021). Food production is strongly affected by these conditions, where a lot of African regions depend and will continue on rain-fed agriculture (Hertel et al. 2010; Bacci et al. 2020). Moreover, approximately 31% of the total food-insecure population of the world resides in Africa (CDKN 2019).

To tackle the vulnerability to climate change and low adaptation capacity in Africa, multi-source input from multiple levels integrated and translated into tailor-made information to decision-making and governance is urgent more than ever. The EU-funded Horizon 2020 project AfriCultuReS (AfrCRS from now on), entitled “Enhancing Food Security in African Agricultural Systems with the Support of Remote Sensing” uses Earth-Observational (EO)-based products, meteorological and climate data to develop an integrated agricultural monitoring and early warning system for Africa (pilot countries: Tunisia, Niger, Ghana, Ethiopia, Kenya, Rwanda, Mozambique, South Africa) in order to support decision-making in the field of food security (Alexandridis et al. 2021). It combines EO-based products, meteorological forecasts and climate data and projections to deliver a service portfolio with seven service categories (Alexandridis et al. 2019; Cherif et al. 2021a, b; Karypidou et al. 2022a, b; Kganyago 2021; Kganyago et al. 2021, 2020). Under this framework and with respect to this study, the“Weather Forecast Service” provide deterministic weather forecasts at continental scale up to 180th forecast hour (7.5 days) by utilizing a global weather forecasting system.

Over the last decades, the evolution of numerical weather prediction (NWP) models in conjunction with the advent of more powerful computer hardware paved the way for more reliable and accurate medium-range global weather forecasts with increased spatial resolution. Computer technology and more scalable codes now permit computations at rates of PetaFlop per second, which positively affects the spatial discretization of the global NWP models. Leading NWP centers such as the European Centre for Medium-Range Weather Forecasts (ECMWF) and the NOAA’s National Centers for Environmental Prediction (NCEP) currently produce medium-range global weather forecasts at native grids of  9 and  13  km, respectively. In addition, the weather forecast skill has been increasing over the past 40 years at a rate of one day per decade (Bauer et al. 2015). For example, today’s 5-day forecast is as accurate as the 6-day forecast a decade ago. This is due to the (a) increased vertical and horizontal resolution, (b) improved physics and model dynamics, and (c) continuously improvement of the models’ initial conditions by employing innovative data assimilation techniques (Kalnay 2002; Navon 2009; Courtier et al. 1994), better use of observations (Bauer et al. 2006; Janisková and Lopez 2013; Ma et al. 2015, 2017; Yin et al. 2019) and treatment of their associated errors (Buehner 2005; Caron et al. 2019).

The continuous verification of global NWP systems is important for quantifying progress in reliability and accuracy. It can be implemented both for raw model guidance (Haiden et al. 2021) and post-processed human-generated forecasts (Novak et al. 2014). Since the availability of robust ground-based measurements is crucial for any verification procedure, the majority of verification studies are performed over regions with vast number of observations subjected to quality control (Caron and Steenburgh 2020; Haiden et al. 2021; Gowan et al. 2018; Charles and Colle 2009; Zheng et al. 2012; Durai et al. 2021; Durai and Roy Bhowmik 2014; Sridevi et al. 2018; Yang et al. 2006; Morcrette 2002; Kerns and Chen 2014; Prakash et al. 2016; Dias et al. 2018).

In the data-limited Africa, the sparse network of available surface observations introduces additional uncertainty to the verification of the model performance, which also acts synergetic to the observational uncertainty. The latter may explain why the verification studies that focus over regions in Africa are rather limited and focus only in single events or short periods of time. For example, Kipkogei et al. (2016) explored the forecast skill of precipitation over the Greater Horn of Africa by ingesting a number of global NWP models into a super-ensemble prediction scheme. Moses and Ramotonto (2018) investigated the performance of NCEP/GFS and ECMWF/IFS NWP models during Tropical Cyclone (TC) DINEO which affected the southern Mozambique and Botswana on 12–17 February 2017. They found that both models underestimated the intensity of the cyclone but were able to predict the locations of the observed maximum rainfall. Fu et al. (2013) assessed the forecast skill of the Madden-Julian Oscillation (MJO) observed during the DYNAMO (Dynamics of the MJO)/CINDY (Cooperative Indian Ocean Experiment on Intraseasonal Variability in 2011) field campaign in three models including the NCEP/GFS suite. They stretched out the importance of the air-sea coupling in the models as it affected greatly the predictability of the MJO. Kniffka et al. (2020) verified three global and one regional model over the southern West Africa (SWA) against a vast amount of observations retrieved during the Dynamics-Aerosol-Chemistry-Cloud Interactions (DACCIWA) project (Knippertz et al. 2015). Although the NCEP/GFS was not among the verified models, their results indicated a dry bias in daily rainfall for all the models under examination and poor representation of local features that modulate precipitation and cloud conditions in this region. Milton et al. (2017) showed that between June to September 2012, all the 1200 UTC forecasts produced by different NWP operational centers were able to capture the seasonal mean precipitation over the West African Monsoon region but presented a range of behavior regarding the diurnal cycle of convection.

The aim of this study is twofold. The first one is to assess the overall performance of the NCEP/GFS over African regions with different weather and climatic conditions by utilizing available surface and EO data within the framework of the AfrCRS Weather Forecast Service. From its deployment and forth, a number of high impact precipitation events affected some of the areas of interest. Thus, the second goal of this study is the investigation of the forecast skill of precipitation with respect to these events.

The remainder of this paper is organized as follows: In section “Data and Methodology”, a brief description of the numerical model (subsection “The Global Forecast System”) and the utilized observational and reanalysis data (subsection “Observational/Reanalysis data”) is given, while subsection “Verification methodologies and metrics” presents the verification methodology. Section “Results and Discussion” presents and discusses the results of the verification procedure. Section “High impact precipitation events” presents the selected high impact precipitation events and evaluates the performance of the NCEP/GFS with respect to these events. Section “Conclusion” summarizes and concludes the findings of this study.

Data and methodology

This study utilized a global weather numerical model, available surface observations, Earth Observational (EO) and reanalysis data. A short description of each component is given next.

The global forecast system

The National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) is a global numerical weather prediction model initially developed in the late 70’s (Sela 1980). Since then and until mid-2019, the model ran on spectral mode and underwent several major improvements regarding its spatial resolution (e.g., from 375 to 13 km), model initialization cycles, forecast window, dynamics, physics and data assimilation suites (Caplan et al. 1989; Kalnay et al. 1998; Han and Pan 2011; Wang et al. 2013, 2018). On June 2019, the Finite Volume Cubed Sphere (FV3) dynamical core (Lin 2004; Putman and Lin 2007) was deployed on GFS v15.0 (13 km/64 vertical layers) in the NCEP Production Suite (NPS), which replaced the Global Spectral Model in GFS. On March 2021, the GFS was upgraded to version 16.0, introducing 127 vertical layers and an extended model top up to the mesopause (approx. 80 km height) among other major upgrades. More details can be found at NCEP/GFS (2022).

Observational/reanalysis data

Surface observations

Available surface observations over the entire African continent (20\(^\circ\)W to 55\(^\circ\)E and 40\(^\circ\)S to 40\(^\circ\)N) were retrieved and post-processed in order to be utilized in the verification procedure. SYNOP reports were obtained from the European Centre for Medium Range Weather Forecasts (ECMWF) at 3-h intervals. Since these reports include a number of variables, only the 2 m air temperature and dew point temperature, the mean sea level pressure and the 10 m wind speed were extracted. METARs were downloaded from the University of Wyoming with up to 30 min temporal resolution. They were utilized after extensive quality control and only in cases where no SYNOP records were available. In total, 202 stations (Fig. 1) were taken into consideration in the verification of the model.

Fig. 1
figure 1

Locations of the utilized surface observations for the verification of the NCEP/GFS. The colored dots represent the number of the available time records in each surface station between 01 June 2018 and 31 May 2020

Earth-observational data

For the verification of predicted precipitation, the Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) (IMERG; Huffman et al. 2019) was used. The implementation of the IMERG algorithm (V06) provides a global high-resolution satellite-based product, which estimates the precipitation from the various precipitation-relevant satellite passive microwave (PMW) sensors comprising the GPM constellation, at 0.1\(^\circ \times\) 0.1\(^\circ\) lon-lat spatial resolution. Here, the GPM Level 3 IMERG “Final” Half Hourly (GPM_IMERGHH) and Daily (GPM_IMERGDF) satellite-gauge products were utilized, which combine both forward and backward morphing and include monthly gauges analyses for the estimation of the precipitation over the regions of interest. The IMERGV06 product has been evaluated positively over complex topography (Derin et al. 2019), while an earlier version (IMERGV04) exhibited a better agreement with gauge data in East Africa and humid West Africa than in the southern Sahel (Dezfuli et al. 2017).

ERA5 reanalysis

The ERA5 reanalysis product embodies a detailed record of the global atmosphere, land surface and ocean waves from 1950 onwards (Hersbach et al. 2020). It is produced by ECMWF, and it is based on the Integrated Forecasting System (IFS) Cy41r2 with horizontal resolution of 31 km, 137 vertical layers and hourly output. In addition, it utilizes an uncertainty estimate from an ensemble in order to assess the evolution of the ingested observing systems. Overall, it shows an improved performance against its predecessor, the ERA-Interim reanalysis (Dee et al. 2011). However, there are some known issues such as the large cold bias in the lower stratosphere, the large warm bias near the stratosphere and the overly strong tropical westerly jet. The ERA5 is available from the Copernicus Climate Change Service (C3S; Thépaut et al. 2018) at a horizontal grid-spacing of 0.25\(^\circ \times\) 0.25\(^\circ\) (latitude-longitude).

Verification methodologies and metrics

The predicted atmospheric variables under examination were 2 m air temperature (TEMP) and 2 m-relative humidity (RH), mean sea level pressure (MSLP), wind speed (WIND) at 10 m above ground level (AGL) and 24-h accumulated precipitation (PREC). Verification was performed over the AfrCRS pilot countries, namely Tunisia (TUN), Niger (NIG), Ghana (GHA), Ethiopia (ETH), Kenya (KEN), Rwanda (RWA), Mozambique (MOZ) and South Africa (RSA) by utilizing a number of continuous and categorical statistical metrics.

From each 1200 UTC NCEP/GFS prognostic model cycle lying in the period from 01 June 2018 to 31 May 2020, the 12th to 180th forecast hours in 3-h intervals (1st to 7th forecast day) were extracted using the nearest model grid point to observation for all forecasted variables, but PREC. Regarding the predicted precipitation (in 24-h accumulations), a neighborhood based verification was performed (Clark et al. 2010; Pytharoulis et al. 2016). The IMERGV06 satellite data (GPM_3IMERGDF) were re-gridded to the GFS grid (0.25\(^\circ \times\) 0.25\(^\circ\) lon-lat), masked over each pilot country and compared with the 24-h accumulated precipitation of the model for different thresholds (1.0, 5.0, 10.0 and 50.0 mm) and distances (1, 3, 5, 7, 9 grid-points or 25, 75, 125, 225, 275 km), on each forecast lead day (T+36 h–T+180 h).

The aforementioned distances represent the width of the side of the square that is centered at each observation point and encompasses each neighborhood region. Thus, the actual size of the specified region was distances x distances. In each neighborhood region, at least 50% of its predicted grid-point values had to be equal or greater than the precipitation threshold in order to consider a forecast of the precipitation event. The latter means that for each precipitation threshold, a hit was obtained when the observed precipitation and at least 50% of the forecast values surrounding the observation (in the above-mentioned neighborhood region) were equal or greater than the corresponding precipitation threshold. Although precipitation had been discretized obtaining a multi-categorical variable, it was treated only as a dichotomous variable at different thresholds or distances in the verification process.

In addition, as the available SYNOP reports did not contain any relative humidity records, the latter was estimated according to:

$$\begin{aligned} \textrm{RH} = \frac{e}{e_s} \times 100, \end{aligned}$$
(1)

where e is the vapor pressure (hPa), given by the Clausius-Clapeyron equation (Stull 2017),

$$\begin{aligned} e = 6.11 \text {exp} \left[ \frac{L}{R_w} \left( \frac{1}{273.15} - \frac{1}{T_d}\right) \right] , \end{aligned}$$
(2)

where, \({T_d}\) is the dewpoint (K) at 2 m above ground level (AGL), L is the latent heat of vaporization (2.453 \(\times\) 10\(^6\) J kg\(^{-1}\)) and \(R_w\) is the gas constant for moist air (461 J kg\(^{-1}\)K\(^{-1}\)). The saturation vapor pressure \(e_s\) (hPa) is calculated according to Eq. 2, if \({T_d}\) is replaced with the air temperature (K) at 2 m AGL.

All statistical metrics (Tables 1 and 2) were computed with the Model Evaluation Tools (MET, v8.1.2; Newman et al. 2018). Instead of the Mean Error (ME, Table 1), the Multiplicative Bias (MBIAS, Table 1) was calculated for the variables that present zero as a lower bound, namely RH and WIND. The Fractions Skill Score (FSS; Roberts 2008; Roberts and Lean 2008) was employed for the verification of precipitation in section “High impact precipitation events”, as it determines the spatial scales at which the forecasts can be considered as skillful. In general, a forecast with perfect skill will have a FSS value equal to 1, while FSS equal to zero means no skill. However, there is a minimum FSS value (herafter UFSS; Roberts and Lean 2008) over which the model forecast is considered useful (target skill). For spatial scales where FSS < UFSS, the model output might be regarded as stochastic noise.

Table 1 Definition of statistical metrics used in the verification process
Table 2 Contingency table with the definitions of hit (H), false-alarm (F), misses (M) and correct rejections (CR) at each precipitation threshold

Results and discussion

Model verification

Continuous variables

The verification metrics were aggregated over the eight AfrCRS pilot countries and the entire verification period (01 June 2018 to 31 May 2020). The bootstrapping statistical method was utilized in order to define the confidence intervals at 95% significance in each verification metric. It is also acknowledged that the use of METARs introduces some errors due to the truncation and rounding of the recorded measurements in these reports.

On average, the model slightly underestimated TEMP (ME = \(-0.09 \pm 0.11\) K; Table 3) over the entire forecast window (T+12 h–T+180 h). However, TEMP presented strong diurnal behavior as it was overestimated the most at 0600 UTC, while its maximum underestimation occurred at 1800 UTC (Fig. 2a). As the forecasts under verification covered regions with different time zones, the aforementioned times in UTC correspond to different local times, ranging from 0 to +3 h with respect to UTC. The average RMSE was found equal to \(2.62 \pm 0.12\) K (Table 3). Moreover, GFS underestimated RH (Fig. 2b) and MSLP (Fig. 2c) throughout the entire forecast window and their RMSEs ranged from 13.58 to 16.94% and 2.46 to 5.46 hPa, respectively. WIND was overestimated approximately at all forecast lead times (MBIAS = \(1.19 \pm 0.5\)) and its RMSE exhibited a diurnal variation lying between 3 and 4.68 ms\(^{-1}\) (Fig. 2d).

Fig. 2
figure 2

Mean Error (ME, blue solid line) and Root Mean Square Error (RMSE, red dashed line) of 2 m air temperature (a) and Mean Sea Level Pressure (c), Multiplicative bias (MBIAS, left y axis, blue solid line) and RMSE (right y axis, red dashed line) of 2 m rel. humidity (b) and 10 m wind speed (d) as a function of forecast lead time, aggregated over the AfrCRS pilot countries. The bootstrap confidence intervals at the 95% significance are also plotted. The 1200 UTC forecast cycle is used

In addition, the performance of the GFS in each pilot country for each variable was also examined. The model predicted higher temperatures (TEMP) than the observed ones in TUN (\(0.51 \pm 0.07\) K), NIG (\(1.23 \pm 0.08\) K) and RSA (\(0.19 \pm 0.12\) K), while it underestimated their magnitudes in ETH (\(-0.33 \pm 0.11\) K), GHA (\(-0.46 \pm 0.06\) K), MOZ (\(-0.24 \pm 0.07\) K), KEN (\(-0.93 \pm 0.09\) K), and RWA (\(-0.75 \pm 0.26\) K). The diurnal variation of the ME through forecast lead time is evident with most notable the shifting from underestimation to overestimation or vice-versa between the cold and warm hours of the day (Fig. 3a). The regional to local climate conditions also affected the performance of the model. For example, Zheng et al. (2012) found that GFS presented a large and cold bias in land surface temperatures over arid areas during daytime, which affected also the temperature at 2 m AGL. Here, the model presented a cold bias during the day in TUN (arid regions at southern parts), ETH (arid regions mostly at east), MOZ (semi-arid to dry sub-humid conditions) and RSA (arid regions at west) and a warm bias on daytime in RWA and GHA. In KEN, the modeled air temperature at 2 m AGL was cold biased only. Although hyper-arid or arid conditions are met in NIG, the model presented a warm bias in 2 m air temperature AGL over the verification period. The latter can be attributed to the distribution of the available observations in NIG, which are located mostly at the southern parts of the country (Fig. 1). Moreover, the RMSE values lied from 1.6 K (KEN, T+18 h) to 3.97 K (RSA, T+168 h) (Fig. 3b).

Fig. 3
figure 3

Mean Error of TEMP, MSLP/Multiplicative Bias of RH, WIND (left column) and Root Mean Squared Error (right column) as a function of forecast lead time (3-h intervals) for 2 m air temperature (TEMP; (a) and (b)), 2 m relative humidity (RH; (c) and (d)), Mean Level Pressure (MSLP; (e) and (f)) and 10 m wind speed (WIND, (g) and (h)) against available SYNOP and METAR reports, for the period 01 June 2018–31 May 2020. The 1200 UTC forecast cycle is used

Regarding the verification of RH, on average, GFS underestimated it in all AfrCRS pilot countries (TUN, \(0.9 \pm 0.01\); NIG, \(0.69 \pm 0.01\); GHA, \(0.94 \pm 0.0\); ETH, \(0.88 \pm 0.01\); KEN, \(0.97 \pm 0.01\); RWA, \(0.93 \pm 0.03\); MOZ, \(0.95 \pm 0.0\); RSA, \(0.95 \pm 0.01\)). However, RH was overestimated between afternoon and late evening hours in RWA throughout forecast lead time (Fig. 3c), while in KEN this overestimation was shown only at 1500 UTC over the entire forecast window. Moreover, the RMSE values ranged from 7.58% (KEN, T+12 h) to 27.56% (RWA, T+141 h) (Fig. 3d). Since the latent heat term (L) in Eq. 2 slightly varies with temperature, we evince that these differences introduce very small changes to the calculated RH values (often smaller than the uncertainties in measurements of temperature or humidity). However, we additionally investigated the errors of the dew point temperature at 2 m. The verification of this parameter suggested that the above-mentioned errors do not contribute to the RH errors significantly. The statistical verification of the dew point temperature at 2 m can be found in supplementary material (Tables S2, S3; Figs. S1, S2).

Overall, the model underestimated MSLP in all countries under examination but TUN (\(0.11 \pm 0.6\) hPa). MSLP was underestimated the most in ETH (\(-6.14 \pm 0.46\) hPa) and RWA (\(-6.11 \pm 0.11\) hPa) and the least in MOZ (\(-0.19 \pm 0.09\) hPa). The aforementioned suggests that the model tends to predict more intense low pressure systems and weaker high pressure systems over ETH and RWA. These large deviations are probably linked with the complex terrain over these regions, where the relative coarse resolution of the model cannot capture the local variations of the topography at 0.25 degree horizontal resolution. Last but not least, approximately 50% of the RMSE values of MSLP (Fig. 3f) lied below 1.5 hPa, while 95% of them was found below 7 hPa. The lowest (highest) RMSE was equal to 1.02 hPa (RSA, T+21 h) (12 hPa at ETH on T+168 h) throughout the forecast lead time (Fig. 3f).

Regarding the WIND variable, GFS predicted lower wind speeds than the observed ones in RWA (MBIAS: \(0.48 \pm 0.04\)) and RSA (MBIAS: \(0.56 \pm 0.03\)) and higher in rest. In the aforementioned countries (RWA and RSA), the modeled WIND values were almost half against the observed ones, while in GHA and NIG the model simulated approximately twice higher speeds than the observations (MBIAS equal to \(1.76 \pm 0.05\) and \(2.17 \pm 0.06\), respectively). Worth of notice is the RMSE of WIND in RSA, which presented values greater than 6.5 ms\(^{-1}\) through the entire forecast window (Fig. 3h), with a total maximum of 13.08 ms\(^{-1}\) (T+165 h). There large deviations can be attributed to either poor representation of atmospheric boundary layer processes in the region by the model (Nagarajan et al. 2015) or poor quality of the wind observations.

Table 3 The Mean Error (ME), Multiplicative Bias (MBIAS) and Root Mean Square Error (RMSE) of the GFS simulated 2 m temperature (TEMP), 2 m relative humidity (RH), mean sea level pressure (MSLP) and 10 m wind speed (TEMP) temporally averaged over the entire forecast window (T+12 h–T+180 h)

Precipitation

In total, 719 days of available IMERG precipitation observations (in 24-h accumulations, 00:00:00 UTC to 23:59:59 UTC) were retrieved and re-gridded to the GFS grid (mass-conserving budget interpolation). According to IMERG data (Table S1), on average, GHA and RWA were receiving 3.4 mm day\(^{-1}\) over the verification period, whereas the mean daily precipitation exhibited the highest variability in ETH (stddev 1.3 mm and IOD = 0.7, Table S1). On the other hand, the maximum 24-h accumulated precipitation was estimated to 258.1 mm in MOZ (26th of April 2019, during Tropical Cyclone KENNETH), while during TC-IDAI (04–20 March 2019, subsection “Intense Tropical Cyclone (TC) IDAI”) the daily maxima (over land) were above 100 mm between 05–09 March 2019 and 15–18 March 2019.

A Taylor diagram (Taylor 2001), such as in Fig. 4, provides a summary of the degree of correspondence between model and observed values by utilizing the correlation coefficient (azimuthal position, intervals between solid lines), the root-mean-square (RMS) error (centered distances from REF, dotted semi-circles) and the standard deviation (radial distances from REF quarter-circle) as performance metrics. Here the RMS error and standard deviation metrics are normalized in order to avoid widely varying numerical values. Moreover the relative bias metric is added (triangles), in order to present the direction of the errors. According to Fig. 4, different variability (measured as the radial distance from the REF dashed quarter circle) between simulated and observed precipitation existed in each forecast lead day (numbers) and country (colors) under examination. Specifically, the model presented lower variability in mean daily precipitation than the observations at all forecast lead days in TUN and higher variability at the rest pilot countries (expect for NIG at forecast lead day 1). The highest model variability was obtained for GHA (black coloring) in which the highest RMS errors are also presented. This region in West Africa is prone to low predictive skill since most of the rainfall is produced by organized mesoscale convective systems (MCSs) and the West African monsoon system (Fink et al. 2006; Maranan et al. 2018; Mathon et al. 2002). It is also worth mentioning that the model performance was inversely proportional to the forecast lead time in GHA (Fig. 4, black triangles and cycles), since the best (worst) RMS errors appeared for lead day 7 (1). Similar to GHA, the model presented high RMS errors and almost doubled standard deviation values against observations in RWA (light green colors). In addition, the model exhibited consistency with respect to the RMS errors throughout forecast lead time in TUN and MOZ. However, in TUN the model was drier (negative relative bias) than the observations, while in MOZ wetter conditions were predicted. Last but not least, in TUN, NIG, ETH, KEN and RSA, the mean daily predicted precipitation was highly correlated (\(> 0.9\)) with observations in each forecast lead day. GHA and RWA present moderate and high correlations (\(>0.6\) and \(>0.7\), respectively), while in MOZ the mean daily simulated precipitation was strongly (0.8–0.9) correlated with the observed values.

Fig. 4
figure 4

Normalized Taylor diagram of performance statistics comparing the mean daily forecasted precipitation and observed (IMERGV06), for each forecast lead day in each AfrCRS pilot country, for the period 01 June 2018–31 May 2020. The lower absolute bias among AfrCRS pilot countries for each forecast day is represented by a filled background. The bias metric here is defined as 100*[(mean_forecast \(-\) mean_observed)/mean_observed]. x-axis refers to the standardized deviations of the observations

The results of the neighborhood-based verification for the 24-h accumulated precipitation at thresholds of 1, 5, 10 and 50 mm and for distances of 25, 75 and 125 km (1, 3 and 5 grid-points) are presented in Fig. 5. The model predicted correctly approximately 69% of observed precipitation events that were equal or greater than 1 mm d\(^{-1}\) at forecast lead day 1 and at 25 km distance (POD values, black filled bars, Fig. 5a), while at 75 km (125 km) distance the POD was equal to 0.72 (0.74) (Fig. 5b,c). In addition, at 25 km distance the POD ranged from 0.69 to 0.59 through the forecast lead days (Fig. 5a). The maximum POD was found equal to 0.76 at 275 km distance at forecast lead day 1 (not shown). Regarding the behavior of the model to false alarms (FAR), at 1 mm threshold and at 25 km distance (Fig. 5a, white filled bars), roughly 42% of the forecasted events at lead day 1 were not observed by the IMERG product, whereas approximately 52% of the forecasted events did not occur at day 7. Considering that the FAR values were slightly decreased at greater distances in each corresponding forecast lead day (Fig. 5b,c) (e.g., the FAR dropped to 0.36 at 275 km distance; not shown), the aforementioned indicate that the model was able to adequately capture precipitation events (equal or greater than 1 mm d\(^{-1}\)) at daily temporal scale but missed their spatial distribution (location).

At distance of 25 km and at threshold of 1 mm d\(^{-1}\) the ETS ranged from 0.32 to 0.22 through forecast lead days (dark green bars, Fig. 5a), while at the threshold of 10 mm d\(^{-1}\) the corresponding values lied between 0.18 and 0.1 (Fig. 5g). At 50 mm d\(^{-1}\) threshold, the ETS values dropped below 0.1 at all distances under examination (Fig. 5j,k,l). Moreover, at forecast lead day 1, at distance of 125 km and at threshold of 1 mm d\(^{-1}\) (5 mm d\(^{-1}\)) the ETS was found equal to 0.37 (0.27) (Fig. 5c,f). The maximum values of ETS for each forecast lead day were obtained at 275 km distance and the total maximum for ETS was equal to 0.4 at the threshold of 1 mm d\(^{-1}\) and at forecast day 1 (not shown).

Fig. 5
figure 5

Probability of Detection (POD, black filled bars), False Alarm Ratio (FAR; white filled bars) and Equitable Threat Score (ETS, dark green bars) of the GFS 24-h accumulated precipitation, at 1 mm (first row), 5 mm (second row), 10 mm (third row) and 50 mm (fourth row) thresholds, for neighborhood distances of 25 km (left column), 75 km (center column) and 125 km (right column), across forecast lead day 1 to 7. The confidence intervals at the 95% significance are also plotted for POD and FAR bars. The 1200 UTC forecast cycle is used

High impact precipitation events

In this section we briefly present the prevailing synoptic conditions (Section “Synoptic Overview”) during three high impact precipitation events that affected Mozambique (MOZ), Tunisia (TUN) and Ghana (GHA), respectively. As these events occurred during the verification period of the model, we examine its performance with respect to its predicted precipitation (Section “Model performance on selected case studies”) in a number of forecast cycles prior to each event.

Synoptic overview

Intense Tropical Cyclone (TC) IDAI

IDAI was a lethal Tropical Cyclone (TC) which greatly affected Mozambique, Zimbabwe and Malawi from 04 to 20 March 2019 (Fig. 6). Originally as a tropical depression formed near the coasts of Mozambique (over the sea), produced heavy precipitation and flooding across Mozambique and Malawi. From 07 March 2019 and onwards the system started to propagate eastwards into the Mozambique Channel due to the flow in the upper troposphere. Han et al. (2019) showed that the Sea Surface Temperatures (SSTs) inside the Mozambique Channel are warmer than that east of the Madagascar Island and the highest SSTs in the channel appear in March. The latter contributed positively to the further deepening of the cyclone upon its arrival in the channel on 09 March 2019. On that day at 1800 UTC, the tropical depression was intensified to tropical storm since the prevailing conditions at upper troposphere (low wind shear between 200 and 850 hPa, Fig. 7a) in conjunction with the very warm SSTs in the Mozambique Channel (\(>29\) \(^{\circ }\)C, Fig. 7b) provided the perfect conditions for the storm. This radical intensification can be seen in Fig. 7c, where the upper-air divergence and low-level convergence take their maximum values on 09 March 2019. On 10th of March, it was classified as TC (with wind intensity equivalent to a category 1 hurricane in the Saffir-Simpson scale) between Madagascar and Mozambique, according to the Météo-France’s La Réunion tropical cyclone center (MFR, RSMC La Réunion). On 14th of March at 0000 UTC, TC IDAI’s center was located about 315 km east of Beira City in Sofala Province (central Mozambique), with maximum sustained winds of 194 km h\(^{-1}\) (53.9 m s\(^{-1}\)), gusts up to 280 km hr\(^{-1}\) (77.8 m s\(^{-1}\)) and minimum sea level pressure (MSLP) approximately equal to 940 hPa (equivalent to a Category 3 hurricane, Global Disaster Alert and Coordination System; www.gdacs.org, last accessed: 30/09/2022). Therefore, it reached the intensity of an Intense Tropical Cyclone. TC IDAI made landfall close to Beira City on 14 March at 2330 UTC (Fig. 6), where heavy rainfall (up to 150 mm, Fig. 6), strong winds and storm surge (maximum model estimation at 4.4 m; JRC 2019) affected the coast of Sofala region. According to Emergency Response Coordination Centre (ERCC), TC IDAI affected approximately 3 million people and caused over 600 fatalities.

Fig. 6
figure 6

Accumulated precipitation based on the IMERGV06 product (mm, shaded contours) from 0000 UTC 04 March 2019 to 0000 UTC 17 March 2019. The “best” track of the tropical depression (yellow solid line), the tropical storm (orange solid line) and the tropical cyclone (including the intense phase, red solid line) is illustrated from 0000 UTC 04 March to 1800 UTC 16 March 2019 for TC IDAI, according to the Météo-France’s La Réunion tropical cyclone center. Black dots refer to the location of the lowest mean sea-level pressure at 0000 UTC, while the white ones at 0600 UTC, 1200 UTC and 1800 UTC, respectively

Fig. 7
figure 7

a Average wind shear 200–850 hPa (ms\(^{-1}\), filled contours) from 07 March 2019 at 00 UTC to 11 March 2019 at 00 UTC (the embedded line plot shows the spatial average wind shear over the area between 18.5\(^{\circ }\)S\(-\) 13.5\(^{\circ }\)S and 37\(^{\circ }\)E\(-\)44\(^{\circ }\)E (green box)), b Sea Surface Temperature (\(^{\circ }\)C, filled contours) and mean sea level pressure (hPa, blue dashed contours) on 09 March 2019 at 1800 UTC and c hovmoller diagram of the divergence (x 10\(^{-5}\) s\(^{-1}\), filled contours) and omega (hPa h\(^{-1}\), contours) averaged over the area between 18.5\(^{\circ }\)S\(-\)13.5\(^{\circ }\)S and 37\(^{\circ }\)E\(-\) 44\(^{\circ }\)E (green box in (a) and (b) panels) according to the ERA5 reanalyses

Flashfloods in Tunisia

Between 17th and 18th of October 2018, torrential rain caused flash flooding events in northern Tunisia. The governorates of Tunis, Sidi Bouzid, Kef, Kasserine, Siliana, Nabuel and Zaghouan had been greatly affected, the road traffic network had been cut off, and schools had been closed, while the storm caused 5 fatalities.

The prevailing atmospheric conditions favored the development of severe thunderstorms over the wider area of Algeria and Tunisia, since a positive tilted closed long wave trough (Karacostas 2003; Karacostas et al. 2018) originally located north of Atlas Mountains (on 16 October 2018 at 1200 UTC) moved south-eastwards according to the ERA5 reanalyses. On 18 October 2018 at 0000 UTC (Fig. 8a), the upper air trough intensified further, while the associated surface low pressure system was located west of Tunisia. This tilted vertical structure along with the now negatively tilted cut-off low (Karacostas 2003; Karacostas et al. 2018) suggested that the system was reaching maturity, whereas kinetic energy was transforming into dynamic energy slowing down its velocity. High values of relative vorticity (Fig. 8a) in conjunction with positive vorticity advection (not shown) over the area of interest increased the thermodynamic instability (convective available potential energy (CAPE) \(> 2000\) J Kg\(^{-1}\), convective inhibition energy (CIN) \(> 100\) J Kg\(^{-1}\), K-Index \(> 30\) based on ERA5 reanalyses) in the next hours, where the greatest amounts of precipitation occurred between 1800 UTC 17 October and 0600 UTC 18 October (according to SYNOP reports).

The highest precipitation values over the accumulated period are shown at Tunis (140.4 mm, WMO ID 60715) and Kelibia (128.3 mm, WMO ID 60720) (Fig. 8b), while between 1800 UTC 17 October and 0600 UTC 18 October 2018 these two stations exhibited values of 108 mm and 82 mm, respectively.

Fig. 8
figure 8

a 500 hPa geopotential height (dam, black contours) and relative vorticity (\(\times\) 10\(^{-5}\) s\(^{-1}\), filled contours), and mean sea level pressure (hPa, blue dashed contours) on 18 October 2018 at 0000 UTC (X symbol denotes the location of the minimum pressure at surface) according to the ERA5 reanalyses, b accumulated precipitation in the wider area of Tunisia (black dotted box in (a) panel) according to IMERGV06 product (shadings) and SYNOP reports (red dots and black values) from 0600 UTC 16 October to 1200 UTC 18 October 2018 (World Meteorological Organization identification numbers with asterisks indicate missing data over the accumulation period)

Exceptional heavy rain in Ghana

Favorable conditions for heavy precipitation existed in Ghana between 17 and 21 October 2019. Although Central Ghana received higher amounts of rainfall in this period (Fig. 9a), flashfloods in the north-east regions of Ghana caused a number of fatalities as 29 people reported dead by the local media. Specifically, the regions that greatly affected were Bolgatanga, Kassena Nankana West, Bawku Municipal, Builsa North, Builsa South, Binduri, Talensi, Garu-Tempane and Bongo (red solid polyline administration, Fig. 8a), while a spillage from the Tono Irrigation Dam (yellow asterisk, Fig. 9a) flooded hundreds of farmland hectares in Kassena-Nankana Municipal (south of Tono Dam).

The time evolution of the area averaged values of the divergence (\(\times\) 10\(^{-5}\) s\(^{-1}\)) and omega (hPa h\(^{-1}\)) in the area of interest (6\(^\circ\)N–10\(^\circ\)N and 2.75\(^\circ\)W–0.75\(^\circ\)E, red dashed box in Fig. 9a) according to the ERA5 reanalyses are shown in Fig. 9b. Pairs of low-level convergence and the upper-level divergence were evident on 18 October 2019 (on this day flashfloods occurred in the upper east regions) and between 19 and 20 October 2019, allowing the lifting of the air masses over the region and contributed to the development of strong vertical motions. Relative vorticity values up to 3 \(\times\) 10\(^{-5}\) s\(^{-1}\) were obtained at early hours on 20 October 2019 (not shown), when according to the surface observations the higher accumulated precipitation amounts were recorded.

Fig. 9
figure 9

a Accumulated precipitation in the wider area of Ghana according to IMERGV06 product (shadings) from 0600 UTC 17 October to 1800 UTC 21 October 2019 and b hovmoller diagram of the divergence (\(\times\) 10\(^{-5}\) s\(^{-1}\), filled contours) and omega (hPa h\(^{-1}\), contours) averaged over the area between 6\(^\circ\)N-10\(^\circ\)N and 2.75\(^\circ\)W\(-\)0.75\(^{\circ }\)E (red dashed box in (a) panel) according to the ERA5 reanalyses

Model performance on selected case studies

In this section we evaluated the NCEP/GFS model for each high impact precipitation event, presented in Section “Synoptic Overview”, by calculating the Fractions Skill Score (FSS; Roberts 2008; Roberts and Lean 2008) for the 6-h simulated precipitation at the thresholds of 5 mm and 10 mm and for different distances (Section “Verification methodologies and metrics”). In this neighborhood verification approach, we considered only the previous eight initialization cycles of the model (at 1200 UTC) prior to the occurrence of the highest precipitation intensities in Tunisia and Ghana events and the previous seven cycles in TC-IDAI case. In order to estimate if the calculated FSS at each forecast lead time represented a skillful forecast, the uniform FSS (UFSS; Newman et al. 2018; Roberts and Lean 2008) was utilized and compared with the calculated FSS values. If FSS \(\ge\) UFSS then the forecast was considered as skillful at the corresponding spatial scale. For the TC-IDAI we also evaluated how the individual prognostic cycles were able to predict both the minimum mean sea level pressure (MSLP) and track of the tropical cyclone prior to its landfall.

TC IDAI (1200 UTC 08 March–0000 UTC 16 March 2019)

Figure 10a illustrates the tracks of the forecasted tropical depression, the tropical storm and the tropical cyclone together with the “best” track of the observed TC-IDAI from 1200 UTC 08 March to 0000 UTC 16 March 2019. The simulated tracks were derived by finding the location of the minimum MSLP at each available forecast time within the region of interest. The tracks generally follow the path of the actual cyclone in all prognostic cycles under examination, except in the 1200 UTC 08/03/2019 forecast cycle. The 1200 UTC cycles on 12/03, 13/03 and 14/03 predicted most accurately the location of the actual cyclone’s landfall (at 2330 UTC on 14/03). However, the 13/03 forecast delayed the time of its landfall by approximately 3 h (purple dashed line).

According to Fig. 10b, none of the GFS forecasts was able to forecast the intensity of TC-IDAI during its tropical cyclone stage (1800 UTC 10 March–0000 UTC 15 March 2019) with respect to its actual minimum MSLP. All the forecasts simulated a shallower cyclone without the observed rapid deepening between 0600 UTC 13 March and 0000 UTC 14 March 2019 (black line), albeit some of them predicted correctly the occurrence of the minimum MSLP (940 hPa) at 0000 UTC on 14 March 2019 (10/03: 986.3 hPa at T+84 h, 11/03: 984.4 hPa at T+60 h, 13/03: 992.8 hPa at T+12 h). The weakest development of the cyclone was exhibited on initialization cycle of 1200 UTC 08 March 2019.

Such deviations can be attributed to the initial conditions, parameterized boundary/surface layer, cumulus scheme, or in the formulation of the surface fluxes by the model. Davis (2018) have concluded that numerical weather prediction models with a relatively coarse grid interval underestimate the intensity of category 4 and 5 north Atlantic hurricanes. Last but not least, the role of the simulated Sea Surface Temperatures (SSTs) to the development of the cyclone was investigated and no significant deviations were found.

The minimum, mean and maximum distance (km) of the 6-h locations of the simulated tracks from the “best” track in each model cycle is displayed in Fig. 10c. The aforementioned metrics were calculated from the beginning of each cycle (T+0 h) up to 0000 UTC 16 March 2019. The smallest mean track errors were found at 1200 UTC 13/03 initialization cycle (62.4 km) and the largest (164.9 km) at 1200 UTC 08/03 cycle, respectively. The absolute minimum track error was encountered at 12 UTC 11/03 cycle (9.9 km at T+6 h), while the absolute maximum at T+180 h of the 1200 UTC 08/03 cycle (628.6 km).

Fig. 10
figure 10

a The track of the Tropical Cyclone IDAI according to the Météo-France’s La Réunion tropical cyclone center—(“best track,” solid black line) and the NCEP/GFS forecasts at different 1200 UTC initialization cycles from 1200 UTC 08 March 2019 to 0000 UTC 16 March 2019 (squares depict the center of the cyclone in GFS forecasts at the beginning of each cycle), b time series of the minimum MSLP (hPa) of the TC-IDAI according to RSMC La Réunion (black dotted line) and in NCEP/GFS forecasts,  c the minimum, mean and maximum distance (km) of the 6-h locations of the TC-IDAI in the GFS forecasts from the “best-track,” averaged from each 1200 UTC initialization time to 0000 UTC 16 March 2019

Regarding the past 6-h predicted accumulated precipitation at the threshold of 10 mm, GFS provided skillful forecasts up to T+96 h, at distances larger than or equal to 75 km (Fig. 11a). In specific, the FSS values averaged from T+6 h to T+96 h at the distances of 75, 125, 175, 225 and 275 km were found equal to 0.66, 0.74, 0.79, 0.82 and 0.84, respectively. Beyond T+96 h, the FSS values decrease bellow UFSS values at an average rate of \(-\) 0.014 per 6 h. Figure 11b displays the FSS values calculated from 1800 UTC 14 March to 1200 UTC 15 March 2019, six and twelve hours before and after cyclone’s landfall. According to Fig. 11b, higher FSS values are encountered in all distances where the model provided skillful forecasts at 125 km distances up to T+156 h.

Fig. 11
figure 11

Fractions Skill Score of the previous 6-h GFS predicted accumulated precipitation as a function of forecast lead time (hrs) for different Distances (km) against the IMERGV06 data, at the threshold of 10 mm a from 1200 UTC 08 March to 0000 UTC 16 March 2019 and b from 1800 UTC 14 March to 1200 UTC 15 March 2019, in TC-IDAI case. The solid lines in a and b represent the reference FSS (UFSS, see text) above which the forecast is considered as skillful

Tunisia case (0600 UTC 16 October–1200 UTC 18 October 2018)

For this event, the verification was performed from 0600 UTC 16 October to 1200 UTC 18 October 2018 (54 h). At the threshold of 5 mm, the model presented the lowest skill at distance of 25 km (Fig. 12a, dark gray dots) regarding the 6-h simulated accumulated precipitation, since the FSS was constantly lower than the UFSS throughout the forecast lead time. As expected, the increase in the neighborhood region led to higher FSS values, albeit the model presented no-skill at all distances from the 150th forecast lead hour and forth. However, skillful forecasts were encountered up to T+78 h, at scales greater than 125 km (Fig. 12a, blue asterisks).

Fig. 12
figure 12

Fractions Skill Score of the previous 6-h GFS predicted accumulated precipitation as a function of forecast lead time (hrs) for different Distances (km), against the IMERGV06 data, at the thresholds of a 5 mm from 0600 UTC 16 October to 1200 UTC 18 October 2018 and b 10 mm from 1800 UTC 17 October to 0600 UTC 18 October 2018, in TUN case. The solid lines in a and b represent the reference FSS (UFSS, see text) above which the forecast is considered as skillful, while in b some of the corresponding initialization cycles were added for clarity

The highest precipitation intensities were recorded from 1800 UTC 17 October to 0600 UTC 18 October 2018 (12 h). In this period, at Tunis (WMO ID 60715), Kelibia (WMO ID 60720) and Bizerte (WMO ID 60714), precipitation amounts of 108 mm, 82 mm and 68 mm were recorded, respectively. According to Fig. 12b, at the threshold of 10 mm, the model exceeded UFSS at distances of 75 km (orange squares) on T+12 h and 125 km (blue asterisks) on T+18 h, respectively (with respect to the 1200 UTC 17/10/2018 initialization cycle). The 1200 UTC 16/10/2018 cycle provided skillful forecasts at distances greater than 125 km (175 km) at T+36 h (T+42 h), while a decrease in FSS values below the skillful target (USFF) was found for the 1200 UTC 15/10/2018 initialization cycle at all distances (FSS \(< 0.51\)). Worth of note is the change in inclination of the FSS lines from T+84 h–T+90 h and forth (T+108 h–T+112 h, T+132 h–T+138 h, T+156 h–T+162 h). The latter is attributed to the fact that the model performed better in forecasting the 6-h accumulated precipitation (in terms of intensity) between 1800 UTC 17/10 and 0000 UTC 18/10/2018 (T+12 h, T+36 h, T+60 h) rather than between 0000 UTC 18/10 and 0600 UTC 18/10/2018 (T+18 h, T+42 h, T+66 h), albeit it was not skillful at all distances under examination. The opposite behavior was encountered from the 1200 UTC 14/10/2018 cycle (T+84 h–T+90 h) backwards. However, none of the forecasts was able to predict the precipitation area (\(> 10\) mm) in the correct place in both 6-hourly accumulated periods (1800 UTC 17/10 to 0000 UTC 18/10/2018, Fig. 13; 0000 UTC 18/10 and 0600 UTC 18/10/2018, Fig. S3).

Fig. 13
figure 13

Six-h accumulated precipitation (mm) at the threshold of 10 mm according to the a IMERGV06 observations (regridded to the GFS grid, 1st order conservative interpolation) and NCEP/GFS model in different 1200 UTC initialization cycles on b 17/10/2018, c 16/10/2018, d 15/10/2018, e 14/10/2018, f 13/10/2018, g 12/10/2018 and h 11/10/2018, from 1800 UTC 17 October to 0000 UTC 18 October 2018

Ghana case (0600 UTC 17 October–1800 UTC 21 October 2019)

In the tropical region of Ghana where convective precipitation prevails, the model presented low performance in forecasting the 6-hourly precipitation accumulations (\(\ge 5 mm\)) during the examined period at all distances (Fig. 14a). Skillful forecasts were provided at distances greater than or equal to 75 km (orange squares) only at T+6 h, while at T+12 h the model was able to predict the observed rainfall accurately at much greater distances (225 km and 275 km, respectively). From T+18 h and forth, the FSS values at all distances lie below the UFSS (black solid line), albeit this not entirely true for distances of 275 km. In addition, the model presented a diurnal variation in FSS, which is probably linked to the West African monsoon circulation (Parker et al. 2005).

Fig. 14
figure 14

Fractions Skill Score of the previous 6-h GFS predicted accumulated precipitation as a function of forecast lead time (hrs) for different Distances (km), against the IMERGV06 data, at the thresholds of a 5 mm from 0600 UTC 17 October to 1800 UTC 21 October 2019 and b 10 mm from 1800 UTC 19 October to 0600 UTC 20 October 2019, in GHA case. The solid lines in (a) and (b) represent the reference FSS (UFSS, see text) above which the forecast is considered as skillful, while in b some of the corresponding initialization cycles were added for clarity

The highest 6-h precipitation amounts were recorded from 1800 UTC 19 October to 0600 UTC 20 October 2019. According to Fig. 14b, at the threshold of 10 mm the model was unable to provide skillful forecasts during this period at all distances and initialization cycles under examination, expect for the 1200 UTC 15/10/2019 cycle (T+108 h–T+114 h). In this cycle the NCEP/GFS was able to predict the observed rainfall (between 0000 UTC and 0600 UTC on 20/10/2019) but missed its location (\(\textrm{FSS}_{225\,\textrm{km}} \text { and } \textrm{FSS}_{275\,\textrm{km}} > \textrm{UFSS}\), T+114 h). The latter is clearly illustrated on Fig. 15 (panels (a) and (f)). This poor performance can be attributed to the nature of precipitation in this region, since most of the rainfall was produced by an organized mesoscale convective system. The predictability of the latter is related on errors in the initial conditions (Melhauser and Zhang 2012) and their dependence on horizontal scales (Weyn and Durran 2019) and on multistage error growth dynamics (Zhang et al. 2007).

Fig. 15
figure 15

Six-hour accumulated precipitation (mm) at the threshold of 10 mm according to the a IMERGV06 observations (regridded to the GFS grid, 1st order conservative interpolation) and NCEP/GFS model in different 1200 UTC initialization cycles on b 19/10/2019, c 18/10/2019, d 17/10/2019, e 16/10/2019, f 15/10/2019, g 14/10/2019 and h 13/10/2019, from 0000 UTC 20 October to 0600 UTC 20 October 2019

Conclusion

The present study assessed the performance of the NCEP/GFS model in the framework of the the AfrCRS Weather Forecast Service (AfrCRS-S7-P01). Since the service’s products are used as input to various AfrCRS services such as in crop condition products (Alexandridis et al. 2021), their verification add credits to the service itself and quantifies its limitations. The variables under examination were 2 m air temperature (TEMP) and relative humidity (RH), mean sea level pressure (MSLP), wind speed (WIND) at 10 m above ground level, and accumulated precipitation (PREC) under 6-h and 24-h accumulations. The model was verified against SYNOP and METAR reports for TEMP, RH, MSLP and WIND variables, while the IMERGV06 satellite-based product (both in half-hourly and daily temporal resolutions) was utilized in the evaluation of the simulated PREC. The verification period lied between 01 June 2018 and 31 May 2020. In addition, the model performance was examined through three high impact precipitation events that occurred during this period. These events developed under different atmospheric conditions and provided an insight on the behavior of the model in terms of precipitation over different regions in Africa (Tunisia, Ghana and Mozambique).

According to the results, the modeled TEMP presented a strong diurnal variation between positive and negative biases and its averaged RMSE was equal to \(2.62 \pm 0.12\) K over the entire forecast window (T+12 h–T+180 h). GFS constantly underestimated the RH (MBIAS: \(0.9 \pm 0.01\)) and MSLP (ME: \(-2.0 \pm 0.15\) hPa) forecasts and their RMSE values were found equal to \(15 \pm 0.65\) % and \(3.9 \pm 0.3\) hPa, respectively. WIND was overestimated approximately at all forecast lead times (MBIAS = \(1.19 \pm 0.0\)) and its RMSE was \(3.9 \pm 0.3\) ms\(^{-1}\). However, the intensity of the errors was affected by the hour of the day, the location of each region and its morphological characteristics (e.g., complex terrain, land cover). These errors were in accordance with other studies that addressed the performance of the model in other parts of the world. The application of the neighborhood-based statistical verification of the 24-h accumulated precipitation against the IMERGV06 satellite product showed that the model forecasted the precipitation events more accurately as the verification distance was increasing but at higher precipitation thresholds (10 mm and 50 mm) its performance deteriorated. Moreover, different variability, root-mean-squared errors and correlation between simulated and observed precipitation existed in each forecast lead day and country under examination, albeit the model maintained the ability to correctly detect precipitation occurrence through forecast lead days.

TC-IDAI (March 2019) affected more than 3 million people and caused approximately 1000 fatalities. It was classified as an equivalent to a Category 3 hurricane with maximum sustained winds equal to 194 km h\(^{-1}\) (53.9 ms\(^{-1}\)) and a minimum mean sea level pressure of 940 hPa (0000 UTC 14 March 2019). The verification of the model initialization cycles (those at 1200 UTC) prior to its landfall (approx. 0000 UTC 15 March 2019) showed that GFS was able to provide skillful forecasts in terms of precipitation up to four days ahead, missed the rapidly intensification of the cyclone at all cycles and indicated the area of its landfall with high proximity up to 3 days ahead. At Tunisia case (16–18 October 2018), the neighborhood-based verification revealed that the model was able to forecast the occurrence of the event and its intensity up to three days ahead but missed both its location and its spatial extent. The lowest performance was found at Ghana case study (17–21 October 2019), where the model missed both its occurrence and intensity of the precipitation events at all verification distances under examination, and at all 1200 UTC initialization cycles but one (1200 UTC 15 October 2019). This region in known for its low predictive skill since most of the rainfall is modulated by organized mesoscale convective systems and the West African Monsoon (Fink et al. 2006; Janicot et al. 2011; Maranan et al. 2018; Mathon et al. 2002). In addition, the data sparse observation of the atmospheric and land surface conditions in this area is also likely to lead to inadequate model initialization and forecast errors. Maurer et al. (2015) through a numerical study have shown that in West Africa the representation of soil parameters is crucial for the prediction of convective precipitation.

Future work includes the investigation of post processing methods in order to reduce the RMSE of the forecasted variables. Both systematic and random errors can be alleviated to some extent by applying analog-based methods (Delle Monache et al. 2011; Nagarajan et al. 2015) or a Kalman filter predictor-corrector algorithm (Delle Monache et al. 2006; Bozic 1994).