Prediction skill of Sahelian heatwaves out to subseasonal lead times and importance of atmospheric tropical modes of variability

Global warming has increased the frequency of extreme weather events, including heatwaves, over recent decades. Heat early warning systems are being set up in many regions as a tool to mitigate their effects. Such systems are not yet implemented in the West African Sahel, partly because of insufficient knowledge on the skill of models to predict them. The present study addresses this gap by examining the skill of the ECMWF ENS extended-range forecasting system (ENS-ext) to predict Sahelian heatwaves out to subseasonal lead-times. It also assesses the importance of tropical modes of variability, which were previously identified as important large-scale drivers of heatwave occurrence in the Sahel. The results show that ENS-ext is able to predict Sahelian heatwaves with significant skill out to lead-week 2–3. With increasing lead-time, heatwaves are more predictable at nighttime than at daytime. Likewise, the pre-monsoon season heatwaves have a longer predictability than those occurring in late winter. The model is also able to relatively well simulate the observed relationship between heatwave occurrence and tropical mode activity. Furthermore, the prediction skill is better during the active phases of the modes, suggesting that they are good sources of heatwave predictability. Therefore, improving the representation of tropical modes in models will positively impact heatwave prediction at the subseasonal scale in the Sahel, and gain more time and precision for anticipatory actions.


Introduction
The recent developments in climate change are marked by an increased occurrence of extreme weather and climate events, including heatwaves (Stott 2016). There has indeed been an upward trend of heatwaves both at the global and regional levels (Perkins-Kirkpatrick and Lewis 2020), with future projections warning of even more severe thermal discomfort (Xu et al. 2020;Raymond et al. 2020) for the human community.
The West African Sahel, a climatologically hot region (e.g. Nicholson 2013), suffers from extreme heat events all year round (with peaks in boreal spring). The literature indicates that Sahelian heatwaves are relatively short-lived as compared to other regions, but are extremely severe in magnitude (e.g. Oueslati et al. 2017;Guigma et al. 2020a). Moreover, over the recent decades and in agreement with the global trend, they have been more frequent, more intense (especially at night) and longer lasting (Fontaine et al. 2013;Ringard et al. 2016;Moron et al. 2016;Oueslati et al. 2017;Barbier et al. 2018). Climate projections also anticipate an increase of the magnitude, spatial extent and frequency of extreme heat events (Russo et al. 2016;Dosio 2017;Sylla et al. 2018) that could only aggravate the thermal risk in the region.
The impacts of extreme heat in the region, as elsewhere in Africa, are largely unreported or underreported (Harrington and Otto 2020). A few studies have however elaborated on the topic, giving an insight into the adverse effects of heat across a range of sectors. Diboulo et al. (2012) and Azongo et al. (2012) showed strong associations between higher temperature and daily mortality in western Burkina Faso and northern Ghana respectively. The increase of death rates is especially important at the short-term (a few days after the heatwave events), with under-five children being the most hit. In the energy sector, Aissatou et al. (2017) evidenced 1 3 a relatively strong correlation between extreme heat events and peaks of electricity consumption in two major Sahelian cities (Dakar and Niamey). Furthermore, the International Labour Office stresses in a recent report (ILO 2019) that, in Africa, seven of the 10 countries most severely affected by labour productivity loss due to heat stress are located in the Sahel. In this report, the working hours lost to heat stress in 1995 across West Africa, were estimated to be the equivalent of more than two million full-time jobs, which represents, in economic terms, 3.3% of the GDP of the region. With the projected increase of heat in the region, these losses are expected to reach more than eight million full-time jobs, or equivalently 4.77% of the GDP by just 2030. The agriculture and construction sectors, which employ an important portion of the work force, are the most severely affected.
Faced with this issue, it is urgent to undertake actions to alleviate the adverse effects of these extremes. In that regard, numerical weather prediction (NWP) models could provide information to help governments and humanitarian organisations in the region to trigger preventive actions. Such heat early warning systems (HEWSs), jointly recommended by the World Meteorological Organization and World Health Organization (2015) (WMO; WMO N°1142), are already implemented in several countries across North America (e.g. McElroy et al. 2020;Henderson et al. 2020), Europe (e.g. Morabito et al. 2019;Casanueva et al. 2019), Australia (e.g. Nicholls et al. 2008;Nitschke et al. 2016) and South Asia (e.g. Knowlton et al. 2014). A non-exhaustive global map of heat-action plans has been prepared by the Global Heat Health Information Network (GHHIN) and is accessible from http:// ghhin. org/ map/.
One prerequisite for HEWSs is skilful prediction from the NWP models at a reasonable lead-time for action. However, the skill of Sahelian heatwave forecasting has received only minor attention. The main work on this topic so far is an evaluation of two CNRM-CM forecasting systems in use at Météo-France by Batté et al. (2018). They found that, at the subseasonal scale, the skill of their forecasting systems is essentially restricted to the deterministic horizons (first 10 days). Coughlan de Perez et al. (2018) investigated the short-term (out to 10 days) predictability of temperature extremes at the global level, and found that while the NOAA model has limited skill, the ECMWF model instead presents a potential for the implementation of rapid preventive actions for heatwave impact mitigation. They also made the recommendation that further research be conducted to identify the drivers of heatwave predictability in regions including Africa. Likewise, Batté et al. (2018) mentioned that extended predictability may be provided by planetary waves and teleconnections.
These recommendations are in tune with previous work by Guigma et al. (2020b), who identified tropical modes of variability as important large-scale drivers of Sahelian heatwaves. Precisely, the activity of the Madden Julian Oscillation (MJO), the equatorial Rossby (ER) and Kelvin (EK) waves in the Equatorial West Africa sector (0-10°N), where convection peaks in spring, significantly modulates the frequency and spatial distribution of heatwaves in the Sahel. Given the spatio-temporal properties of these modes, Guigma et al. (2020b) suggested that they could provide heatwave predictability at subseasonal timescales. Subseasonal predictability has received increasing attention over recent years, given the range of new opportunities for risk management in several sectors (health, disaster preparedness, water management, energy and agriculture) that it brings (White et al. 2017).
This research seeks to address the gap in understanding of heatwave predictability in the Sahel and has two objectives: (1) to evaluate the skill of Sahelian heatwave prediction at the synoptic and subseasonal scales (i.e. up to ~ 45 days) and (2) to assess the importance of tropical modes as a source of predictability. This is achieved through a statistical evaluation of a long record of hindcast (or re-forecast) data and a detailed examination of a case study heatwave event.
By building understanding of climate and predictability, this research seeks to pave the way for the development of HEWSs and the scaling of anticipatory forecast-based Actions/Financing (FbA/FbF) for such events (e.g. Coughlan de Perez et al. 2015). This is an especially relevant approach in developing countries, including the Sahel, where climate investments are currently principally directed to post-disaster recovery (Mirza 2003).
The remainder of this manuscript is structured as follows. Section 2 introduces the forecast and reference datasets used in this study as well as the different methods for tropical mode detection and skill evaluation. In Sect. 3, the results of both the statistical and the case studies are presented and discussed. Finally Sect. 4 summarises the findings, and elaborates on the next steps for future research on heatwaves in the Sahel.

Data and method
The present research analyses heatwave prediction skill for forecasts initialised in two seasons, as in Guigma et al. (2020a): the February to March season (FM hereafter) and the pre-monsoon April to June season (AMJ hereafter), which marks the peak of heat in the region.

Description of the ECMWF ENS extended-range forecasting system
In this study, the ECMWF ENS extended-range forecasting system (ENS-ext hereafter) has been chosen to evaluate the prediction skill of Sahelian heatwaves at the synoptic to subseasonal lead-times. The main reason for this preference is that in most inter-model comparative studies, ECMWF has proved to be the most skilful (e.g. Janiga et al. 2018;de Andrade et al. 2019;Bengtsson et al. 2019). In addition, national meteorological services in the Sahel can freely access some of the ECMWF high-resolution real-time forecast data (including 2 m temperature), thanks to a partnership between the African Centre for Meteorological Applications for Development (ACMAD) and the European Centre. ENS-ext generates a hindcast twice a week (Monday and Thursday) in running an 11 member-ensemble (one control and 10 perturbed members) for the last 20 years, starting on the same weekday and month as the real time forecast. The present study uses all the hindcast data generated in 2018 (thus covering the 1998-2017 period), consisting of 105 different calendar days (initialisation dates). Note that 2018 covers two different versions of the model (CY43R3 and CY45R1) as an upgrade was implemented in June 2018. The hindcast, like the real-time forecast, has a time horizon of 46 days (output data are generated every six hours), with a native horizontal resolution of O640 (about 18 km) up to day 15, degrading to O320 (about 36 km) between day 16 and day 46.
Two main sets of variables are extracted.
(1) Thermal variables consisting of temperature (T), maximum and minimum temperatures (T max and T min ) and dewpoint temperature (T d ), all at the screen level (2 m height), from which are derived thermal indices (see Sect. 2.2). (2) Outgoing longwave radiation (OLR) data which are used to assess the activity of tropical modes (see Sect. 2.4). The thermal variables are extracted as 06-hourly forecasts at a resolution of 0.5° × 0.5° over the Sahel domain (20°W-30°E; 10°N-20°N), while the OLR data are downloaded as forecast 24-h totals at a resolution of 2.5° × 2.5° over the global tropics (20°S-20°N). Both sets of variables extend up to the full 46-day forecast horizon.
The hindcasts are verified against the fifth generation of the European Reanalyses (ERA5, Hersbach et al. 2020), also produced by ECMWF. ERA5 has a native horizontal resolution of approximately 31 km. The variables retrieved are those extracted from the hindcast, and the resolution chosen accordingly. In terms of the quality of near-surface temperatures in ERA5, Oueslati et al. (2017) and Barbier et al. (2018) assessed ERA-Interim, which ERA5 is an improvement of, against the Global Summary Of the Day (GSOD) observational dataset, and concluded that it was suitable for heatwave study in the Sahel. Furthermore, Gleixner et al. (2020) showed that in ERA5, near-surface temperatures are less climatologically biased, and their interannual variability better represented than in ERA-Interim across Africa, including in the Sahel band. Similarly, Wang et al. (2017), Tall et al. (2019), Wright et al. (2020) and Hersbach et al. (2020) proved that ERA5 represents relatively well the observed OLR over the tropical domain, confirming its suitability as reference dataset for the analysis of tropical modes. The Berkeley Earth Surface Temperatures (BEST; Muller et al. 2014;Rohde et al. 2016) dataset is used as a second reference dataset to provide an independent evaluation of thermal indices (given that ERA5 is created using the same model as ENS-ext). BEST data consist of daily Tmax and Tmin (no moisture data is available) at a native resolution of 1° × 1°, regridded to 0.5° × 0.5° to match the hindcast grid. Guigma et al. (2020a) showed that in the Sahel, heatwaves defined using different thermal indices over the same diurnal period, or the daytime versus nighttime heatwaves of a same index are not synchronous, and often result from different underlying thermodynamic processes. Their predictability could therefore also differ, and to account for this eventuality, two distinct measures of heat are used in this paper: Temperature (T) and the heat index (HI). Considering their daytime and nighttime components separately gives a total of four thermal indices.

Thermal index derivation
For temperature the nighttime (daytime) component is taken as the daily minimum (maximum) value of the 06-hourly forecasts of minimum (maximum) temperature and is hereafter referred to as T-night (T-day).
The formula for HI derivation (Steadman 1979) is as follows: where T is temperature, and RH relative humidity computed from temperature and dewpoint temperature.
The nighttime (daytime) component of HI, hereafter referred to as HI-night (HI-day), is computed by replacing T from (1) by T-night (T-day) and RH by the averages of the 06-hourly forecasts of relative humidity valid at 00 and 06 UTC (12 and 18 UTC).
Similarly, T-night, T-day, HI-night and HI-day are derived from the ERA5 dataset using the corresponding timesteps. T-night and T-day are directly available in BEST.
Heatwaves are also defined in the hindcast dataset at each grid-cell and for each thermal index using several steps. As a reminder, for each thermal index, a given grid-cell has a total of 1,062,600 data records, broken down into 105 initialisation dates each year, 46-day integrations (or forecast horizons), 20 years of forecasts (covering 1998-2017) and 11 ensemble members. Pooling all the 11 members together, the 75th percentile of the total distribution and the 90th percentile of each calendar day are calculated. Both thresholds are derived as a function of lead-time, giving a total of 46 values for the 75th percentile and 4830 values for the 90th percentile (46 lead-times × 105 calendar days). The latter is smoothed through averaging over a window of 10 initialisation dates (including the date of interest, the four initialisation dates before, and the five initialisation dates after). For example, to calculate the 90th percentile of forecasts initialised on 15 January 2018, the forecasts initialised on 01, 04,08,11,15,18,22,25,29 January and 01 February are used.
Once the magnitude thresholds are defined, heatwaves are detected in each member as spells of three or more consecutive days where the thermal index exceeds both the two thresholds defined above. In order to account for events which start before the first day of the forecast but run into the forecast time, each 46 day-long forecast integration is padded at its beginning with the 2 days of reference heatwave occurrence binary data immediately before the forecast. These extra two days are removed after the detection step. The forecast data thus turn binary to indicate the occurrence or non-occurrence of heatwaves.
Then, on a given day and at a given grid-cell, the forecast probability of heatwave occurrence is given by the ratio of the sum of the ensemble members' binary heatwave values to the ensemble size of 11 (ranging from 0 to 1 in 1/11 increments).

Predicted tropical mode activity and link with heatwaves
In order to assess whether tropical modes can be a source of skill for heatwave prediction, their activity in each of the ENS-ext individual members, as well as in the ensemble mean (EM) (mainly for the case study purposes) is filtered, using the same method as in Janiga et al. (2018), which consists of several steps.  (Wheeler and Kiladis 1999), to retain the harmonics of the MJO, ER and EK waves. The exact characteristic wavenumbers, periodicities and equivalent depths used to detect each of these three modes of tropical variability are the same as those used in Guigma et al. (2020b), and are shown in Table 1. The outcome of the filtering for each mode and for each segment is a 1506-day long timeseries of filtered OLR data at each of the global tropics grid-cell. For verification purposes, the 46 days of forecast in each segment are replaced by the corresponding analysed ERA5 data, and the same filtering is applied. This gives to each forecast mode-filtered data segment an equivalent observed mode-filtered data segment, which it can be verified against. Then, another set of methods is used to assess the activity of tropical modes locally over the Equatorial West Africa sector (this set of methods is not applied to the EM data). The forecast and observed 1506-day long mode-filtered data segments are each averaged at the characteristic 5°E longitude between the equator and 10°N (this band of latitudes corresponds to the region of maximum convection over West Africa in spring; Guigma et al. 2020b). For each segment, the resulting unidimensional timeseries and its first order time derivative are standardised (using the standard deviation from ERA5 for both the forecast and observed segments), and, through trigonometric operations, they are combined to identify wave angle and amplitudes for each day. The angles are further binned into eight 45° wide phases labelled 1-8. A mode is considered active on a given day only if its amplitude reaches or exceeds one. If not, the corresponding phase takes the value 0. The composite anomalies of observed OLR against these phases are shown in Fig.  SM1 for each mode of variability. The reader is referred to Guigma et al. (2020b) for a thorough description of the method. At the end of this process, the days corresponding to padded data (a total of 1460 days for each segment) are removed from the data segments, such that only the mode phases of the effective 46 days of the forecast and the corresponding observation are retained. The final outcome for each mode of variability is then 12 arrays (11 forecast and one observation) of filtered OLR data, each of dimensionality 46 (forecast horizons) × 105 (initialisation dates) × 20 (years).

General evaluation
To evaluate the skill of ENS-ext, a set of evaluation metrics has been used. The complete description of each metric is presented in Joliffe and Stephenson (2012).

Anomaly correlation coefficients
The strength of the association between the observed versus predicted values of thermal indices is evaluated using the anomaly correlation coefficients (ACCs), i.e. correlation coefficients between the anomalies of observed versus the anomalies of predicted values of the indices. The ACCs for the four thermal indices are discussed in Sect. 3.1.

Symmetric Extremal Dependency Index
Heatwaves are relatively rare events. Many common measures of forecast quality struggle to give real indications of model skill for extreme events, as they degenerate to trivial values with increasingly rare events (Ferro and Stephenson 2011). For this reason, non-degenerate metrics have been specifically designated to assess the skill associated with rare events. This study uses the Symmetric Extremal Dependence Index (SEDI), suggested by Hogan and Mason (2012) to be the best choice, and successfully used in similar heatwave studies (e.g. Marshall et al. 2014;Mandal et al. 2019). SEDI itself is based on two simple scores: the hit rate (H) and false alarm rate (F) which are derived from a two-by-two contingency table (Table 2) between a deterministic forecast and observation of heatwave occurrence: In (2) and (3), hits are instances where heatwaves were forecast and did occur indeed, misses instances where heatwaves were not forecast but occurred, false alarm instances where heatwaves were forecast but did not occur, and correct negatives instances where heatwaves were not forecast and did not occur (see Table 2).
From H and F, SEDI is obtained by applying this logarithmic formula: The possible values for SEDI range from − 1 to 1, with 1 being the perfect score and positive values indicating that the model is better than random.
In the present research, the SEDI calculation proceeds similarly to Marshall et al. (2014) as follows: contingency tables are first built separately for each of the 11 individual members before pooling them as a single table to calculate H and F, and SEDI subsequently. To assess the where H is the hit rate, F the false alarm rate, n the sample size and p the base rate (relative frequency of heatwave occurrence).
At a given grid-cell, the SEDI score is considered significant if the confidence interval (i.e. SEDI − 2SE SEDI ;SEDI + 2SE SEDI ) does not include zero.

Evaluation of heatwave prediction skill taking into account the modulation by tropical modes
To assess the skill of the ENS-ext in simulating the activity of tropical modes, the forecast local phases are verified against those detected from ERA5 (local phases are defined in Sect. 2.4), using hit rates. As with the SEDI scores, the contingency tables are first built separately before pooling them to calculate the hit rates. They are discussed in Sect. 3.3.1.
To assess how well the model represents the relationship between tropical modes and heatwaves, the frequency of heatwave occurrence conditioned on the phase of tropical modes (also termed as modulation of heatwave occurrence by the modes) is evaluated in both the model and the reference datasets, using the same formula as in Guigma et al. (2020b): where P x is the conditional frequency of heatwaves over an active phase x of a given mode, and P a the frequency derived from all days, irrespectively of the activity of the mode.
The results for this modulation are presented in Sect. 3.3.2.
Finally, a given tropical mode is considered to be a source of heatwave predictability if the SEDI scores are higher under its forecast active phases than its inactive phase. This assessment considers (1) all the eight active phases altogether (i.e. the comparison is made between instances where the mode amplitude is greater than one versus instances where it is equal to or less than one) as in Hudson et al. (2011) and (2) each phase separately in order to determine precisely which phases contribute the most to the skill. At each grid-cell, statistical significance at a 95% level is tested using a nonparametric bootstrap resampling, with 1000 repetitions as in Guigma et al. (2020b).

Additional methods for the case study
To understand the causes of the heatwave case-study event analysed in Sect. 3.4, the patterns of net radiation (shortwave and longwave) and turbulent fluxes (sensible heat flux SHF and latent heat flux LHF) at the surface are analysed from the ERA5 data. For each of these terms, the anomalies are derived by subtracting the calendar day mean and are subsequently averaged over the heatwave period. The fluxes are, by convention, counted positively when directed from the atmosphere towards the surface. The activity of the tropical modes during this period is visualised through a time-longitude diagram of the modefiltered OLR averaged between the Equator and 20°N, a commonly used technique in tropical meteorology (e.g. Schreck et al. 2011;Guigma et al. 2020b).

Skill of thermal index prediction by ENS-ext
ENS-ext has a relatively good skill in predicting the four thermal indices under investigation. Figure 1 shows the ACCs averaged over the Sahel across the 46 lead days for the FM and AMJ seasons (see Sect. 2.5.1 for method description). For the first week of the forecast for example, the ACCs of all the four thermal indices exceed 0.6. There is then a fast decrease of the forecast skill out to week 3-4 of the forecast bringing the ACC values down to about 0.2. The fast decrease of ACCs beyond the first week is also noticed by Batté et al. (2018) using the Météo-France S2S system, but the drop is much sharper there. A diurnal dependence in thermal index prediction skill is noticeable for both seasons. For the shortest lead-times (out to about day 7), daytime indices slightly outperform their nighttime counterparts and conversely for longer lead-times (exception for HI-day in AMJ). The prediction skill also presents a relatively marked seasonality. Thus in the FM season (Fig. 1a) the ACCs are generally better than during AMJ (Fig. 1b) but only for the shortest lead-times. There is indeed a reversal at longer lead-times such that the more humid season of AMJ presents higher skill than FM (even though ACC values are low). Figure 1 also shows that ENS-ext clearly outperforms persistence forecast (black dashed lines in Fig. 1), even at the shortest lead-times.
The examination of the spatial distribution of the ACCs reveals differences across the Sahel (Fig. SM2 using T-day for illustration). For the shortest horizons, the skill is higher in the north than in the south of the Sahel (irrespectively of the season), whereas at longer forecast lead-times, there is increasingly higher skill in the south than in the north (where the correlation becomes insignificant).

Heatwave prediction skill and potential for early action
The skill of ENS-ext in predicting Sahelian heatwaves is assessed using the SEDI score (described in Sect. 2.5.1).
Similarly to the ACCs of the indices, the FM season offers larger SEDI scores of heatwave prediction than the AMJ season at short lead-times. Thus, with ERA5 as reference, for the first and second weeks of forecast, the scores are respectively above 0.8 and 0.5 (0.6 and 0.3) in the FM (AMJ) seasons across much of the region as shown in Fig. 2 (Fig. 3). The skill vanishes quicker in the subsequent lead-weeks in FM than in AMJ such that, after week 3, there is almost no skill (SEDI scores below zero means random forecast better than the model) in forecasts initialised in FM, whereas some scarce areas still have positive (though very weak) SEDI scores in AMJ at lead-week 6. As is also observed with the ACCs, the week of the forecast with the first week at the top. White areas are not significant at the 95% probability level SEDI scores are initially higher in the northern half of the domain, but a reversal is observed at longer lead-times (this is less evident in FM as heatwaves are not detected in northern Sahel at that season). The seasonality and the evolution with lead-time of the skill are similar across all four heatwave indices. It should be noted however, that in AMJ, the decrease of skill of nighttime heatwave indices (T-night and HI-night) is slower than that of their daytime counterparts, especially in the southern Sahel, consistently with previous findings by Batté et al. (2018). HI-day has the fastest rate of skill decrease, with only limited areas showing positive SEDI scores after week 2. This marks a contrast to HI-night which is the best forecast heatwave index at the longest lead-times. The lower skill observed in HI-day may be related to the differential diurnal cycle between Tmax and the relative humidity (the two variables from which it is derived) in the Sahel. Whilst Tmax peaks in the early hours of the afternoon and increases with clear skies, moisture reaches its minimum at the same time, with cloudier skies tending to increase it (Guichard et al. 2009;Bourgeois et al. 2018). The verification using BEST as reference is shown in Fig.  SM3 and it shows mainly similar patterns as using ERA5 with however slightly lower SEDI scores.
Compared with other regions across the globe, it can be said that the Sahel enjoys at least the same degree of heatwave predictability at the subseasonal scale. Thus, European heatwaves are found by (Lavaysse et al. 2019) to be predictable mostly up to two weeks in advance using ENS-ext. In Australia, the Bureau of Meteorology's POAMA-2 ensemble model is able to well predict heatwaves two to three weeks ahead with SEDI scores reaching 0.5 at these lead-times under some weather regimes (Hudson et al. 2011;Marshall et al. 2014). In India, a region with a closer climate system to that of the Sahel, the skill of heatwave prediction by the Indian Institute of Tropical Meteorology's ensemble prediction system is found to still be significant at lead-week 3, with comparable SEDI scores as those of the Sahel during the pre-monsoon season (Mandal et al. 2019). As such, the Sahel can also benefit from HEWSs as currently implemented in these regions (e.g. Lowe et al. 2011;Nitschke et al. 2016;Hess et al. 2018;Casanueva et al. 2019).
One potential explanation for the spatiality/seasonality of the ACC and heatwave prediction skill can be found in the large-scale circulation controlling the Sahelian atmosphere. The FM season experiences a large influence from extratropical weather systems coming from the northern edge of the domain (Knippertz and Martin 2005), which are known for their large synoptic-scale predictability (e.g. Knippertz and Fink 2009;Wheeler et al. 2017). On the other hand, AMJ is characterised by an increasing activity of the MJO and equatorial waves, which are by then more active in the equatorial sector of Africa (e.g. Berhane et al. 2015;Guigma et al. 2020b). These modes of variability, since they are less inclined to forecast error growth with lead-time than extratropical disturbances, confer higher subseasonal predictability to the tropics (Judt 2020). While the verification is so far based on strict comparison of forecast and observed heatwaves at the exact grid-cell and day, it may also be relevant, for operational purposes, to include a window of flexibility in which the forecast still has some potential for action (e.g. Coughlan de Perez et al. 2016). Such a "tolerant" evaluation is assessed here from the temporal point of view through considering that a positive forecast of heatwave (i.e. heatwave forecast to occur) is considered to be a hit if it occurs within a time window of three days centred on the forecast validity date, i.e. between a day earlier and a day after. Given the three-day minimum duration constraint used in this paper, the tolerance only affects the onset and cessation of heatwave events. The comparison between the Sahel-wide average SEDI scores of strict and tolerant evaluation is shown in Fig. 4. It is apparent that the gain in skill obtained through the tolerant evaluation is more important in AMJ than in FM (Fig. 4a-d). Moreover, the gain is the largest at the longest lead-times, with difference of SEDI scores from the strict evaluation reaching a value close to 0.2 in AMJ (Fig.4c, d). As a matter of comparison, the tolerant evaluation shows a heatwave prediction skill at lead-week 6 similar or better than that of the strict evaluation at lead-week 3 (or at lead-week 2 if a 5-day window of tolerance is used instead, not shown). Providing forecasts with such a tolerance for the longest lead-times could prove relevant for heat-health early actions in the region. With long lead-times, the preparedness actions likely do not need daily accuracy in the forecast. An operational scheme could adopt the 'Ready-Set-Go!' approach of the Red Cross in which various inexpensive actions are implemented at long leadtimes, and different more specific or costly actions are then invoked based on more accurate shorter-lead forecasts (Bazo et al. 2019). In this sense, the tolerant verification statistics show that the skill at long lead-times is meaningful to risk managers. Guigma et al. (2020b) showed that at the subseasonal scale, heatwaves in the Sahel are modulated by tropical modes of variability, namely the MJO, the ER and EK waves. Furthermore, in Sect. 3.1, the higher skill at subseasonal scale in the AMJ season than in the FM season could be related to the greater activity of tropical modes in the former season. The present section aims at assessing whether, in addition to being important drivers of heatwave occurrence, tropical modes also constitute a significant source of predictability. Marshall et al. (2014) mentioned two conditions that any model should a priori meet to be able to predict a hazard in association with its climatic driver: (1) well predict the climatic driver, and (2) well simulate the relationship between the climatic driver and the hazard. These two conditions will first be assessed, before considering whether tropical modes indeed provide skill for heatwave prediction in the Sahel. The analysis is restricted to the first three weeks of the forecast, beyond which the SEDI scores become relatively low (Fig. 3), and covers only the AMJ season.

How well does ENS-ext predict tropical modes?
At the global level, Janiga et al. (2018) discussed the predictability of the mode-filtered OLR across the tropics and found ECMWF to be the model with the lowest bias for forecasts of the mean state and activity of tropical modes. Furthermore, investigations by Dias et al. (2018) revealed that ECMWF is relatively skilful at propagating tropical modes for longer lead-times. Here the focus is on the Equatorial West Africa Sector (the region just south of the Sahel, i.e. 20°W-30°E; 0°-10°N) where convection is shown to modulate heatwave occurrence in the Sahel (Guigma et al. 2020b). To assess the skill of the model in capturing the local activity of tropical modes, the forecast phases are compared against observation using hit rates (defined in Sect. 2.5.2). Among the three investigated modes, the MJO (blue histograms in Fig. 5) stands clearly as the most skilfully predicted. At week 1 for example, the hit rate is above 0.4 in most active phases. This value decreases to 0.3 at week 2 and slightly above 0.2 at week 3. As for the ER wave, it has hit rates which are on average 0.1 point lower than that of the MJO, being about 0.3, 0.2 and above 0.1 at weeks 1, 2 and 3 respectively. The EK wave shows the lowest hit rates. They indeed always remain below 0.2, even at week 1, and at weeks 2 and 3, stand below 0.1. Note that the lower skill associated with the EK wave has already been highlighted by previous work (e.g. Li and Stechmann 2020). For each mode, the hit rates are generally comparable across the eight phases, with however slightly higher values in the central phases (phases 3 through 6). The differences observed between the different modes are in agreement with their spectral properties summarised in Table 1. The MJO and ER wave indeed have a longer periodicity than the EK wave. This provides them with a longer "memory" and leads to slower error growth.

How well does the model simulate the link between tropical modes and heatwaves?
Guigma et al. (2020b) already elaborated on the modulation of heatwave occurrence by tropical modes from an observational perspective, with a discussion of the underlying physical mechanisms. This modulation, as described in Sect. 2.5.2, compares heatwave occurrence under active phases of the modes to the climatological occurrence. The quality of the replication of this modulation by ENS-ext is a function of the mode under consideration, and is discussed here using T-day heatwaves for illustration. As shown in the left panels of Fig. 6, observed phases 1-3 of tropical modes (which roughly correspond to a suppression of convection, Fig. SM1) are overall favourable to heatwaves, whereas phases 5-7 (enhancement of convection) obstruct heatwave occurrence. It is apparent that in ENS-ext, the influence of the MJO and ER wave on heatwaves is well simulated. Both the zonal propagation (eastward for the MJO and westward for the ER wave) and the magnitude of the modulation (with M values absolutely reaching 1.5) are well captured by the model (Fig. 6b, d). On the other hand, for the EK wave (Fig. 6f), whilst there is a relatively acceptable simulation of the propagation of the modulation across phases, ENSext struggles to get the magnitude correct. There is indeed an underestimation of the forcing that EK waves exert on heatwave occurrence. This is however not a surprise, given that the model also has difficulty to predict the activity of this mode (Sect. 3.3.1). For the three other thermal indices (T-night, HI-night and HI-day), similar conclusions are  drawn, i.e. a skilful representation of the impact of the MJO and the ER wave on heatwaves versus a limited skill for the EK wave (not shown).

Heatwave prediction skill in active versus inactive phases of the modes
The previous two sections have shown that ENS-ext meets the two necessary conditions (according to Marshall et al. 2014) to be able to draw heatwave predictability from tropical modes, especially from the MJO and ER (much less for the EK wave). This section addresses whether there is indeed an enhancement of prediction skill associated with the activity of tropical modes during the AMJ season. This is done through stratifying the forecast (not observation) into active versus inactive phases, as described in Sect. 2.5.2, and assessing the SEDI differences between the two instances. Out of the three modes of variability, the MJO is the largest source of prediction skill. For T-night, HI-night and HI-day, the MJO-related skill reaches values of 0.4, mainly over the central Sahel (Mali, Burkina Faso and western Niger) and extends out to week 2-3 of the forecast (Fig. 7). For T-day, the skill is mainly observed over the eastern Sahel. The main phases responsible for the positive SEDI differences are phases 3 and 4 (Fig. SM4). For the ER wave, the improvement of skill, limited to 0.3, is mostly found over the eastern (western) Sahel for T-night, HI-night and HI-day (T-day) at week 1 (Fig. 8) and comes essentially from phases 7 and 8 ( Fig. SM5). At longer lead-times, the ER-related skill is relatively marginal, apart from T-day and HI-day which show some skill over the central Sahel (Burkina Faso and western Niger) at week 2-3 (Fig. 8). As for the EK wave, the skill, analysed only for week 1 of the forecast (beyond which the model cannot well predict it, Sect. 3.3.1) originates mostly from phase 3 and is generally not much in excess of 0.1 (Fig. SM6). These results therefore show that the MJO, the ER wave and, to a lesser extent, the EK wave provide predictability to Sahelian heatwaves. This implies that heatwave predictions are more reliable when an intense activity of tropical modes is also (skilfully) forecast. Such a conclusion is especially interesting for operational forecasters in the region. They can indeed rely on the local activity of tropical modes to estimate the confidence levels of their heatwave warnings.

Case study of a tropical mode-driven heatwave over Burkina Faso
In this section, the detailed analysis of the prediction of a heatwave event over Burkina Faso, in the central Sahel, by ENS-ext is undertaken with the objective of assessing, in a real case, how the activity of tropical modes can impact the skill of the model. The choice of this event is justified mainly by the fact that it was physically favoured by tropical modes, and also because of its relatively large spatial extent.

Description of the heatwave and thermodynamic conditions
The heatwave event under scrutiny took place mainly in Burkina Faso between 27 May and 02 June 2015. Figure 9a, b show the spatial distribution and the length of the event across the country. Both daytime and nighttime were affected (which is unusual in the Sahel; Guigma et al. 2020a) over the whole country. It should be noted that the event was less marked in HI-day and HI-night than in T-day and T-night (not shown).  Show the number of heatwave days sampled by T-day and T-night respectively. c-f Show the average anomalies of sensible heat flux (net radiation) at the surface in W m −2 for daytime and nighttime respectively. They are conventionally counted positively when oriented from the atmosphere towards the surface. g, h Display the average anomalies of heat advection at the 925 hPa pressure level superimposed with wind anomalies at the same level respectively for daytime and nighttime The analysis of some thermodynamic variables over the heatwave period reveals that the daytime event was chiefly shaped by a strong sensible flux from the ground towards the atmosphere (a magnitude above 40 W m −2 in some areas; Fig. 9c) which was anomalously dryer than usual (not shown), an increased incoming solar radiation in the south of the country (Fig. 9e) and heat advection in the north (Fig. 9g). At night, the heat resulted mainly from a longwave radiation emission from the ground (Fig. 9f) which was overheated during the day (a relatively cool air was however advected, reducing the heat load, Fig. 9h).

Evolution of tropical modes during the event
The increase of incoming solar radiation during the day and longwave loss during the night were favoured by large-scale conditions that suppressed convection over the region. To find out the origins of this convective inhibition, the time-longitude diagram of the EM mode-filtered OLR is shown in Fig. 10. It is apparent that an ER wave originating from the Indian Ocean was the main mode suppressing convection over the domain surrounding Burkina Faso (green lines in Fig. 10) during the heatwave period. Besides, the initial and last days of the heatwave are also affected by EK waves on convectively suppressed phases which also promoted the heating. An eventual contribution from the MJO is ruled out since it was instead on a convectively enhanced phase (not shown).

Skill of the model over the heatwave period
The average anomalies of T-day and T-night over the heatwave period in the ENS-ext EM forecasts and in the ERA5 analysis, as well as the average anomalies in ERA5 over the week preceding the heatwave (persistence) are shown in Fig. 11. The first remark is the relatively good spatial coherence between the forecasts at different lead-times and the observation, valid for both T-day and T-night. The model was therefore able to predict the anomalously hot conditions that prevailed over Burkina Faso between 27 May and 02 June 2015, even at the longest lead-times. Better, on the last two initialisations before the event, the model beat persistence, notwithstanding that for T-day there is a slight overestimation of the magnitude of the anomalies. Two forecasts, namely those initialised at lead-times 31 and 10 days to the onset, are however characterised by less accuracy than the rest, especially in comparison with forecasts initialised at longer lead-times than them. Figure 12 shows the heatwave forecast probabilities at different lead-times for T-day and T-night. The flavour of the heatwave was already perceptible at lead-time 24 days to the onset (i.e. longer than three weeks in advance) with at least one individual member predicting the event over the vast majority of the country, consistently in both T-day and T-night (note that the climatological forecast probability is below 0.1 over the heatwave period; not shown). The forecast probabilities increased on the following initialisation dates to eventually reach 50% three days prior to the onset. However, as with the index anomalies, some initialisation days "lost" the heatwave signal in the run-up. Thus, forecast probabilities at lead-times 17 and 10 days to the onset are lower than that at the respective longer lead-times.
To understand the weakening of the forecast probabilities at these dates, the EM forecast of tropical mode activity is examined, knowing that the heatwave was associated with a convectively suppressed ER wave (Sect. 3.4.2). Figure 13 thus shows observed and EM predicted ER wave-filtered OLR, starting from lead-time 24 days to the onset where the heatwave was first significantly predicted. It is apparent that at lead-time 17 and 10 days to the onset, the forecast of the ER wave activity over Burkina Faso was less accurate than at other lead-times. While the entire country was under the influence of a convectively suppressed phase of the ER wave during the heatwave period, at lead-times 17 and 10 days, the model was predicting a convectively enhanced phase across at least half of the country. Therefore it can be said that the wrong forecast of the physical driver also led to a less accurate forecast of the heatwave itself, with the reverse being true.
Previous studies have already highlighted similar cases where the misrepresentation of subseasonal variability by models also caused misses in heatwave forecasts (e.g. Qi and Yang 2019; Hsu et al. 2020). As a result, improving the skill of prediction of tropical modes in models could also be beneficial for heatwave prediction in the Sahel as well as in other regions.

Conclusion
The ECMWF ENS extended-range forecasting system shows significant skill for heatwave prediction across most parts of the Sahel in the first two to three weeks of the forecast. The AMJ season has a longer lead-time predictability than the FM season, likewise nighttime heatwaves are better predicted at longer lead-times than their daytime counterparts. This study has also demonstrated that atmospheric tropical modes of variability, mostly the MJO and ER waves, are effective sources of skill for heatwave prediction in the Sahel. The forecast skill is indeed higher when they are active in the region than when they are weak. The case study of the prediction of a heatwave event driven by tropical modes in 2015 over Burkina Faso further illustrated this, by showing that the forecasts of heatwaves are more skilful when that of the tropical modes are accurate. Information on the predicted activity of tropical modes can thus be useful to forecasters in their heatwave warnings.
In addition, as already highlighted by Guigma et al. (2020b), a more accurate simulation of tropical modes will have a positive repercussion on heatwave prediction in the region. This will likely improve the current skill and extend it to longer lead-times, thus winning more time and precision for preparedness actions. In this context and given the connection between convection and tropical modes, convection-permitting models can play an important role as they reduce model errors, and likely offer a better representation of tropical modes (Judt 2020). It has indeed been shown that the parameterisation of moist convective processes and their links to the large-scale flow is an important source of errors in the tropics (Dias et al. 2018).

Fig. 12
T-day and T-night average heatwave forecast probabilities (in %) over the 27 May-02 June 2015 period at different start dates. The numbers between brackets indicate the lead-times in days from the forecast start dates to the onset and cessation of the heatwave But even with the current level of predictability, there is a potential for HEWSs. With a predictability of two to three weeks, there is indeed a range of actions that can be triggered in advance (e.g. Matthies and Menne 2009;Lowe et al. 2016;Nissan et al. 2017). As evidenced in other regions of the globe, many socio-economic sectors (especially public health) can benefit from such systems (e.g. Knowlton et al. 2014). The scaling up of HEWSs actually emerges as a pressing necessity given the future projections of global warming (Xu et al. 2020;Raymond et al. 2020) and could therefore serve as an efficient tool to mitigate its adverse effects. Furthermore, since the predictability is extendible when the verification criteria are relaxed, low-cost preparedness actions can be taken at even longer lead-times, following the "Red-Set-Go!" approach of the Red Cross.
However to get the best of such systems, it is important to have a clear understanding of how the heat hazard affects populations (e.g. WMO N°1142; Casanueva et al. 2019). This includes identifying the most affected social groups, the most lethal heat thresholds, the most relevant thermal indices, the most recurrent heat-related illnesses in the region etc. Such a research area is still in its infancy in the Sahel and should therefore receive more attention now that the potential for anticipatory action is evidenced. Furthermore the investigations can extend to other sectors like energy and water management which are heat-sensitive in this semi-arid region. This will allow a holistic approach to the heat issue and contribute to save many lives and protect livelihoods in the Sahel.
Acknowledgements Two anonymous reviewers helped to improve the manuscript through insightful comments. KHG was supported by the Peter Carpenter Scholarship for African Climate Science at the University of Sussex, UK. Further support was provided through the (1) UK NERC/ESRC/DfID Science for Humanitarian Emergencies and Resilience (SHEAR) consortium project 'Towards Forecast-based Preparedness Action' (ForPAc, http:// www. forpac. org), Grant number NE/ P000673/1 and (2) Future Climate for Africa (FCFA) regional consortium project 'AMMA-2050', grant number NE/M02024X/1.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.