1 Introduction

Africa, the second-largest continent on Earth and with the fastest population growth, is most vulnerable to weather and climate variability. Future climate change and low adaptive capacity are likely to lead to even more severe impacts on many vital sectors (Niang et al. 2014). Africa was selected as the first target region for the Coordinated Regional-climate Downscaling Experiment (CORDEX—Giorgi and Gutowski 2015) of the World Climate Research Programme aiming to generate high-resolution climate projections, relevant to applications at regional scale. To this extent, regional climate models (RCMs) are used to dynamically downscale the global climate models (GCMs) participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5, Taylor et al. 2012).

RCMs are not able to improve the simulation skill of large‐scale fields over those simulated by the GCMs. However, despite some critiques of dynamical downscaling (e.g. Pieke and Wilby 2012), RCMs are particularly useful where GCMs are unable to resolve small-scale features and local drivers of the regional climate, (e.g. convection), in regions with highly heterogeneous land‐cover or topography (e.g. Giorgi 2019) and inland lakes (Williams et al. 2015). Several works investigated CORDEX-Africa reanalysis-driven simulations (e.g. Nikulin et al. 2012; Panitz et al. 2014; Kim et al. 2014; Akinsanola et al. 2015; Shongwe et al. 2015; Sarr et al. 2015; Klutse et al. 2016; Favre et al. 2016; Kisembe et al. 2019; Careto et al. 2018; Warnatzsch and Reay 2019, Tamoffo et al. 2021), as well as GCM-driven ones (e.g. Laprise et al. 2013; Teichmann et al. 2013; Endris et al. 2016, 2019; Dosio 2017; Pinto et al. 2018; Dosio et al. 2019, 2020, 2021b; Tamoffo et al. 2019; Ayugi et al., 2020a, b; Mengistu et al., 2021). In particular, the ability of RCMs to add value (i.e. non-negligible fine-scale information that is absent in the lower resolution simulations) to the driving GCM in simulating precipitation characteristics (especially higher order statistics) has been shown in several studies (e.g. Dosio et al., 2015; Pinto et al. 2016; Nikiema et al. 2017; Fotso-Nguemo et al. 2017, Gibba et al. 2019; Tamoffo et al. 2020).

The errors inherited from the driving GCM, in addition to those introduced by the RCM by means of model errors and parameterizations may result in the downscaled climate still retaining large errors (e.g. Dosio et al. 2015). Therefore, prior to being used as an input for process-based impact models, RCMs outputs are usually post-processed to reduce their systematic biases (i.e. the errors compared to observations over a reference period) in a process known as bias-adjustment (Maraun et al. 2010, 2015, 2017; Piani and Haerter 2012; Chen et al. 2013a, 2013b; Guo et al. 2019).

Impact models are significantly sensitive to the frequency of extreme events, i.e. the tails of the probability distribution function (PDF) of the climatic variable (e.g. temperature or precipitation). Therefore, it is important to understand the impact of bias-adjustment not only on the mean climate signal but also on indices of extreme events relevant to impact assessment (Dosio 2016).

Compared to other regions, such as Europe, where the application of bias-adjustment for impact assessment for several sectors has been employed for more than a decade (Dosio and Paruolo 2011; Dosio et al. 2012; Rojas et al. 2011, 2012; Lung et al. 2013; Migliavacca et al. 2013a,b; Gabaldón-Leal et al. 2015; Ruiz-Ramos et al. 2016; Barredo et al. 2016; Sakalli et al. 2017; Macias et al. 2018; Spinoni et al. 2018a,b; Galmarini et al. 2019), the application of bias-adjustment methods to climate projections over Africa is more recent (due also to the relatively recent availability of CORDEX-Africa simulations) and somehow less exhaustive.

Oyerinde et al. (2017) applied a quantile mapping bias-adjustment to a single CORDEX RCM, and outputs were used for hydrological impact studies over the Niger basin, West Africa. Famien et al. (2018) present a dataset of bias-adjusted CMIP5 GCMs data over West Africa and applied it to crop yields simulations, assessing the effect of different reference datasets and claiming that the effect can be large especially on surface downwelling shortwave radiation. Mbaye et al. (2018) assessed the potential impact of bias-adjustment on extreme precipitation and temperature of the REMO RCM over the Senegal River Basin. Compared to the uncorrected ones, bias-adjustment affected mainly the magnitude of the climate change signal which was lower in the bias-adjusted data. Ayugi et al. (2020a, b) applied a quantile mapping bias-adjustment method to five CORDEX simulations over Kenya. They showed that most of the models (but not all) exhibit reasonable improvement after corrections at seasonal and annual timescales.

Chapman et al. (2020) compared the CMIP5 bias-adjusted results of Famien et al. (2018) to those of bias-adjusted simulations from CORDEX and a convection resolving model and applied the results to crop yields simulations over sub-Saharan Africa. Adeyeri et al. (2020) investigated the performance of univariate empirical and parametric quantile mapping, as well as multivariate bias-adjustment techniques, when applied to CORDEX temperature projections over West Africa. Worku et al. (2020) evaluated bias-adjustment methods applied to CCLM and REMO RCMs over part of Ethiopia and found that the distribution mapping method was better in capturing the 90th percentile of observed rainfall and temperature and wet day probability of observed rainfall. Finally, Manzanas et al. (2020) compared bias-adjustment to statistical downscaling techniques to GCMs over Malawi. They showed that statistical downscaling results are largely sensitive to factors such as the required spatial resolution, the choice of the predictors, and uncertainty in the reference dataset, which can lead to unrealistic projections. They claim that in these conditions, bias-adjustment may lead to more plausible (i.e. more consistent with the climate model) projections.

However, the large majority of these studies use only a small number of models, focus on a relatively small region or lack a direct comparison between bias-adjusted and original results.

Here, we present a dataset of daily, bias-adjusted temperature and precipitation projections for continental Africa based on the large CORDEX ensemble, which can be useful for studies of impact of climate change on several sectors (e.g. Warnatzsch and Reay 2019; Rohat et al. 2019; Sawadogo et al. 2020; Oguntunde et al. 2020; Laux et al. 2021; see also https://www.csag.uct.ac.za/cordex-africa/cordex-africa-analysis-phase-2/cordex-via-wshop1-report/). We provide guidance on the benefits and caveats of using the dataset, investigating the effect of bias-adjustment on impact-relevant indices (both their future absolute value and their change compared to present climate). In fact, although bias-adjusted projections can be helpful for planning adaptation measures against, e.g. heat related hazards, they should not be used uncritically, i.e. without acknowledging the caveats of the underlying adjustment technique and climate models. Finally, we provide a methodology to select a small subset of simulations that preserves the overall uncertainty in the future projections of the large model ensemble. This result can be useful in practical applications when process-based impact models are too expensive to be run with the full ensemble of model simulations.

The paper is structured as follows: Section 2 describes the data and method used in the analysis. Section 3 discusses the results for both the recent past and future climate, focusing on the impact of bias-adjustment on several characteristics of temperature and precipitation, including extremes. Summary and concluding remarks are presented in Section 4.

2 Data and method

2.1 Model simulations

Daily mean, minimum and maximum temperature, and precipitation data for the period 1981–2100 was obtained from a large ensemble of models listed in the Supplementary information Table 1. Seven different RCMs were used to downscale the results of ten CMIP5 GCMs for a total of 24 simulations. All RCMs were run over the same numerical domain covering continental Africa at a resolution of 0.44° following the CORDEX protocol (http://www.cordex.org/wp-content/uploads/2017/10/cordex_general_instructions.pdf). Historical simulations, forced by observed natural and anthropogenic atmospheric composition, cover the period until 2005, whereas projections (2006–2100) are forced by the Representative Concentration Pathways 8.5 (RCP8.5, Van Vuuren et al. 2011).

Here, we focus only on projections at the end of the century (2071–2100) for the high emission scenario RCP8.5, because at shorter time scales or lower emission scenarios, internal variability can delay the emergence of the anthropogenic signal in precipitation changes, especially at regional scale (Doblas-Reyes et al. 2021).

2.2 Observational dataset

As discussed by many studies (Ayugi et al. 2019; Hua et al. 2019; Dinku et al. 2018; Gebrechorkos et al. 2019; Maidment et al. 2015; Nicholson et al. 2018; Harrison et al. 2019; Dosio et al. 2021a), estimates of the observed recent climatology are very uncertain over Africa especially for regions where gauge networks are sparse (e.g. central Africa). Several discrepancies exist across datasets based on gauge station, reanalysis and satellite products, resulting in large uncertainties in the annual precipitation cycle in particular over the Ethiopian Highlands, the eastern Sahel, the coasts of the Gulf of Guinea, central Africa and the Horn of Africa (Fig. 1). As a consequence, selecting a dataset to use as reference for the bias-adjustment is not straightforward.

Fig. 1
figure 1

Annual cycle of daily precipitation (mm/day) as simulated by the original CORDEX-Africa models. Simulations are grouped by colour according to the RCM. For comparison, the WFDE5 data is shown (black thick line) as well as the range of a large ensemble of observational datasets including gauge, reanalysis and satellite-based products, (Dosio et al. 2021a). The figure also shows the sub-regions used for the analysis. They are referred to as Atlas Region (ATL), West Sahel (SAH_W), East Sahel (SH_W), Ethiopian Highlands (ETP_H), coast of the Gulf of Guinea (GN_C), Northern Central Africa (CAF_N), Southern Central Africa (CAF_S), Horn of Africa (HRN), East Africa (EAF), Western South Africa (SF_W) and Eastern South Africa (SAF_W)

In this study, we employ the WFDE5 dataset (Cucchi et al. 2020), which is based on ERA5 reanalysis bias-adjusted using CRU TS4.03 (Harris et al. 2020) and GPCCv2018 (Ziese et al. 2018) as described in Weedon et al. (2011). Data is available on a 0.5° resolution grid at hourly time step for the period 1979–2018.

The WFDE5 dataset has been evaluated against FLUXNET2015 sites and it has been used as input to an uncalibrated hydrological model for several river basins, resulting in improvements compared to unaltered ERA5 forcing (Cucchi et al. 2020). When compared to other observational datasets over Africa, WFDE5 lies normally well within the observational range (Fig. 1). These factors, in addition to having a spatial resolution comparable to that of the RCMs runs (which is recommended when performing bias-adjustment, see, e.g. Maraun et al. 2017), make WFDE5 a suitable choice for the application of the bias-adjustment to CORDEX-Africa runs.

2.3 Bias-adjustment method

The bias‐adjustment method employed in this work, developed by Lange (2019) is a trend preserving, parametric quantile mapping, designed to robustly adjust biases in all percentiles. Briefly, the method first generates pseudo future observations by transferring, for each quantile, the simulated climate change signal to the historical observations, and, then, it uses these pseudo future observations as reference for correcting future model simulations with parametric quantile-mapping. Any trend in daily mean temperature is removed before and restored after these two steps. The method adjusts the wet‐day frequency, and it does not consider any specific corrections for extremes preserving the trends in all percentiles. The method has been compared to several other bias-adjustment techniques in Casanueva et al. (2020) showing overall satisfactory performance for indices of mean and extreme temperature and precipitation.

Here, we use 1981–2010 as a reference period for the bias-adjustment.

2.4 Indices of temperature and precipitation

SI Table 2 lists the indices of precipitation and temperature analysed. Some of these indices are standard ETCCDI indices (Zhang et al. 2011), such as the number of rainy days (RR1) or the number of consecutive dry days (CDD). As bias-adjustment does not affect percentile-based indices (such as the number of days where maximum temperature is higher than the 90th percentile), here, we focus only on threshold-based indices, which can be highly affected by bias-adjustment (Dosio 2016).

In particular, we analyse the number of days with maximum temperature (TX) higher than 25 °C and those with minimum temperature (TN) higher than 20 °C (i.e. the standard ETCCDI indices SU, “summer days” and TR, “tropical nights”, here renamed SU25 and TR20, respectively), but also those with TX > 35 °C (named SU35), and with TN > 30 °C (TR30), which can be relevant in some African regions, especially in view of projected warming. We also analyse indices related to the length of warm and hot spells, both during daytime (i.e. the number of consecutive days with TX > 25 °C and 35 °C, namely CSU25 and CSU35) and night time (i.e. the number of consecutive days with TN > 20 °C and 30 °C, namely CTR20 and CTR30), which may be relevant for health related impacts.

2.5 Definition of subregions

Although a new set of reference regions has been developed for the Intergovernmental Panel on Climate Change (IPCC) 6th Assessment Report (AR6) (Iturbide et al. 2020), here, we use a different set of regions (Fig. 1). In certain regions, the IPCC macro-regions are too large for analysing precipitation characteristics where precipitation exhibits substantial spatial and temporal heterogeneity (compare, e.g. the different precipitation annual cycle between north and south central Africa, or between the coast of the gulf of Guinea and the Western Sahel, Fig. 1) as well as in regions with specific geographical characteristics (e.g. Ethiopian highlands). Additionally, even between contiguous subregions, substantial differences in observed (Dosio et al. 2021a) and projected (Dosio et al. 2021b) precipitation indices exist.

3 Results

As stated previously, CORDEX-Africa RCMs have been extensively evaluated in the past. Therefore, here, we only briefly discuss the performances of the CORDEX models on the reference period, and we mainly focus on the comparison between original and bias-adjusted indices and how the bias-adjustment alters their future changes.

3.1 Performances of temperature indices

Figure 2 shows the comparison between observed and modelled June-July–August (JJA) seasonal mean (SM) temperature for the reference period (1981–2010). The geographical distribution of the models ensemble mean shows a satisfactory agreement with WFDE5. The original (i.e. non bias-adjusted) CORDEX simulations show, in mean, remarkably low biases, especially over central Africa, the Horn of Africa and western southern Africa, whereas over the other regions, the ensemble mean biases remain constantly smaller than 1.5 °C. However, large discrepancies exist amongst individual simulations, with the model spread ranging between 3.5 °C over GN_C and 7 °C over SAH_E (see also Dosio, 2017). Interannual variability, calculated as the standard deviation of the detrended temperature time series, is also usually consistent with the observed one, although generally slightly overestimated in median.

Fig. 2
figure 2

Time series of observed (WFDE5) and modelled seasonal (JJA) mean temperature (°C) over the subregions. The geographical distribution of the long term mean (1981–2010) from WFDE5 and the multi model mean of the original RCMs is also shown. For each subregion, the temperature interannual variability (calculated as standard deviation of the detrended time series) is also shown; the black X denotes the WFDE5 value, whereas the box and whiskers plots represent the median, interquartile and full range of the original (blue) and bias-adjusted (red) CORDEX ensemble

By construction, bias-adjusted results show values that are very close to the reference dataset. Intermodel spread is largely reduced, and interannual variability is closer to the observed one.

The bias of the original CORDEX ensemble for a range of temperature indices in JJA is shown in SI Fig. 1. As discussed previously, mean temperature is usually underestimated, in median, by the CORDEX ensemble (apart from CAF_S), whereas for other indices, the sign and magnitude of the bias depend on the region (and season), although in general, most indices are underestimated. There are cases, however, where the sign of the bias changes drastically depending on the index; for instance, over CAF_N (Fig. 3a), TXx,TNx and SU35 are overestimated, but TXn, TNn and SU25 are underestimated. This can be explained by analysing the empirical probability distribution function (PDF) and cumulative distribution function (CDF) of maximum daily temperature (Fig. 3b and c), where the results of a single model (REMO2009 driven by MPI-ESM-LR) are shown as an explanatory example (the same reasoning is true for all models with similarly shaped PDFs). Although mean TX is generally well simulated by the model, since the modelled PDF is wider than the observed one, this results in the underestimation of the lower end (TXn) and overestimation of the upper end (TXx) of the PDF. In addition, although the number of days with TX = 25 °C is overestimated (Fig. 3b), SU25 (i.e. the number of days with Tx > 25 °C) is underestimated (Fig. 3c), and the opposite is true for SU35. The analysis of the CDF can also help to understand the effect of bias-adjustment on the change in temperature indices (Fig. 4). Although the change of mean, minimum and maximum values are generally well conserved by the bias-adjustment, the effect on threshold-based indices can be substantial. For instance, original models show, in median, an increase of around 10 days in SU25 for CAF_N in JJA, although individual models can show an increase up to 50 days. However, this index is largely underestimated in the reference period, where observations show that more than 95% of days in JJA (i.e. around 87 days) have already maximum temperature higher than 25 °C (Fig. 3b); this means that the maximum increase in SU25 in JJA is bounded to 5 days, making the original, non-adjusted projections unrealistic (see also the discussion in Dosio 2016).

Fig. 3
figure 3

a Bias over 1981–2010 with respect to WFDE5 of the original CORDEX simulations for the precipitation ETCCDI indices over CAF_N in JJA. For each index, the box and whiskers plots show the median, interquartile and full range of the CORDEX ensemble. Red boxes indicate that the bias is, in median, positive; blue, negative. The vertical line separates the indices according to the scale (left and right vertical axes); units depend on the index. b Empirical probability distribution function (PDF) and c cumulative distribution function (CDF) of TX over CAF_N in JJA for REMO2009-MPI-ESM-LR. The black continuous line denotes the observed (WFDE5) values over 1981–2010. Red and blue lines represent original (blue) and bias-adjusted (red) results for the reference (continuous line) and future (2071–2100, dashed-dotted line) periods, respectively. In the CDF panel, the arrows denote the values of SU25 for the reference period and its change (future minus present). Note that over the reference period, the bias-adjusted value overlaps the WDFE5 one

Fig. 4
figure 4

Projected change at the end of the century (2071–2100 – 1981–2010) of temperature indices over the subregions in JJA. For each index, the box and whiskers plots show the median, interquartile and full range of the original (red) and bias adjusted (orange) RCMs ensembles. The vertical line separates the indices according to the scale (left and right vertical axes); units depend on the index (SI Table 2)

SU25 and TR30 are the indices showing the largest differences between the change of original and bias-adjusted indices (Fig. 4), due to the general underestimation of temperature by the original simulations discussed previously. For the same reason, bias-adjusted indices based on higher thresholds, like SU35, can show changes larger than those of the original simulations, such as over GN_C, CAF_N CAF_S and HRN.

This results in large differences between original and bias-adjusted absolute values of these indices at the end of the century (Fig. 5) especially in December-January–February over the sub-equatorial regions for SU25 and CSU25, and over the area ranging from the West Sahel to the Horn of Africa (but also Namibia, Botswana and part of South Africa) for SU35 and CSU35. These results indicate that when biases over the present climate are accounted for, projected risks of extreme temperature related hazards, such as heat waves, are higher than previously found, with possible consequences for the planning of adaptation measures (Rotah et al. 2019, Mbokodo et al. 2020; Harrington and Otto 2020).

Fig. 5
figure 5

Projected maximum temperature indices at the end of the century (2071–2100) for the original and bias-adjusted CORDEX models. Results show the ensemble mean value in December-January–February (first and second column) and June-July–August (third and fourth column)

3.2 Precipitation indices

Figure 6 shows the bias over the reference period of precipitation indices over the African subregions during the wet season (i.e. season of the maximum daily precipitation amount, see Fig. 1).

Fig. 6
figure 6

Bias over 1981–2010 of the CORDEX ensemble with respect to the WFDE5 dataset for a number of daily precipitation indices over the subregions. For each index, the box and whiskers plots show the median, interquartile and full range of the bias for the original (red/blue) and bias adjusted (light blue/orange) ensembles. Red/orange boxes indicate that the bias is, in median, positive; blue/light blue, negative. Results are shown for each region depending on the season of peak precipitation (see Fig. 1)

Model performances in simulating daily precipitation vary greatly; over SAF_E and SAF_W, the majority of the models tend to overestimate most indices (SDII, RR1, RX1day, CWD, with consequent underestimation of CDD) whereas over HRN nearly all models underestimate them (and overestimate CDD). Over SAH_W, SAH_E, GN_C, CAF_N, CAF_S and ETP_H models tend to simulate too many rainy days (RR1) but to underestimate the main daily rainfall (SDII). Notably, Rx1day shows the largest discrepancy between modelled and observed results, and the largest uncertainty amongst CORDEX simulations. This is particularly true for the western Sahel, where Dosio et al. (2021a) showed that half of the CORDEX results are outside the range of uncertainty of a large ensemble of observational products.

The performance of the CORDEX runs depend strongly on both the driving GCM and the downscaling RCM (e.g. Dosio et al. 2015 for CCLM, Tamoffo et al. 2019, 2020 for RCA); however, Dosio et al. (2019) showed that over most of Africa the intrinsic precipitation bias of the RCMs is scarcely affected by the lateral boundary conditions. This is particularly true for regions where the local and regional forcings (e.g. convection or topography) are the main drivers for precipitation.

The change in precipitation indices as projected by the original and bias-adjusted models is shown in Fig. 7. The difference between original and bias-adjusted results is shown in SI Fig. 2. The geographical distribution of seasonal mean daily precipitation is usually preserved by bias-adjustment; both original and bias-adjusted multi-model means show increasing precipitation over Angola, Tanzania, Kenya and Madagascar in DJF. In JJA, projections show a marked drying over Senegal, Sudan and the Democratic Republic of Congo, whereas increasing precipitation is projected over parts of Guinea, Liberia, Sierra Leone and Cote d’Ivoire, the eastern Sahel and parts of Cameroon and Gabon.

Fig. 7
figure 7

Change (2071–2100 – 1981–2010) in precipitation indices for the ensemble means of the original (first and third column) and bias-adjusted CORDEX runs in DJF and JJA. Hatching denotes areas where seasonal mean precipitation at the end of the century (2071–2100) is smaller than 1 mm/day

The change in other indices is also usually preserved by bias-adjustment, although the change in RX1day over the Eastern Sahel in JJA and that in CDD over southern Africa in JJA is somehow reduced by bias-adjustment (SI Fig. 2). It must be remembered that Rx1day is the index showing the largest biases in present-day climate by the original simulations (Fig. 7). For CDD, the reduction by bias-adjustment can be explained with a reasoning similar to that for threshold-based temperature induces (e.g. SU25); original simulations tend to underestimate CDD over the reference period in JJA, which is a dry season for southern Africa (see Fig. 1). As the maximum value of CDD is bounded by the number of days in the season, this means that when the bias is adjusted, the future number of CDD may reach its maximum limit; as a result, the increase in CDD in the bias-adjusted simulations can be smaller than that of the original one.

Figure 8 shows the change in precipitation indices by original and bias-adjusted simulations. Precipitation projections over Africa are often uncertain, i.e. with models often not agreeing on the sign of the mean precipitation change, especially over the Sahel and the Ethiopian Highlands in JJA and Central Africa in MAM. However, in some cases such as ATL in DJF (GN_C in SON and HRN in MAM) CORDEX models do agree in projecting a dryer (wetter) future. CORDEX results are also generally consistent with those of global models, although RCMs usually project a dryer future (see, e.g. Dosio et al. 2021b; Gutierrez et al. 2021).

Fig. 8
figure 8

Projected change at the end of the century (2071–2100 – 1981–2010) of precipitation indices over the subregions during the season of maximum precipitation. For each index, the box and whiskers plots show the median, interquartile and full range of the original (red/blue) and bias adjusted (orange/light blue) RCMs ensembles. Red/orange boxes indicate that the bias is, in median, positive; blue/light blue, negative. The vertical line separates the indices according to the scale (left and right vertical axes); units depend on the index (SI Table 2)

Results for other precipitation characteristics are more consistent. In general, the majority of models project an increase in mean and maximum precipitation intensity (SDII and Rx1day) during the wet season over all regions and a decrease in precipitation frequency (RR1). Depending on the season, the length of dry spells is projected to increase by most (if not all) models over southern Africa, the Ethiopian highlands and the Atlas region.

Bias-adjusted results are usually consistent with the original ones, with the median change preserved for most of the regions and indices. In the few cases where median results differ (e.g. SM over SAH_W, CA_F and SAF_E) projections are usually uncertain, with median values close to 0 and large values of the intermodel range.

The interquartile and full range of the original model projections is usually well preserved by bias-adjustment, with the exception of Rx1day, whose range is usually greatly reduced by the bias-adjustment. As explained, this is due to the usually poor simulation and extremely large model spread for this index over the reference period; when the bias is reduced, most models converge in projecting a similar change.

3.3 Selection of simulations for impact assessment studies

Planning effective adaptation options requires taking into consideration the full range of plausible future outcomes, especially in regions where projections are similarly distributed between, e.g. dry and wet futures (Dosio et al. 2020). However, usually, impact models are too computationally demanding to be used with a large ensemble of climate projections such as the full CORDEX-Africa ensemble. Many methodologies exist to subsample an ensemble of models based on different criteria and techniques. Some authors propose a selection based on, e.g. models dependency and/or performances in the present climate (e.g. Lutz et al. 2016; Abramowitz and Bishop 2015; Brunner et al. 2019). However, Dosio et al. (2019) showed that in the CORDEX-Africa ensemble, future projections are not directly linked to present-day performances (e.g. a dry model in the present does not necessarily show drying trend in the future). In addition, the CORDEX-Africa RCM–GCM matrix is still incomplete and very heterogeneous, making the use of selection methodologies that rely on model dependence not straightforward.

Other authors employed clustering approaches to select an objective subset of climate change scenarios (Herger et al. 2018; Casajus et al. 2016; Cannon 2015).

These methods are useful to select a subsample of simulations able to preserve the main characteristics and, most importantly, a large fraction of the full range of the change in future temperature and precipitation characteristics.

Our goal, however, is not to determine the optimal size of the subset able to preserve the ensemble properties, (which can be as high as more than 50% of the simulations, see Cannon 2015). Rather, our goal is to pragmatically select a very small number of simulations that can be used by biophysical models to investigate the impact of contrasting future scenarios (e.g. dry and hot, or cold and wet). Even for regions such as Europe, computational resources can limit the number of simulations realistically feasible for the investigation of multi-hazard assessment to four or five (https://ec.europa.eu/jrc/sites/default/files/task_01_method_final_v1.pdf).

Here, the selection is performed by means of a principal component analysis (PCA), a statistical technique that converts a set of possibly correlated variables into a smaller set of linearly uncorrelated variables called principal components (PCs). This transformation is defined in such a way that the first principal component has the largest possible variance (that is accounts for as much of the variability in the data as possible). The coefficients of the transformation, called loadings, describe the weight by which each original variable should be multiplied to obtain the transformed one (Mendlik and Gobiet 2016; Dalelane et al. 2018).

In this study, the PCA was performed by using the climate change signal of temperature and precipitation, for both seasonal mean (SM) and some extreme indices (TXx,TNn, SU25,TR20, SDII, R10mm, RX1day) over the subregions in all seasons.

As shown in Fig. 9, the first three PCs alone explain more that 75% of the total variance. By plotting the loadings for each index and region, it can be noted that, for instance, the first PC (PC0) is mostly related to the variance of temperature in both DJF and JJA, and, to a lesser extent, extreme precipitation over the north equatorial regions in JJA, whereas PC1 is mostly related to precipitation indices.

Fig. 9
figure 9

Main results of the PCA analysis. First and second rows show the values of the loadings for the first 3 PCs (PC0–PC2) for each index and region in DJF and JJA. The bottom-left panel shows the total variance explained by the first ten PCs. The reconstructed signal of each model simulations in the PC0–PC1 space is also shown; individual simulations are marked according to the RCM (symbols) and GCM (colours)

By plotting the PCA results (i.e. the reconstructed signal) in the PCs space, it is possible to visually select the simulations that maximize the inter-model variability. As an example, from the analysis of the first two components, PC0 and PC1, we note that four simulations (CLMcom-CCLM4-8-17_ICHEC-EC-EARTH; SMHI-RCA4_CSIRO-QCCCE-CSIRO-Mk3-6–0; ICTP-RegCM4-3_MOHC-HadGEM2-ES; SMHI-RCA4_MIROC-MIROC5) are sufficient to mostly capture the spread in the PC0–PC1 components, whereas ICTP-RegCM4-3_MPI-M-MPI-ESM-LR is mostly neutral. This selection preserves the range of the full CORDEX ensemble relatively well for all indices and regions (Fig. 10) and captures a representative spread of temperature and precipitation changes in terms of magnitude and geographical distribution (SI Fig. 3).

Fig. 10
figure 10

Projected change at the end of the century (2071–2100 – 1981–2010) of precipitation indices over the subregions during the season of maximum precipitation. For each index, the box and whiskers plots show the median, interquartile and full range of the bias adjusted RCMs ensembles (as in Fig. 9) and the coloured symbols represent the 5 simulations selected from the PCA analysis

4 Summary and concluding remarks

We present a dataset of bias-adjusted climate change projections for Africa based on the CORDEX regional climate model simulations, and we investigated the effect of bias-adjustment on present and future indices of temperature and precipitation.

The trend-preserving bias-adjustment method is generally able to preserve not only the mean precipitation and temperature change but also other daily characteristics that are relevant for impact assessment, such as minimum and maximum temperature and number of rainy days.

Other indices (such as the number of days with temperature higher than a threshold) are strongly affected by bias-adjustment, as a direct consequence of the model bias over the reference period, which makes the projected climate change not reliable.

This effect is particularly visible for SU25 and CSU25 in DJF over the sub-equatorial regions and for SU35 and CSU35 over the area ranging from the West Sahel to the Horn of Africa (but also Namibia, Botswana and part of South Africa). However, when the biases over the present climate are accounted for, projected risks of extreme temperature related hazards are higher than previously found, with possible consequences for the planning of adaptation measures.

As impact models are usually too computationally demanding to be used with a large ensemble of climate projections, we propose a methodology based on principal component analysis to select a small subsample of simulations able to preserve the main characteristics and, most importantly, most of the range of the change in future temperature and precipitation characteristics.

Finally, as pointed out by Dobas-Reyes et al. (2021), users of bias-adjusted datasets need to be fully aware of the suitability of the selected bias-adjustment approach (including underlying assumptions) for their application, and the potential limitations of their results. A collaboration between bias-adjustment users and providers, including experts in the specific regional climate, is advisable.

Here, we list some important limitations that need to be taken into account by potential users of the dataset.

  1. 1)

    Bias adjustment will not correct for models’ deficiencies in representing fundamental physical processes and circulation errors, and the projections of these models will remain unreliable, even after bias adjustment. This implies that a process-based evaluation of model performances is essential, prior to any bias-adjustment, to assess their ability to reproduce the physical processes and drivers of the regional climate.

  2. 2)

    Several bias-adjustment techniques exist, based on different assumptions and methodologies (e.g. Maraun 2016; Casanueva et al. 2020; Tabari et al. 2021). In this work, we applied a trend-preserving bias-adjustment. However, other bias-adjustments exist that modify the projected trends even of mean quantities (e.g. Hagemann et al. 2011; Haerter et al. 2011; Dosio et al. 2012). Although some studies (Boberg and Christensen 2012; Gobiet et al. 2015) claim the advantage of this modification, the validity of these methods is still debated (e.g. Pierce et al. 2015; Maraun et al. 2017; Doblas-Reyes et al. 2021). As a consequence, it is important to note that the choice of the bias-adjustment can add another layer of uncertainty to the full range of projections, an uncertainty that may be of the same order of magnitude as that of the model ensemble.

  3. 3)

    Previous studies (e.g. Gobiet et al. 2015) showed that using a bias-adjustment that modifies the (mean) climate change signal of individual model simulations also drastically affects the intermodel spread (namely reducing it). The bias-adjustment used in our study preserves the uncertainty range of future projections not only for mean quantities but also the tails of the PDFs (such as TXx and TXn, with the notable exception of Rx1day). Obviously, the intermodel spread may not be preserved for threshold-based indices that are unrealistically simulated in the present climate, and whose increase is bounded by the maximum number of days in a season (e.g. SU25).

  4. 4)

    The reliability of any projection based on bias-adjusted results depends on the observational dataset used as reference; in regions, such as Africa, where datasets of observed precipitation show large discrepancies, it would be interesting to investigate the effect of using different reference datasets on the projected bias-adjusted results. This is left for future research.

  5. 5)

    As discussed, many methodologies and criteria exist to subsample an ensemble of models; even when using the same methodology (e.g. PCA) different user-relevant choices in, e.g. the variable/index, specific region or season would lead to a different subset of simulations. Here, we showed that PCA can be successful in pragmatically selecting a very limited number of simulations that preserve the range of the full CORDEX ensemble for most indices and regions and capture a variety of changes in both temperature and precipitation that can be used for investigating the impact of contrasting but plausible future scenarios.

Finally, it must be noted that the construction of user-relevant regional information for, e.g. impact assessment on different sectors, strongly benefits from co-production and bottom-up approaches that involve users and related stakeholders, and take into account, apart from the climatic perspective, also social and environmental pressures (Füssel 2009, Pielke et al. 2012, Jack et al. 2020, Doblas-Reyes et al. 2021).

The bias-adjusted CORDEX-Africa simulations are freely available at https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/SUCCAST/VER2021