1 Introduction

Anthropogenic activities have been considered the main driver of change in climate over many parts of the world [1]. The Intergovernmental Panel on Climate Change (IPCC) six Assessment Report (AR) Working Group1 (WG 1) on the status of change in the past, present, and future revealed that the frequency and severity of extreme events have increased in recent years in many parts of the world [2]. These changes have been exacerbated by changes in the physical features of climate system drivers [3,4,5]. Climate models are one of the tools used to simulate past climate conditions [6], simulation of variability in the present time [7, 8] and projection of future extremes events [9], ratio of change in total rainfall under different scenarios [10, 11]. The results from these different models vary from the region, country, and meteorological variables or even if applied to a specific location due to variations in model initial and boundary conditions [12]. Hence, it is critical to validate the skills of these models before use in impact studies, applications in a specific sector or region [13].

In order to evaluate the past and future climate signals, several researchers from across the world evaluated model simulations at the global land [14], in Africa [15], Asia [16], Europe [17], America and the Caribbean [18]. Additionally, models’ simulations are widely utilized in the sub-regions of West Africa [19], North Africa [20], Southern Africa [21] and East Africa among others [22,23,24]. These studies showed variation in skills of different Coupled Model Intercomparison Project Phase 6(CMIP6). Also, individual model skills have also evolved through time, beginning with CMIP3, CMIP4, CMIP 5, and the latest CMIP6 Global climate models (GCM) simulations used in IPCC AR6 [25]. Even products downscaled at the regional level such as the Coordinated Regional Downscaling Experiment (CORDEX) simulations over East Africa [26] showed variation in the performance of models within the region, countries and sub-nation levels. All these studies explained the importance of validating the accuracy of GCMs and Regional Climate Models (RCMs) historical simulations before being applied in assessments of climate change signals at present time and projected climate change.

Recently, studies used CMIP6 GCMs simulations pointed out the variation in skills of GCMs over many parts of Africa. For example, there is a high correlation coefficient exceeding 0.8 over West Africa [27], with a general overestimation of annual precipitation variation over most parts of Southern Africa [21]. Most models captured a superior spatial climatological rainfall pattern than simulated amplitude over East Africa [7], particularly over sub-regions receiving bimodal rainfall patterns compared to those receiving unimodal rainfall regimes [10], increased performance in the annual mean cycle [22]. The CMIP6 Multi Models Ensemble (MME) represents the spatial patterns of linear trend over Africa and the Arabian Peninsula better than individual models [28]. IPCC AR6 noted a significant improvement in CMIP6 outputs compared to previous versions of CMIPs, whether, improvement in higher spatial resolution, parameterization schemes, biogeochemical cycles and physical processes [29].

Despite the progress made on modeling, representations of climate systems, physical parameterizations, high spatial resolution, dynamical downscaling and other forms, the CMIPs and CORDEX simulations still shows biases [22], variation in the ability of models to reproduce observed climate variability and changes [30]. In addition, if a model shows good skill over specific regions and locations based on CMIP3 and CMIP5 simulations it does not necessarily have similar skills in reproducing the observed patterns using simulations from CMIP6 [31]. It is worth mentioning that, the risks and stresses in the future depend on the quality of current information simulated by models in the present time. In addition, the effectiveness of adaptation and mitigation actions and sustainable planning for the future rely on the ability of GCMs historical outputs to simulate the current patterns of rainfall patterns and extreme events. The analysis on a seasonal basis such as March, April and May (MAM) and June, July, August and September (JJAS) as the main rainy season over IGAD member states are rarely studied using CMIPs simulations.

To date several publications have focused on the evaluation of CMIP6 performance to project regional climate and extreme events over East Africa [7, 10, 15, 23, 32,33,34]. All these studies used continuous statistical indices (metrics). Therefore, the overall goal of this paper is to evaluate the skills of 23 CMIP6 GCMs historical precipitation outputs (rainfall for the purpose of this study) in simulating total rainfall patterns, then to select the best 10 performance models to compute Multi Models Ensembles (MME) used in projecting wet/dry days and wet/dry spells characteristics over IGAD region of East Africa. We employ the standard set of 8 continuous statistical measures, 4 categorical and 8 volumetric indices for the first time in East Africa. The rest of the paper is organized as follows: details of data and techniques used are explained in Sect. 2 while results are presented in Sect. 3. Lastly, the summary and discussions are presented in Sect. 4.

2 Data and methodology

2.1 Study area descriptions

The study focuses on Intergovernmental Authority on Development (IGAD) member states of Sudan, Eritrea, Djibouti, South Sudan, Ethiopia, Kenya, Somalia and Uganda (Fig. 1). The geographical coordinates of the region are latitude 21.4–51.2°E and Longitude 5–23.2°N. The region is characterized by complex topography. The region's elevation varies from an area below sea level over Sudan to the highest points of Mount Kenya at 5199 m the second highest mountain in Africa after Mount Kilimanjaro (5895 m) in Tanzania, Rift Valley extended from Kenya to Ethiopia. The IGAD region climate is affected severely by these high elevation landmarks, seasonal movement of intertropical convergence zone (ITCZ) north and southward, which is the one of factors that determined the variation in four different rainfall seasons such as December, January, February (DJF), March, April, May (MAM), June, July, August, September (JJAS) and October, November, December (OND). Also, many studies show the climate of the region is influenced by El Nino/Southern Oscillation [35,36,37,38] as well as the variability of Sea-Surface temperature(SST) over the Indian Ocean [39]. The impacts of different ENSO phases (El Niño and La Niña or neutral) have other impacts over the other parts of the region [38, 40]. The variation in climatic zones, whether warm deserts or humid highland climates are mainly driven by orography, geography and micro- synoptic systems [41]. These local effects offer an opportunity or could be a challenge to the CMIP6 simulations to reproduce observed climate conditions over the region.

Fig. 1
figure 1

Elevation map of the IGAD region of Eastern Africa, the sky-blue areas indicate the potential agricultural areas used to validate the Models. The purple points indicate local administrative areas used to compute and visualize the temporal and spatial patterns of validation

2.2 Data

This study uses 23 daily historical simulations of CMIP6 models that participated in the new Sixth Phase of CMIP6 and IPCC Sixth Assessment Report(AR6) [42]. The daily precipitation for historical (1981–2014) simulations was obtained from the CMIP6 website (https://esgf-node.llnl.gov/search/cmip6/). The published 53 CMIP6 list and information regarding models ID, institution description, different agencies, countries and nominal resolutions found from CMIP6 institution values (wcrp-cmip.github.io). These 53 WCRP-CMIP CMIP6 models had gone through the scrutiny of the availability of datasets in CMIP6 database and resolution. The selected 23 CMIP6 simulations, institutions, model names, and resolution information are listed in Table 1. In this study, Despite the large number of models involved in CMIP6, we only used first member realization outputs (r1i1p1f1) so as to allow for inter-comparison and consistency. To overcome the challenges related to insufficient in-situ data sets to be gridded for weather and climate studies over the IGAD region and eastern Africa in general [43,44,45], the High-resolution Satellite Rainfall Estimates (SRE) products selected for this study are Climate Hazards Group (CHG) Infrared Precipitation with in-situ station (CHIRPS) daily datasets from the University of California at Santa Barbara (UCSB). The CHIRPS is used as reference data due to better performance in the IGAD region compared to other SRE data sets [45,46,47,48,49]. The CHIRPS product is developed at 0.05° spatial resolution at daily, pentadal, dekadal, and monthly temporal resolution and available from 1981 to near present [50]. The data has been used by many researchers in previous studies in the region [46, 49, 51, 52]. The selected 23 models and CHIRPS datasets were rescaled from original resolutions to ten-kilometer (0.1 deg) using the bilinear interpolation method used by researcher Song and Yan [53] to overcome challenges of differences in resolutions of CMIP6 models and CHRIPS satellite rainfall estimates over IGAD region of Eastern Africa. The Multi Models Ensembles (MME) referred to herein as Ensemble mean (EnsMean) are computed to reduce systematic errors and biases. To compare models’ simulations and CHIRPS, the temporal range datasets of 1981–2014, and the mean value of all pixels over each of five potential agricultural areas of Al Qadarif state in Sudan, Arsi district in Ethiopia, Upper Nile state in South Sudan, Trans Nzoia County in Kenya and Arua district in Uganda and entire IGAD region were used in validation. These potential agricultural subregions were adapted based on geographical location and agriculture capacity, food security, and the magnitude of cash and food crops produced in these regions (see Fig. 1). In addition, these five regions are important agricultural areas of the IGAD region supplying other parts of the region with food consumed locally or for export.

Table 1 List of 23 CMIP6 models in this study and their institutions, model names and spatial resolutions

2.3 Methodology

2.3.1 Statistical deterministic metrics

The climatological mean is used to assess how CMIP6 simulations capture the annual cycles and inter-annual variability patterns. The performance of CMIP6 simulations was analyzed using 20 continuous statistical measures, categorical and volumetric indices computed over all local administrative areas in the IGAD region. The continuous indices are Correlation Coefficient (CC), Percent Bias (Pbias) ratio, Mean Error (ME), Root Mean Squared Error (RMSE), and Nash–Sutcliffe Efficiency (NSE). The Categorical indices are Probability of Detection (POD), Probability of False Detection (POFD), False Alarm Ratio (FAR), Critical Success Index (CSI), and Heidke Skill Score (HSS). The volumetric indices are Mean Quantile Bias (MQB), Mean Quantile Error (MQE), Volumetric Hit Index (VHI), Quantile Probability of Detection (QPOD), Volumetric False Alarm Ratio (VFAR), Quantile False Alarm Ratio (QFAR), Volumetric Miss Index (VMI), Volumetric Critical Success Index (VCSI). The CC, PBIAS and RMSE continuous indices were selected to visualize the spatial patterns of model’s performance relative to CHIRPS v.2.0 reference datasets. The CC was used to measure the relationship strength between each of the 23 individual models and CHIRPS v.2.0 reference datasets. The perfect relationship value of CC is 1.0. The PBIAS measures the model’s simulation values if it is bigger or smaller than observed. The perfect score is 0.0, therefore low-magnitude values indicate accurate model simulation. The positive values show model overestimation, whereas negative values show underestimation. The RMSE measures the differences or errors between models’ simulations and CHIRPS. The optimal value of RMSE is 0.0. The colour code portrait, scatter plots, and Cumulative Distribution Function (CDF) were used to compare the accuracy and consistency of 23 CMIP6 models with respect to CHIRPS v.2.0. The 20 continuous, categorical and Volumetric statistical metrics values and ranking the performance of each 23 CMIP6 models over Al Qadarif state in Sudan, Arsi zone in Ethiopia, Arua districts in Uganda, Trans Nzioa county in Kenya and entire average area of IGAD region during MAM season and during JJAS season. The comprehensive rating, ranking and selecting of the best 10 performance CMIP6 to guide parts 2 of this analysis, which is projected patterns of future wet days and dry spells patterns, the CC, PR2, Pbias, ME, MAE, RMSE and NSE was computed over entire averaged areas of IGAD region, then the values scored sorted from highest to lowest performance model and colour code used for each model and best 10 models consistently appeared in ranking during MAM and JJAS season selected. For the mathematical expression and purpose of this paper, if O is representing CHIRPS rainfall measurements; \({O}^{-}\) representing the average of the measurements; S representing CMIP6 simulations; n representing the number of data samples. If the A, B, C and D represent Hits, false alarm, misses and correct negative respectively, the statistical skill scores are computed based on a likelihood in formulas in Eq. (5, 6, 7, 8). Then continuous statistical measures (CC, Pbias, ME, MAE, RMSE, NSE, IOA), categorical (POD, POFD, FAR, CSI, and HSS) and volumetric indices (MQB, MQE, VHI, QPOD, VFAR, QFAR, VMI, VCSI) computed using formulas in Eq. (1–20) respectively as described in Table 2 below. Linux based Climate Data Operators (CDO) commands lines, R- statistical Package Climate Data Tool (CDT) and ArcGIS 10.4 used in plotting and mapping spatial maps.

Table 2 Descriptions of 20 continuous statistical, categorical and volumetric indices used in validation of CMIP6 simulations

3 Result and discussions

3.1 Seasonal rainfall climatology

The spatial climatology patterns of total rainfall of 23 CMIP6 historical simulations, ensemble mean (Ensmean) and observation (CHIRPS v2.0) of the 1981–2014 average are presented in Figs. 2, 3. Most models successfully reproduce the spatial patterns of total rainfall over highlands of western Ethiopia, western South Sudan, dry conditions over extreme northern parts of Sudan, arid and semi-arid climate over southern, northeastern Kenya, south-eastern Ethiopia, and most parts of Somalia during MAM (Fig. 2) and JJAS (Fig. 3). The majority of Models reproduced the highest total rainfall amount (200–600 mm) and 800–1200 mm during MAM and JJAS respectively. The findings were consistent with s study by Ayugi et al. [22] found a considerable difference in the performance of GCM models, whether CMIP5 or CMIP6 in reproducing annual and seasonal precipitation climatology. Total rainfall increased from MAM to JJAS which is associated with the northward movement of ITCZ as the main driver of simulated and projected rainfall over the region [54]. Despite the importance of MAM for parts of the region close to the Equator and JJAS seasonal rainfall over the northern sector of the IGAD region, models are able to capture the highest amount of rainfall over highlands of western Ethiopia, while northern parts of Sudan, Northeastern Kenya, southeastern Ethiopia, central and northern Somalia received lowest amount during MAM and JJAS. These results are in agreement with results from CORDEX Regional Climate Models carried out by Endris et al. [55] over East Africa. Additionally, the majority of models underestimated MAM and JJAS rainfall over the highlands of western Kenya and well reproduced the patterns over coastal. The results also reveal that CNRM-CM6-1-HR, GISS-E2-2-G, KIOST-ESM and CAMS-CSM1-0 tend to underestimated rainfall, contrary to ACCESS-ESM1-5 overestimated rainfall over South Sudan, central and highland of western Ethiopia and most parts of Uganda during JJAS. The Ensmean well represented MAM and JJAS rainfall compared to individual models, however, the arid and semi-arid regions in Kenya, Ethiopia and Somalia over-estimated the total rainfall over most parts of the IGAD region. These findings show that lowlands in ASALs overestimate total rainfall when compared to highlands. This demonstrates the need of improving topographical features in CMIP6 models, particularly those with resolution coarser than 1.5°.

Fig. 2
figure 2

Climatology of rainfall in eastern Africa during MAM as simulated by 23 CMIP6 historical run, ensemble mean (Ensmean) and observation (CHIRPS v2.0) in simulating total rainfall climatology patterns relative to 1981–2014 reference period

Fig. 3
figure 3

Climatology of rainfall in eastern Africa during JJAS as simulated by 23 CMIP6 historical run, ensemble mean (Ensmean) and observation (CHIRPS v2.0) in simulating total rainfall climatology patterns relative to 1981–2014 reference period

The total rainfall annual cycle patterns of 23 CMIP6 GCMs, ensemble mean and CHRIPS v2.0 presented in Fig. 4. The datasets were extracted over five potential food and cash crops cultivation areas of Al Qadaref state in Sudan, Upper Nile state in South Sudan, Arsi Zone in Ethiopia, Aura district in Uganda and Trans Nzoia County in Kenya. The results show the models well reproduced the annual rainfall cycle, unimodal rainfall regimes over Al Qadaref state in Sudan (Fig. 4a) and Upper Nile state in South Sudan (Fig. 4b), peak in April and May, then July to October over Arsi zone in Ethiopia (Fig. 4c) and bimodal over Arua districts (Fig. 4d) and Trans Nzoia County (Fig. 4e). Annual cycle, explained the importance of MAM and OND seasons for equatorial eastern Africa countries, JJAS for the northern sector of GHA. It is clear that most models under-estimate the peak of rainfall in August over Al Qadaref, June–August over Arua, February-May over Arsi, February-September over Trans Nzoia, while over-estimate April-September rainfall over Upper Nile, October-December over Arzi, Arua and Trans Nzoia.

Fig. 4
figure 4

Annual rainfall cycle over East Africa for 1981–2014 based on observations (CHIRPS) and 23 CMIP 6 historical simulations averaged food and cash crops cultivation areas of over a Al Qadaref, b Upper Nile, c Arsi Zone in Ethiopia, d Aura district in Uganda, e Trans Nzoia County in Kenya

On a seasonal timescale, the MAM and JJAS over the Highlands of western Kenya, JJAS over Central Ethiopia and northeastern Sudan under-estimated rainfall, while northern Uganda during MAM and OND, central Ethiopia during OND, northeastern South Sudan during JJAS over-estimated rainfall. The MAM under-estimated cycle pattern in Kenya, JJAS in Sudan, and over-estimate during OND season over Kenya and Uganda are consistent with CMIP5 results by Ongoma et al. [56], also, in agreement with most CMIP3 simulations [38]. The ensemble means of the models reproduce an annual cycle to a large extent compared to most individual models. ACCESS-ESM1-5 is the only model that over-estimates interannual variability from January to December over Al Qadarif state, while CNRM-CM6-1-HR failed to simulate the annual cycle over the Al Qadarif state, Upper Nile state, Arsi zone, and Arua district. In contrast to Babaousmail et al. [20] the study, which found that CNRM-CM6-1 and CNRM-ESM2-1 consistently overestimate rainfall, particularly between September and February over North Africa. The results also show the model's patterns confirmed the influence of north–south passage of the ITCZ as reported in a study by Clark et al. [57] and possible influence of ENSO phenomena among other climate drivers during OND rainfall [58]. Seems, the effect of EA–Indian Ocean, Asian monsoon, atmosphere–ocean–monsoon interactions are the main driver of bias in MAM and OND rains as reported by Yang et al. [59] in his study of Annual Cycle Bias over East Africa in CMIP5. According to Anyah and Qiu [38], the CMIP3 coupled models overestimate the short rains of OND and underestimate the long rains of MAM. Also been reported for the CMIP5 coupled models, which misrepresented the annual cycle of rainfall in East Africa due to bias in the convective instability (CI), and near-surface moisture static energy (MSE). Generally, the performance of the CMIP6 models is characterized by low skills in reproducing MAM rainfall patterns compared to peaks in the OND season. This confirms the continuation of some weaknesses found in previous versions of CMIPs (CMIP3, CMIP4 and CMIP5).

3.2 Spatial patterns of statistical deterministic metrics

The spatial patterns of CC, PBIAS and RMSE statistical deterministic metrics during MAM and JJAS seasons over each local administrative unit in the IGAD region presented in Figs. 5, 6, 7. The patterns of correlation between 23 CMIP6 models and EnsMean simulations relative to CHIRPS v2.0 reference datasets for the period from 1981 to 2014. The results revealed that all individual models during MAM (Fig. 5a–w), and JJAS seasons (Fig. 5a–w) and EnsMean (Fig. 5x) observed the positive correlation over the entire IGAD region. Most parts of the region recorded weak CC values between 0 and 0.2, while the highest values not exceeding 0.8. Majority of zones in western Ethiopia zones recorded the highest CC, followed by South Sudan counties and southern parts of Sudan. The ACCESS-ESM1-5, CanESM5, CMCC-CM2-HR4, IPSL-CM6A-LR, MRI-ESM2-0 and NorESM2-MM are the most performed CMIP6 simulations over South Sudan with CC values exceeding 0.4. The majority of models recorded higher correlation over southern and central parts of Somalia compared to northern parts of the country. The majority of sub-counties and Parishes in northern Uganda recorded higher CC compared to central and southern parts. The GFDL-ESM4 performed better than other models over most parts of Uganda (Fig. 5j). Majority of consistencies in Kenya, zones in central and northern Ethiopia recorded lowest correlation. All 23 models recorded less than 0.1 CC values over all districts in Djibouti, while the districts in northwestern Eritrea recorded better CC compared to other parts of the country. All models recorded lower CC over northern parts of Sudan. Compared to individual models, the EnsMean patterns showed the improved CC values over all parts of the region, with western Ethiopia, southern parts of Sudan and counties in South Sudan showing a remarkable correlation ranging between 0.5 and 0.8 (Fig. 5x). This in consistent with Mbigi's et al. [7] findings, which indicate that an ensemble means of all 20 models and a bimodal rainfall pattern show a very strong link throughout all regions of East Africa.

Fig. 5
figure 5

Spatial patterns of correlation coefficient (CC) values (mm per season) of 23 models and EnsMean rainfall with respect to CHIRPS v.2.0 during MAM season over each local administrative unit in IGAD region for 1981–2014

Fig. 6
figure 6

Spatial distribution of Percent of Bias (PBIAS) climatology of precipitation (mm_season − 1) of 23 models and EnsMean rainfall with respect to CHIRPS v.2.0 during JJAS season over each local administrative unit in IGAD region for 1981–2014

Fig. 7
figure 7

Spatial distribution of RMSE climatology of precipitation (mm_season − 1) of 23 models and EnsMean rainfall with respect to CHIRPS v.2.0 during MAM season over each local administrative unit in IGAD region for 1981–2014 reference period

The Pbias patterns during JJAS showed all 23 models over-estimated rainfall over all wards in southern and northeastern Kenya, southeastern Ethiopia, most districts in Somalia and Djibouti. Again, the ACCESS-ESM1-5 and MIROC6 are the two models of over-estimated rainfall over the IGAD region. Also, with the exception of ACCESS-ESM1-5 and MIROC6, all zones in the highlands of western Ethiopia, and all districts in Sudan have under-estimated rainfall (Fig. 6). The ACCESS-ESM1-5 and MIROC6 are most model over-estimated (Fig. 6a, r) and CNRM-CM6-1-HR is the most under-estimated rainfall over most parts of the region (Fig. 6g). The amount of rainfall over-estimated (under-estimated) for both MAM and JJAS exceeded 80 mm per season over these areas. The patterns of under-estimation (over-estimation) are close with seasons and the dry or wet climate of the region. For instance, MAM and JJAS which are not the main rainy seasons in Sudan and most parts of Kenya observed significant overestimation of rainfall. The EnsMean simulated lowest Pbias compared to individual models over South Sudan compared to other countries. Similarly, a study by Kevin et al. [24] aimed at understanding CMIP6 Biases in the representation of long and short rains over East Africa showed that model biases are caused by a complex combination of oceanic and atmospheric processes, including models of the Walker Circulation. Generally, the Pbias has reduced over Sudan, Ethiopia and South Sudan, while northeastern Kenya, southeastern Ethiopia and most parts of Somalia over-estimation of rainfall amount which exceeded 70 mm per/dekad (Fig. 6x). Previous studies have linked aridity and rainfall belt over parts of Eastern Africa with some large-scale orography [60,61,62]. Turkana low-level Jet stream (TJ), are major moisture transport mechanisms from the western Indian Ocean to the interior of northwestern East Africa [62]. As a result, effective modelling of these low jet characteristics in the CMIP6 models might play a significant role in improving skill and reducing biases in the CMIP6 models.

Figure 7 presents the RMSEs of the simulated MAM rainfall over IGAD region of Eastern Africa region from the 23 individual models and its EnsMean with reference to the observation for the period from 1981 to 2014. All 23 individual models simulated lower than 1–20 RMSEs/season over all areas north 15 N latitude line in Sudan, all districts in Eritrea and Djibouti (Fig. 7a–w). Zone with ASAL in southeastern Ethiopia, Kenya and southern Somalia, all districts in Uganda simulated 20–60 RMSEs/season. The majority of models recorded higher RMSE over the western and Nyanza counties in Kenya, southeastern districts in Uganda and southern parts of Ethiopia. The MIROC6 model recorded the highest RMSE of 70–100 over Zones in the highlands of Ethiopia and western and Nyanza wards in Kenya (Fig. 7r). This suggests that using model information for adaptation planning may be misleading. The areas with less rainfall during the MAM season recorded less RMSE compared to regions with the highest amount of rainfall. Generally, the EnsMean displays lower RMSEs over most parts of the study domain compared to individual models. The lowest RMSEs in the EnsMean simulation are observed over all areas north 15 N latitudes line in Sudan, ASALs in northern Ethiopia, the central and northern Somalia (Fig. 7x). According to Makula and Zhou’s [10] research, most of these models had low RMSE, strong spatial correlation, and an SD ratio between simulations and observations near to one, indicating that they performed better.

3.3 Regional and sub-national statistical metrics

Figure 8 illustrates coloured code portraits of 23 CMIP6 GCMs historical simulation with respect to CHIRPS-2.0 over Trans Nzioa county in Kenya during MAM and Arsi zone in Ethiopia during JJAS season. The assessment was done using 20 different continuous, categorical and Volumatic indexes. The results show it is premature to conclude the best-performed models based on limited indexes. In other words, it is not inclusive to selecting the best performed models based on continuous indexes such as correlation, bias and RMSE, or categorical indexes (FAR, CSI, HSS) and Volumatic indexes such as VHI, VFAR, VMI, VCSI indexes or performance over specific location. The individual models show variation in skills within 20 indexes considered in the analysis. Some models performed very well based on continuous indexes and poorly under categorical and Volumatic indexes and vice versa. For example, MIROC6 performed very well using Volumatic indexes and poorly using continuous over Trans Nzoia. Similarly, the EC-Earth3 and CNRM-CM6-1-HR performed poorly under continuous indexes and better under both categorical and Volumatic indexes. The EnsMean scored the best skill based on continuous indexes and poorly under Volumatic. The INM-CM5-0, HadGEM3-GC31-MM, CMCC-CM2-HR4, AWI-CM-1-1-MR, TaiESM1, NorESM2-MM, IITM-ESM, IPSL-CM6A-LR, GFDL-ESM4 and EC-Earth3 are the best 10 performed models over Trans Nzoia in Kenya during MAM season. Ranking individual models show INM-CM5-0 scored the best skills based on continuous indexes (Pbias, ME, MAE, RMSE, NSE and IOA), POD and CSI categorical indexes. Again, CMCC-CM2-HR4 scored better based on volumetric indexes (MQB, VHI, VFAR, VMI and VCSI). The coloured code portraits of 23 CMIP6 over the Arsi zone in Ethiopia during the JJAS season shows failed assessment using categorical indexes because all values are the same for all models. Within one categorical or continuous index, there is variation in the way models perform. For instance, the EC-Earth3 performed the best using correlation and poorly under PBIAS, ME, MAE and RMSE. The ACCESS-ESM1-5, EC-Earth3, IPSL-CM6A-LR, MIROC6 and MRI-ESM2-0 performed well under categorical and Volumetric and poorly under continuous indexes. The BCC-CSM2-MR, INM-CM5-0, CAMS-CSM1-0, HadGEM3-GC31-MM, GFDL-ESM4, KACE-1-0-G, UKESM1-0-LL, NorESM2-MM, AWI-CM-1-1-MR and MPI-ESM1-2-HR are the best 10 performed models over Arsi in Ethiopia during. Again, the EnsMean scored the best skill based on continuous indexes and poorly under Volumatic during JJAS over Arsi. It is worth mentioning that the validation using continuous indexes is much better and representative compared to categorical and Volumatic indexes.

Fig. 8
figure 8

Summary of 20 continuous, categorical and Volumatic statistical metrics values of 23 CMIP 6 models with respect to CHIRPS v.2.0 over Trans Nzioa county in Kenya during MAM season and Arsi zone in Ethiopia during JJAS season

The scatter plots in Fig. 9 compare the 23 CMIP6 simulation, EnsMean and CHIRPS products at dekadal time-scales. The sample of results used is for Arua district in Uganda and Al Qadaref in Sudan during MAM during JJAS respectively. There is wide scatter for all 23 CMIP6 products and EnsMean over Arua compared to Al Qadaref. The values toward models’ simulations (toward CHIRPS) explained the overestimated (underestimation) rainfall patterns. The majority of models overestimated rainfall, especially rainfall values over 50 mm.The CNRM-CM6-1-HR and EC-Earth3 are the models that underestimated the rainfall over Arua and Al Qadaref. The BCC-CSM2-MR, ACCESS-ESM1-5, CMCC-CM2-HR4, MIROC6, TaiESM1, MRI-ESM2-0 and GFDL-ESM4 are the most models with wider scattered rainfall values of overestimated rainfall over Arua during MAM season. The majority of models are much better performed over El Qadaref state compared to the Arua district. The ACCESS-ESM1-5, AWI-CM-1-1-MR and MPI-ESM1-2-HR are the most common models with wide scatter over El Qadaref. The HadGEM3-GC31-MM shows the least scattered and best performed models (rainfall values are in agreement with model simulation). On the other hand, the EnsMean showed substantial difference scatter from individual models which exhibits systematic underestimation of rainfall amount. These are due to a high variability and the number of Models overestimating/underestimating the rainfall amount, especially rainfall values exceeding 50 mm. This scatter may be attributed to uncertainty in the standard calendar, original resolution and variation associated with CHIRPS satellite rainfall estimates. Also, overestimated(underestimation) of the rainfall patterns could be driven by misrepresentation of the large-scale patterns of sea surface temperatures (SSTs) in the Pacific as reported by Hoell and Funk [63] and teleconnection with Indian oceans as concluded by Bahaga et al. [64].

Fig. 9
figure 9

Scatter plots comparing each 23 CMIP6 simulation against CHIRPS v2.0 area-average of all pixels in Arua district in Uganda during MAM and El Qadaref in Sudan during JJAS at dekadal time-scale reference to 1981–2014

The Cumulative Distribution Function comparing each 23 CMIP6 simulation against CHIRPS during MAM and JJAS seasons presented in Fig. 10. The MAM season is represented by Trans Nzaia in Kenya and the JJAS season by Upper Nile in South Sudan. The results show that CDF the model’s performance varies from each model and rainfall amounts categories (0–50 and 100–150 mm). A majority of models over Trans Nzoia show a tight agreement with observation at lower rainfall compared to high amounts. The CMCC-CM2-HR4, BCC-CSM2-MR, IPSL-CM6A-LR, IITM-ESM, NorESM2-MM, MRI-ESM2-0 are the best-performing models (shows a tight simulation). The AWI-CM-1-1-MR, CNRM-CM6-1-HR, EC-Earth3, MIROC6, MPI-ESM1-2-HR, UKESM1-0-LL and GISS-E2-2-G are most models that show a loose pattern. The AWI-CM-1-1-MR, CAMS-CSM1-0, CNRM-CM6-1-HR, EC-Earth3, MPI-ESM1-2-HR and UKESM1-0-LL underestimated the rainfall patterns. This is similar to Hamida et al. [65] findings across Uganda, where the models CNRM-CM6-1 and CNRM-ESM2 underestimated rainfall throughout the annual cycle and mean climatology. The performance of all models improved during JJAS in Upper Nile compared to Trans Nzoia, with ACCESS-ESM1-5 and BCC-CSM2-MR as the most model with loose simulation patterns for all rainfall amount densities. The majority of models overestimated rainfall patterns over the Upper Nile. However, the 23 models have better skills over the Upper Nile state compared to Trans Nzoia. The CAMS-CSM1-0, KACE-1-0-G, NorESM2-MM, HadGEM3-GC31-MM and CMCC-CM2-HR4 are the most performed models over Upper Nile in South Sudan.

Fig. 10
figure 10

Cumulative Distribution Function comparing each 23 CMIP6 simulation against CHIRPS v2.0 area-average of all pixels in Trans Nzaia in Kenya and Upper Nile state in South Sudan at dekadal time-scale reference to1981-2014

Due to variation in models’ performance using 20 continuous, categorical and volumetric indices indexes, scatter plots and CDF. The 23 models ranked based on perfect values of CC, BR2, Pbias, ME, NSE, MAE and RMSE from 1 to 10 regardless of the position of EnsMean in ranking over entire pixels in the IGAD region. 7 continuous metrics have been selected for ranking out of 20 indices based on their consistency in how well they performed as shown in Fig. 8 above. Out of 23 models ranked, the best 10 performance models during MAM and JJAS seasons were selected from 13 models-maintained appearances in ranking. From this, the best 10 ranked models based on perfects scored values of CC, BR2, Pbias, ME, NSE, MAE and RMSE are INM-CM5-0, CMCC-CM2-HR4, HadGEM3-GC31-MM, IPSL-CM6A-LR, TaiESM1, IITM-ESM, KIOST-ESM, GFDL-ESM4, CanESM5 and BCC-CSM2-MR during MAM (Fig. 11a). The best 10 perfect scored models during JJAS are EC-Earth3, INM-CM5-0, HadGEM3-GC31-MM, CMCC-CM2-HR4, KACE-1-0-G, IPSL-CM6A-LR, NorESM2-MM, GFDL-ESM4, MRI-ESM2-0 and BCC-CSM2-MR (Fig. 11b). Makula and Zhou [10] also found the IPSL-CM6A-LR, BCC- CSM2-MR, NorESM2-LM, and CMCC-CM2-SR5 outperform the other individual models during OND and MAM. Considering the patterns of spatial correlation, Percent of Bias, RMSE, coloured code picture, scatter and CDF plots and spatial patterns of other 17 indices computed in this study relative CHIRPS v2.0 as reference data, we can conclude that the INM-CM5-0, HadGEM3-GC31-MM, CMCC-CM2-HR4, IPSL-CM6A-LR, KACE-1–0-G, EC-Earth3, NorESM2-MM and GFDL-ESM4, TaiESM1 and KIOST-ESM are most ranked 10 models during MAM and JJAS season over the IGAD region. Therefore, they were selected to compute EnsMean which will be used in computing future patterns of changes and variability in extreme events linked to food security over the IGAD region of Eastern Africa. The next stages in this study will be to incorporate our findings in controlled trials to models with a resolution hierarchy especially than 1.5° coarse horizontal resolution, and to investigate the mechanisms causing low correlation in ASALs, high bias, and RMSE in the majority of CMIP6 models.

Fig. 11
figure 11

Summary of ranking the best 10 performances out of 23 CMIP 6 models with respect to CHIRPS v.2.0 based on 7 continuous indexes values over the Entire IGAD region during a MAM and b JJAS seasons

4 Conclusions

This paper documents the performances of the 23 CMIP6 GC Ms' historical rainfall simulations over the IGAD region of Eastern Africa using 20 continuous, categorical and Volumatic statistical metrics, scatter and CDF plots with respect to CHIRPS-2.0 as reference datasets. The results showed the majority of CMIP6 GCMs individual models, EnsMean successfully reproduced the bi-model rainfall regime, and spatial climatological rainfall pattern of MAM and JJAS total rainfall. The majority of models considered in this study overestimated rainfall amounts, consistent biases are evident across CMIP6 models. The EnsMean generally provides a better representation of observed rainfall patterns compared to individual models when considering all metrics and seasons. Most models scored the highest skills over the highlands of western Ethiopia, most counties in South Sudan, and southern parts of Sudan. The lowest model’s skill is observed over ASALs in Ethiopia, Kenya and Somalia. The CNRM-CM6-1HR is considered the most model under-estimated total rainfall, highest negative bias and RMSE values. The 20 continuous, categorical and Volumatic, colored code portrait, scatter and CDF plots of 23 CMIP6 GCMs historical simulations show variation within one individual category of metrics. This suggested it is not sufficient to conclude the best performed models based on limited indexes or individual indexes. Based on results from 20 continuous metrics, categorical and Volumatic indices, coloured code portrait, scatter and CDF plots, the INM-CM5-0, HadGEM3-GC31-MM, CMCC-CM2-HR4, IPSL-CM6A-LR, KACE-1-0-G, EC-Earth3, NorESM2-MM and GFDL-ESM4, TaiESM1 and KIOST-ESM are best 10 performance models over IGAD region of Eastern Africa. The main contribution of this work is the employment of several continuous metrics, categorical and Volumatic indices, coloured code picture, scatter and CDF plots to conclude on the top performing CMIP6 GCMs rather than individual statistical measurements. Future work might include projecting wet and dry spell patterns under Shared Socioeconomic Pathway (SSP) scenarios utilizing a Multi-model Ensemble (MME) of the best 10 models chosen in this study.