Selecting regional climate models based on their skill could give more credible precipitation projections over the complex Southeast Asia region

This study focuses on future seasonal changes in daily precipitation using Regional Climate Models (RCMs) from the Coordinated Regional Climate Downscaling Experiments-Southeast Asia ensemble (CORDEX-SEA). Projections using this RCM ensemble generally show a larger inter-model spread in winter than in summer, with higher significance and model agreement in summer over most land areas. We evaluate how well the RCMs simulate climatological precipitation using two skill metrics. To extract reliable projections, two sub-ensembles of ‘better’ and ‘worse’ performing models are selected and their respective projections compared. We find projected intensification of summer precipitation over northern SEA, which is robust across RCMs. On the contrary, in the southern part of SEA, the ‘worse’ ensemble projects a significant and widespread decrease in summer rainfall intensity whereas a slight intensification is projected by the ‘better’ ensemble. Further exploration of inter-model differences in future changes reveals that these are mainly explained by changes in moisture supply from large-scale sources (i.e., moisture convergence) with enhanced effects from local sources (i.e., evapotranspiration). The ‘worse’ models project greater changes in atmospheric circulation compared with the ‘better’ models, which can explain part of the uncertainty in projections for daily precipitation over the CORDEX-SEA domain. Hence, our findings might help assess more reliable projections over the SEA region by selecting models based on a two-step model evaluation: the ability of models to simulate historical daily precipitation and their performance in reproducing key physical processes of the regional climate.


Introduction
The Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC 2021) highlighted the continuous intensification of the water cycle under global warming, with monsoon precipitation projected to increase over many regions, including Southeast Asia . Despite slightly lower future warming over Southeast Asia compared with the global average, the region has been identified as one of the most vulnerable regions to climate change impacts due to its high exposure and low resilience [SREX Report (IPCC 2012)]. Therefore, providing robust future climate information is crucial for evaluating the likely impacts and possible adaptation pathways over the region.
Precipitation over Southeast Asia is strongly influenced by the complex interaction between precipitation systems [e.g., Asian-Australian monsoon systems (Chang et al. 2005;Robertson et al. 2011)] and local topography conditions of the numerous islands of different sizes and orography. Therefore, simulating precipitation and its changes over the region can be challenging since any slight changes in atmospheric conditions like wind can often cause significant changes in the characteristics of local precipitation (Juneng et al. 2016).
Global Climate Models (GCMs) have been used to simulate the climatological distribution and projected changes in daily mean and extreme precipitation over the region (Ge et al. 2021(Ge et al. , 2019Giorgi et al. 2019;Kang et al. 2019). However, with their relatively horizontal coarse resolution (from 100 to 240 km), GCMs poorly describe the complex coastlines and terrain of Southeast Asia, and thus hardly reflect the detailed characteristics of rainfall over this region (Love et al. 2011). Ongoing initiatives have been conducted on the dynamical downscaling of GCMs using Regional Climate Models (RCMs), such as within the framework of the Coordinated Regional Climate Downscaling Experiment (CORDEX) over the Southeast Asia domain (CORDEX-SEA). With a 25 km horizontal atmospheric resolution, CORDEX-SEA RCMs have been widely used to provide regional climate information and climate change scenarios at finer scales than those provided by GCMs Tangang et al. 2020Tangang et al. , 2019Tangang et al. , 2018Villafuerte II et al. 2020). In this study, we use the CORDEX-SEA RCM simulations to further assess the future changes in daily precipitation over the land of SEA.
Evaluating climate models, either global or regional, is the first important step before assessing their projected future changes. The performance of both GCMs and RCMs in Southeast Asia has been commonly evaluated and compared based on statistical measures (e.g., root mean square errors, correlation, biases, or standard deviation) (Nguyen et al. 2022;Tangang et al. 2020). By using the multi-model mean (MMM) approach, Tangang et al. (2020) indicate that RCMs show limited advantages in displaying climatological precipitation compared with their forcing GCMs over high orographic regions. On the other hand, Nguyen et al. (2022) considers the performance of each RCM-GCM pair and indicates more intense precipitation in RCMs compared with their forcing GCMs. In addition, the precipitation climatology in the CORDEX-SEA RCM MMM is likely a result of wet and dry biases from individual models canceling each other out. Therefore, it is crucial to assess whether these RCM biases can affect the projected change in daily precipitation over the complex Southeast Asia region. However, a model that performs well in the present is not guaranteed to represent an accurate climate in the future (Jun et al. 2008;Knutti et al. 2010;Schaller et al. 2011). Some studies agree that there are advantages to selecting GCMs to reduce the uncertainties in projected precipitation over some regions [e.g. Perkins and Pitman (2009) and Smith and Chandler (2010) over different sub-regions of Australia]. To date, limited studies over SEA assess whether future projections are unbiased to model selection and/or whether some models give more plausible projections than others. Here, we conduct a thorough model evaluation and suggest a selection of the 'better' and 'worse' models for seasonal daily precipitation. We then evaluate the differences in projections between these models and assess how they compare with the projections from "ALL" available simulations, or, in other words, with the CORDEX-SEA MMM. Our final objective is to understand the differences that may arise among the different models by investigating the physical mechanisms responsible for the changes in each model.
Projections given in the MMM of both GCMs and RCMs highlight a widespread precipitation intensification across most regions of Southeast Asia except the Maritime Continent (Hamed et al. 2022;Supharatid et al. 2021;Tangang et al. 2020). Mean or total precipitation in the most of the land area over the Maritime Continent is projected to decrease in the majority of the models, but the drying signals are usually not statistically significant (Villafuerte II et al. 2020). The chapter 8 (Douville et al. 2021) of AR6 (IPCC 2021) also mentions a drying trend over the Maritime Continent with medium confidence. However, the examination of individual CORDEX-SEA RCM projections for daily mean precipitation also highlights prominent inter-model differences, revealing a strong degree of uncertainty in the local and regional response to global warming (Tangang et al. 2020). Therefore, we further investigate the projected changes in daily mean rainfall and the underlying physical mechanisms associated with their uncertainties over different sub-regions of SEA, and over the Maritime Continent in particular.
The paper is structured as follows: Sect. 2 introduces the data and methodology used in this paper. Results are then presented in two subsections: Sect. 3a focuses on changes in seasonal daily precipitation over the SEA region and subregions of interest, and Sect. 3b assesses the mechanisms responsible for the projected changes. Finally, we end with a discussion of our results (Sect. 4) and a summary of the main conclusion remarks in Sect. 5.

Regional climate outputs: CORDEX-SEA simulations
In this study, we use the daily model outputs (precipitation, evaporation, 850 hPa, wind velocity, and specific humidity) from 8 simulations of the CORDEX-SEA project (Im et al. 2021;Tangang et al. 2020). The CORDEX-SEA simulations were forced from the first ensemble realization (r1i1p1) of GCMs from the Coupled Model Intercomparison Project Phase 5 (CMIP5) and were run at different grid numbers (from 182 grid points in latitude × 250 grid points in longitude to 189 grid points in latitude × 335 grid points in longitude: Table 1). We use the common grid number (182 grid points in latitude × 250 grid points in longitude) that cover over the southeast Asian domain (90 • N-145 • E, 15 • S-25 • N). Note that we do not consider the RegCM4-3's simulations because they have a "much wetter bias" compared with observations and with other simulations (Nguyen et al. 2022). Instead, we utilize the new generation (i.e., version 4.7) of RegCM4. Quantile-quantile (Q-Q) plots of daily regionally-averaged precipitation illustrate the comparison between two generations of RegCM has been conducted (Figs. s1 and s2 for summer and winter, respectively) and how they compare to three observational products (APH-RODITE, REGEN_ALL, and CHIRPSv2). Results indicate generally better performance (i.e., closer to observational references and other RCM simulations) for RegCM4-7 compared to RegCM4-3, giving further confidence in using the version RegCM4-7 here. The RCA4 and REMO2015 simulations belong to the CORDEX Phase 1 experiments (Giorgi and Jr., 2015) which are the first to downscale a number of CMIP5 GCMs for model evaluation and climate projection stream over 14 regional domains, including the Southeast Asia region [Southeast Asia Regional Climate Downscaling-SEACLID (Tangang et al. 2020)]. The RegCM4-7 simulations over Southeast Asia (Im et al. 2021) are part of the second phase of the CORDEX-CORE experiments. This CORDEX-CORE exercise is conducted over different areas of the world and provides homogeneous downscale ensembles at 0.22-degree resolution to assess the consistency of climate change responses regionally ).

Observations and reanalysis
To evaluate the performance of the CORDEX-SEA models in simulating climatological  precipitation, we use different observational datasets to estimate the uncertainties associated with each type of product (Nguyen et al. 2020b). The selected datasets consist of the regional Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation of water resources (APHRODITE) with 0.5° × 0.5° resolution (version v1101) (Yatagai et al. 2012), Rainfall Estimates on a Gridded Network with 1° × 1° resolution [REGEN version Allstns V1 2019 , and Climate Hazards Group InfraRed Precipitation with Station at 0.25 × 0.25 resolution [CHIRPS version 2.0 (Funk et al. 2015)] dataset. These observational products are selected and interpolated to the common 1-degree grid using the conservative area-weighted method using Climate Data Operators (CDO). These datasets have been chosen because they have sufficient coverage of at least 24 years of climatological period  and show relatively high consistency in representing daily precipitation and extremes over the domain of interest (90-145E, 15S-25N) (Nguyen et al. 2020b).
To study the atmospheric circulation characteristics that prevail over Southeast Asia, the ERA5 reanalysis (Hersbach et al. 2020) is used for total daily precipitation, and evaporation, and also for atmospheric variables such as horizontal wind and specific humidity at 850 hPa level. To make a fair comparison between RCMs and observations, all observations and RCM simulations are interpolated into the moderate resolution grid (1° × 1°) of observational products. Meanwhile, all projected analyses are conducted at the original resolution (e.g., 0.22° × 0.22°) of RCMs to get the finest information of future projections. Note that almost all analyses (except analyses on atmospheric circulation) in this study focus on land-only since observational datasets used do not cover the ocean.

Changes in precipitation
We consider seasonal relative changes in daily precipitation between the late twenty-first century  and historical  periods. The statistical significance of future changes in precipitation is tested using the Mann-Whitney U test assuming the non-normality of precipitation (alpha = 0.1). The visualization of the ensemble mean changes for each grid cell are displayed following the classification of (Tebaldi et al. 2011) which highlights three types of changes in the model ensemble: a. significant changes are marked with stippling where at least half of RCMs present a significant change and at least 75% of RCMs' significant projections agree on the direction of future changes. b. non-significant statistical changes are shown in color only where less than 50% of the models have significant changes. c. significant disagreeing changes are shown in white where at least half of the models have significant changes and less than 75% of RCMs' significant projections agree on the sign of future changes.

Simplified moisture budget analysis and associated physical mechanisms
One major objective of this study is to understand intermodel differences in precipitation changes by investigating the physical mechanisms responsible for simulated changes. The atmospheric moisture budget is analyzed following Seager et al. (2010) and Endo and Kitoh (2014). Over a long time period (over 1 year), the primary balance is between atmospheric moisture divergence, precipitation, and evaporation (Goergen and Kollet 2021). Therefore, in this study, we consider a simplified atmospheric water budget or balance for a control volume of the RCM model domain, i.e., all land grid points, can be expressed as: Rainfall changes can be separated into two terms based on a linearized moisture budget equation: a term related to P − E = −DivQ moisture flux convergence (-DivQ) which includes atmospheric moisture content and atmospheric mean circulation changes, and a term related to surface evaporation changes (E).

Model evaluation of climatological daily precipitation
Before examining projected future changes, we evaluate the CORDEX-SEA ensemble's ability to simulate climatological daily precipitation (over the 1982-2005 period) based on a comparison with three observational products. We first focus on the spatial distribution of seasonal daily mean precipitation [Figs. 1 and 2, for boreal summer (June-July-August-September; JJAS) and winter (December-January-February; DJF), respectively]. In general, RCMs simulate spatial contrasts well (i.e., the north-tosouth, dry-to-wet gradient) and the seasonal shift (i.e., high and low precipitation are depicted over the mainland during JJAS and DJF respectively) of observed rainfall over the region. Most simulations of regionally-averaged precipitation in both seasons are consistently wetter than the observational references (i.e., all models overestimate the regional mean relative to APHRODITE and 6 out of 8 models relative to REGEN_ALL and CHIRPsv2 during summer, Fig. 1). The inter-product spread is higher among models (i.e., summer regionally-averaged precipitation ranging from 5.75 to 11.01 mm day −1 , Fig. 1) than among observations (from 5.25 to 7.14 mm day −1 ). The inter-model spread is also greater during boreal summer than boreal winter. Focusing on the different quantiles of the daily precipitation distribution, we find considerable differences among observations and between observations and RCMs. To illustrate this point, Quantile-Quantile (Q-Q) plots display quantiles of the regionally-averaged daily precipitation distribution in summer and winter (Fig. 3) in all CORDEX-SEA RCM simulations (Table 1), and in the observational products REGEN_ALL, and CHIRPSv2. Both these products and the models' quantiles are compared to those of APH-RODITE which is taken as our reference dataset (Nguyen et al. 2022). Figure 3 reveals some interesting features. First, APHRODITE consistently shows the lowest quantile values among the observational products while REGEN_ALL and CHIRPSv2 are quite similar to each other. Second, from the 50 th percentile, all models consistently show higher estimates of precipitation compared with APHRODITE. Models generally have higher precipitation amounts compared ΔP = ΔE − ΔDivQ to observations except for RCA4 simulations forced by CNRM-CM5 and HadGEM2-ES that have lower estimates than REGEN_ALL and CHIRPv2. Third, models show more diversity in estimates of quantiles compared to observations, and in particular for the highest quantiles (greater than the 99 th percentile). The Area Score Metric (ASM, i.e., a measure of the distance between one distribution to the observed reference distribution, here APHRODITE; see inserted numbers on Fig. 3) indicates that RCA4_CNRM-CM5 and RegCM4-7_NorESM1-M have the lowest ASM values, indicating that their distribution is closer to that of APH-RODITE compared with other models i.e., RCA4_CNRM-CM5, RegCM4-7_NorESM1-M. Meanwhile, REMO2015 forced by HadGEM2-ES and MPI-ESM-LR, RegCM4-7_ HadGEM2-ES have the highest ASM values indicating that their distributions are far away from that of APHRODITE. The above findings are consistent with Nguyen et al. (2022).

Model performance and grouping of models
Given the somewhat substantial inter-model differences in simulating the mean and different intervals of the precipitation distribution, we further examine model skill (Fig. 4). Model performance is evaluated using the Root Mean Square Errors (RMSE) and the ASM, and the eight individual simulations are then ranked from the 'better' (1) to the 'worse' (8). This ranking is conducted three times, with regard to each observational dataset, and is displayed in Fig. 4 for summer and winter separately. The RMSE is calculated based on 1982-2005 climatological precipitation, which allows for assessing the similarity in terms of rainfall mean intensity distribution while the ASM integrates the differences between simulated and observed data or the whole distribution. Overall, model performance is sensitive to the considered metric and season (Fig. 4). For example, REMO2015_HadGEM lies in the middle of the model range when the whole precipitation distribution is considered (ranked 2, 3, or 4 based on ASM and compared to the three observational products in summer) but shows the 'worse' skill in terms of mean precipitation (ranked 8 based on RMSE for all observations). Overall, no individual model can be identified as the 'best' based on both considered metrics or both seasons. Instead, we can extract groups of models that are generally better or worse considering both metrics, for a particular season. For example, RCA4 simulations usually have better scores  Table 1; (d)-(k)] during the climatological period of 1982-2005. The inserted number indicates the regionally-averaged seasonal mean of daily precipitation. All datasets are considered at a grid of 1° × 1° degree of resolution 1 3 while REMO2015 simulation forced by MPI-ESM-LR shows lower skill in summer.
One major objective of this study is to investigate how model biases can affect projected future changes in seasonal daily precipitation. Therefore, two sub-ensembles of the 'better' and 'worse' performing models are selected according to their RMSE and ASM values, for both seasons separately. For the individual season, a model is classed in the 'better' category if it ranks from 1 to 5 in both of the two considered metrics, irrespective of the reference observations. This range of ranking was chosen to balance the aims of including only models with demonstrated skill and the need to have a reasonable sample size in each group. The members of each group are presented in Fig. 4, with a group of 4 'better' simulations and the 4 'worse' simulations are different between summer and winter. For summer, all RCA4 simulations, RCMs forced by NorESM1-M belong to the 'better' ensemble since they show advanced skill in both metrics while RegCM4-7 and REMO2015 forced by MPI-ESM-MR and HadGEM-ES are classified in the 'worse' ensemble due to their lower skill (ranked from 6 to 8 as shown in Fig. 4a). Meanwhile, RCA4_CNRM-CM5 and all RegCM4-7 are categorized in the 'better' group while RCA4_HadGEM2-ES and all REMO2015 simulations are in the 'worse' group due to their lower skill.

Future changes in seasonal daily mean precipitation
We now investigate spatial patterns of late twenty-first-century projections of seasonal daily mean precipitation relative to the 1976-2005 historical period under a high emission scenario (RCP8.5) [ Fig. 5 for summer, and winter]. We compare the projected change in the three ensembles: ALL, 'better', and 'worse' (Fig. 5). We first focus on the summer season, in which we observe a larger inter-model spread in simulating historical rainfall (Figs. 5a-c). We find a robust and significant intensification in precipitation over Indochina in all three ensembles. This intensification in summer daily precipitation over the northern part of Southeast Asia (e.g., Indochina and Northern Philippines; see subdomain R1 in Fig. 5) is consistently highlighted in GCMs projections (Villafuerte II et al. 2020) and RCMs projections (Tangang et al. 2020(Tangang et al. , 2019. Future changes in summer daily mean precipitation over Indochina tend to be larger in the 'better' ensemble mean compared with the 'worse' ensemble mean (23.9% and 19.4% respectively; Fig. 5b, c). This is also associated with the weaker model agreement (i.e., white areas) in the 'worse' ensemble compared to the 'better', especially over southern parts of the mainland (i.e., Cambodia, Northern Thailand). On the contrary, models from the 'worse' ensemble seem to provide the main contribution to the robust and significant intensification found over the few grid cells in the Northern Philippines in the ALL ensemble ( Fig. 5a, b, c).
We then focus on the southern parts of Southeast Asia (e.g., Maritime Continent, Southern Philippines, and Papua; see subdomain R2 in Fig. 5) and find that the projections of summer rainfall vary substantially (both in magnitude and direction of change) across the 'worse' and 'better' ensembles. While the 'worse' ensemble projections indicate a robust significant drying trend over these regions by the end of the twenty-first century, the 'better' ensemble projects an increase in precipitation that is not significant (Fig. 5b). The projections in the ALL-ensemble result in a mixed signal from these two different patterns of the 'better' and ALL ensembles, with the strongest drying of the 'worse' ensemble imprinting on ALL projections. This clearly illustrates how the use of the CORDEX-SEA ensemble mean might not be the most relevant to assess future changes in such cases where the individual RCM projections are so different from model to model. In this case, based on the performance of RCMs in simulating historical daily precipitation, we can conclude that the RCM projections from the 'better' ensemble are likely more relevant in this particular subregion in summer.
Winter projections in the three ensembles show somewhat different results from those of summer ( Fig. 5d-f). First, future winter changes in rainfall are predicted to have a weaker model agreement (larger white areas in Fig. 5d-f; see also Sect. 2.3), compared to those of summer. This highlights a larger inter-model spread during winter, irrespective of model ensemble and sub-regions. Second, there are some changes projected though they are not significant for most of the mainland (most in colors but no hatching). Third, there are some land grid cells in both northern and southern parts of SEA which are associated with robust and significant changes. Interestingly, a comparison between 'better' and 'worse' ensembles indicates the same results as that for summer with generally similar projections in all three ensembles of the northern part but contrasting results between 'better' and 'worse' in the southern part. As a result, the projection of the southern part of SEA using ALL ensemble might not be applicable over this region.

Future changes in sub-regional precipitation
Our results demonstrate the substantial differences in the late twenty-first-century projections of seasonal daily precipitation among three considered ensembles and how model agreement varies from region to region. Therefore, we further diagnose how changes in seasonal daily precipitation are simulated by the CORDEX-SEA RCMs over two subregions of interest: northern and southern Southeast Asia.
Since the robustness of summer daily precipitation projection is more sensitive to the region considered compared to that of winter, we focus primarily on boreal summer (JJAS) for brevity and provide equivalent results for winter (DJF) in Supplementary material (Figure s3-6).

Northern parts of Southeast Asia
As previously noted, a significant and robust increase in summer mean daily precipitation is found over northern Southeast Asia in all model ensembles irrespective of the model performance in simulating historical precipitation (Fig. 6a-c). Most of models project a widespread future intensification across the whole sub-region, with the exception of mountainous areas in central Vietnam, where little  (Table 1) based on their ranking regarding RMSE and ASM metrics for (a) Summer (June-September, JJAS) and (b) Winter (December-February, DJF) with 1 indicating the best model performance and 8 indicating the worst model performance. The RMSE is calculated based on 1982-2005 climatological precipitation, which allows for assessing the similarity in terms of rainfall mean intensity distribution while the ASM integrates the differences between simulated and observed data or the whole distribution following the trapezium rule insignificant areas are shown in color, denoting that less half of the models are significantly changed. In significant agreeing areas (stippled), at least half of RCMs are significant biases and at least 75% of the significant model agree on the sign of biases. Significant disagreement areas are shown in white. The inserted number indicates the regionally-averaged precipitation changes over the land points of the sub-region 1 3 change and/or a decrease in rainfall intensity is projected by some models (REMO2015 forced by NorESM1-M and HadGEM2-ES, RegCM4-7 forced by HadGEM2-ES and MPI-ESM-MR, Fig. 6h-k). Although there is a general agreement between RCMs on the sign of the change, this intensity of the change varies substantially across models (from 13.9% to 45.9% on average over the domain; Fig. 6). This inter-model spread in intensity changes is of similar amplitude in the 'better' and 'worse' ensembles (ranging from 14.9% to 45.9% and from 13.9% to 30.6% respectively).
Projected changes in boreal winter indicate contrasted projections within the domain (north-wetting, south-drying, Figure s3a-c). In addition, CORDEX-SEA simulations show a wide range of changes from negative to positive (ranging from − 20.4% to 56.5%, Fig. s3d-k) across models. This can be explained by the seasonal contrast (e.g., summerwet, winter-dry, Fig. 1 and 2) of climatological precipitation over the region. The northern part of SEA receives less precipitation during DJF. Therefore, any small changes in actual precipitation amount during winter can lead to large relative changes.
We further investigate the inter-model differences in future change by comparing the contribution of evaporation and moisture convergence to regionally-averaged total precipitation over land, and this contribution changes in the historical  and far-future (2070-2099) simulated climates. To that end, we follow the simplified moisture budget analyses of Goergen and Kollet (2021) where precipitation originates from a local source of moisture (evapotranspiration, E) or from a large-scale source of moisture (moisture convergence, P-E). This framework enables a direct comparison of these two components over a considered region to help understand the differences in simulated precipitation amounts. This analysis is conducted within each model as well as in the ERA5 reanalysis over the 1979-2005 period, which is the longest temporal coverage available for this dataset. Figure 7 highlights again a significant (red asterisk) intensification in summer rainfall averaged over the land points of northern SEA and across all models, irrespective of model performance in simulating historical precipitation. Interestingly, models tend to show that most of the regionally-averaged summer precipitation comes from large-scale sources of moisture although the ratio of this contribution varies from model to model. The subsetting into 'better' and 'worse' models does not explain any inter-model differences in this contribution, whereas an obvious grouping by the RCM family is found. RegCM4-7 simulations stand out with  (Table 1), and during the historical period (1976-2005, orange) and the late twenty-first century (2070-2099; blue) over northern parts of Southeast Asia.
The red, purple, and green asterisks indicate the significant differences between the historical and future precipitation, evaporation, and moisture convergence respectively at a 10% level of significance according to Mann-Whitney U-test. The vertical dashed black lines mark reanalysis, 'better' and 'worse' ensembles the lowest estimated proportion of evaporation to total precipitation while RCA4 and REMO2015 simulations have a similar ratio to one another and also compared to the ratio of ERA5. The potential reason behind this is the difference in land surface schemes applied among RCMs (Table 1). This suggests the important role of RCM setup in the resultant quality of RCM simulation in estimating evaporation over the CORDEX-SEA domain.
Focusing on how this ratio might change in the future across models, we find that an increase in moisture convergence from remote moisture sources is significant (green asterisk) and most likely the dominant contribution to this increase in summer rainfall over northern SEA in all models. A slight increase in summer evaporation is also found in the simulations performed with RCA4 and RegCM4-7 but not those of REMO2015 compared with the increase in largescale precipitation and can partly enhance the intensification of rainfall over the region.
The relative contribution of two sources to the regional moisture budget over northern SEA during winter is similar to that in summer for RegCM4-7 simulations but not for RCA4 and REMO2015 simulations (Fig. s4). Focusing on future changes, the figure highlights again the large intermodel differences in winter projected regionally-averaged precipitation over the land points of sub-region, with the majority of models (e.g., 5 out of 8 models) projecting the non-significant changes in precipitation. Although a significant increase (purple asterisk) in evaporation is observed among 6 out of 8 models as a response to future global warming, these changes have a slight impact on changes in future precipitation.

Southern parts of Southeast Asia
We focus on the southern parts of Southeast Asia to better understand the future changes in precipitation over the many islands of complex topography in the region. As previously noted, significant differences in the sign and the robustness of the projections are found between the 'better' and 'worse' ensembles. On average over the region, changes in the 'better' ensemble are very small and around zero (ranging from − 3.6% to 13.1%, Fig. 8e-h), and usually not significant. This ties in with the non-significant wetting trend (color but no hatching) or disagreement in the sign of trend (in white color) in the spatial map of the 'better' ensemble mean mentioned before (Fig. 8b). Meanwhile, the 'worse' simulations consistently indicate a significant decrease in precipitation, ranging from − 21.8% to − 14% (Fig. 8i-l). The widespread changes among 'better' simulations, ranging from positive to negative changes lead to a larger inter-model spread compared with that during the 'worse' ensemble ( Fig. 8). This also highlights the fact that the robust and significant reduction in summer rainfall over the southern part of SEA emerges strongly from the 'worse' models only.
The contrasted projections of winter daily precipitation over the southern part of SEA are observed within both 'better' and 'worse' simulations (Fig. s5). There are only weak changes in the winter precipitation with much less agreement on the sign of changes across models from the same group. As a result, we find some grid points over land associated with robust significant changes across a particular group of models (color and hatching in Fig. s5a-c). However, they are usually small in size, and magnitude compared with that during summer. In addition, there are no "overlap" regions between 'better' and 'worse' simulations. We also note the similarity in the spatial distribution of winter daily precipitation projected by RegCM4-7_NorESM1-M and REMO2015_NorESM1-M despite substantial differences in the magnitude of changes. This feature can be explained by the fact that these two RCMs share the same forcing GCMs (i.e., NorESM1-M) and convective scheme (i.e., Tiekte, Table 1), even though they are categorized into different ensembles. This is in line with the conclusion from Nguyen et al. (2022) which suggested the important role of RCM setup in the CORDEX-SEA domain. Figure 9 compares the relative contribution of local and remote sources to total regionally-averaged precipitation over the land points of southern parts of SEA. We first find the similarity in the ratio of evaporation and moisture convergence with the northern part of SEA, with the obvious grouping by RCM type as mentioned in Sect. 3.4.1. Although RCA4 and REMO2015 compared relatively well with ERA5, they are categorized into different groups. Second, among the 'worse' models which show weaker skill in simulating the historical precipitation, the RegCM4-7 simulations have substantial differences in the ratio of contribution while REMO2015 is quite similar to that of ERA5. Third, in terms of the projection averaged over the subdomain, the figure highlights again the substantial differences in projected actual precipitation between 'better' and 'worse' simulations with only 'worse' simulations projecting a consistently significant decrease in precipitation. However, given the ratio of contribution shows variations among the 'worse' simulations we cannot explain a common mechanism for such drying. In addition, despite the similarity in the historical ratio of contribution, RCA4 and REMO2015 project different changes in the contribution of the largescale source of moisture. In particular, RCA4 projects a slight but non-significant reduction in moisture convergence while there is a significant reduction of precipitation from remote sources among REMO2015 simulations. Changes in large-scale precipitation are larger compared with changes in evaporation over the sub-regions. A slight reduction in local evaporation is predicted, which partly contributed to the drying trend in the 'worse' simulations. During the winter, the main features remain the same as shown in summer. The increase in the boreal winter evaporation is small but more obvious across all simulations over the sub-regions (Figure s6).

Potential links with future changes in atmospheric circulation
Our findings suggest a dominant role of large-scale moisture convergence changes in explaining the future changes in summer mean daily precipitation in both northern and southern SEA. We further investigate these projected changes in large-scale sources of moisture and evaluate potential changes in low-level atmospheric circulation in the COR-DEX-SEA RCMs. Tangang et al. (2020) evaluated the CORDEX-SEA simulations in terms of monsoon circulation, focusing on the multi-model mean of low-level circulation and moisture flux divergence at a single level (i.e., 850 hPa) for two seasons. They find that although the RCM ensemble mean captures the general patterns well compared to ERA5-, MMM RCM tends to be stronger compared to those in the reanalysis. Building on this previous study, we further analyze atmospheric circulation changes within the 8 simulations and compare the changes across the 'better' and 'worse' ensembles too.
We first focus on how well the 'better' and 'worse' ensemble means simulate the low-level circulation pattern in the ERA5 reanalysis (Fig. 10). The summer atmospheric circulation over Southeast Asia is largely modulated by two monsoon systems: the westerlies from the Bay of Bengal into northern parts of Southeast Asia, including the mainland and the northern Philippines (along 10°N); and the easterlies from Australia to the Maritime Continent and Papua (Fig. 10a). Overall, the 'better' ensemble means can capture relatively well the structure and intensity of the wind compared to ERA5, whereas the ALL and 'worse' ensemble means are generally stronger in magnitude. In particular, the westerly component in the 'worse' RCMs tends to be stronger over the Bay of Bengal and the mainland of SEA, and imprints on the ALL ensemble mean. Focusing on individual RCMs now (Fig. 10e-i), we find that RCA4_HadGEM2 stands out in the 'better' ensemble with a much stronger magnitude of westerlies into the mainland compared with other 'better' simulations. This highlights the limitation of ranking models using statistic-based metrics only. The same conclusion can be drawn for the RCM's performance in simulating the ERA5 atmospheric circulation during winter (Fig. s7). In particular, RCMs tend to overestimate the easterlies components to the northern parts of Southeast Asia compared with ERA5.
In order to explore the mechanisms responsible for the summer daily precipitation changes over Southeast Asia, we now explore the changes in the low-level atmospheric circulation. The 'better' simulations have a similar magnitude of changes although there is a slight difference in the direction of the wind. Consistently anomalous south-westerlies prevail in all simulations, leading to enhanced monsoonal winds Fig. 10 Spatial distribution of the climatological  lowlevel wind circulation during the summer (JJAS) (vector) in the ERA5 reanalysis (a), the ALL, 'better' and 'worse' ensembles (b-d) and for all individual RCM simulations listed in Table 1. All analyses are considered at 1-degree resolution. Shading indicates the magnitude of wind (in mm s −1 ) and can explain some of the robust increase in summer rainfall across Indochina and the northern Philippines. Over the southern parts of SEA, the reversed direction of anomalous summer winds over the equator to the Maritime Continent (compared to the climatological wind; Fig. 10a) are indicated for RegCM4-7_HadGEM2-ES and RegCM4-7_MPI-ESM-MR (Fig. 11h-i), implying the weakening of circulation in these two models which belong to the 'worse' ensemble. Such weakening circulation is also found in the westerlies affecting the southern Philippines in the majority of the 'worse' simulations ( Fig. 11h-j) except REMO2015 forced by MPI-ESM-MR, in which there was an anomalous prevalence of westerlies. Overall, the patterns of changes in the ALL, 'better', and 'worse' ensemble means as well as within each individual model are consistent with the patterns of precipitation change identified earlier: a significant drying signal of precipitation among the 'worse' models which cannot be seen in the 'better' models, stronger changes in low-level atmospheric circulation in southern parts of SEA are found in the 'worse' MMM compared to the 'better' MMM which is in line with the stronger signal of changes in precipitation we mentioned previously over the southern parts of SEA.
During winter, there is a slight change in the prevailing atmospheric circulation over two sub-regions, with large inconsistencies across models, irrespective of whether they belong to the 'better' or 'worse' ensembles (Fig. s8). These non-significant changes in circulation do not help to explain part of the changes in winter precipitation mentioned in Sect. 3.4. Therefore, the subsetting into 'better' and 'worse' ensembles does not really make sense in the winter case.

Offsets and enhancing effects from evaporation
In Sect. 3.4, we suggest that the increase in evaporation due to increased surface temperature might have different effects on seasonal daily precipitation (i.e., offsetting or enhancing effects depending on sub-regions). Therefore, we further analyze the future changes in surface evaporation among ALL, 'better', and 'worse' ensembles and individual models.
We firstly focus on the model's and ensemble's ability to simulate the spatial patterns of summer evaporation over the land of SEA as represented by the ERA5 reanalysis (Fig. 12). Southeast Asia is characterized by high evaporation across the whole region during the summer (JJAS, Fig. 12a). In terms of regionally-averaged evaporation, three considered Fig. 11 The projected changes in the low-level wind (850 hPa) for the late twenty-first century over three model ensembles: ALL, 'better' and 'worse' (a-c) and for individual simulations (d-k, Table 1) relative to the historical period  for the boreal summer (JJAS). The shading indicates the magnitude of changes (in mm s −1 ) ensemble means underestimate the amount of evaporation from ERA5. This is due to the striking evaporation estimated from RegCM4-7. In particular, the RegCM4-7 simulations stand out with a systematic bias and a clear underestimation compared to ERA5 and other simulations (Fig. 12h-j). RCA4 simulations' climatology in evaporation is the closest to ERA5 with a regionally-averaged value close to ERA5's value, and REMO2015 simulations tend to slightly overestimate evaporation over SEA in summer.
During the winter, the observed spatial distribution of evaporation in ERA5 shows the north-south gradient of evaporation, reflecting the distribution of temperature over the region. In particular, high evaporation is depicted over the southern part of SEA while the northern part of SEA exhibited low evaporation due to low temperature (Fig. s9a). The 'worse' MME simulates the spatial contrast in evaporation better than that in 'better' or ALL. This is due to the fact that the large differences across the different RCM families generally hold for winter. In particular, RegCM4-7 simulations fail to capture the seasonal contrasts and the regional differences in climatological evaporation (Fig. s9). It is clear that the ability of the model in estimating evaporation is sensitive to the type of RCMs. This reveals model difficulties in capturing land-atmosphere feedbacks which has been illustrated in previous studies Goergen and Kollet 2021).
Focusing on the simulated changes in evaporation now, we find that different types of RCM tend to project different spatial distributions of future changes in evaporation. In particular, RegCM4-7 simulations show the robust and significant exacerbation of evaporation across all land grid cells of the SEA domain while there is a slight or significant decrease of evaporation over the southern parts of SEA over the RCA4 and REMO2015 simulations respectively. This is linked with the different RCMs' ability in simulating the climatological evaporation in ERA5 we mentioned before. Interestingly, projections for summer daily evaporation generally display somewhat common features with those from projections for daily mean precipitation in the three ensemble means (Figs. 13a-c and 5a-c). For example, evaporation is projected to increase mainly over northern parts of Southeast Asia across all models although the robustness is pronounced across more land grid cells in the 'better' models rather than in  Table 1 (b-l). The inserted number indicates the regionally-averaged seasonal mean of daily evaporation over the land points of SEA the 'worse' models. This helps to enhance the intensification of summer daily precipitation predicted over the northern part of SEA.
Interestingly, changes in winter evaporation show more consistency across the models (Figure s10), with smaller areas shown in white color compared with maps of changes in precipitation shown in Fig. 5d-f. Most models projected an increase in evaporation over northern parts of Southeast Asia with the exceptions of RCA4_CNRM-CM5 and REMO2015_MPI-ESM-MR which projected a slight decrease, but these changes are non-significant. Despite the stronger robustness across different model ensembles in terms of surface evaporation, the changes in precipitation still show weaker model agreement, highlighting the important role of large-scale circulation in determining mean precipitation changes.

Discussion
Fundamental to this study is the hypothesis that a model that can simulate climatological precipitation well is more likely to be able to produce more credible projections of future precipitation. While it is reasonable to argue that model skill in the present is not likely to hold in the future, we see no evidence in previous literature that demonstrates that a model that has a weak skill in simulating climatology can be superior to others in the future. In addition, investigating the inter-model differences of future changes shows that selecting models based on their skill might help to reduce the model uncertainties not only in magnitude but also in the sign of future changes over the CORDEX-SEA domain. In particular, our results indicate that the projected summer precipitation over southern parts of SEA is very sensitive to model selection. A robust and significant drying trend is found in southern SEA on average over all CORDEX-SEA simulations, but we highlight here that this decrease in precipitation intensity mainly comes from a group of models that have worse performance compared with others (i.e., 'worse' models). This suggests the important role of model biases in simulating climatological precipitation over the sub-region of SEA so that considering all available model simulations might not give the most relevant projection over there.
During the winter, we have shown much less robust future changes in daily precipitation simulated by CORDEX-SEA. Most land areas show weak model agreement (i.e., white areas shown in Fig. 5). This is explained by a larger range of changes (from negative to positive) averaged across the whole region. Our findings are somewhat different from the conclusion of Tangang et al. (2020) who mentioned a tendency toward an intensification over Indochina and the eastern Philippines and a reduction of rainfall over the Maritime Continent. This can partly be explained by the differences in the set of simulations considered. Indeed, they used all available CORDEX-SEA simulations, including six RegCM4-3 simulations (compared to the three newest generation RegCM4-7 simulations that we use here) which consistently projected a significant drying trend over the Maritime Continent. These simulations are associated with wetter biases compared with other observational references and other simulations (Ngo-Duc et al. 2017;Nguyen et al. 2022;Tangang et al. 2020), including the RegCM4-7 simulations (Figs. s1 and s2).
Further analysis of the projection of seasonal wettest day (Rx1day) shows similar results in terms of model agreement compared with changes in seasonal daily mean precipitation for both seasons (Figs. s11 and s12 for summer and winter respectively). This is somewhat at odds with the high consistency in the Rx1day projection mentioned in previous studies . Our study suggests a framework could be applied for more detailed studies of model biases and future projections of heavy precipitation over the region.
To inform the risk associated with future changes in precipitation under warmer climates, this study focuses on the skill of models in simulating seasonal precipitation over SEA land regions where most of the impact occurs. With SEA located within the tropical climatic zone and containing many islands of varying size, we acknowledge that the adjacent ocean areas have an important role on the sub-region's climate. For example, the large-scale transport of atmospheric moisture from oceans to the land contributes significantly to land surface evaporation over the small islands of SEA (Nguyen et al. 2022). Due to limitations of the observational datasets employed we were unable to extend our analysis to oceanic precipitation. For this reason, we also assess ERA5 reanalysis in order to investigate the relative performance of RCMs in simulating seasonal precipitation over land-only, ocean-only and both ocean and land (Tables 2 and 3 for summer and winter respectively). The results reveal the sensitivity of model ranking to reference datasets, seasons and cases. Interestingly, the grouping of "better" and "worse" simulations is similar across all considered cases during the boreal summer and remains the same when APHRO-DITE, CHIRPSv2 and REGEN_ALL are taken as references (Fig. 3a). Categorizing model performance during the winter is more complicated with only RCA4_CNRM-CM5 and RegCM4-7_NorESM1-M consistently showing "better" performance compared with other simulations, regardless of cases and metrics. This indicates that model biases over the ocean can be different from land, which might affect the grouping of sub-ensembles, notably during the boreal winter when less precipitation is expected. Note that using reanalysis products (e.g., ERA5) as reference datasets is not recommended as they have been demonstrated to have large inter-product differences when estimating precipitation and extremes at global Bador et al. 2020) and regional scales (Nguyen et al. 2020a). Therefore, further evaluation is required on The skill metrics are calculated based on: both ocean and land point, the land points only and the ocean points only over the SEA domain at 1-degree of resolution. The green and yellow colors indicate the "better" and "worse" simulations respectively. See Sect. 3.2 to learn more on how model is categorized Table 3 Same as Table 2 but for winter precipitation. The green and yellow colors indicate the "better" and "worse" simulations respectively what impact multi-model performance has on simulating precipitation over the ocean with respect to the reliability of future projections. The proposed skill metrics in this study focus on seasonal precipitation. We chose RMSE and ASM since these metrics allow us to evaluate model agreement with observations by measuring the similarity in terms of mean precipitation intensity and across different quantiles of the whole precipitation distribution. As a result, those members that produce more precipitation during the considered seasons are classified as "worse" (e.g., REMO2015's simulations during summer). However, the skill criterion is less distinguishable for winter when less rain falls. One reason behind this might be related to the lack of quantity and quality of stations over the region, resulting in large observational uncertainties. In winter, these observational uncertainties can be as large as inter-model differences, leading to the high sensitivities of model ranking to references (Nguyen et al. 2020a). Another possible reason might be related to the resolution of the RCMs used for model evaluation (i.e., 1° × 1°; see Sect. 2.2). Previous studies suggested interpolation to coarser resolution might remove the detailed features of datasets and/or smooth extreme values (Herold et al. 2016), which might affect the precipitation over the many small islands of SEA. Therefore, we perform additional model evaluation in simulating seasonal daily precipitation at 0.25-degree resolution which is close to both the original resolution of ERA5 and the model simulations (Table s1). We find similarity in the grouping of "better" and "worse" models for summer rainfall but with a small difference for the winter case. In particular, there are only 3 simulations categorized in the "better" grouping using the higher resolution simulations. However, the "worse" simulations are similar between the two considered resolutions. This highlights that the coarser resolution of 1 degree that we have used for model evaluation is not the main reason behind the inapplicability of skill metrics found in winter.
Our results reveal somewhat differences in the ability of models to reproduce two components of the moisture budget: evaporation and moisture convergence compared to precipitation. For example, most "worse" summer rainfall simulations (e.g., REMO2015 simulations) can reproduce better regionally-averaged mean evaporation and moisture divergence over land using reanalysis (Figs. 7 and 9) compared to other simulations. Meanwhile, the "better" RegCM4-7_NorESM1 reproduces too little evaporation compared to ERA5. Actually, the seemingly enhanced performance of REMO2015 simulations is likely a result of wet biases (e.g., northeast SEA) and dry biases (e.g., over Cambodia, southern parts of Vietnam, Java) cancelling each other out. In addition, we performed additional model evaluation using the same skill metrics to estimate evaporation and moisture convergence using ERA5 over land only (Tables 4   and 5 for summer and winter respectively). Interestingly, although model ranking for evaporation and moisture convergence differs from precipitation, we still find similarities in the grouping of "better" and "worse" performing models, notably during summer (Table 4), regardless of the variables being evaluated. REMO2015 simulations actually perform "worse" (e.g., ranked from 5 to 8 depending on metrics and variables; Table 4) in simulating the mean intensity of evaporation and moisture convergence. This highlights some limitations with these evaluation metrics and that additional process-related variables should be included in any future model evaluation framework.
Our results suggest that changes in moisture supplies from both local (i.e., evapotranspiration) and large-scale (i.e., moisture convergence) sources are not similarly reproduced across models and therefore contribute to inter-model uncertainties in future changes in precipitation. However, within each RCM, the changes in atmospheric circulation can generally explain the changes in precipitation. Therefore, the inter-model differences in atmospheric circulation changes explain a great part of inter-model differences in projecting seasonal precipitation. This is inline with findings from both Table 4 Ranking of RCM simulations for summer precipitation (JJAS) (pr), evaporation (evspsbl) and moisture convergence (con) based on RMSE and ASM with ERA5 taken as reference. The skill metrics are calculated based on the land points of SEA only at 1-degree resolution. The green and yellow colors indicate the "better" and "worse" simulations respectively  global (Seneviratne et al. 2021;Shepherd 2014;Trenberth et al. 2015) and regional studies [i.e. over India (Pfahl et al. 2017)] as documented in AR6 (IPCC 2021) which identified that dynamic contributions (e.g. from moisture convergence) show large differences across models and are more uncertain than thermodynamic contributions (due to warming). While dynamic response and feedback are important in reoccurring the convective process of precipitation, future studies might focus on improving our confidence in how dynamic changes affect future precipitation.

Conclusions
In this paper, we investigate the late twenty-first century changes in daily precipitation relative to the historical period  under the RCP8.5 scenario and in the COR-DEX-SEA eight-member ensemble. Our aim is to assess how model performance affects the projections in seasonal mean precipitation over the region which is characterized by many islands of complex topography difficult to model. To that end, RCM simulations are first carefully evaluated based on two aspects of precipitation: the seasonal mean state and the daily precipitation distribution. Given the wide range of RCM performance, two sub-ensembles, 'better' and 'worse', are created for each season individually and for the whole of the SEA region. This study is the first attempt to assess how projected seasonal rainfall is affected by model selection by inter-comparing projections between the 'better' and 'worse' ensembles.
Model agreement in future projections of daily precipitation generally varies across the sub-regions and seasons considered. The inter-model spread is generally larger in winter than in summer with much fewer land grid cells where the projected winter changes are robust and significant across models. Focusing on summer projections, we find a robust significant intensification of summer daily mean precipitation over northern parts of Southeast Asia (e.g., Indochina and the northern Philippines), which is robust across RCMs, irrespective of which ensemble a model sits in. On the contrary, southern parts of Southeast Asia (e.g., Maritime Continent, Papua, and the southern Philippines) have been highlighted as regions in which projected summer rainfall is affected by model selection. The 'worse' ensemble projects a significant and widespread decrease in summer rainfall intensity over the majority of land areas whereas a slight intensification is projected by the 'better' ensemble. This indicates that selecting models based on their skill might help to reduce model uncertainties not only in magnitude but also in the sign of future changes over the CORDEX-SEA domain. Our results suggest that considering all available model simulations over some regions and seasons of Southeast Asia may not give the most relevant projections. In other words, careful model evaluation is needed and could lead to more reliable projections at the regional and seasonal scales relevant to the complex Southeast Asia region.
We also explore the underlying reasons for any identified inter-model differences through assessing relative contributions from the local and large-scale source of precipitation and its associated physical mechanisms, which in turn help to have a better understanding of the future uncertainties among models. Further analyses suggest that the future changes in precipitation can be explained by moisture supply changes from both large-scale sources (i.e., moisture convergence) and local sources (i.e., evapotranspiration). However, the inter-model uncertainties in projected daily precipitation are mainly associated with the large inter-model differences in moisture convergence projections which are resumed in a large inter-spread for changes in low-level atmospheric circulation. Meanwhile, despite the smaller changes in evaporation, these changes are significant and can offset or enhance the changes in precipitation, depending on the sub-region considered. Yet, we find very little agreement between models, which makes it hard to conclude generally for the CORDEX-SEA ensemble.
We have limited our studies due to the number of available RCM simulations from CORDEX-SEA. Indeed, the 'better' ensemble for winter contains only four simulations forced by two RCMs (e.g., RCA4 and RegCM4-7) in each sub-ensemble. There are obvious inter-model dependencies in evaporation and monsoon circulation changes over the regions mentioned in Sects. 3.5 and 3.6. Therefore, our method could benefit from being applied to other CORDEX regions with larger numbers of simulations. An additional approach of process-based selections might also be helpful to fully understand the capability of models in capturing regional precipitation and its relevant drivers. We can go further by selecting models based on a two-step model evaluation: the ability of models to simulate historical daily precipitation and their performance in reproducing key physical processes of the regional climate. In that way, we can either select the 'best' (which is probably feasible in the summer case) or remove the 'worst' of the 'worse' models (e.g., in the winter case) to extract more reliable projections over the SEA region.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions. PLN and LVA are supported by the Australian Research Council grant FT210100459. PLN, LVA, and TPL are supported by the ARC grant CE170100023. MB and this project have received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 101027577.

Competing interest None.
Ethics approval and consent to participate Not applicable.

Consent for publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.