1 Introduction

General circulation models (GCMs) participating in the Coupled Model Intercomparison Project (CMIP) are the state-of-the-art tool of analysis in climate study and have been regularly used by the Intergovernmental Panel on Climate Change (IPCC) in their assessment reports (IPCC 2007, 2014). In addition to that, CMIP models have also been employed by a great number of studies to broaden our knowledge of the earth systems, to assess the socioeconomic impacts of the projected future climate, and even to create the scientific base of environmental-related policy making. Despite these vital roles, CMIP models, especially their coupled ocean–atmosphere runs, still have many deficiencies known as biases and understanding the nature of these biases is needed to minimise them.

There are several long-standing biases in CMIP models and Stouffer et al. (2017) outlined six important biases that are recommended to be urgently addressed in the upcoming generation. They are: (1) the double intertropical convergence zone (double ITCZ; Bellucci et al. 2010; Oueslati and Bellon 2015; Zhang et al. 2015; Samanta et al. 2019; Tian and Dong 2020; Si et al. 2021), (2) the dry Amazon bias (e.g., Lintner et al. 2017) impacting the land carbon fluxes as a result of poorly simulated Walker circulation, (3) the stratocumulus clouds over the eastern part of subtropical ocean basins (see Boucher et al. 2013), (4) the overly deep ocean thermocline affecting the simulation of El Nino-Southern Oscillation (ENSO; e.g., Li and Xie 2012 and Flato et al. 2013), (5) too warm and too dry land surfaces during summertime (Klein et al. 2006; Cheruy et al. 2014; Mueller and Seneviratne 2014), and (6) the location of the Southern Hemisphere atmospheric jet (e.g., Russell et al. 2006). We can notice that two of the biases mentioned above (the double ITCZ and dry Amazon) are in precipitation and a couple others (too deep thermocline and too dry land surfaces) are related to rainfall, emphasising the significance of precipitation biases. Notwithstanding this recommendation, a recent study by Fiedler et al. (2020) found that these precipitation biases still exist in the current CMIP Phase 6 (CMIP6; Eyring et al. 2016) models and there are only slight improvements from the previous CMIP5 (Taylor et al. 2012) and CMIP3 (Meehl et al. 2007).

One of the solutions proposed in Fiedler et al. (2020) to this persistent problem in simulating precipitation is to use storm-resolving model (SRM; Matsui et al. 2016 and Stevens et al. 2019). In SRM, the grid spacing is set to a fine scale, in the order of kilometre, so that convective processes can be partially resolved by the numerical scheme. By avoiding cumulus parameterisation, which is always thought to be problematic, SRM has several advantages compared to the coarse-resolution CMIP-class climate models. Among others are the explicit representation of the mesoscale phenomena and the direct link between circulation and cloud microphysical processes (Satoh et al. 2019). Simulations on limited domains by Stevens et al. (2020) result in better representation of precipitation in terms of location, diurnal cycle, and spatial propagation. Additionally, another study also found that the precipitation rate simulated by this type of model is closer to that observed by satellites (Holloway et al. 2012). These promising results suggest that running the SRM globally can lead to a more realistic simulation of precipitation.

Due to the high computational cost of running a global simulation, there are currently only a few global storm-resolving models (GSRMs) being developed (Stevens et al. 2019). One of them is the ICOsahedral Nonhydrostatic (ICON) model developed by the Max Planck Institute for Meteorology (MPI-M), Germany. The one-year simulation of the ICON model with the Sapphire configuration (ICON-Sapphire; Hohenegger et al. 2023) shows a closer seasonality of the tropical precipitation over land. However, precipitation over tropical ocean still exhibits the double ITCZ bias. It is therefore interesting to know why and how the long-standing biases in the traditional climate models are also apparent in a GSRM.

Precipitation biases in GCMs are thought to be systematic and propagated from the other simulated variables. Yang et al. (2018) proposed two main sources of biases in large-scale precipitation: thermodynamic and dynamic factors. Thermodynamic factor is related to biases in the simulation of hydrological processes, while dynamic factor is the bias in large-scale atmospheric circulation, especially the ascending branch of it. They found that the dynamic factor is the dominant contributor to precipitation biases over four regions of tropical ocean, where models tend to overestimate the frequency of occurrence of strong upward motion regime. Bellucci et al. (2010) and Oueslati and Bellon (2015) previously showed a similar result for the double ITCZ region and they connected it further to the biases in sea surface temperature (SST).

In this study, we examine the main sources of the large-scale tropical (30°S-30°N) precipitation mean biases in coupled climate model simulations by applying the simple globally resolved energy balance (GREB) diagnostic precipitation model (Stassen et al. 2019). In this GREB diagnostic model, we estimate precipitation as a result of four environmental fields: surface specific humidity, surface relative humidity, tropospheric mean vertical motion and the daily variation strength of tropospheric mean vertical motion. The four terms are representing the thermodynamic and dynamic factors of tropical precipitation.

The study is organised as follows: the data and methodology used here are described in details in the next section, including an analysis of how the GREB diagnostic model performs. In Sect. 3 we shortly present the tropical precipitation biases in the models. The main results of this study follow in Sect. 4 where we present the results of the sensitivity analysis of the GREB diagnostic model. Finally, we discuss and summarise all the results in Sect. 5.

2 Methods

2.1 Data

We use output data from CMIP5 and CMIP6 historical runs that have the following required variables: precipitation (pr), surface specific humidity (huss), surface relative humidity (hurs), and vertical velocity (wap). All variables are of monthly frequency, except for wap; daily wap is needed to calculate its standard deviation within a month. Models used in this study are listed in Table 1. Only one realisation is taken for each model. For CMIP6, we use a 36-year time period of 1979–2014, whereas only 27 years (1979–2005) are used for CMIP5. All variables are regridded onto a common 2.5° × 2.5° resolution.

Table 1 List of CMIP5 and CMIP6 models used in this study

To validate the model output, we use the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 (ERA5; Hersbach et al. 2020) data of the same variables. As ERA5 does not have huss and hurs variables, we derive them from 2-m temperature, 2-m dew point temperature, and surface pressure using the method in ECMWF (2016). For precipitation, we use the state-of-the-art observed precipitation from the Global Precipitation Climatology Project (GPCP; Adler et al. 2018). Additional reanalysis data from the Modern-Era Retrospective Analysis for Research and Application, Version 2 (MERRA2; Gelaro et al. 2017) are also used. The time period of ERA5, GPCP, and MERRA2 variables is the same as CMIP6 (1979–2014) and they are also regridded onto the same horizontal resolution.

2.2 ICON-Sapphire model

We further use the global-coupled storm-resolving ICON-Sapphire model (Hohenegger et al. 2023). ICON-Sapphire, in this configuration, targets the representation of the atmospheric flow by explicitly solving meso-beta scale processes and avoiding statistical approaches in representing convection. ICON-Sapphire also explicitly resolves mesoscale eddies in the ocean, giving a framework for a better representation not only of the individual components of the earth system (atmosphere, ocean, and land), but also of the interaction between them. We analyse the simulation presented in Hohenegger et al. (2023) with the name of G_AO_5 km simulation, which is named ICON-Sapphire in this study. ICON-Sapphire uses a horizontal mesh grid of 5 km and discretizes the atmosphere in 91 levels, the ocean in 128 levels, and the soil in five layers. The simulation is time-integrated for one complete annual cycle (January 21, 2020 to January 31, 2021). The long-term integration and the grid spacing place ICON-Sapphire as the first simulation of its kind, which also means that ICON-Sapphire is in the first stage of its development and susceptible to model errors and imbalances (Gassmann 2013; Hohenegger et al. 2023). More details regarding ICON-Sapphire are found in Hohenegger et al. (2023).

The original output of ICON-Sapphire does not have specific and relative humidity variables at surface level. So, first we take the available outputs at the lowest model level, at about 25 m above the surface, and do the following. We calculate the global means of the humidity fields in ICON-Sapphire as well as those in ERA5. The ratio of ERA5 to ICON-Sapphire global means (e.g., \({\overline{q} }_{\text{ERA}5}/{\overline{q} }_{\text{ICON}-\text{Sapphire}}\)) serves as the scaling factor, which is then used to multiply with the ICON-Sapphire humidity field. The scaled specific and relative humidity fields are thus considered as the field values at the surface level. This is done only to achieve consistency when comparing them to the ERA5 and CMIP models surface humidity fields. We test several different approaches (see Supplementary Materials) to compute the surface values and carefully checked that the main conclusions are not sensitive to this scaling step, that is, the conclusions are the same even when using the original humidity fields. All the required variable fields are also regridded to the same horizontal resolution to allow for bias calculations.

2.3 The GREB diagnostic model for precipitation

The diagnostic precipitation model from the hydrological part of the GREB model is used (Dommenget and Floter 2011; Stassen et al. 2019) to analyse the drivers of precipitation. The same diagnostic precipitation model has been applied in Stassen et al. (2020) to analyse the drivers of precipitation changes in climate change scenarios. In monthly time scale, the change in atmospheric moisture content due to precipitation (\({\Delta q}_{\text{precip}}\); kg kg−1 day−1) is modelled as:

$${\Delta q}_{\text{precip}}={r}_{\text{precip}}\cdot q\cdot \left({c}_{q}+{c}_{rq}\cdot rq+{c}_{\omega \text{mean}}\cdot {\omega }_{\text{mean}}+{c}_{\omega \text{std}}\cdot {\omega }_{\text{std}}\right)$$
(1)

where \({r}_{\text{precip}}=0.1/\text{day}\) is a constant representing the mean lifetime of water vapour in the atmosphere, \(q\) (kg kg−1) is the monthly mean surface specific humidity, \(rq\) (unitless) is the monthly mean surface relative humidity, \({\omega }_{\text{mean}}\) (Pa s−1) and \({\omega }_{\text{std}}\) (Pa s−1) are the monthly mean and standard deviation of the daily vertical velocity within each calendar month, respectively, and \({c}_{q}\) (unitless), \({c}_{rq}\) (unitless), \({c}_{\omega \text{mean}}\) (s Pa−1), and \({c}_{\omega \text{std}}\) (s Pa−1) are linear regression parameters of the model. Monthly mean precipitation (\(P\); kg m−2 day−1) and \({\Delta q}_{\text{precip}}\) are related by

$$P={\Delta q}_{\text{precip}}\cdot 2.6736\times {10}^{3} \text{ kg }{\text{m}}^{-2}.$$
(2)

This simple model assumes that precipitation is proportional to the specific humidity and that relative humidity and upward motion enhance rainfall. It does not only take into account the mean upward motion (\({\omega }_{\text{mean}}\)), but also includes the circulation variability (\({\omega }_{\text{std}}\)). By doing this, it can generate the high amount of precipitation over regions with weak \({\omega }_{\text{mean}}\) but strong sub-monthly weather fluctuations, such as the midlatitude storm tracks. To calculate \({\omega }_{\text{mean}}\) and \({\omega }_{\text{std}}\), we first take the pressure-weighted vertical average of the daily vertical velocity from 200 hPa to the surface, and then compute the mean and standard deviation for each calendar month. The diagnostic model parameters in Eq. (1) are obtained by fitting the precipitation monthly climatology (left hand side) and the monthly climatology of the four variable fields (right hand side) globally using multiple linear regression method. For the observed estimate, we diagnose the GPCP precipitation using the four variable fields from ERA5 (right hand side). For ICON-Sapphire, the available data in its one-year time period is assumed to be its long-term monthly climatology.

The diagnosed precipitation is able to show notable features of precipitation, especially over the tropical oceans (Fig. 1). The Pacific and Atlantic ITCZ appear precisely at the same locations and have about the same magnitude in the diagnosed annual mean (Fig. 1c). As with the ITCZ, high amount of rainfall over the Maritime Continent and the equatorial Indian Ocean are also diagnosed very well. In the seasonal cycle (Fig. 1d), many monsoon regions can be identified having about the same areal extents and strengths, namely the Indian, East Asian, Australian, South American, and African monsoons. The seasonal displacement of ITCZ can also be spotted, especially over the Atlantic. However, the diagnostic model does have an overall smaller global mean precipitation.

Fig. 1
figure 1

Annual mean of (a) the observed precipitation from GPCP and (c) the diagnosed precipitation calculated using Eq. (1) (see text for more details). b, d Same as (a, c), but for seasonal cycle, defined as the difference between June–July–August (JJA) and December–January–February (DJF) averaged values. Numbers on the top-right corner of (c) and (d) are the pattern correlation coefficients with respect to (a) and (b), respectively

The diagnostic model does have a lower performance over the extra-tropics, with only about 50% of precipitation over the Southern Ocean, and precipitation over the western basins of North Pacific and North Atlantic are also underestimated both in the annual mean value and in the seasonality amplitude (Fig. 1c, d). These inaccuracies in the diagnosed precipitation are also found over the continental regions. When only considering the tropical (30°S-30°N) region, the pattern correlation coefficients increase up to 0.91.

Figure 2 summarises the performance of the simple model in diagnosing tropical precipitation annual mean and seasonal cycle of CMIP5, CMIP6, and ICON-Sapphire models. In general, the diagnosed precipitation seems to be slightly underestimated in the annual mean, but is of the right amplitude in seasonal cycle. Furthermore, the diagnostic skill is higher in climate models than in observations, as indicated by most coloured dots having correlations higher than the black diamond in Fig. 2a. The CMIP multimodel mean even has correlation value of 0.98 in annual mean, while the skill in ICON is more moderate (0.96) but still better than in observations (0.91) and in reanalysis (0.95, not shown).

Fig. 2
figure 2

Taylor diagrams of a the diagnosed tropical precipitation and c the diagnosed tropical precipitation bias in the annual mean of the individual CMIP5 (red dots) and CMIP6 (blue dots) models, the CMIP5 (red diamond) and CMIP6 (blue diamond) multimodel means, the ICON-Sapphire model (green diamond), and the GPCP-ERA5 (black diamond), with their respective actual precipitation or actual precipitation bias (black dot) as reference. b, d Same as (a, c), but for the seasonal cycle

The GREB diagnostic model has four individual terms, that each contributes to the total precipitation in different ways. To gain some understanding of the relative contribution of each term, we have shown each term in Fig. 3. We should first note that the model is non-linear in respect to specific humidity, \(q\), following the fundamental idea that rainfall is proportional to the available moisture (\(P\propto q\)) and at the same time controlled by the other three variables (Eq. 1). Therefore the \(q\cdot {c}_{q}\) term (Fig. 3a) cannot be considered as the stand-alone physical contribution of \(q\) to the diagnosed rainfall. Rather, Fig. 3a and b should be viewed together when analysing the roles of both specific and relative humidity in Eq. (1). With the same amount of \(q\) in the air, it is more probable to have larger amount of rainfall if the region has high \(rq\). For that reason, \(q\cdot {c}_{q}\) term (Fig. 3a) is more like a limiting factor to the role played by \(q\cdot {c}_{rq}\cdot rq\) term in Fig. 3b. We can further note in Fig. 3c, that the sum of Fig. 3a and b is small and only slightly correlated with the actual precipitation (~ 0.5). These two terms’ total contribution is to confine high precipitation within the tropical and midlatitude oceans and suppress the rainfall over subtropical regions.

Fig. 3
figure 3

Annual mean values of the individual terms in Eq. (1): a \(\text{q}\cdot {\text{c}}_{\text{q}}\), b \(\text{q}\cdot \left({\text{c}}_{\text{rq}}\cdot \text{rq}\right)\), d \(\text{q}\cdot \left({\text{c}}_{\omega \text{mean}}\cdot {\omega }_{\text{mean}}\right)\), and (e) \(\text{q}\cdot \left({\text{c}}_{\omega \text{std}}\cdot {\omega }_{\text{std}}\right)\) for GPCP-ERA5. All terms are scaled using Eq. (2) so that the units are mm day−1. (c) is the sum of (a-b) and (f) is the sum of (d-e). The sum of (c) and (f) is the diagnosed precipitation as shown in Fig. 1c. The actual precipitation, as shown in Fig. 1a, is superimposed as black contour lines plotted with interval of 2 mm day−1. Numbers on the top-right corner are the pattern correlation with respect to the actual precipitation and the percentage of the global mean values with respect to the global mean actual precipitation

The roles of the \({\omega }_{\text{mean}}\) and \({\omega }_{\text{std}}\) terms are easier to understand (Fig. 3d-e). These two terms have high pattern correlations (> 0.7) with respect to the actual precipitation. This indicates that mean circulation and its variability construct the building blocks of the geographical patterns of precipitation. In Fig. 3f, we can see that the sum of these two terms has a very high pattern correlation (0.85) and explains more than 60% of the actual global mean precipitation, emphasising the significance of the circulation terms in the diagnosis. The roles of all these terms in diagnosing CMIP5 and CMIP6 multimodel mean precipitation are quite similar to what is described here (Figs. S1-S2), whereas for ICON-Sapphire, some quantitative differences are evident (Fig. S3).

The GREB diagnostic model is used to conduct sensitivity studies in order to deduct the main sources of precipitation biases. We use the observed estimate of the model as a reference and estimate the sensitivities to elements of the right-hand side by replacing the elements (variable fields or parameters) with the CMIP multimodel mean or the ICON-Sapphire model values with all other elements remaining as the observed values. The difference in the diagnosed precipitation to the observed estimate approximates the sensitivity of the precipitation to the considered element. For example, we use the CMIP model values for \({c}_{\omega \text{mean}}\) and \({\omega }_{\text{mean}}\), and let the other terms on the right-hand side of Eq. (1) be the observed values. Then, the differences in the diagnosed precipitation (left hand side) between these two estimates are defined as the response to the biases from the coupled climate models.

3 Model precipitation biases

Figure 4 shows the CMIP6 ensemble mean and the ICON-Sapphire annual mean biases. The precipitation bias in CMIP6 (Fig. 4a) shows some large-scale biases that are similar to those reported in previous studies (e.g., Fiedler et al. 2020). Prominent features are the tripolar pattern of wet–dry–wet biases in the Pacific depicting the double ITCZ and the equatorial cold tongue biases (Tian and Dong 2020), an equatorial dipole in the Indian ocean (Long et al. 2020), a meridional dipole across the equatorial Atlantic (Richter and Tokinaga 2020) and a dry bias over the Amazon (Yin et al. 2013). Overall, the models simulate more rainfall than observed (see Fig. S4 and Fig. 2 in Fiedler et al. 2020). These main large-scale features in the biases are very similar in pattern and amplitude in the CMIP5 ensemble mean (see Fig. S5), indicating that there is only little improvement from one model generation to the next.

Fig. 4
figure 4

Actual precipitation bias in the annual mean of a CMIP6 multimodel mean and b ICON-Sapphire calculated against the observed GPCP precipitation. c, d Same as (a, b), but for diagnosed precipitation bias calculated against the diagnosed GPCP-ERA5 precipitation. The actual precipitation annual mean of CMIP6 multimodel mean and ICON-Sapphire are superimposed as black contour lines in a and b, respectively, plotted with interval of 2 mm day−1. Numbers on the top-right corner of (a, b) are the root-mean-squared bias (in mm day−1) over the tropics (30°S-30°N). Numbers on the top-right corner of (c, d) are the pattern correlation coefficients with respect to (a, b) and the root-mean-squared bias (in mm day−1) over the tropics

The biases in each of the individual CMIP model simulations are in general similar to the mean bias shown here, indicating that the ensemble mean bias is indeed the main bias in models (see Fig. S6). The pattern correlations to the ensemble mean bias is between 0.5 to 0.9 for the models in CMIP5 and CMIP6 ensembles, with the magnitudes of the mean bias pattern about 10% to 50% larger than in the ensemble mean.

The ICON-Sapphire precipitation bias (Fig. 4b), on the other hand, is dominated by strong biases (almost twice as strong as in CMIP6), both dry and wet, over most of the tropical ocean regions. This strong amplitude can result partly from the fact that we only use one year period for ICON-Sapphire but compare it against a 36-year climatology from GPCP. Similar to CMIP6, ICON-Sapphire still suffers from the same wet biases in, among others, the South Pacific Convergence Zone (SPCZ), the north Pacific ITCZ, and the western Indian Ocean. The most interesting pattern is the dry bias over the Maritime Continent, that is in contrast with the rather wet bias in CMIP6. Apparently, this might be caused by the westward displacement of the equatorial cold tongue bias up to the eastern Indian Ocean, creating some sort of dipole bias there. This is supported by Segura et al. (2022) who pointed out that this issue comes from the sea surface temperature (SST) bias.

The diagnostic GREB model can reproduce most of the biases in the CMIP6 ensemble mean and the ICON-Sapphire simulation with similar patterns and magnitudes (see Fig. 4c-d). Dry biases over Amazon and India in the CMIP6 ensemble are some small exceptions that cannot be properly diagnosed, making the pattern correlation coefficient only moderate (0.64). The ICON-Sapphire simulation biases are better captured by the GREB diagnostic model with a comparable magnitude and a high correlation (0.85).

4 Drivers of precipitation biases

We now focus on analysing the GREB diagnostic model to understand what is driving the biases in precipitation simulation. Assuming Eq. (1) is perfect in explaining the relationship between rainfall and the input forcing fields of humidity and circulation, precipitation biases in climate models can occur from the biases in the simulated fields of \(q\), \(rq\), \({\omega }_{\text{mean}}\), and \({\omega }_{\text{std}}\) that propagate to the precipitation, or from the errors in precipitation sensitivity to those fields, which is represented by the regression parameters (\({c}_{q}\), \({c}_{rq}\), \({c}_{\omega \text{mean}}\), and \({c}_{\omega \text{std}}\)). We first focus on the sensitivities in the GREB diagnostic model to the individual forcing, followed by analysis of the biases in the four individual forcing terms of Eq. (1), and finally we compare the relative importance of the biases in sensitivities against biases in the forcing fields.

4.1 Biases in the sensitivity to forcing fields

In the GREB diagnostic model (Eq. 1), we estimate the sensitivity of the precipitation to each of the four forcing fields by the four multiple regression parameters (\({c}_{q}\), \({c}_{rq}\), \({c}_{\omega \text{mean}}\), and \({c}_{\omega \text{std}}\)). The value of each of these regression parameters determines how sensitive the precipitation is to each of the forcing fields. The deviations of the model simulations from the observed parameters give some indications of potential biases in the simulations of the precipitation; however, the interpretation of the parameters is not always that simple and has to consider the multiple regression approach, which makes all parameters depend on each other.

We first evaluate the uncertainty in the observed values, by estimating the regression parameters based on different combinations of the observed data fields: GPCP, ERA5 and MERRA2 (see Fig. 5a–d). We can note that, for all four parameters, there is some uncertainty among the different combinations of datasets. These variations tend to be larger than the uncertainties in the regression parameter based on the number of samples in the regression calculations (error bars in Fig. 5a–d). The variations are mostly within ±10%, and the estimates with the GPCP combined with the reanalysis data is not systematically different from the estimates based solely on reanalysis data, which gives some confidence in the consistency of the GPCP data with the reanalysis data.

Fig. 5
figure 5

Estimated values of the fitted a \({\text{c}}_{\text{q}}\) (unitless), b \({\text{c}}_{\text{rq}}\) (unitless), c \({\text{c}}_{\omega \text{mean}}\) (s Pa−1), and d \({\text{c}}_{\omega \text{std}}\) (s Pa−1) for ERA5 fitted with GPCP precipitation (left black), ERA5 fitted with ERA5 precipitation (right black), MERRA2 fitted with GPCP precipitation (left grey), and MERRA2 fitted with MERRA2 precipitation (right grey). eh Same as (a–d), but for CMIP5 (red) and CMIP6 (blue) multimodel means, ICON-Sapphire (green), and GPCP-ERA5 (GPCP precipitation fitted with ERA5 variable fields; black). Orange vertical bars denote the 95% confidence interval of the fitted parameters, black vertical bars denote the intermodel spread (i.e., \(\pm\upsigma\)), and cyan vertical bar denotes the range of parameter values in different combinations of observations and reanalyses in (ad). Black dots are the minimum and maximum values of parameters in CMIP multimodel ensembles

Figure 5e–h compare these fitted parameters of CMIP and ICON-Sapphire models to those of the observations. Starting with the CMIP model ensembles, we can notice that the fitted parameters are mostly similar to the observed values within the ensemble spread, but there are also some substantial differences to the observed values. We can first of all notice that the CMIP5 and CMIP6 ensembles are similar in all parameters, but the CMIP6 ensemble mean is closer to the observed in three out of the four parameters.

The sensitivity to the specific humidity (\(q\)) is slightly underestimated in magnitude in both CMIP ensemble means (Fig. 5e). Since this parameter is negative, it suggests that the models are too sensitive to \(q\), precipitating at larger rates for a given \(q\) than observed, assuming all other aspects of the model are unbiased (e.g., no biases in the mean \(q\) or other mean fields). This bias may be related to the light precipitation problem (e.g., Dai 2006; Sun et al. 2017; Na et al. 2020), also known as the “drizzling” bias, which is commonly found in CMIP models (Chen et al. 2021), as it leads to precipitation irrespective of relative humidity or vertical circulation.

The sensitivity to the relative humidity (\(rq\)) is also underestimated in magnitude in both CMIP ensemble means (Fig. 5f), but given the positive value of the parameter, this does lead to less precipitation per \(rq\) in the CMIP model simulations relative to what is observed. Thus, the CMIP models are slightly less sensitive to relative humidity than what is observed.

The largest deviation of the CMIP models from the observations is in the sensitivities to the mean vertical motion (\({\omega }_{\text{mean}}\); see Fig. 5g). The models are almost twice as sensitive to \({\omega }_{\text{mean}}\) than observed, which is unlikely to be within the observed uncertainties, indicating this is a significant bias in the models. It suggests that the models precipitate about twice as much for a given mean ascending air motion and also suppress precipitation about twice as strong for a given mean descending air motion.

The sensitivity of the CMIP models to the daily mean standard deviation of vertical motion (\({\omega }_{\text{std}}\)) is also overestimated (Fig. 5h), but not as strongly as for the \({\omega }_{\text{mean}}\). Given that \({\omega }_{\text{std}}\) is positive definite, this bias leads in general to more precipitation, but more so in region with large mean \({\omega }_{\text{std}}\).

The ICON-Sapphire model has larger deviations from the observed values, but also deviates from the CMIP model ensembles. For the sensitivities to \(q\), \(rq\) and \({\omega }_{\text{mean}}\), the biases to observations in the ICON-Sapphire model are in the same direction as those in the CMIP models, but substantially stronger for \(q\) and \(rq\). The bias of the ICON-Sapphire model in the sensitivities to \({\omega }_{\text{std}}\) is in the opposite direction to those of the CMIP models, suggesting the ICON-Sapphire model is less sensitive to \({\omega }_{\text{std}}\). Thus, this bias would in general lead to reduced precipitation, particularly in regions with large \({\omega }_{\text{std}}\).

The above analysis found a significantly enhanced sensitivity of the CMIP and ICON-Sapphire models to the \({\omega }_{\text{mean}}\) compared to the observed one, which we like to explore further. Figure 6a depicts the relationship between precipitation and mean circulation (\({\omega }_{\text{mean}}\)) in the observations (GPCP precipitation and ERA5 \({\omega }_{\text{mean}}\)). As expected, upward motion (i.e., negative \({\omega }_{\text{mean}}\)) leads to high precipitation amount and this relationship is stronger over oceans (blue points) than over land regions (brown points). In Fig. 6b-d, we see similar relations in the CMIP5 and CMIP6 multimodel means and in the ICON-Sapphire simulation. However, the model simulations have steeper gradients than the observations (compare solid and dashed lines). That is, these models tend to generate higher amount of rainfall than the observations for the same \({\omega }_{\text{mean}}\). This overestimation is worse in the strong upward motion regime (e.g., \({\omega }_{\text{mean}}<0.05 \text{ Pa }{\text{s}}^{-1}\)). The ICON-Sapphire also has very strong upward motion points beyond the observed \({\omega }_{\text{mean}}\) range, which in turn create many points with very high amount of rainfall and strong positive rainfall bias. Even so, we acknowledge that this may arise partly due to the use of only one year of ICON-Sapphire data, which cannot fully represent its climatology.

Fig. 6
figure 6

Scatterplots of (ad) annual mean precipitation (mm day−1) vs. annual mean \({\omega }_{\text{mean}}\) (Pa s−1) and (eh) annual mean “effective” precipitation (unitless) vs. annual mean \({\omega }_{\text{mean}}\) (Pa s−1) for tropical oceans (blue) and tropical lands (brown) in (a, e) GPCP-ERA5, (b, f) CMIP5 and (c, g) CMIP6 multimodel mean, and (d, h) ICON-Sapphire. “Effective” precipitation is defined by excluding the influences of \(\text{q}\), \(\text{rq}\), and \({\omega }_{\text{std}}\) on precipitation using Eq. (1) (see Eq. [3] for more details). Solid blue (brown) lines in (bd) and (fh) are the best-fit lines estimated by including all the tropical ocean (land) grid points. The best-fit lines for GPCP-ERA5 in (a) and (e) are plotted as dashed blue (brown) lines for ocean (land) points and duplicated as reference in (bd) and (fh), respectively. Numbers in the upper right corner are the gradient values of the best-fit lines

The gradients of the model simulations are about 30–50% stronger than those found in the observations. This mismatch is not as strong as the one found in Fig. 5g for the multiple regression fit of \({c}_{\omega \text{mean}}\), which was about twice as strong for the models than for the observations. This suggests that the covariance with the other forcing fields (\(q\), \(rq\) and \({\omega }_{\text{std}}\)) enhances the mismatch of the models relative to the observations in their relation between precipitation and \({\omega }_{\text{mean}}\).

To illustrate this better, we employ Eq. (1) one more time to exclude the influences of \(q\), \(rq\) and \({\omega }_{\text{std}}\) on the precipitation to only focus on the precipitation resulting from a relation to \({\omega }_{\text{mean}}\) by defining \({P}_{{\omega }_{\text{mean}}}\):

$${P}_{{\omega }_{\text{mean}}}={\Delta q}_{\text{precip}}/\left({r}_{\text{precip}}\cdot q\right)-\left({c}_{q}+{c}_{rq}\cdot rq+{c}_{\omega \text{std}}\cdot {\omega }_{\text{std}}\right).$$
(3)

Figures 6e-h show \({P}_{{\omega }_{\text{mean}}}\) as a function of \({\omega }_{\text{mean}}\). The differences in the best-fit line gradients between models and observations are now more discernible and are what is manifested in Fig. 5g. Not only that, we can now see a salient dissimilarity between ICON-Sapphire and CMIPs. Although having the same overestimation, in ICON-Sapphire (Fig. 6h), the relationship over land points is identical to that over oceans, while in CMIP5 (Fig. 6f) and CMIP6 (Fig. 6g) multimodel means, they are not.

4.2 Biases in the forcing terms

Next, we focus on the biases in each of the four forcing terms in Eq. (1). Each term can have biases in either the sensitivities as discussed in the previous section or in the variable fields. The biases of each of the four variable fields for the CMIP6 ensemble mean are shown in the left column of Fig. 7. We can quantify the sensitivity of the precipitation to the model bias for a specific term by replacing the respective term values from the observations with the model values (e.g., replacing the observed \({c}_{q}\cdot q\) with the model value of \({c}_{q}\cdot q\)). See also the methods section for more details. The resulting changes in precipitation relative to the control (i.e., Eq. (1) using observed values) are shown in the right column of Fig. 7.

Fig. 7
figure 7

The annual mean bias of the simulated a \(\text{q}\), b \(\text{rq}\), c \({\omega }_{\text{mean}}\), and d \({\omega }_{\text{std}}\) in CMIP6 multimodel mean calculated against ERA5 reanalysis. The resulted annual mean precipitation bias from experiments of changing the e specific humidity (\({\text{c}}_{\text{q}}\) and \(\text{q}\)), f relative humidity (\({\text{c}}_{\text{rq}}\) and \(\text{rq}\)), g mean circulation (\({\text{c}}_{\omega \text{mean}}\) and \({\omega }_{\text{mean}}\)), and h circulation variability (\({\text{c}}_{\omega \text{std}}\) and \({\omega }_{\text{std}}\)) terms with the CMIP6 multimodel mean values (see text for more details). The actual precipitation annual mean of CMIP6 multimodel mean is superimposed as black contour lines in (eh), plotted with interval of 2 mm day−1. Numbers on the top-right corner of (eh) are the pattern correlation coefficients with respect to (ad). Note that the range and interval used for shading in (eh) are different to those in Fig. 9e-h for the sake of clarity

First, we discuss the specific humidity term (\({c}_{q}\cdot q\); see Fig. 7a, e). The biases in the humidity field show primarily an ocean-land contrast, with a mostly wet ocean bias and a dry land bias (Fig. 7a). The changes in the GREB model diagnostic precipitation resulting from the biases in the specific humidity term show a fairly different pattern (correlation of 0.2) with mostly enhanced precipitation, with a tropical wide wet bias of about 0.5 mm/day and up to 1 mm/day over the high rainfall oceanic regions (Fig. 7e). This indicates that the slightly less negative \({c}_{q}\) than the observations is more dominant than the bias in \(q\), forcing a slightly positive bias.

The bias in the relative humidity (\(rq\)) field is similar to the bias in specific humidity, with a clear dry bias over land (Fig. 7b). The sensitivity of the precipitation to biases in relative humidity term (\({c}_{rq}\cdot\) \(rq\)) is generally negative, but also mostly following the bias pattern in \(rq\) (correlation of 0.75; Fig. 7f). This suggests that the biases in precipitation resulting from the relative humidity term are resulting from both elements (\({c}_{rq}\) and \(rq\)).

The biases in the \({\omega }_{\text{mean}}\) field show a complex shift in the large-scale upward and downward motions (Fig. 7c). Combined with the bias towards increased sensitivity (\({c}_{\omega \text{mean}}\)), the \({\omega }_{\text{mean}}\) term leads to strong precipitation biases, similar in many parts to the total bias of the CMIP6 ensemble mean (Fig. 4a). Among others are the double ITCZ and the equatorial dry bias in the Pacific, the Indian ocean wet bias, and the southerly displacement of Atlantic ITCZ. This is mostly due to the increase sensitivity (\({c}_{\omega \text{mean}}\)), enhancing precipitation where the \({\omega }_{\text{mean}}\) is upward and reducing it where \({\omega }_{\text{mean}}\) is downward. However, it is also partly due to the biases in the \({\omega }_{\text{mean}}\) field, which is highlighted by the moderate negative pattern correlation of − 0.4, considering that upward motion, that is negative \({\omega }_{\text{mean}}\), enhances precipitation.

The last term is the circulation variability (\({\omega }_{\text{std}})\), which does show some significant biases in the tropical Pacific and Atlantic (Fig. 7d), which result into some important precipitation biases (Fig. 7h). Biases from the circulation variability term seem to supplement those from the mean circulation term. For example, the wet bias over the southeastern tropical Pacific (Fig. 7h) would further extend the anomalous south rainfall band of the double ITCZ (Fig. 7g). The resulting precipitation bias pattern is also correlated (0.64) to the \({\omega }_{\text{std}}\) bias field, illustrating that the biases in \({\omega }_{\text{std}}\) are more important than the biases in sensitivity (\({c}_{\omega \text{std}}\)).

We can combine the first two terms (\(c_{q} \cdot q\) and \({c}_{rq}\cdot rq\)) to focus on the thermodynamic terms and the last two terms (\({c}_{\omega \text{mean}}\cdot {\omega }_{\text{mean}}\) and \({c}_{\omega \text{std}}\cdot {\omega }_{\text{std}}\)) to focus on the dynamic terms (see Fig. 8). In the thermodynamic terms, we can see a weak wet bias over most of tropical oceans and some dry biases over land (Fig. 8a). In contrast to that, the dynamic terms’ combined result is larger in magnitude and shows almost all of the main features of the CMIP6 rainfall bias (compare Fig. 4c with Fig. 8b). This comparison demonstrates that biases from dynamic factor are more important to the large-scale precipitation biases than those from thermodynamic factor. However, this does not mean that biases from humidity are entirely negligible, since the linear superposition of all terms (Fig. 8c) is closer to the actual bias than the dynamic factor alone.

Fig. 8
figure 8

The resulted annual mean precipitation bias from experiments in a humidity terms (sum of Fig. 7e-f) and (b) circulation terms (sum of Fig. 7g-h) for CMIP6 multimodel mean. c Linear superposition of the experiment results in a and b. The actual precipitation annual mean of CMIP6 multimodel mean is superimposed as black contour lines, plotted with interval of 2 mm day−1. Note that the range and interval used for shading are different to those in Fig. 10 for the sake of clarity

Fig. 9
figure 9

The annual mean bias of the simulated a \(\text{q}\), b \(\text{rq}\), c \({\omega }_{\text{mean}}\), and d \({\omega }_{\text{std}}\) in CMIP6 multimodel mean calculated against ERA5 reanalysis. The resulted annual mean precipitation bias from experiments of changing the e specific humidity (\({\text{c}}_{\text{q}}\) and \(\text{q}\)), f relative humidity (\({\text{c}}_{\text{rq}}\) and \(\text{rq}\)), g mean circulation (\({\text{c}}_{\omega \text{mean}}\) and \({\omega }_{\text{mean}}\)), and h circulation variability (\({\text{c}}_{\omega \text{std}}\) and \({\omega }_{\text{std}}\)) terms with the CMIP6 multimodel mean values (see text for more details). The actual precipitation annual mean of CMIP6 multimodel mean is superimposed as black contour lines in (eh), plotted with interval of 2 mm day−1. Numbers on the top-right corner of (eh) are the pattern correlation coefficients with respect to (ad). Note that the range and interval used for shading in (eh) are different to those in Fig. 7e-h for the sake of clarity

The analysis of the CMIP5 ensemble mean shows very similar results to CMIP6 (Figs. S7 and S8). Biases from the specific and relative humidity terms (Figs. S7e-f) lead to a wet bias condition over most of the tropical oceans (Fig. S8a), while those from the dynamic terms, which mainly come from the mean circulation term (Fig. S7g), explain many important large-scale patterns of the biases (Fig. S8b).

Figure 9 shows the biases in the four forcing fields and the biases in the precipitation resulting from the four forcing terms for the ICON-Sapphire simulation. In comparison to the CMIP6 biases (Fig. 7), we can see some similarities, some differences, and an overall much stronger impact on the precipitation from all four terms.

The precipitation biases resulting from the specific humidity term (Fig. 9e) are similar in pattern to those of the CMIP models, but much stronger in amplitude. Similar to the CMIP6 result, this arises mostly from the bias in \({c}_{q}\) as the result does not resemble the ICON-Sapphire \(q\) field bias (Fig. 9a). An opposite response is resulting from the relative humidity term, where a strong dry bias is found almost all over the tropics (Fig. 9f). This result also suggests that the bias in \({c}_{rq}\) is more dominant, although we can still see some signatures of the \(rq\) field bias (Fig. 9b), such as the dry biases over the Amazon and equatorial Africa, hence making the pattern correlation still relatively high (0.65).

The biases in mean circulation term (Fig. 9g) have similar impact on the precipitation biases as seen for CMIP6. It shows most of the important large-scale rainfall biases in ICON-Sapphire, such as the zonally elongated SPCZ and the too wet northeastern tropical Pacific. Unlike for the CMIP6, this result is more correlated (up to −0.8) with the bias in ICON-Sapphire \({\omega }_{\text{mean}}\) (Fig. 9c), even though they have about the same errors in \({c}_{\omega \text{mean}}\) (see Fig. 5g). The circulation variability term (Fig. 9h) is overall weaker. Although the \({\omega }_{\text{std}}\) field bias (Fig. 9d) has seemingly very similar patterns to the \({\omega }_{\text{mean}}\) field bias, the resulting precipitation bias shows mostly a dry Maritime Continent bias and no prominent wet biases. This is due to the substantially weaker \({c}_{\omega \text{std}}\) in the ICON-Sapphire model than observed, leading to a dry bias.

Fig. 10
figure 10

The resulted annual mean precipitation bias from experiments in a humidity terms (sum of Fig. 9e-f) and (c) circulation terms (sum of Fig. 9g-h) for ICON-Sapphire. c Linear superposition of the experiment results in a and b. The actual precipitation annual mean of ICON-Sapphire is superimposed as black contour lines, plotted with interval of 2 mm day−1. Note that the range and interval used for shading are different to those in Fig. 8 for the sake of clarity

We again combine the terms for estimating the thermodynamic and dynamic terms (see Fig. 10). The result is almost identical to CMIP6 models. The large magnitude precipitation biases of the specific and relative humidity terms almost cancel each other and lead to a weak wet bias across the tropics with some patches of dry biases over land (Fig. 10a). Conversely, we can see that the overall biases from the dynamic terms are the main large-scale patterns of the actual precipitation bias (Fig. 10b).

We summarise the contribution of the four terms to the precipitation biases in each of the model simulations with a Taylor diagram in Fig. 11. First, we can notice that the linear superposition of the four terms in all model simulations has a relation to the actual model biases similar to the total GREB model diagnostic bias (compare stars and diamonds in Fig. 11a). Further, we find that the \({\omega }_{\text{mean}}\) term has the largest contribution, followed by the \({\omega }_{\text{std}}\) term, and minor contributions from the thermodynamic terms, \(q\) and \(rq\).

Fig. 11
figure 11

Taylor diagrams of the resulted precipitation bias in the sensitivity experiments in Figs. 7 and 9 against the actual precipitation bias over (a) all the tropical regions as well as (b) tropical oceans and (c) tropical lands only. Blue (red) coloured symbols are for the CMIP6 (CMIP5) multimodel mean and green coloured symbols are for the ICON-Sapphire model. Linear superposition of the four experiment results and the diagnosed bias are also shown as star and diamond symbols, respectively. Some symbols are off the scale and indicated by an arrow and a number in the bracket showing their correlation coefficients

We can notice that more proportion of the precipitation biases in ICON-Sapphire is explained by the circulation terms, especially the mean circulation, compared to the CMIP5 or CMIP6, which also lead to the overall higher correlation than in CMIP (red or blue star), even though the contribution from relative humidity term (green square) in ICON-Sapphire is negative.

Another difference is revealed by separating the analysis into land and ocean regions. Over oceans (Fig. 11b), both have most of the biases coming from the circulation terms, with the mean circulation term alone having correlations of up to 0.7 in CMIP and almost 0.9 in ICON-Sapphire. However, they have a clear distinction over land (Fig. 11c). In CMIP, it is biases from the specific humidity terms which have the highest correlation (smallest root-mean-squared error), whereas in ICON-Sapphire, it is still the mean circulation terms having the highest correlation.

4.3 Biases in the sensitivities vs. biases in forcing fields

The above analysis illustrated that the precipitation biases can occur from biases in the sensitivity parameters (\({c}_{q}\), \({c}_{rq}\), \({c}_{\omega \text{mean}}\), and \({c}_{\omega \text{std}}\)) and the forcing fields (\(q\), \(rq\), \({\omega }_{mean}\), and \({\omega }_{std}\)). We like to now focus on the relative importance of these two aspects. We therefore analyse the sensitivity of the GREB model precipitation to sensitivities parameters and the forcing fields, respectively, for the CMIP6 and the ICON-Sapphire simulations (see Fig. 12).

Fig. 12
figure 12

The resulted annual mean precipitation bias from experiments of changing the (a) parameters (\({\text{c}}_{\text{q}}\), \({\text{c}}_{\text{rq}}\), \({\text{c}}_{\omega \text{mean}}\), and \({\text{c}}_{\omega \text{std}}\)) and (b) variable fields (\(\text{q}\), \(\text{rq}\), \({\omega }_{\text{mean}}\), and \({\omega }_{\text{std}}\)) with the CMIP6 multimodel mean values (see text for more details). (c) Linear superposition of the experiment results in (a-b). (d-f) Same as (a-c), but for the ICON-Sapphire model. The actual precipitation annual mean of CMIP6 multimodel mean and ICON-Sapphire are superimposed as black contour lines in (a-c) and (d-f), respectively, plotted with interval of 2 mm day−1

Both for CMIP6 and ICON-Sapphire, the results are similar. The biases in the sensitivity parameters lead to wet biases over the regions of high rainfall (Fig. 12a,d). This suggests that in the CMIP models biases in \({c}_{q}\), \({c}_{\omega \text{mean}}\) and \({c}_{\omega std}\) (Fig. 5e,g,h), which lead to positive bias condition, dominate over biases in \({c}_{rq}\). For the ICON-Sapphire simulation, the biases \({c}_{q}\) and \({c}_{\omega \text{mean}}\) dominate over biases in \({c}_{rq}\) and \({c}_{\omega std}\).

On the other hand, changing all the variable fields (\(q\), \(rq\), \({\omega }_{\text{mean}}\), and \({\omega }_{\text{std}}\)) with the models’ simulated fields results in the important large-scale patterns of the precipitation biases (Fig. 12b and e). If we compare the patterns in Fig. 12b (Fig. 12e) with those in Fig. 7a-d (Fig. 9a-d), we may notice that most of them come from the biases in \({\omega }_{\text{mean}}\) and \({\omega }_{\text{std}}\) fields, again indicating the dominance of the circulation factor. Dissimilar to the previous sensitivity experiment, the linear superposition of the two sensitivity results here does not quite match the diagnosed bias, which is not unexpected, given they are not linear terms of Eq. (1).

Figure 13 quantifies the relative contribution of the different elements to the precipitation biases. The results for CMIP and ICON-Sapphire are qualitatively similar, even after separating the analysis into ocean (Fig. 13b) and land (Fig. 13c) regions. The only noticeable difference is that all the sensitivity results for ICON-Sapphire (green symbols) have higher correlations and smaller RMSE values compared to those for CMIP (red or blue symbols). Nevertheless, these analysis show that biases from the variable fields control most of the large-scale precipitation bias patterns, while biases from the sensitivities do have similar magnitudes, but are not contributing as much to the overall bias pattern.

Fig. 13
figure 13

Taylor diagrams of the resulted precipitation bias in the sensitivity experiments in Fig. 12 against the actual precipitation bias over (a) all the tropical regions as well as (b) tropical oceans and (c) tropical lands only. Blue (red) coloured symbols are for the CMIP6 (CMIP5) multimodel mean and green coloured symbols are for the ICON-Sapphire model. Linear superposition of the four experiment results and the diagnosed bias are also shown as star and diamond symbols, respectively. Some symbols are off the scale and indicated by an arrow and a number in the bracket showing their correlation coefficients

5 Summary and discussions

In this study, we investigated the main sources of the tropical precipitation biases in the state-of-the-art GCMs participating in CMIP and in one of only a few developed global convective-permitting climate models: ICON-Sapphire. As a novelty, we employed the simple GREB diagnostic model that relates precipitation to atmospheric moisture content (specific and relative humidity; \(q\), \(rq\)) and large-scale circulation (mean upward motion and its sub-monthly variability; \({\omega }_{\text{mean}}\), \({\omega }_{\text{std}}\)). It has been shown that the GREB diagnostic model precipitation has remarkably high pattern correlation with respect to the actual precipitation, both in observations and in model simulations. In addition to that, the model also has high skill in diagnosing the biases in precipitation climatology with correct magnitude and patterns.

The analysis of the GREB diagnostic model suggests that biases in the tropical precipitation result from both biases in the forcings fields (\(q\), \(rq\), \({\omega }_{\text{mean}}\), \({\omega }_{\text{std}}\)) and from the sensitivity to these forcing fields (\({c}_{q}\), \({c}_{rq}\), \({c}_{\omega \text{mean}}\), and \({c}_{\omega \text{std}}\)). Here, the most remarkable bias comes from the sensitivity of the models to the mean large-scale circulation (\({c}_{\omega \text{mean}}\)), which is about twice as large as observed. This suggests that precipitation in the models is too closely related to \({\omega }_{\text{mean}}\). This finding is consistent with similar results in previous studies. Oueslati and Bellon (2015) found that precipitation over the Pacific double ITCZ region is overly simulated by coupled climate models, compared to the observations, for the same upward motion regime. Yang et al. (2018) also found similar relations between \(P\) and \(\omega\) for other ocean regions. Our result here confirms that such relation is typical over the tropics in general.

Although having very similar overestimation problem, the results for CMIP and ICON-Sapphire can be distinguished by separating the analysis into ocean and land regions. The relationship between precipitation and mean circulation is about the same for ocean and land points in ICON-Sapphire, something that we do not see in CMIP models nor observations. This might be linked to the resolved convective processes, the main distinction between the two. Furthermore, the fact that we still find the overestimated \({c}_{\omega \text{mean}}\) in ICON-Sapphire indicates that this problem has less to do with the convective parameterisation and lies more in other processes that translate atmospheric circulation into precipitation, such as the microphysics scheme.

Our analysis further showed that biases from the \({\omega }_{\text{mean}}\) are the dominant sources of most of the prominent rainfall biases in CMIP models, such as the double ITCZ and the dry equatorial Pacific. This is consistent with previous studies that also attribute the double Pacific ITCZ problem to the atmospheric circulation bias (e.g., Bellucci et al. 2010; Oueslati and Bellon 2015; Yang et al. 2018). On the other hand, we find that biases from the humidity terms only play a small role in the forms of tropics-wide wet biases that might be linked to the “drizzling” problem (e.g., Chen et al. 2021) and some dry biases over land areas, such as the Amazon (e.g., Lintner et al. 2017). Note that, however, this dry bias from humidity terms might only explain part of the problem as the GREB model cannot really diagnose the dry Amazon bias (see Fig. 4c), while the association of the wet biases to the “drizzling” bias should be further explored using more detailed methods (e.g., probability density function). Nevertheless, the humidity terms contribution is not necessarily negligible as the linear sum of biases from both factors is closer to the actual bias than those from circulation alone.

The results for ICON-Sapphire are as follows. Even though having the capability to partially resolve convective processes, the large-scale precipitation biases in ICON-Sapphire stem from the same main sources as in CMIP models: the circulation terms. Biases from the mean circulation terms in ICON-Sapphire are dominantly responsible for the prominent precipitation biases over the Pacific, such as the too strong and too zonal SPCZ. In contrast, although having large magnitude, biases from the specific and relative humidity terms almost cancel each other and only result in a weak positive bias with some patches of dry biases over land. However, unlike in the CMIP result, we cannot easily associate the positive bias to the “drizzling” problem as a previous study (Na et al. 2021) suggested that GSRMs do not suffer from it. Moreover, the dry bias apparent over South America is unlikely to be the dry Amazon bias as it mostly gets cancelled by the opposite bias from circulation terms and does not appear in the linear superposition (Fig. 10) or in the diagnosed bias (Fig. 4d).

Despite the similarities in the results, there are important differences in how the biases in humidity and circulation terms relate to the precipitation biases in CMIP and in ICON-Sapphire. We find that more proportion of the overall precipitation biases in ICON-Sapphire can be related to the biases from circulation compared to that in CMIP. This difference is explained by another finding when we separate the analysis into land and ocean regions. Over land, biases from the relative humidity terms account for most of the rainfall biases in CMIP, whereas in ICON-Sapphire, it is the biases from the mean circulation terms. Therefore, we conclude that the sources of the large-scale precipitation biases in ICON-Sapphire are more concentrated in the biases from circulation, while in CMIP, they are more distributed among the biases from humidity, circulation, and other unrepresented processes.

A previous study about ICON-Sapphire by Segura et al. (2022) found that some of the tropical precipitation biases in ICON-Sapphire can be explained by the strong biases in its simulated SSTs. The large-scale biases over the Pacific, such as the equatorial dry bias, were reduced largely when a meridional gradient of SSTs similar to observations was simulated (see their Sect. 6). Our study here follows this up by suggesting that the biases in SST affect the precipitation biases more through the dynamic factor (i.e., atmospheric circulation) than through the thermodynamic factor (i.e., atmospheric moisture). However, this error propagated through the coupling between ocean and atmosphere cannot be overly blamed on. Kodama et al. (2015) found similar biases in the atmosphere-only run of another global storm-resolving model called NICAM (Satoh et al. 2014). This indicates that there are also inherent errors in the atmospheric component of these models.

Our simplistic analyses here are not without any drawbacks. We apply a crucial assumption to the sensitivity experiments, that the biases in humidity and circulation terms are independent to each other and to the precipitation biases as well. In reality, they are intertwined and interconnected through many complex processes and their cross-influences are hard to discern. For instance, if one tried to investigate the main cause of the mean circulation biases using such similar simple method, they would probably find that it is the biases in precipitation or in other variables related to precipitation. In fact, this is what Fan and Dommenget (2023) found in their recent study on tropical circulation biases. They used a simple model of moist static energy budget (MSEB; Fan and Dommenget 2021) to analyse the mean state bias in tropical mean circulation, and found a significant bias contribution from the surface latent heat flux, which is closely related to evaporation and precipitation. In addition to that, we also acknowledge that our use of only one year period of ICON-Sapphire simulation may not be representative to its long-term climatology. Moreover, the results for ICON-Sapphire might not be common among the other GSRMs and further investigations are needed to gain more comprehensive conclusions regarding rainfall biases in this type of models. Nevertheless, we believe that this study adds new knowledge and provides valuable insights to the development of climate models in general.

The fact that the ICON-Sapphire still has the common problems in precipitation simulation does not mean that we do not get the added values of resolving the meso-beta scale convective processes. Instead, it helps us know that we can now focus our attention to the other unresolved processes. For instance, one possible interpretation of the results presented here is that the microphysics scheme for precipitation in CGCMs is creating too much precipitation for a given amount of humidity convergence (mass flux). As a hypothesis, one could test if reducing the amount of precipitation per mass flux would improve the large-scale precipitation patterns and also the large-scale tropical circulation. Such sensitivity studies in CGCMs could provide a guide in improving tropical circulation and precipitation simulations.