1 Introduction

The tropical mean atmospheric circulation provides the background state for tropical variabilities, the global energy transportation, and modulates the spatial distribution of precipitation (Held and Soden 2006; Ma et al. 2018). Recent studies have shown little progress in the ensemble of CMIP6 model simulations in simulating the tropical mean atmospheric circulation and precipitation (Fiedler et al. 2020; Tian and Dong 2020). The purpose of this paper is to use the moist static energy for first baroclinic mode (MSEB) model (Fan and Dommenget 2021; referred as FD21) as a diagnostic tool to examine the sources of circulation mean state biases from an energetic perspective.

According to previous research, precipitation is extensively used to define the mean bias regions of historical climate simulations (Bollasina and Ming 2013; Li and Xie 2014; Oueslati and Bellon 2015; Fiedler et al. 2020; Tian and Dong 2020). In the tropics, the spatial disagreements include a double-ITCZ pattern over the north-of-the-equator and south-of-the-equator Pacific (Lin 2007; Li and Xie 2014; Oueslati and Bellon 2015), a cold tongue bias over the west and central equatorial Pacific (Hirota and Takayabu 2013; Li and Xie 2014; Tian and Dong 2020), a southward displaced precipitation pattern over the south-of-the-equator Atlantic (Biasutti et al. 2006; Siongco et al. 2015; Richter and Tokinaga 2020), and an east–west precipitation dipole anomaly over the equatorial Indian Ocean (IO) (Bollasina and Ming 2013; Fiedler et al. 2020; Long et al. 2020).

On the basis of the regions depicted above, the following descriptions introduce each mean bias pattern and relative hypotheses sequentially: the double-ITCZ bias is characterized by a positive precipitation simulation bias on both sides of the equator, particularly the south side (Lin 2007; Li and Xie 2014; Oueslati and Bellon 2015; Xiang et al. 2017). Lin (2007) discovered that, in comparison to CMIP runs, AMIP runs also produce the same situation. The result suggests that the excessive tropical precipitation is an intrinsic error of the atmospheric models (Zhang and Wang 2006; Bellucci et al. 2010; Oueslati and Bellon 2015; Zhou et al. 2022). Despite the fact that atmospheric general circulation models (AGCMs) typically exhibit the double-ITCZ problem (Li and Xie 2014; Zhang et al. 2019), air-sea interaction and simulated sea surface temperature (SST) biases also play significant roles in the formation of double-ITCZ biases. As shown in Xiang et al. (2017) and Zhou and Xie (2017), the SST bias in the coupled model can be explained by the radiative bias that already exists in the AMIP models.

The cold tongue bias lead to negative precipitation biases where cool SST penetrate too far westward in the Pacific (Lin 2007; Li and Xie 2014). The cold tongue bias is also accompanied by overly strong easterly winds (AchutaRao and Sperber 2006; Lin 2007; Li and Xie 2014). The CMIP-AMIP comparison indicates that the bias in the equatorial Pacific cold tongue is not due to errors in atmospheric models but rather due to the interaction with the ocean (Lin 2007; Zheng et al. 2012; Li and Xie 2014).

Over the tropical Atlantic, the southward displaced ITCZ (Richter et al. 2014; Siongco et al. 2015; Fiedler et al. 2020) in most coupled general circulation model (CGCM) simulations is associated with a westerly bias in equatorial surface winds and a failure to reproduce the eastern equatorial cold tongue SST in boreal summer (Richter and Xie 2008; Richter and Tokinaga 2020). Chang et al. (2007) and Richter and Xie (2008) reported that westerly wind biases are most pronounced in March–May (MAM), when SST biases along the equator are small. This suggests that AGCMs forced with observed SSTs also have westerly wind biases (Wahl et al. 2011; Voldoire et al. 2019). While the oceanic components of GCMs also likely contribute to this mean bias, Cvijanovic and Chiang (2013) demonstrated that the ITCZ shifts in the Atlantic are impossible without changes in tropical SST and suggested that ITCZ shifts are primarily influenced by local SST forcing.

Over the IO, an east–west precipitation anomaly is a long-lasting bias in simulating the IO climates in CGCMs (Meehl et al. 2006; Li et al. 2015; Long et al. 2020). They are referred to as IOD-like biases because their spatial structures closely resemble the anomaly pattern of a positive IO dipole (IOD) event (Saji et al. 1999). In the tropical IO, CGCMs exhibit common annual-mean easterly wind biases (Lee et al. 2013), as well as excessive precipitation and warm SST biases in the western IO and deficient precipitation and cold SST biases in the eastern IO (Cai and Cowan 2013). Moreover, the IOD-like biases appear not only in the CGCMs but also in the AGCM simulations (Meehl et al. 2006; Bollasina and Ming 2013; Wang et al. 2018).

Even though some theories have been proposed to address the mean state bias issue in CMIP simulations, the improvement on that issue still very limited over the past ten years. Understanding the formation of biases remains a challenge as it is potentially related to deficiencies in individual model components. Moreover, the CMIP6 model outputs of AMIP runs and CMIP runs are relatively new, and mean state bias research based on these new datasets still requires additional investigation.

The paper aims to use the MSEB diagnostic model, which considers dynamic and thermodynamic forcing, to identify the sources of the tropical circulation biases in CGCMs. Recently, the same model has been applied to diagnose the tropical circulation changes under global warming (Fan and Dommenget 2023). This analysis will provide a new perspective for understanding mean bias in CGCMs and can be linked to the hypotheses mentioned above. Moreover, the findings will provide some directions for the next generation of climate model development and bias reduction strategies.

The remaining part of this study proceeds as follows. Section 2 will describe the model outputs, such as CMIP runs and AMIP runs of CMIP6 simulations under historical scenario, and the analysis methods used in this study. This will also give a short introduction into the MSEB model. In Sect. 3, we will present tropical circulation mean state biases based on ensemble of CMIP and AMIP simulations, which will be followed by sensitivity analyses of the MSEB model to investigate causes of tropical mean state bias in Sect. 4. Then, the analysis on the four typical mean bias patterns will be discussed separately in Sect. 5. Lastly, this study will be concluded with a summary and discussion in Sect. 6.

2 Data and methods

2.1 Reanalysis data

The reanalysis dataset is used to be observational reference for evaluating the model simulation biases in tropic historical climatology. From the data supported by ECMWF, we obtained monthly ERA5 (Hersbach et al. 2020) reanalysis data with a spatial resolution 3.75° × 3.75°. The variables are extracted as monthly averaged data on single levels and pressure levels (14 pressure levels, 1000–30 hPa) from 1979 to 2008 as a historical period. The surface variables include surface latent heat flux, surface sensible heat flux, surface net solar radiation, surface net thermal radiation, top net solar radiation, and top net thermal radiation. The multi-layer variables include vertical velocity, U-component of wind, V-component of wind, specific humidity, temperature, and geopotential.

2.2 Model data

Our analysis is based on the newest released CMIP6 dataset of ocean–atmosphere coupled climate models for CMIP and AMIP runs in historical simulation (Taylor et al. 2012; Eyring et al. 2016). The utilization of CMIP and AMIP runs aims to distinguish the contribution of atmospheric physics in the AGCM from the contribution due to atmosphere–ocean interaction and ocean physics (hereafter coupling effects). The AMIP simulations involved employ the identical atmospheric components utilized in CMIP models, and are driven by monthly mean observed SST and sea ice concentration data from 1979 to the present, thereby eliminating SST bias that occur in the fully coupled CMIP simulations. We considered all models for which the necessary data was available to conduct the MSEB model calculations (see List of CMIP6 models in Fig. 2). Here, we analyze only one member of each CMIP/AMIP run.

All data from the CMIP6 outputs is extracted as monthly mean values on surface and pressure levels (14 pressure levels, 1000–50 hPa). The data period from the CMIP and AMIP historical simulation is from 1979 to 2008. The surface variables include surface latent heat flux, surface sensible heat flux, and surface/top upwelling/downwelling solar/thermal radiation. The multi-layer variables include vertical velocity, U-component of wind, V-component of wind, specific humidity, temperature, and geopotential.

All calculations for the MSEB model are done on the original data/model grids and then interpolated onto a common 3.75° × 3.75° grid for analysis. Throughout the analysis, we define the tropical region as latitudes less than or equal to 30° from equator. For doing analyses on ocean-only regions, we used land-sea mask which is based on Rand’s Global Elevation and Depth Data (RAND Corporation 1980).

2.3 Moist static energy for first baroclinic mode (MSEB) model

In this paper, we apply the MSEB model (FD21) as a diagnostic tool to estimate tropical circulation, as approximated by the 500 hPa vertical pressure velocity, \(\omega \). The MSEB model approximates \(\omega \) by

$$\omega \approx \frac{-g}{{\text{GMS}}_{\text{B}}}(\text{Adv}+{F}_{net})$$
(1)

with \(g\) as the acceleration due to gravity, assumed globally uniform. \({F}_{net}\) is the net energy flux into the air column from top of atmosphere (\({F}_{top}\)) and surface (\({F}_{sur}\)):

$${F}_{net}={F}_{top}-{F}_{sur}$$
(2)

The net energy heat flux into the top of atmosphere is

$${F}_{top}={\text{SW}}_{\text{top}}+{\text{LW}}_{\text{top}}$$
(3)

and the net energy heat flux into the surface is

$${F}_{sur}={\text{SW}}_{\text{sur}}+{\text{LW}}_{\text{sur}}-\text{SH}-\text{LH}$$
(4)

\({F}_{net}\) considers surface latent heat (\(\text{LH}\)) and sensible heat (\(\text{SH}\)), and top of net shortwave radiation (\({\text{SW}}_{\text{top}}\)) and net longwave radiation (\({\text{LW}}_{\text{top}}\)) and bottom of net shortwave radiation (\({\text{SW}}_{\text{sur}}\)) and net longwave radiation (\({\text{LW}}_{\text{sur}}\)). The radiative heat fluxes are defined as positive downwards and the surface heat fluxes positive upwards. Adv is the vertically integrate advection of latent energy (\({\text{Adv}}_{\text{q}}\); hereafter advection of moisture) and advection of enthalpy (\({\text{Adv}}_{\text{T}}\); hereafter advection of heat) into the air column:

$$\text{Adv}={\text{Adv}}_{\text{q}}+{\text{Adv}}_{\text{T}}=-{<{\varvec{v}}\bullet \nabla \left({L}_{v}q\right)>}_{{P}_{T}}-{<{\varvec{v}}\bullet \nabla \left({C}_{p}T\right)>}_{{P}_{T}}=-{<{\varvec{v}}\bullet \nabla \left({L}_{v}q+{C}_{p}T\right)>}_{{P}_{T}}$$
(5)

The vertical integral \({< >}_{{P}_{T}}\) denotes a mass weighted vertical integration from surface to the tropical mean tropopause height, \({P}_{T}\). \({P}_{T}\) is defined as the height at which the lapse rate is approximately \(-2\) oC/km. The lapse rate is calculated based on the tropical mean temperature profile using a spline fit interpolation with a vertical resolution of 25 m. Note the \({F}_{net}\) here is simply taken as the net energy heat flux at tropopause for the MSEB model since the most influential parameters are from troposphere. \(q\) and \(T\) are the specific humidity and air temperature, respectively. The constants of \({L}_{v}\), and \({C}_{p}\) are the latent heat of vaporization, and the specific heat at constant pressure, respectively. \({\varvec{v}}\) is the horizontal wind field and \(\nabla \) denotes the horizontal gradient operator.

The \({\text{GMS}}_{\text{B}}\) is the gross moist stability (GMS) weighted by the first baroclinic mode \(\Omega \) at 500 hPa, \({\Omega }_{500hPa}\):

$${\text{GMS}}_{\text{B}}=\frac{M}{{\Omega }_{500hPa}}$$
(6)

with \(M\) defined as the gross moist stability (GMS):

$$M=-g{<\Omega {\partial }_{p}h>}_{{P}_{T}}$$
(7)

where \({\partial }_{p}\) is vertical gradient operator in pressure coordinate and \(h\) represents MSE defined as

$$h={C}_{p}T+{L}_{v}q+gz$$
(8)

it combines air parcel’s enthalpy (\({C}_{p}T\)), latent energy (\({L}_{v}q\)), and potential energy (\(gz\)) with \(z\) as the geopotential height. \(\Omega \) to first order represents the vertical structure of vertical motion in the tropics and the values vary with different pressure levels. It is calculated based on the tropical mean temperature profile and tropopause height under a convective quasi equilibrium assumption (Wills et al. 2017). Note the \({\Omega }_{500hPa}\) here is a scalar to correspond with the diagnosis of \(\omega \) at 500 hPa in Eq. (1).

In summary, \(\omega \) is driven in the MSEB model estimate by the combined \({F}_{net}\) and \(Adv\), where a net gain in moist static energy due to these two terms lead to upward motion. Conversely, a net loss leads to downward motion. The sensitivity \(\omega \) of to these two drivers is determined by \({\text{GMS}}_{\text{B}}\), which is positive (stable) for all the cases examined in this study. Smaller values of \({\text{GMS}}_{\text{B}}\), lead to a larger sensitivity of \(\omega \) to \({F}_{net}\) and \(\text{Adv}\), while larger \({\text{GMS}}_{\text{B}}\) leads to smaller sensitivity. In turn, for given \({F}_{net}\) and \(\text{Adv}\) fields smaller values of \({\text{GMS}}_{\text{B}}\), lead to an intensification of the \(\omega \) field, while larger values of \({\text{GMS}}_{\text{B}}\) lead to a weakening of the \(\omega \) field. In this model ω, \({\text{GMS}}_{\text{B}}\), \({F}_{net}\), \(\text{Adv}\), \(h\) and \(M\) are horizontal fields of tropospheric integrals and \(\Omega \) is a mean vertical profile. In the following analysis we will only discuss the 500 hPa level of \(\omega \).

2.4 Sensitivities of the MSEB model

In the MSEB model we can estimate the sensitivity of \(\omega \) to the forcing terms on the RHS of Eq. (1). For instance, if we like to evaluate the sensitivity of the changes in ω from the observation to model simulation to the \({F}_{net}\) forcing term, we can compute \(\omega \) based on Eq. (1) with all elements of the RHS from the observation except the \({F}_{net}\), which we get from the CMIP or AMIP model simulation. The difference to the MSEB model with all elements of the RHS from the observation provides an estimate of the sensitivity of the changes in \(\omega \) to the change in \({F}_{net}\). In the subsequent study, we will discuss a series of such sensitivity analyses, in which we evaluate the changes in one element of the RHS of Eq. (1) at a time.

2.5 Regional \({\varvec{\omega}}\) mean bias indices

We define \(\omega \) mean bias indices over different oceans to quantify the MSEB model’s specific ability in capturing four well-known mean bias patterns. The indices are: The Pacific Tripole Index (TPI), the Equatorial Pacific Index (EPI), the South Atlantic Index (SAI), and the Indian Ocean Dipole Index (DPI).

The TPI quantifies the double-ITCZ bias. It computes the average of the north equatorial Pacific (9.375°N-31.875°N, 161.25°E-243.75°E) and south equatorial Pacific (5.625°S-28.125°S, 191.25°E-273.75°E) mean biases and subtracts the average equatorial Pacific (5.625°S-5.625°N, 138.75°E-247.5°E) mean biases. The EPI quantifies the cold tongue bias. It computes the average mean biases over the equatorial Pacific (same box as TPI). The SAI quantifies the southward displaced ITCZ bias in the Atlantic. It computes the average mean biases over the south equatorial Atlantic (1.875°N-16.875°S, 326.25°E-367.5°E). The DPI quantifies IOD-like bias in the IO. It computes the difference from the average mean biases of the eastern IO (5.625°N-9.375°S, 78.75°E-97.5°E) minus the average mean biases of the western IO (5.625°N-9.375°S, 48.75°E-67.5°E).

The regions are selected based on prominent mean bias signals in Fig. 1d, and all the above regions are marked as boxes on all the spatial plots in this study. The calculation methods of TPI and DPI follow the concept of SST indices for Interdecadal Pacific oscillation (Henley et al. 2015) and Indian Ocean Dipole (Saji and Yamagata 2003), but the selected regions are not exactly the same as in previous studies. For TPI, the calculation is the upward motion bias (negative value) region minus the downward motion bias (positive value) region. For DPI, the calculation is the downward motion bias region minus the upward motion bias region. Therefore, in this study, the more negative the TPI, the larger the double-ITCZ bias is. The more positive the DPI, the larger the IOD-like bias is.

Fig. 1
figure 1

Mean \(\omega \) at 500 hPa in historical period of (a) ERA5, (b) CMIP ensemble, (c) AMIP ensemble. Biases of mean state \(\omega \) from (d) CMIP ensemble minus ERA5, (e) AMIP ensemble minus ERA5. Positive (negative) values represent downward (upward) motion. Note (d) and (e) are shown without the tropical mean bias. The value on the top-right of each panel is the tropical mean value and the value on the top-left is the root-mean-square without considering the tropical mean value. The boxes are defined as regions for discussing different mean bias patterns in this study. The selection of the boxes is based on (d)

3 Mean state biases of the tropical circulation

We start the result sections with the analysis of the CMIP and AMIP simulations of the tropical circulation with a focus on their mean state biases relative to the observations, see Fig. 1. The AMIP bias specifically reflects the error attributed solely to the atmospheric model. On the other hand, the disparity between CMIP and AMIP simulations highlights the error stemming from SST biases. Considering that SST biases themselves are a result of ocean–atmosphere coupling processes. For simplicity, we term it the coupling bias. We focus on oceanic regions, avoiding the more complex land regions, for which the MSEB model has only limited skill (see FD21). We further subtract the tropical mean vertical motion to highlight the regional patterns in all further analysis in this study. The value of the tropical mean is displayed in the top-right corner of each panel in Fig. 1.

The \(\omega \) bias pattern from CMIP runs reveals upward motion biases over the northern and southern regions of the equator in the Pacific, as well as south of the equator in the Atlantic and the western equatorial IO (Fig. 1d). In contrast, there are downward motion biases over the equatorial Pacific and the eastern equatorial IO. This \(\omega \) bias pattern is consistent with mean state bias in tropical precipitation described in earlier studies (Fiedler et al. 2020; Tian and Dong 2020). For instance, regions are associated with the double-ITCZ problem, cold tongue bias in Pacific, southward displaced ITCZ in the Atlantic, and IOD-like bias in the IO. The excessive and deficient precipitation in these regions closely corresponds with upward and downward motion biases, respectively. This support the idea that tropical mean bias problems are intricately linked with biases in tropical circulation.

The \(\omega \) mean bias of AMIP simulations is generally weaker than that of CMIP simulations, but it maintains the general structure of the previously mentioned typical mean biases, with the exception of the cold tongue bias and the southward displaced ITCZ in the Atlantic, see Fig. 1c and e. The pattern correlation of \(\omega \) mean bias between AMIP and CMIP for entire tropical oceans is 0.62. It suggests that these \(\omega \) mean biases are partially intrinsic errors of AGCMs and that they may be amplified or altered by air-sea interaction in CMIP simulations.

These findings are consistent with previous studies. Numerous studies have indicated that the double-ITCZ problem is an intrinsic error in AGCMs (Zhang and Wang 2006; Lin 2007; Bellucci et al. 2010; Oueslati and Bellon 2015; Xiang et al. 2017). The cold tongue bias is an error resulting from the atmosphere–ocean coupled feedbacks (Lin 2007; Zheng et al. 2012; Li and Xie 2014). IOD-like bias is an intrinsic error in AGCMs (Meehl et al. 2006; Bollasina and Ming 2013; Wang et al. 2018) but can be amplified due to atmosphere–ocean coupling effects (Cai and Cowan 2013; Li et al. 2015; Long et al. 2020).

We employ a Taylor diagram to further evaluate the consistency of the ensemble mean bias patterns for each individual model (Fig. 2). The diagram illustrates that the ensemble mean bias patterns in CMIP simulations are consistent among the 23 models, with an average correlation value of 0.73 (blue star in Fig. 2). However, for most models, the bias pattern is about 30% stronger than in the ensemble mean (black dashed line in Fig. 2). In the AMIP simulations, the diagram shows slightly less consistency, with an average correlation of 0.63 and even larger pattern amplitudes than in the CMIP models. These findings indicate that the general mean bias patterns persist in most models and suggest that uncertainties in mean bias patterns among contemporary CGCMs primarily arise from the atmospheric models.

Fig. 2
figure 2

Taylor diagram of individual model vs. the ensemble mean \(\omega \) bias as reference for CMIP/AMIP simulations (see legend). Symbols are: individual CMIP/AMIP models (see list of numbers), and CMIP/AMIP ensemble mean (stars)

4 MSEB sensitivity analysis of the drivers of the mean biases

We now focus on the climate drivers that contribute to the mean state biases, utilizing the MSEB model as a diagnostic tool. First, we will showcase the capability and limitations of the MSEB model, subsequently we show what elements of the MSEM model lead to biases in \(\omega \), which will then be discussed in more detail for the surface heat fluxes and the advection terms.

4.1 Assessment of the MSEB model’s performance

Figure 3 presents the tropical mean \(\omega \) as diagnosed by the MSEB model. The MSEB model captures the annual mean \(\omega \) distribution for the observed and simulated data fairly well, but it does reveal an upward motion bias in the subtropics and a downward motion anomaly in the deep tropics. A potential explanation for this globally upward motion offset issue could be the absence of the transient eddy flux of MSE in the MSEB model, as discussed in FD21.

Fig. 3
figure 3

Mean \(\omega \) estimated by MSEB model at 500 hPa in historical period of (a) ERA5, (b) CMIP ensemble, (c) AMIP ensemble. Biases of mean state \(\omega \) estimated by MSEB model from (d) CMIP ensemble minus ERA5, (e) AMIP ensemble minus ERA5. Positive (negative) values represent downward (upward) motion. Note (d) and (e) are shown without bias tropical mean. The value on the top-right of each panel is the tropical mean value and the value on the top-left is the root-mean-square without considering the tropical mean value. The boxes are defined as regions for discussing different mean bias patterns in this study

The MSEB model also captures mean bias pattern (Fig. 3d and e) with a moderate pattern correlation of 0.6 (with Fig. 1d and e, respectively). This suggests that the model does have some skill in representing these biases, but it also has some significant limitations (see the following discussion in Fig. 4). It captures several prominent CMIP bias patterns, including the double-ITCZ, the cold tongue bias over the Pacific, a southward shift of ITCZ over the Atlantic, and a partial IOD-like bias over the IO (Fig. 3d). Concerning the estimation of AMIP bias, the MSEB model captures the weakening of the double-ITCZ and the disappearance of the cold tongue bias over the Pacific, the absence of the southward displaced ITCZ bias over the Atlantic, and the weakening of the bias over the western IO (Fig. 3e). Nevertheless, there are some discrepancies between the CMIP/AMIP bias (Fig. 1d and e) and the MSEB-estimated CMIP/AMIP bias (Fig. 3d and e). For instance, the CMIP and AMIP bias over the west equatorial Pacific, the AMIP bias over the Atlantic, and the CMIP bias over western and eastern IO.

Fig. 4
figure 4

Taylor diagram of MSEB model estimates vs. the corresponding reference \(\omega \) from observations (green), CMIP (blue) and AMIP (red) simulations: a long-term annual mean b long-term annual mean biases. Symbols are: ERA5 reanalysis (triangle), individual CMIP/AMIP runs (numbers; see Table 1), and CMIP/AMIP ensemble mean (stars)

Table 1 Comparison of the bias difference between CMIP (blue) and AMIP (red) and the bias difference between actual pattern and MSEB estimate

A Taylor diagram is utilized to quantify the MSEB model’s performance (Fig. 4). The MSEB model is able to capture the mean \(\omega \) based on CMIP and AMIP simulations with a correlation of 0.65 and 0.7 respectively, as well as reanalysis data with a correlation of 0.65. The pattern standard deviation ratio is slightly underestimated for the observations, whereas the estimates from the MSEB model are overestimated for both CMIP and AMIP simulations.

The mean bias pattern in the CMIP or AMIP case is approximated by the MSEB model with spatial correlations around 0.45 and the pattern standard deviation ratio is overestimates by about 30%. This overestimation is rooted in the underestimation of \(\omega \) in the observational case and the overestimation of \(\omega \) in the CMIP/AMIP cases (Fig. 4a). In combination, the moderate correlation and amplitude overestimation suggest that the MSEB model has some capability in presenting the overall biases in the model simulations, but it also some significant limitations that will limit the outcomes of this study.

While, the overall skill of the MSEB model is only moderately skillful at reproducing the tropical multi-model mean bias, it does have better skill in representing the inter-model variation of the regional indices, see Fig. 5. The MSEB model effectively captures CMIP mean biases in the Pacific, Atlantic, and IO, yielding correlations of approximately 0.66, 0.73, 0.76, and 0.85, respectively. For the AMIP cases, the MSEB model captures mean biases with correlations of about 0.56, 0.73, 0.43, and 0.9, respectively.

Fig. 5
figure 5

Mean bias indices of \(\omega \) (x-axis) for CMIP (left column) and AMIP (right column) models vs. MSEB model estimates (see y-axis). All the indices are calculated based on different region defined in our study (see boxes in Fig. 1). From top row to bottom row, respectively, Tripole index (TPI) for the average of north Pacific region and south Pacific region minus the average of equatorial Pacific region, equatorial Pacific index (EPI) for the average of equatorial Pacific region, South Atlantic index (SAI) for the Atlantic region, and dipole index (DPI) for east Indian Ocean region minus west Indian Ocean region. Each red circle represents a model value and the MSEB estimate. The red line represents the linear regression onto the 23 models. The triangle represents the average of data points. The legend shows numbers representing correlation between x- and y-axis, slope of regression line, and intercept of regression line, respectively

We now quantify the overall strength of the \(\omega \) mean (time-mean) bias in the tropical ocean regions in terms of an area-weighted horizontal root-mean-square (RMS) ratio, see Fig. 6a. The RMS ratio is computed by dividing the tropical ocean’s RMS of the \(\omega \) mean bias by the RMS value of the \(\omega \) mean. We use RMS instead of standard deviation for the denominator in order to indicate the bias strength relative to the mean background omega, which cannot be effectively highlighted using standard deviation. This measurement considers both upward and downward motion biases. We further assess the contribution of each contributory factor of the MSEB model to the \(\omega \) mean bias by computing the RMS value of the MSEB model with only including the bias from the respective component in the model. All relative RMS values in Fig. 6a are the RMS values of the MSEB model only considering the bias of one forcing term (numerator), relative to the RMS value of the CMIP/AMIP models (denominator). For instance, the RMS value of \(\omega \) estimated by the MSEB model only considering \({F}_{net}\) biases results into an RMS value of \(\omega \) that is about 40% of the RMS value of \(\omega \) in the CMIP models (red bar at \({F}_{net}\) in Fig. 6a).

Fig. 6
figure 6

Estimates of \(\omega \) bias for the 23 models for CMIP (colored bars) and AMIP (dashed lines) simulations. a Ensemble mean bias strength for the entire tropical oceans. be Bias correlation and slope of the regression line onto the 23 model biases for four regional indices. The first bars in each panel marks the CMIP and AMIP biases, the second bars the MSEB model estimates and the following bars are different MSEB model estimates for the sensitives to individual contributory factors of the MSEB model (see x-axis labels). The colored frames indicate how the sub elements are combined in the MSEB model. The RMS ratio in a is positive definite and see text for more details. All correlation and regression results in b are based on data between the MSEB model estimates (the second bar) and individual sensitivity test results (see Fig. 5 and Figs. S1-S5 for the correlation analysis). The solid and empty triangle represents slope of CMIP and AMIP cases, respectively (see Figs. S1-S5 for the regression analysis). Note that the slopes are only shown for sensitivity test cases

First, we can note that the biases in the CMIP and AMIP simulations are both of similar magnitude and with values of ~ 40% they present a significant change in the mean tropical circulation. Further, we note that the MSEB model overestimates the RMS, but is also similar for both CMIP and AMIP simulations. This will be discussed in more detail in the following sections.

4.2 Drivers of mean circulation biases

Figure 7 shows the CMIP and AMIP mean biases in the three major components of the MSEB (\({F}_{net}\), \(\text{Adv}\), and \({\text{GMS}}_{\text{B}}\)). CMIP models tend to simulate excessive \({F}_{net}\) on both sides of the equator in the Pacific, the south Atlantic, and the equatorial IO. Conversely, models display a tendency to simulate deficient \({F}_{net}\) over the equatorial Pacific and subtropical regions. The \({F}_{net}\) biases in the AMIP simulations are of similar magnitudes with substantially different patterns across the entire tropical oceans, and they are smaller for the averages within the boxes.

Fig. 7
figure 7

Biases from the CMIP ensemble minus ERA5 (left column) and from the AMIP ensemble minus ERA5 (right column). From top row to bottom row, respectively, \({F}_{net}\), \(Adv\), and \({GMS}_{B}\). Positive (negative) values for \({F}_{net}\) and \(Adv\) represent the atmosphere gains (loses) energy. Positive (negative) value for \({GMS}_{B}\) represent the atmosphere gain (lose) stability

The CMIP models \(\text{Adv}\) bias pattern portrays excessive \(\text{Adv}\) over the equatorial Pacific, the northern subtropical Pacific, and the SPCZ (Fig. 7c). In all other tropical ocean regions, models predict insufficient \(\text{Adv}\). Upon comparison with the pattern of \(\text{Adv}\) bias in AMIP simulations (Fig. 7d), it is apparent that the \(\text{Adv}\) bias over the SPCZ and much of subtropical regions are similar to those of the CMIP simulations, indicating that biases these regions are likely attributable to intrinsic errors in AGCMs.

Turning to the \({\text{GMS}}_{\text{B}}\) in CMIP and AMIP simulations, we find that the bias patterns in \({\text{GMS}}_{\text{B}}\) is quite different between the two kind of simulations, with stronger magnitudes in CMIP than in AMIP simulations. This indicates the majority of these biases in \({\text{GMS}}_{\text{B}}\) are due to errors of atmosphere–ocean interactions or ocean physics in CGCMs.

We assess the contribution of the biases in each of the three major components of the MSEB model to the \(\omega \) mean bias by only including the bias from the respective component in the MSEB model to approximate the \(\omega \) mean (see method Sect. 2d). The results are shown in Fig. 8. First, it becomes evident that \({F}_{net}\) and \(\text{Adv}\) biases are of similar strength and both exert more impact on \(\omega \) biases compared to \({\text{GMS}}_{\text{B}}\) biases. This suggests that biases in the \({\text{GMS}}_{\text{B}}\) are of minor importance.

Fig. 8
figure 8

\(\omega \) resulting from MSEB model sensitivity experiments to different independent forcing terms in CMIP simulations (left column) and AMIP simulations (right column). From top row to bottom row, respectively, \({F}_{net}\), \(Adv\), and \({GMS}_{B}\). Positive (negative) values represent downward (upward) motion. The value on the top-right (top-left) of the figure is the bias tropical mean (root-mean-square without bias tropical mean). The detailed sensitivity method is described in Sect. 2d

The \({F}_{net}\) bias are again fairly different in pattern for the CMIP versus the AMIP simulations (pattern correlation of 0.34). However, the contribution of \(\text{Adv}\) bias to \(\omega \) mean bias is similar in pattern between CMIP and AMIP simulations (pattern correlation of 0.7). Despite distinct boundary settings in both simulations, the robust correlation in the \(\text{Adv}\) bias implies a persistent intrinsic error within atmospheric models.

Figure 6a quantifies the overall strength in the \(\omega \) mean bias resulting from each contributory factor of the MSEB model in terms of the relative RMS values. It again highlights that the \({F}_{net}\) and \(\text{Adv}\) terms emerge as the two primary sources contributing to the \(\omega \) mean bias in CMIP and AMIP simulations, whereas the \({\text{GMS}}_{\text{B}}\) bias has a weaker impact on the \(\omega \) mean bias. Note, given the small impact of \({\text{GMS}}_{\text{B}}\) in this study, we will only focus on the analysis of \({F}_{net}\) and \(\text{Adv}\) terms for the following discussions.

It is noteworthy that the MSEB model’s sensitivities yield similar outcomes for the CMIP and AMIP simulations in almost all sub-components of the model. This implies that the \(\omega \) mean bias in CGCMs is mainly rooted in intrinsic errors within AGCMs, but the coupling to the ocean does change the regional patterns, as we will further explore the analysis in the next sections.

4.3 Heat flux biases

\({F}_{net}\) In the MSEB model comprises \({F}_{top}\) and \({F}_{sur}\), each of which can be further broken down into additional components. Initially, as illustrated in Fig. 6a, the contribution of \({F}_{net}\) bias to \(\omega \) mean bias within the tropics is further subdivided into the contributions of \({F}_{top}\) and \({F}_{sur}\) biases. This reveals that the \({F}_{sur}\) bias is the primary component driving \(\omega \) mean bias. The \({F}_{sur}\) bias can be decomposed into contributions from \({\text{SW}}_{\text{sur}}\), \({\text{LW}}_{\text{sur}}\), \(\text{SH}\), and \(\text{LH}\) (as shown in Fig. S6 and S7), revealing that \(\text{LH}\) and \({\text{SW}}_{\text{sur}}\) biases are mainly the contributions to the \(\omega \) mean bias in both CMIP and AMIP simulations. \(\text{LH}\) errors could be due to error in surface winds and \({\text{SW}}_{\text{sur}}\) errors are linked to errors in cloud cover. This again suggests these biases are primarily intrinsic errors within AGCMs.

Figure 9a and c show the sensitivity of the CMIP ω mean bias in the MSEB model to the biases in \(\text{LH}\) and \({\text{SW}}_{\text{sur}}\). It becomes evident that the \({F}_{net}\) contribution to the \(\omega \) mean bias in CMIP simulations is related to both elements with similar strength in the deep tropics, but somewhat different regional patterns. In the AMIP simulations the \(\text{LH}\) term has a stronger impact than the \({\text{SW}}_{\text{sur}}.\)

Fig. 9
figure 9

\(\omega \) resulting from MSEB model sensitivity experiments to different independent forcing terms in CMIP simulations (left column) and AMIP simulations (right column). From top row to bottom row, respectively, \(LH\) and \({SW}_{sur}\). Positive (negative) values represent downward (upward) motion. The value on the top-right (top-left) of the figure is the bias tropical mean (root-mean-square without bias tropical mean). The detailed sensitivity method is described in Sect. 2d

According to Fig. 6a, both components of \({F}_{top}\), \({\text{SW}}_{\text{top}}\) and \({\text{LW}}_{\text{top}}\), contribute to \(\omega \) mean bias. Given the similarity in the contributions between CMIP and AMIP cases, it suggests that \({F}_{top}\) biases are present in both CMIP and AMIP simulations. The sensitivity of the CMIP and AMIP \(\omega \) mean bias in the MSEB model to \({\text{SW}}_{\text{top}}\) and \({\text{LW}}_{\text{top}}\) shows that both terms contribute to \(\omega \) mean bias across most tropical ocean regions, except for the northern subtropical regions (Fig. 10a and c). Furthermore, we note that \({\text{LW}}_{\text{top}}\) biases tend to predominantly enhance tropical \(\omega \) mean bias patterns, whereas \({\text{SW}}_{\text{top}}\) biases tend to weaken those patterns. The compensation between these two components could be due to biases in cloud cover that affect each flux in opposite senses and underscores why \({F}_{sur}\) biases exert a more substantial influence compared to \({F}_{top}\) biases.

Fig. 10
figure 10

\(\omega \) resulting from MSEB model sensitivity experiments to different independent forcing terms in CMIP simulations (left column) and AMIP simulations (right column). From top row to bottom row, respectively, \({LW}_{top}\) and \({SW}_{top}\). The control forcing terms are based on ERA5. Positive (negative) values represent downward (upward) motion. The value on the top-right (top-left) of the figure is the bias tropical mean (root-mean-square without bias tropical mean). The detailed sensitivity method is described in Sect. 2d

4.4 MSE advection biases

The contributions of \(\text{Adv}\) bias to the \(\omega \) mean bias can be further broken down into horizontal wind, temperature gradient, and moisture gradient components. To begin, we deconstruct \(\text{Adv}\) into \({\text{Adv}}_{\text{T}}\) (\(-{<{\varvec{v}}\bullet \nabla {C}_{p}T>}_{{P}_{T}}\)) and \({\text{Adv}}_{\text{q}}\) (\(-{<{\varvec{v}}\bullet \nabla {L}_{v}q>}_{{P}_{T}}\)), see Fig. 6a. In general, both \({\text{Adv}}_{\text{q}}\) and \({\text{Adv}}_{\text{T}}\) biases contribute to the mean bias in CMIP and AMIP simulations, with \({\text{Adv}}_{\text{q}}\) bias playing a somewhat stronger role than \({\text{Adv}}_{\text{T}}\) bias.

Considering that the advection terms comprise of both horizontal winds, temperature and moisture gradient terms, we further decompose the sources of contribution to \(\omega \) mean biases into terms related to biases in wind field (denoted by \({\varvec{v}}\boldsymbol{^{\prime}}\)) and into terms related to biases in the gradients (denoted by \(\nabla {L}_{v}q{\prime}\) and \(\nabla {C}_{p}T{\prime}\)). This result into four different biases terms: \({\text{Adv}}_{\text{qq}}\) (\(-{<{\varvec{v}}\bullet \nabla {L}_{v}q{\prime}>}_{{P}_{T}}\)), \({\text{Adv}}_{\text{qv}}\) (\(-{<{\varvec{v}}\boldsymbol{^{\prime}}\bullet \nabla {L}_{v}q>}_{{P}_{T}}\)), \({\text{Adv}}_{\text{TT}}\) (\(-{<{\varvec{v}}\bullet \nabla {C}_{p}T{\prime}>}_{{P}_{T}}\)), and \({\text{Adv}}_{\text{Tv}}\) (\(-{<{\varvec{v}}\boldsymbol{^{\prime}}\bullet \nabla {C}_{p}T>}_{{P}_{T}}\)). For the purpose of this study, we will omit the nonlinear term due to its negligible contribution to ω mean biases. The individual bias terms for CMIP and AMIP are shown in Fig. S8 and S9.

The sensitivity experiments for the four different CMIP advection biases terms indicate that upward motion biases in the north Pacific are driven by \({\text{Adv}}_{\text{qv}}\) biases, while biases in the south and equatorial Pacific region are due to the \({\text{Adv}}_{\text{qq}}\) biases (Figs. 11a, c, 12a, c). The majority of the downward motion biases stem from the \({\text{Adv}}_{\text{Tv}}\) biases, followed by \({\text{Adv}}_{\text{TT}}\) biases. The AMIP bias terms (Figs. 11b, d, 12b, d) are fairly similar to those of the CMIP biases in most regional aspects, indicating that the bias contributions from different advection terms are likely intrinsic errors of the atmospheric model.

Fig. 11
figure 11

\(\omega \) resulting from MSEB model sensitivity experiments to different independent forcing terms in CMIP simulations (left column) and AMIP simulations (right column). From top row to bottom row, respectively, \({Adv}_{qv}\) and \({Adv}_{qq}\). Positive (negative) values represent downward (upward) motion. The value on the top-right (top-left) of the figure is the bias tropical mean (root-mean-square without bias tropical mean). The detailed sensitivity method is described in Sect. 2d

Fig. 12
figure 12

\(\omega \) resulting from MSEB model sensitivity experiments to different independent forcing terms in CMIP simulations (left column) and AMIP simulations (right column). From top row to bottom row, respectively, \({Adv}_{Tv}\) and \({Adv}_{TT}\). The control forcing terms are based on ERA5. Positive (negative) values represent downward (upward) motion. The value on the top-right (top-left) of the figure is the bias tropical mean (root-mean-square without bias tropical mean). The detailed sensitivity method is described in Sect. 2d

5 MSEB sensitivity analysis of significant regional biases

We now focus on the four well-known mean bias patterns, including the double-ITCZ, the cold tongue, the southward displaced ITCZ in the Atlantic, and the IOD-like bias region. Here we focus on the analysis of the inter-model spread, as the MSEB model does have better skill in presenting this, whereas the multi-model mean is only presented with moderate skill (Fig. 5).

For each index region, we present the correlation value for inter-model mean bias spread between CMIP/AMIP simulations and MSEB estimates, as well as the correlation and slope of linear regression values for inter-model mean bias spread between MSEB estimate with individual contributory factors bias and MSEB estimate with all bias factors, as shown in Fig. 6b–e. The correlation value for the first bar in each panel indicates the MSEB model’s ability to capture inter-model regional biases in CMIP and AMIP cases, with detailed analysis results provided in Fig. 5. For rest of bars except the second one, the correlation value indicates the influence of each individual factor on mean bias variation among GCMs, while the slope value represents the magnitude of their contribution. Detailed results of the regression analysis can be found in Figs. S1-S5. If the correlation and slope values are equal to 1, it means that the individual factor fully contributes to the inter-model mean bias spread of the MSEB estimate. The second bar is retained to indicate that all correlation values are calculated based on the correlation with the MSEB estimate including all bias factors, thus it is always 1. In the subsequent analysis, we will focus specifically on the factors with relatively high influence (high correlation) and contribution (large slope) to mean bias.

The first aspect we can notice now is that CMIP and AMIP simulations have biases of different signs for several well-known mean bias patterns, as shown in Fig. 5. Further, the different terms of the MSEB model can lead to biases of opposing signs, as shown in Fig. 6b–e. In the following we will discuss each of the four well-known mean bias patterns.

5.1 Double-ITCZ bias region in the Pacific

The CMIP double-ITCZ bias in the Pacific (TPI index) is characterized by negative values (Fig. 5a), indicating enhanced upward motion in the off-equatorial regions and reduced upward motion in the equatorial regions, a key signature of the double-ITCZ bias (Fig. 1d). The AMIP simulations exhibit only a weak mean double-ITCZ bias, with individual models fluctuating around zero (Fig. 5b). This implies that the general double-ITCZ bias is largely attributed to strong atmosphere–ocean coupling errors. This conclusion differs from many previous studies (Lin 2007; Bellucci et al. 2010; Oueslati and Bellon 2015), which suggested that the double-ITCZ problem is an intrinsic error related to precipitation bias. The primary reason for this difference is that we consider the bias index over the entire equatorial Pacific (including the cold tongue bias region) rather than focusing solely on the bias on both flanks of the equatorial Pacific, as mentioned in previous studies.

The MSEB model can capture the CMIP and AMIP bias variations with correlation values of 0.66 and 0.56, respectively (Figs. 5a and b). However, it tends to underestimate the strength of the CMIP and AMIP biases by 69% and 34% (i.e., with slopes of 0.31 and 0.66 in the same figure), respectively. Additionally, from the AMIP simulations, the MSEB model also misestimates some biases with opposite signs. This suggests that there is some uncertainty in the MSEB model estimates of the double-ITCZ bias in the Pacific. We will discuss some of these limitations in the AMIP analysis, but first, we will focus on the CMIP bias analysis.

The analysis of the MSEB model sensitivities shows that a number of components contribute to the CMIP double-ITCZ bias. Starting with the heat flux forcing we can see that the \({F}_{net}\) bias projects well onto the double-ITCZ bias (Figs. 6b and 8a). Here, \({F}_{sur}\) bias contribute to the bias (Fig. 6b), which is mostly due to the biases in \({\text{SW}}_{\text{sur}}\)(Figs. 6b, 9a and c). \(\text{LH}\) biases also contribute regionally in the index regions (Fig. 9a), but lead to compensating effects for the overall double-ITCZ bias. On the other hand, although the \({F}_{top}\) bias contribute little, two opposing biases are observed, with the \({\text{LW}}_{\text{top}}\) enhancing the double-ITCZ bias and the \({\text{SW}}_{\text{top}}\) acting against it (Figs. 6b, 10a and c). Many previous studies have indicated that the \({\text{SW}}_{\text{sur}}\) bias (Lin 2007) and \({\text{SW}}_{\text{top}}\) (Xiang et al. 2017) are much more influential for the double-ITCZ problem from the perspective of precipitation. This shortwave radiation bias has been linked to cloud effects (Lin 2007) and convection parameterization (Zhang and Wang 2006; Zhang et al. 2019). Given the compensating effects of \({\text{SW}}_{\text{sur}}\) and \({\text{SW}}_{\text{top}}\), which must both be linked to cloud cover biases, we must see the biases \({\text{LW}}_{\text{top}}\) as the main drivers of the CMIP double-ITCZ bias with some potential contribution from \(\text{LH}\) biases if regional uncertainties in the MSEB are considered.

The \(\text{Adv}\) bias are generally less influential and contribution to the CMIP double-ITCZ bias (Fig. 6b). However, locally they do contribute to large biases in the regions of the CMIP double-ITCZ bias (Figs. 11 and 12), which however have counteracting effects at different locations. For instance, the biases in \({\text{Adv}}_{\text{qv}}\), \({\text{Adv}}_{\text{qq}}\), \({\text{Adv}}_{\text{Tv}}\) and \({\text{Adv}}_{\text{TT}}\) all have significant contributions to biases in the mean \(\omega \) bias in the CMIP simulations (Figs. 11a, c, 12a and c), but they have either opposing effects within the regions or the amplitudes of the biases only overlap partly with the index regions of the double-ITCZ bias.

The AMIP estimates of the MSEB model do suggest an opposite sign in the double-ITCZ bias, which is not present in the AMIP simulations. The main components that contribute to the MSEB estimate of the AMIP bias are the \(\text{LH}\) and \({\text{Adv}}_{\text{qq}}\) terms (Figs. 6b, 9b and 11d). All these biases have patterns that are only partially projecting on the double-ITCZ bias, but are mostly a different pattern. This suggests that the MSEB model may be slightly off in the regional emphasis of these biases, leading to somewhat misleading projection on the double-ITCZ bias.

5.2 Cold tongue bias region in the Pacific

The CMIP cold tongue bias in the equatorial Pacific is manifested as a downward motion bias (Figs. 1d and 5c). In AMIP simulations, this bias is weak and nearly disappears in the mean (Fig. 5d). This suggests that the cold tongue bias results from atmosphere–ocean coupling errors. This conclusion aligns with previous studies that have focused on the perspective of precipitation bias (Lin 2007; Zheng et al. 2012; Li and Xie 2014).

The MSEB model can capture the CMIP and AMIP bias variations well, with correlation values of 0.73 (Fig. 5c and d). However, it tends to underestimate the CMIP downward motion bias by 36%, while overestimating the strength of the AMIP bias by 52% (i.e., with slopes of 0.64 and 1.52 in the same figure). The causes of misestimation in the MSEB model estimates still require further investigation, as potential uncertain components may have contributions that offset each other. Nevertheless, the following analysis based on the MSEB model sensitivities can still offer valuable qualitative insights.

As indicated by the sensitivity analysis in Fig. 6c, the \({F}_{net}\) bias in CMIP simulations is attributed mostly by \({F}_{sur}\) biases (Fig. 6c). Further decomposition of the \({F}_{sur}\) and \({F}_{top}\) bias suggests that the downward motion bias is mainly influenced by both \({\text{LW}}_{\text{top}}\) and \({\text{SW}}_{\text{sur}}\) (Figs. 6c, 9c and 10c). In contrast, \({\text{SW}}_{\text{top}}\) plays a role in weakening the downward motion bias (Figs. 6c and 10a). Furthermore, the \(\text{LH}\) bias is found to contribute to the downward motion bias in most of the region but leads to a strong upward motion bias in a small region nearby the warm pool (Fig. 9a). Given the cancellation between \({\text{SW}}_{\text{sur}}\) and \({\text{SW}}_{\text{top}}\) due to cloud effect, the analysis here qualitatively concludes that \({\text{LW}}_{\text{top}}\) are the main drivers of the CMIP cold tongue bias. Most previous discussion has focused on the contribution of net shortwave radiation bias. We would like to emphasize the importance of further investigating the longwave radiation bias rather than the shortwave radiation bias, as the shortwave radiation bias can be cancelled out, as mentioned in the case of the double-ITCZ problem.

Besides the \({F}_{net}\) bias, the \(\text{Adv}\) bias is found to be another influential contribution to the cold tongue bias with a positive correlation (Fig. 6c). Although the multi-model mean \(\text{Adv}\) bias appears to be anticorrelated with the cold tongue bias, we find that this is due to an offset caused by a few CMIP model simulations. According to further sensitivity analysis, the \(\text{Adv}\) bias contribution seems to be mainly dominated by \({\text{Adv}}_{\text{T}}\), and secondarily by \({\text{Adv}}_{\text{q}}\). However, the subcomponents of both terms are not consistent with each other. Therefore, we suggest that the nonlinear effect between wind and temperature/moisture bias plays an important role in the cold tongue bias region.

5.3 Southward displaced ITCZ region in Atlantic

The CMIP southward displaced ITCZ bias is observed as upward motion biases occurring in the south equatorial Atlantic (Figs. 1d and 5e). Conversely, in the AMIP simulations, the bias becomes quite weak (Fig. 5f). This implies that the displaced ITCZ bias results from strong atmosphere–ocean coupling errors in CGCMs. This finding differs from previous studies that have primarily focused on the perspective of precipitation bias, which suggested that the southward displaced ITCZ originates from intrinsic AGCM errors (Chang et al. 2007; Richter and Xie 2008; Wahl et al. 2011). This disagreement may result from differences in the selection of index regions and the specific variables under consideration.

A similar bias in CMIP simulations is also captured by the MSEB model with a correlation value of 0.76 (Fig. 5e). The \({F}_{net}\) bias is primarily attributed to the \({F}_{top}\) (Fig. 6d). The \({F}_{top}\) in CMIP simulations contributes to the central and eastern southward displaced ITCZ bias (Fig. 8a). Decomposition of the \({F}_{top}\) suggests that the upward motion biases over the Atlantic region are influenced mainly by \({\text{LW}}_{\text{top}}\) (Figs. 6d and 10c).

On the west side of the CMIP southward displaced ITCZ bias, we find that the bias results from the \(\text{Adv}\) (Fig. 8c), and its influence is similar to the \({F}_{net}\) in terms of the entire south equatorial Atlantic region (Fig. 6d). The source of the \(\text{Adv}\) on the west side is primarily due to \({\text{Adv}}_{\text{qv}}\) and \({\text{Adv}}_{\text{qq}}\) (Fig. 11a and c). Conversely, the decomposition of the \({\text{Adv}}_{\text{T}}\) shows an opposite contribution (Fig. 12a and c).

The AMIP bias is captured by the MSEB model with less correlation compared to the CMIP bias, having a correlation coefficient of 0.43. The MSEB model captures the general downward motion bias in the south equatorial Atlantic region but tends to exacerbate the bias pattern (compare Fig. 3e with Fig. 1e). The main components that contribute to the MSEB estimate of the AMIP bias is \(\text{Adv}\), which include contributions from \({\text{Adv}}_{\text{qv}}\) and \({\text{Adv}}_{\text{qq}}\) (Fig. 6d). The spatial contribution from \(\text{Adv}\) is not clear due to the cancellation of signals in the multi-model mean (Fig. 8d).

5.4 IOD-like bias region in the Indian Ocean

The CMIP IOD-like bias is characterized by upward motion biases occurring in the west equatorial Indian Ocean (IO) and downward motion bias in the east equatorial IO (Figs. 1d and 5g). This bias weakens slightly in AMIP simulations (Fig. 5h). This implies that the IOD-like bias is a significant intrinsic error in AGCMs. This finding aligns with previous studies that have primarily focused on the perspective of precipitation bias (Meehl et al. 2006; Bollasina and Ming 2013; Wang et al. 2018). Over this bias region, the MSEB model can successfully capture both CMIP and AMIP bias, with correlations of 0.85 and 0.90, respectively.

The IOD-like bias in the CMIP simulations results from \({F}_{net}\) (Figs. 6e and 8a). The \({F}_{net}\) biases are primarily attributed to \({\text{LW}}_{\text{top}}\) and \({\text{SW}}_{\text{sur}}\) (Figs. 9c and 10c). Conversely, the \({\text{SW}}_{\text{top}}\) plays a role in mitigating the upward motion biases (Fig. 10a). Given the opposing effects of \({\text{SW}}_{\text{sur}}\) and \({\text{SW}}_{\text{top}}\), we can conclude that the \({\text{LW}}_{\text{top}}\) is the primary heat flux drivers of the bias.

In AMIP simulations, the \({\text{LW}}_{\text{top}}\) continues to have a clear impact on the IOD-like bias pattern but to a lesser extent than in the CMIP case (Fig. 10d). This result further supports the idea that the IOD-like bias is an intrinsic error within AGCMs.

6 Summary and discussion

This study applied the MSEB model to identify the sources of tropical mean circulation biases in coupled general circulation models. The tropical mean circulations were examined through combined moist energy forcing terms, including net energy heat flux and energy advection, with sensitivity determined by air column stability. The analysis of CMIP and AMIP simulations identified errors intrinsic to the atmospheric models or arising from atmosphere–ocean coupling. Our analysis focused on the tropical ocean region in general and four specific large-scale CMIP bias patterns.

Before we discuss the results for the tropical circulations biases, the MSEB model’s diagnostic limitations for tropical circulation biases should be acknowledged. Its moderate skill, lower than diagnosing mean tropical circulation (Fan and Dommenget 2021) or tropical circulation changes under global warming (Fan and Dommenget 2023), may result from the complexity of tropical circulations biases and limitations in the MSEB model itself. For instance, the source of bias may link to the land part (Zhou and Xie 2017) which is not considered by MSEB model in this research. Further studies are required to refine the MSEB model for accurate diagnostic of tropical circulation biases. Despite the limitations, the MSEB model can capture most of the large-scale biases in both CMIP and AMIP models, providing guidance for understanding tropical circulation biases.

From the perspective of the multi-model mean across the entire tropics, primary drivers of tropical circulation biases in the MSEB model framework are the net energy heat flux and energy advection biases, whereas biases in stability of air column play only a minor role. Heat flux biases result from net longwave radiation at top of atmosphere and surface latent heat, showing similar amplitudes in CMIP and AMIP simulations but different patterns. This indicates that the biases are intrinsic to the atmospheric general circulation models, but are substantially affect by the coupling with the ocean. There are also biases in net surface shortwave radiation and shortwave radiation at top of atmosphere, indicating that biases in cloud cover play an important role, however, these heat flux terms have compensating effects on the MSEB model, resulting in no significant effects on the tropical circulation.

Energy advection biases, influenced by wind, moisture, and temperature gradients, also contribute to mean circulation biases with similar amplitudes and patterns in CMIP and AMIP simulations. It indicates that the biases are intrinsic to the atmospheric general circulation models.

Biases in net longwave radiation at top of atmosphere and surface latent heat appear to be the most important aspect driving tropical circulation biases. The net longwave radiation at the top of atmosphere is usually affected by the cloud top effect and the latent heat flux depends on SST and wind. On the other hand, both of these terms are related to the hydrological cycle and thus would suggest a link between biases in precipitation or evaporation and biases in the large-scale circulation. The results would be consistent with the hypothesis that biases in the simulation of precipitation or evaporation lead to biases in the large-scale tropical circulation. However, since large-scale tropical circulation and precipitation are inherently interconnected to each other, causality cannot be drawn for this study. It, however, should motivate to further explore this hypothesis.

In this research, we consider not only the perspective of multi-model mean but also the perspective of inter-model variation for analyzing the four significant large-scale biases (double-ITCZ problem and cold tongue bias over the Pacific, southward shift of ITCZ over the Atlantic, and dipole bias over the Indian Ocean). Our analysis indicates that regional bias patterns are mainly attributed to coupling errors, except for the bias in the Indian Ocean, which is intrinsic but amplified due to coupling. The drivers include biases in net longwave radiation at the top of the atmosphere among the four mean bias patterns, as well as biases in wind and moisture gradients affecting moisture advection over the Atlantic.

Our findings are consistent with several previous studies but provide some new perspectives. In the Pacific, the surface heat flux bias has been identified as a contributory factor for the double-ITCZ precipitation bias (Xiang et al. 2017). We have found that this can be linked to the contribution from surface shortwave radiation bias, however, further investigations suggest that the surface shortwave heat flux bias can be offset by the top shortwave heat flux bias. Instead, the net longwave radiation at the top of the atmosphere is critical to the bias itself. Regarding the cold tongue bias, the net longwave radiation at the top bias can be associated with cold sea surface temperature (SST) bias due to the suppression of clouds over the cold tongue bias region (Zheng et al. 2012; Li and Xie 2014). In the Atlantic region, the biases can be linked to the presence of tropical warm SST anomalies over the central and eastern Atlantic (Cvijanovic and Chiang 2013; Richter and Tokinaga 2020) and surface westerly wind anomalies (Chang et al. 2007; Richter and Xie 2008). However, for the bias in the Indian Ocean, our analysis does not show wind bias as an important contributor (Nagura and McPhaden 2014; Long et al. 2020). This discrepancy could be due to differences in the analysis based on different time scales. Overall, we have identified that net longwave radiation at the top of the atmosphere plays a significant role among the four mean bias patterns, which has not been previously highlighted in other studies.

This research demonstrated that the MSEB model is a simple tool to help narrow down the scope of understanding circulation mean state bias. The analysis results have provided some directions for improvements to the next generation general circulation model and enhanced the confidence in some theories proposed by previous studies. Overall, this research has offered a new perspective based on MSEB for large-scale tropical circulation mean bias studies.

It is worth noting that coupling errors may stem from SST climatology issues in coupled general circulation models rather than issues with atmosphere–ocean coupling implementation. To distinguish between these possibilities, further research can compare results with an atmospheric general circulation model experiment using coupled general circulation model SST as boundary conditions.