Compensating Errors in Cloud Radiative and Physical Properties over the Southern Ocean in the CMIP6 Climate Models

The Southern Ocean is covered by a large amount of clouds with high cloud albedo. However, as reported by previous climate model intercomparison projects, underestimated cloudiness and overestimated absorption of solar radiation (ASR) over the Southern Ocean lead to substantial biases in climate sensitivity. The present study revisits this long-standing issue and explores the uncertainty sources in the latest CMIP6 models. We employ 10-year satellite observations to evaluate cloud radiative effect (CRE) and cloud physical properties in five CMIP6 models that provide comprehensive output of cloud, radiation, and aerosol. The simulated longwave, shortwave, and net CRE at the top of atmosphere in CMIP6 are comparable with the CERES satellite observations. Total cloud fraction (CF) is also reasonably simulated in CMIP6, but the comparison of liquid cloud fraction (LCF) reveals marked biases in spatial pattern and seasonal variations. The discrepancies between the CMIP6 models and the MODIS satellite observations become even larger in other cloud macro- and micro-physical properties, including liquid water path (LWP), cloud optical depth (COD), and cloud effective radius, as well as aerosol optical depth (AOD). However, the large underestimation of both LWP and cloud effective radius (regional means ∼20% and 11%, respectively) results in relatively smaller bias in COD, and the impacts of the biases in COD and LCF also cancel out with each other, leaving CRE and ASR reasonably predicted in CMIP6. An error estimation framework is employed, and the different signs of the sensitivity errors and biases from CF and LWP corroborate the notions that there are compensating errors in the modeled shortwave CRE. Further correlation analyses of the geospatial patterns reveal that CF is the most relevant factor in determining CRE in observations, while the modeled CRE is too sensitive to LWP and COD. The relationships between cloud effective radius, LWP, and COD are also analyzed to explore the possible uncertainty sources in different models. Our study calls for more rigorous calibration of detailed cloud physical properties for future climate model development and climate projection.


Introduction
Clouds play a pivotal role in the global energy budget by reflecting solar radiation back to space, which is known as a cloud cooling effect (Schneider, 1972;Ramanathan et al., 1989), absorbing longwave radiation as a warming effect (Zhao and Garrett, 2015), as well as releasing latent heat to the atmosphere (Wang et al., 2014;Pan et al., 2020). Nearly 80% of the Southern Ocean (SO) region (defined as 40°-60°S) is covered by cloud (Dolinar et al., 2015) but has been underexplored due to lack of observations. Meanwhile, this area is far from any anthropogenic pollution sources but has abundant biogenic marine gases and aerosols (McCoy et al., 2020).
Cloud radiative effect (CRE) is largely determined by cloud physical properties. The previous model assessments showed that the poorly simulated regional radiation budget can lead to the biases in sea surface temperature and climate sensitivity (IPCC, 2014a, b;Bony et al., 2015) and also affect the simulations of atmospheric circulation and precipitation (Ceppi et al., 2012;Hwang and Frierson, 2013). The cloud radiative cooling effect at the top of atmosphere induced by a 15%-20% increase in low cloud coverage can neutralize the effect of doubling CO 2 concentrations (Slingo, 1990). Cloud fraction (CF) was underestimated by many global climate models, which results in a large bias in absorbed shortwave radiation (Trenberth and Fasullo, 2010;Dolinar et al., 2015). Moreover, cloud microphysical processes also closely linked with the radiation budget and the hydrological cycle, which efficiently determines regional and global climate feedbacks (Tan et al., 2015;Bjordal et al., 2020). Slingo (1990) found that the cooling effects of a 15% -20% decrease in cloud effective radius (r e ) or a 20%-35% increase in liquid water path (LWP) are equivalent to a 15% -20% increase in low cloud coverage. The cloud phase is also critical to determine CRE (Teng et al., 2020), especially for the mixed-phase clouds over the SO region (Bjordal et al., 2020). The low and optically thick clouds over the SO (Haynes et al., 2011) have a remarkable effect on the planetary albedo. The concurrent increases in LWP and cloud optical depth (COD) can cause negative shortwave CRE (Zelinka et al., 2012).
Cloud physical parameterizations are key knobs in tuning climate model simulations (Schiro, 2019;Wang et al., 2020). By implementing new boundary layer and convection schemes, researchers managed to improve CRE on the global scale in Post-CMIP5 simulations (Stanfield et al., 2015), most obviously in total CF and SW CRE over the SO. Therefore, a comprehensive examination of cloud physical properties is essential to understanding regional and global CREs, simulating climate mean states and variations, and quantifying climate feedbacks and sensitivities. It is a wellknown fact that the net radiative flux biases in the Coupled Model Intercomparison Project Phase 5 (CMIP5) are mainly caused by the SO clouds in the models (Bodas-Salcedo et al., 2014). Here, we expand this effort to the CMIP Phase 6 (CMIP6) models (Eyring et al., 2016). The performance of the latest models has been improved in terms of precipitation extremes (Luo et al., 2022), equilibrium climate sensitivity (Jiang et al., 2021), and climate extremes (Zhu et al., 2020). A recent study attributed the high effective climate sensitivity in the CMIP6 models to stronger positive cloud feedbacks from decreasing extratropical low cloud coverage and albedo mainly in the SO (Zelinka et al., 2020). Therefore, it is critical to quantify and understand the cloud and radiation biases in the CMIP6 models over the SO. To avoid degradation of model performance due to the complexity in air-sea interactions, this study focuses on the Atmospheric Mode Intercomparison Project (AMIP) experiment with prescribed but time-varying sea surface temperature, sea ice content, and other external forcings (Gates et al., 1999). It covers the time period from January 1979 through December 2014.
In this study, we evaluate CMIP6 models using 10-year satellite observations and focusing on atmospheric radiation fluxes, cloud radiative and physical properties such as CRE, CF, r e , LWP, and COD, as well as aerosol optical depth (AOD) over the SO. Since we aim to explore all relevant cloud and aerosol output of a CMIP6 model, only five CMIP6 models meet such a requirement and are analyzed in this study. We investigate their relationships and the underlying reasons for their biases. Detailed information of models and satellite products, including the participating CMIP6 models and satellite retrievals, are described in section 2. Evaluation results are shown in section 3. Synergistic results and further analyses on potential uncertainty sources in predicting CRE are discussed in section 4. Conclusions and discussions on future research are provided in section 5.

Observations
Observational datasets for the period from January 2003 to December 2012 are utilized in our study to evaluate the performance of CMIP6 models. Monthly-mean radiative fluxes from the Clouds and the Earth's Radiant Energy System (CERES) Energy Balanced and Filled (EBAF) dataset are used for assessing top-of-atmosphere (TOA) cloud radiative properties, including the longwave cloud radiative effect (LW CRE), shortwave cloud radiative effect (SW CRE), and net cloud radiative effect (CRE). Furthermore, the Surface Data Ocean Fraction Coverage from CERES SYN1deg Products are also used as a filter in our study to avoid the inconsistency between land and ocean surface retrievals. Hence, the pixels with ocean fraction of more than 95% are chosen in this study. We also use retrieved Level-3 cloud physical properties from Moderate Resolution Imaging Spectroradiometer (MODIS) instruments onboard the Aqua satellite (MYD08_M3), including the CF, liquid cloud fraction (LCF), r e , LWP, and COD. The details of the MODIS dataset are shown in Table 1. Note that satellitederived LWP is an in-cloud property. To match up with the climate models' definition, which is averaged over the grid box, we multiply the satellite LWP by the satellite CF to get the grid-scale LWP for all the analyses below. To explore the relationship between aerosols and clouds over the SO, the Level-3 AOD product from MODIS is also analyzed in this study.
The uncertainty of satellite-derived variables needs to be emphasized, specifically for any model evaluation study. Based on prior research, the uncertainty of global monthly mean TOA fluxes can be separated into two parts. The difference of all-sky radiative fluxes can be 2.5-3 W m -2 , while the discrepancies of clear-sky fluxes can reach 5-6 W m -2

√
(1 + ρ)/(1 − ρ) and 4.5-5 W m -2 for SW and LW fluxes, respectively (Loeb et al., 2018a,b). Compared with in situ aircraft observations, the MODIS cloud retrieval product overestimates r e near cloud top by 26%-31% over the SO (Zhao et al., 2020). Kang et al. (2021) further suggested that r e is overestimated by satellite for non-or light precipitation and underestimated for heavy precipitation. Considering the possible existing uncertainty in satellite-derived variables, corresponding statistic strategies (Weatherhead et al., 1998;Shea et al., 2017) are used in this study. σ var is the standard deviation of the variable's regional, annual, or monthly mean time series. κ var is the autocorrelation time of natural variability. It is calculated as: κ var = , where ρ is the lag-1 autocorrelation. The uncertainty of MODIS-derived cloud physical properties used in this study is presented in Table 2.

CMIP6 models
Considering the availability of datasets within the CMIP6 project, five CMIP6 climate models (CESM2, GISS-E2-1-G, MPI-ESM-1-2-HAM, NorESM2-LM, and UKESM1-0-LL) participating in the Atmospheric Model Intercomparison Project (AMIP) are chosen. Some detailed information on these five models is listed in Table 3. The CMIP6 dataset is from "historical*" experiments. The radiative properties in five CMIP6 models are incoming shortwave radiation (rsdt), outgoing longwave radiation (rlut), outgoing shortwave radiation (rsut), upwelling clear-sky longwave radiation (rlutcs), and outgoing clear-sky shortwave radiation (rsutcs). The total CF for the whole atmospheric column is called clt, and the LCF (the mass of cloud liquid water) seen by MODIS, including the large-scale and convective clouds, is called clwmodis. The cloud physical properties in the models include LWP, r e , and COD. The parameters of LWP are calculated by clwvi minus clivi, which corresponds to column integrated condensed water minus that of ice. The variables of r e are named as reffclwtop, representing the effective radius of liquid droplets seen from satellite. Note that r e in GISS-E2-1-G is calculated from the averaged reffclws (the *r e is multiplied by a factor of 10, as the original r e is on the order of 1 μm. **r e values that are too large (greater than 25 μm) are excluded. effective radius of stratiform cloud liquid water) in multiple layers. The cod of CMIP6 models is the integral of scattering, absorption, and attenuation coefficients along the radiative path. The od550aer of CMIP6 models refers to the AOD from ambient aerosols. All available "r1i1p1f1" ensemble runs from three CMIP6 models (CESM2, MPI-ESM-1-2-HAM, and NorESM2-LM) are used in our study. Owing to the unavailability in the AMIP experiment, the "r1i1p3f1 " ensemble run from GISS-E2-1-G and the "r1i1p1f4" ensemble run from UKESM1-0-LL are used here. The time period of CMIP6 datasets used in this study is from January 2003 to December 2012, consistent with our satellite product time period.

Evaluation methods
In this study, the evaluations are developed using the following statistical methods. The regional averaged variables over the SO are calculated by Eq. (1): where X i is the ith value of the multiyear averaged corresponding variable and k is the total number of samples.
(2) X The regional root-mean-square Error (RMSE) is calculated from Eq. (2), where m is the sum of grid samples covering the SO and X i is the corresponding value of grid i. is the averaged value of the grid over the SO. (3) The correlation coefficient is used for evaluations in this study. In Eq. (3), X and Y represent the corresponding variables of observation and the CMIP6 models, respectively, is the covariance, and and denote the variance of X and Y separately. . (4) The normalization method used in this study is calculated by using the ratio of the ith variable and the maximum of values to show the temporal change of corresponding variables. Note that all the grid data used in this study is latitudeweighted.

Absorbed solar radiation
Absorbed solar radiation (ASR) is calculated as the downwelling minus upwelling shortwave radiation of all-sky conditions at the top of atmosphere. A previous study by Trenberth and Fasullo (2010) found that too large ASR over the SO was simulated by the CMIP3 models, showing a substantial bias of more than 32 W m -2 . The serious overestimation of ASR and underestimation of cloudiness over the SO led to poor model performance in simulating the energy budget in the Southern Hemisphere in climate models (Marchand et al., 2014). Here, we first compare CMIP6 multimodel mean ASR against the CERES satellite observations. In Fig.1a, the spatial distribution of the ASR biases can be characterized as an overall underestimation over a large fraction of the oceans and an overestimation in the high-latitude region (50°W-150°E, 53°-60°S). More importantly, the magnitudes of those biases are within the range of ±10 W m -2 and ±10% relative changes (Fig. 1b), showing an evident improvement of ASR in CMIP6 compared to CMIP3 and CMIP5. In Fig. 1c, all the correlation coefficients of multiyear mean ASR between CERES and individual CMIP6 models are larger than 0.95, indicating well-aligned spatial patterns between satellite and CMIP6 models. The RMSE of multiyear mean ASR for the CMIP6 models compared to CERES ranges from 4 W m -2 to 8 W m -2 , and such biases between simulations and observations are considerably smaller than previous results from CMIP3 (Trenberth and Fasullo, 2010).

Cloud radiative effect
Cloud radiative effect (CRE) is defined in this study as the difference of net incoming radiation between all-sky and clear-sky conditions at the top of atmosphere (TOA). The spatial distributions of LW CRE, SW CRE, and net CRE from the CMIP6 models and CERES are shown in Fig. 2. The spatial patterns of radiative properties in the CMIP6 models are generally consistent with CERES. However, the magnitudes are stronger over the deep ocean and relatively weaker on the near-shore regions. Considering the abundant low-level clouds over the SO, there is an obvious cooling effect over the southern Indian and Atlantic oceans. The relatively weaker cooling occurred in the high latitudes to the south of 55ºS (Figs. 2b and e). This can be attributed to the high surface reflectance by sea ice appearing in austral winter with relatively weak solar radiation. According to Figs. 2g and h, the warming effect of LW CRE is relatively underestimated by the CMIP6 models over a large fraction of the SO, but the spatial biases of SW CRE between CMIP6 models and CERES suggest that the cooling effect of SW CRE is apparently overestimated over the areas south of 50ºS and underestimated over the lower latitudes. The larger bias of SW CRE can also explain the similar pattern of net CRE discrepancy due to the stronger magnitude of the effect of SW CRE than theof LW counterpart (Fig. 2i). Generally speaking, the biases in net CRE are contributed by both SW and LW CRE in the lower latitudes, while SW CRE biases dominate over the high latitudes. The standard deviations between the five CMIP6 climate models are used to represent the inter-model spreads. The standard deviations in SW CRE and net CRE, particularly over the area to the south of the African continent (0°-25°E, 40°-47°S), reach more than 15 W m -2 , making the model biases less statistically significant in those regions.
To examine the fidelity of each CMIP6 model in a more quantitative manner, the Taylor diagrams (Taylor, 2001) for the CRE spatial distributions are shown in Fig. 3. LW CRE is found to have a larger bias than the SW counterpart. As we learn from Fig. 2g, the CMIP6 multimodel mean cannot capture the magnitude of LW CRE very well, with maximal underestimation of about 5 W m -2 . In Fig. 3a, the Taylor diagram shows that the correlation coefficient of the LW CRE spatial distribution is less than 0.6, with RMSE larger than 2 W m -2 . On the contrary, the simulated SW CRE and net CRE are better correlated with observations ( Figs. 3b and c). In particular, UKESM1-0-LL exhibits the best performance with the largest correlation coefficient and the smallest RMSE. CESM2 also shows good performance in SW and net CRE. The net CRE evaluation resembles that of SW CRE according to the three criteria in the Taylor dia- Fig. 2. Multimodel means and standard deviations of LW, SW, and net CRE from five CMIP6 models and from CERES. grams, implying that the net CRE is mainly regulated by the SW part.

Cloud fraction
Cloud macro-physical properties such as CF can play a dominant role in determining the radiative characteristics of cloud. Basically, increasing cloud coverage can reduce outgoing longwave radiation at the TOA and shortwave radiation reaching the surface. The CF spatial pattern over the SO in the CMIP6 models generally agrees with MODIS (Figs. 4a and b), with more cloud cover over the deep ocean regions. The differences between MODIS and the CMIP6 models ( Fig. 4c) suggest that there is less CF simulated in CMIP6 models, especially over the region near 40°S. The CF difference between MODIS and CMIP6 models can be caused by many factors, such as cloud overlap algorithms and the threshold assumptions for cloud formation (Jian et al., 2021). The spatial distribution of the spread of the five CMIP6 models can be seen in Fig. 4d. The model disagreement is larger over the southern Indian Ocean (0°-25°E, 40°-47°S), where a similar bias in net CRE simulation can be found (Fig. 2l). Specifically, according to the Taylor diagram (Fig. 4e), UKESM1-0-LL and CESM2 exhibit better performance in  simulating CF with higher spatial correlation and smaller RMSE against satellite observations. The underestimated CF in the models can also explain the lower modeled LW CRE compared to CERES observations. LWP and COD do not impact LW radiation significantly because cloud emissivity is close to 1 when LWP and COD are larger over the region, like over the SO.
LCF in the CMIP6 models and in MODIS is shown in Fig. 5. Because of unavailable LCF output from GISS-E2-1-G and MPI-ESM-1-2-HAM, only CESM2, NorESM2-LM, and UKESM1-0-LL are compared with MODIS. The observation shows that LCF (Fig. 5a) has a pattern similar to total CF (Fig. 4a), i.e., greater cloud fraction at higher latitudes. This pattern is also consistent with the distribution of net CRE shown in Fig. 2c, which has a stronger cooling effect over the high-latitude deep sea. Note that the net CRE over the region of 100°-180°E is weaker than other areas which is caused by the smaller liquid cloud fraction of this region. Compared to observations, the CMIP6 LCF bias is high in the area near 40°-45°S and low near 50°-60°S. Moreover, the inter-model spread is rather large over the area of 50°-100°E, 45°-55°S, which makes the model-observation comparison meaningless in this region.

Cloud liquid water path and cloud optical depth
The spatial patterns of cloud LWP in the CMIP6 multimodel mean and in MODIS are shown in Fig. 6. The MODIS observation reveals that the high-latitude LWP is distinctly larger than the LWP at lower latitudes (Fig. 6a), which is consistent with the patterns of observed CF ( Fig. 4a) and LCF (Fig. 5a). Compared to MODIS, the multimodel mean LWP in the CMIP6 models is underestimated by 19.9% (Figs. 6b and c). The spatial distribution of modeled LWP is characterized by high values in the middle of the SO between 40°-50°S, which is not comparable with observations. The standard deviations among the five CMIP6 models are generally large over the high latitudes (50°-60°S).
For the individual CMIP6 models, their spatial correlations with observations are rather diverse. There are even negative correlation coefficients for MPI-ESM-1-2-HAM and UKESM1-0-LL (Fig. 6e). In contrast, the LWP spatial patterns in CESM2, GISS-E2-1-G, and NorESM2-LM are reasonable. The RMSE magnitude also differs vastly among different models, ranging from a minimum error of 20 g m -2 for CESM2 to a maximum error of 40 g m -2 for UKESM1-0-LL. Note that a negative spatial correlation coefficient never occurs for other cloud macrophysics parameters like total CF and LCF in any model discussed here, implying a much larger challenge for those models to predict LWP.
The SO mean COD is comparable between the CMIP6 mean (15.2) and MODIS (14.9). The slight overestimation of COD in CMIP6 can compensate for the underestimation of CF in the radiation calculation, leaving the modeled mean CRE close to the satellite observation. However, regional biases in COD are much larger. Spatially, the biases appear as a general overestimation at lower latitudes and underestimation at higher latitudes. Meanwhile, the spatial patterns of satellite-observed COD, the multimodel mean COD, and the differences between those two (Figs. 7a, b, and c) are quite similar to the patterns of LWP (Fig. 6). It implies COD is largely controlled by LWP in this region. Of the individual CMIP6 models involved in this study (Fig. 7e), CESM2 is the only model showing reasonable COD performance, with a spatial correlation of 0.75 and a RMSE of 2.2. Both the spatial distributions and magnitudes of COD in the other four models exhibit marked biases compared to MODIS, with correlation coefficients smaller than 0.12 and RMSE larger than 3. In particular, the magnitude of COD in MPI-ESM-1-2-HAM is extremely low (generally  less than 0.4), which means its predicted COD has no physical meaning.

Cloud effective radius
Compared to MODIS r e , the CMIP6 models show poor performance in simulating r e over the SO. As seen in Fig. 8a, r e over the deep oceans is much larger in satellite observations than it is near the continent of South America. In contrast, the spatial pattern of the CMIP6 multimodel mean r e shows the opposite, i.e., larger r e is predicted over the near-shore regions (Fig. 8b). Moreover, the differences of r e climatology between MODIS and the CMIP6 models (Fig. 8c) show that the magnitude of r e over the SO is underestimated by about 11.8% by the CMIP6 models. The large biases in the deep oceanic regions are consistent among the models, as a high standard deviation of r e occurs only in the coastal regions near South America in the five CMIP6 models. The relationships between the five individual CMIP6 models and MODIS are shown in Fig. 8e. The RMSE of simulated r e in the five CMIP6 models compared to MODIS is between 0.8-2.6 μm, indicating stark differences in r e between the CMIP6 models and satellite observations.

Seasonal cycles of variables
The above evaluations focus on the climatology and spatial distributions of cloud properties. Here, we further evaluate the seasonal variations of CRE, the related cloud properties, and AOD in the five CMIP6 models. The simulated tenyear averaged seasonal cycles in SW and LW CRE in the five individual CMIP6 models exhibit good agreement with the ones from CERES (Figs. 9a and b). Both models and satel-lite observations show that SW CRE peaks in the austral summer, as the solar radiation reaches the maximum in the summer. Meanwhile, the CMIP6 models and CERES also agree that LW CRE peaks in the austral winter. However, there are significant differences between the CMIP6 models and MODIS in the temporal evolution of cloud physical properties and AOD throughout a year. The temporal changes in CF (Fig. 9c) between CMIP6 models and observations are relatively consistent. All models capture the peak in the austral winter, but two of them (GISS-E2-1-G and MPI-ESM-1-2-HAM) erroneously predict another peak in the austral summer that does not exist in the satellite observations. Different with CF seasonality, LCF reaches the maximum over the SO in the austral summer. Only UKESM1-0-LL can simulate such a pattern, while CESM2 and NorESM2-LM predict two LCF peaks in spring and fall.
Much larger disparities in the seasonal cycles can be identified for COD, r e , LWP, and AOD in comparing the CMIP6 models with MODIS. As shown in Figs. 9e, f, and g, the temporal changes in COD, r e , and LWP from MODIS share a similar pattern, i.e., an increase in austral winter and a decrease in austral summer. However, the modeled LWP seasonality is opposite to that pattern. Therefore, even though the modeled r e generally agrees with the observed seasonal pattern, the seasonality of COD is not comparable between models and MODIS. To be more specific, MPI-ESM-1-2-HAM, NorESM1-LM, and CESM2 simulate the oppositive trends, and UKESM1-0-LL cannot actually represent the obvious change of corresponding variables. Fig. 8. Same as Fig. 4, except for liquid cloud effective radius (r e ).

Exploration of uncertainty sources 4.1. CRE error estimation
To explore the possible influence of cloud physical properties on the net CRE, the correlation coefficients of spatial distributions between individual cloud physical properties and net CRE are calculated from three sources: the satellite observation, the CMIP6 multimodel mean, and the differences between models and observations (Fig. 10). Generally, all cloud physical properties examined here are positively correlated with SW CRE. For CF, the observations show a high correlation coefficient of about 0.6, while the modeled climatological means and biases do not show strong relationships between CF and SW CRE. In contrast, the models all agree well with observations on the strong positive relationship between LCF and SW CRE. For COD and LWP, the observations only show a moderate positive correlation, with coefficients no larger than 0.3, while much stronger correlations between COD/LWP and SW CRE are found in the CMIP6 models. It indicates that the radiation simulations of those CMIP6 models over the SO are too sensitive to LWP and COD. The positive correlation between CRE and r e is unexpected to some extent, and it implies that some other co- Fig. 9. Annual cycles of normalized cloud radiative effect and cloud physical properties in five CMIP6 models and satellite observations. Note that the normalization is calculated as the monthly-mean value divided by the maximum value throughout a year.
varying factors dictate such a relationship.
The relative importance of each factor in contributing to SW CRE can be partly revealed by comparing the correlation coefficients across all parameters in their relationship with SW CRE. In observations, CF and LCF exhibit the largest correlation, indicating highest relevance to SW CRE. However, for the CMIP6 models, COD, LWP, and r e are the more relevant variables compared with CF and LCF, corroborating the notion that the SW CRE controlling factors are quite different between the models and observations. For the biases in the modeled SW CRE, the potential largest contributors include LCF, LWP, and COD, as their biases resemble those of SW CRE well in space with high correlation coefficients. Furthermore, AOD-SW CRE relationships are found to be positive in both observations and models. However, the biases in the modeled SW CRE cannot be explained by those in the aerosol field, as there is a small correlation coefficient between aerosol and SW CRE biases.
The two key parameters in controlling CRE are CF and LWP. To quantitatively estimate their joint and relative contri- butions to CRE errors, we adopt a multivariate linear regression model to link CRE with CF and LWP. The regression slopes of CREs versus CF ( ) and LWP ( ) can be derived from Eq. (5), where CRE represents the LW, SW, or net CRE. r is the residual of the corresponding regression. Following Dolinar et al. (2015), the CRE sensitivity to CF or LWP ( ), the CRE errors from CF or LWP bias ( ), the co-variations ( ), and the total errors ( ) are computed as follows: Fig. 10. Correlation coefficients of spatial distributions between SW CRE and cloud physical properties or AOD in satellite products, multimodel mean, and the model biases (model minus observation). ε total =ε sen, CF + ε sen, LWP + ε bias,CF + ε bias, LWP + ε cov, CF + ε cov, LWP , where X is CF or LWP, the subscript obs means the observation, and m represents the simulated value.
ε sen ε bias ε cov ε total Four errors ( , , , ) for SW, LW, and net CRE are shown in Table 4. The SW CRE sensitivity to CF and LWP can reach 187.6 W m -2 and -104.9 W m -2 , respectively. The different signs of the sensitivity errors and biases from CF and LWP once again corroborate the notions that there are compensating errors in the modeled SW CRE calculation, and the cancellation of those errors result in smaller error in CRE. Similar with SW CRE, the net CRE is sensitive to CF and LWP with sensitivity magnitudes of 209.2 W m -2 and -106.4 W m -2 , respectively. LW CRE sensitivities to CF or LWP, the biases in CF and LWP, and the co-variations are much smaller than the ones in SW and net CRE, indicating that simulated SW sensitivity contributes most to the total errors. With larger magnitudes in all four errors, CF generally exhibits a larger influence on CRE in comparison with LWP.

Relationship between COD, LWP, and r e
As is discussed in section 3, the modeled spatial distributions of LWP, COD, and r e have different characteristics compared to the MODIS observations. Even though LWP and COD in the CMIP6 models share similar spatial patterns (Figs. 6 and 7), that of r e is largely different from them (Fig. 8). This motivates us to explore how COD is calculated in each model. Employing the canonical formula of the COD calculation in Eq. (10), we compare COD provided by the CMIP6 models with those calculated as a function of LWP and r e . Figure 11 uses scatter density maps to show the relationship between model-simulated COD and calculated COD based on model output and MODIS retrieved and calculated COD. In Fig. 11a, the correlation between MODIS retrieved and calculated COD reaches 0.9, with RMSE of 1.7. As is shown in Figs. 11b and c, the simulated COD values in CESM2 and UKESM1-0-LL have better consistency with the predicted values. In particular, the correlation coefficient R in CESM2 can reach 0.99. In the comparison of CRE (Figs. 2 and 3), the better inner relationship among the physical properties in CESM2 and UKESM1-0-LL can determine the simulation appearance of CRE. However, the simulated COD and calculated COD in GISS-E2-1-G, MPI_ESM_1-2_HAM, and NorESM2-LM shows weaker correlations, with a correlation coefficient R of 0. 53,0.63,and 0.45,. Specifically, MPI_ESM_1-2_HAM tends to simulate smaller COD than calculated COD. Such a discrepancy can be explained by the "reduction factor" introduced in the radiation scheme of the MPI model (Mauritsen et al., 2019). Its purpose is to account for the cloud hetero- Fig. 11. Scatter density plots of COD calculated offline by LWP and r e versus COD taken directly from model output or satellite product. genicity within a grid cell, but it is poorly constrained. The disagreement between calculated and online simulated COD may stem from the r e choice in the calculation based on the model output, as mentioned in section 2.

Plausible influence from aerosols
As a possible source of bias contributing to the modeled cloud radiative effect and physical properties in the CMIP6 models over the SO, aerosol fields in the CMIP6 models are evaluated with satellite products. Here, we focus on AOD, which is output by all models. Among the five CMIP6 models, the GISS model is an outlier, as its AOD values are about six times the MODIS observations. The other four models only differ from MODIS by 0.04 (Figs. 12a and b). However, the spatial correlations of AOD between the other four CMIP6 models and MODIS are rather poor, with coefficients smaller than 0.2. It implies the models have difficulty in predicting the sources of aerosols of the SO and related transport processes. Interestingly, even with the largely biased AOD magnitude, the GISS model shows good agreement with MODIS on the spatial pattern of AOD (Fig. 12c), showing much larger AOD in the Atlantic and Indian oceans of the SO than in the Pacific Ocean. The high AOD over the southern Pacific near 150°W is also captured by the GISS model. Note that the satellite-retrieved AOD is also subject to large uncertainty over the SO, as the highly frequent clouds make it difficult for the instruments to distinguish aerosol from cloud. Overall, a poor ability of aerosol simulation in the CMIP6 models is identified, and the biases can be propagated to simulating cloud physical properties over the SO.
The relationships between r e and AOD from MODIS and the five individual CMIP6 models are shown in Fig. 13. From Fig. 13a, the satellite-derived r e is dominated by cloud droplet radii between 13-15 μm, with AOD ranging from 0.04 to 0.19. The correlation between MODIS AOD and MODIS r e is 0.05, indicating that the signal of aerosol-cloud interaction is not strong in this long-time averaged spatial pattern of r e over the SO, and there should be some other more relevant factors. Three models predict a positive correlation between AOD and r e , while NorESM2-LM and GISS-E2-1-G simulate a negative correlation (Figs. 13d and e). The weak relationship in the observation and diverse relationships in the model simulations reveal that it is not a feasible way to identify aerosol-cloud interactions by simply correlating AOD with cloud properties through their spatial patterns.

Conclusion and discussions
Global climate models (GCMs) have been widely reported to exhibit too much absorbed solar radiation (ASR) over the Southern Hemisphere, especially over the Southern Ocean (SO), with a mean bias more than 30 W m -2 . Such a bias further leads to substantial uncertainty in simulating atmospheric circulations and storm activities in GCMs. Using CERES satellite observations as the benchmark, we show that the bias of ASR over the SO simulated in the CMIP6 models is reduced (within the range of ±10 W m -2 ) compared to the previous CMIP models. Over a large fraction of the SO, even an underestimation of ASR occurs. To understand the improvement of ASR simulations from CMIP3 to CMIP6, this study evaluates the performances of cloud characteristics in the five climate models (CESM2, GISS-E2-1-G, MPI-ESM-1-2-HAM, NorESM2-LM, and UKESM1-0-LL) participating in CMIP6. We use 10-year MODIS and CERES satellite observations to evaluate the cloud radiative effect and cloud physical properties in those CMIP6 models over the SO. The key findings are summarized as follows: Fig. 12. Spatial distribution of multiyear mean aerosol optical depth (AOD) over the SO from CMIP6 models and MODIS (a-c). Taylor Diagram of multiyear AOD spatial average of CMIP6 to MODIS (d). The axis is the standard deviation of CMIP6 simulated AOD and MODIS AOD. The RMSE of difference between CMIP6 and MODIS is represented as green dotted line. The arc refers the correlation coefficient between the four CMIP6 models and observation.
a) The ASR simulations in the CMIP6 models have been largely improved since CMIP3, and the simulations of cloud radiative effect in the five CMIP6 models match well with satellite observations. Especially for SW CRE, the correlations between the CMIP6 models and CERES are close to 0.9. Due to the significant cloud cooling effect over the SO, the net CRE is dominated by SW CRE. Overall, among the five CMIP6 models, CESM2 and UKESM1-0-LL exhibit better performances in simulating the regional radiation balance and components. b) Compared to MODIS, the CMIP6 models have similar spatial patterns of CF but with underestimated magnitudes over the SO. Similar with the CRE comparisons, CESM2 and UKESM1-0-LL also have better correlations with MODIS CF than the other three CMIP6 models. However, there are noticeable biases of LCF between the CMIP6 models and MODIS observation, with an overestimation at lower latitudes and underestimation at higher-latitude areas. Moreover, the discrepancies of cloud physical properties (LWP, COD, and r e ) between the CMIP6 models and MODIS appear even larger. The CMIP6 models fail to capture the spatial characters of cloud LWP and COD. Also, the magnitudes of LWP and COD simulated in certain CMIP6 models depart far away from the MODIS observations. Those identified biases are all larger than the estimated uncertainty in the satellite products. c) When comparing the spatial relationships between SW CRE and six cloud properties, we find that the simulated SW CRE in the CMIP6 models over the SO is too sensitive to the LWP and COD, while the observations show the SW CRE is mainly controlled by CF and LCF. An error estimation for CRE reveals that there are compensating errors in the modeled CF and LWP, and the cancellation of those errors result in smaller net error in CRE.
d) The simulated COD in CESM2 and UKESM1-0-LL are well correlated with the calculated COD based on LWP and r e . It is important to note that these two models also outperform the rest in the comparison of cloud radiative effect. It implies that the inner relationship among the physical properties in each CMIP6 model can impact the simulation of cloud radiative properties. e) Detailed analyses to explain the biases of cloud physical properties between satellite observation and the CMIP6 models are performed in this study. Compared to MODIS, the AOD is overestimated in the CMIP6 models, with lower correlation. The inconsistency of AOD simulated in the CMIP6 models can be responsible for the weaker performance of simulating cloud physical properties over the SO. Especially for GISS-E2-1-G, the magnitude of AOD is way too large, which could be another uncertainty source for cloud simulations over the SO. However, in both satellite observation and the five CMIP6 models, there are no strong relationships between the r e and AOD over the SO.
Data availability. All the CMIP6 model outputs used for this research can be downloaded from website at https://esgf-node. llnl.gov/search/cmip6/. The CERES observations used in this study were obtained from the NASA Langley Research Center