1 Introduction

The Intergovernmental Panel on Climate Change (IPCC) continues to recognize that simulations of clouds and their feedbacks are one of the largest uncertainties in today’s climate models (IPCC AR5 2013, http://www.ipcc.ch/report/ar5/wg1/). Due to their complex interactions with incoming solar (shortwave, SW) radiation and emitted terrestrial (longwave, LW) radiation, clouds induce both warming and cooling effects on the Earth system (atmosphere and surface). For simplicity, these effects are commonly estimated at the top of the atmosphere (TOA) and the surface and depend greatly upon the fraction of sky covered by clouds, cloud height, and cloud microphysical properties (Wielicki et al. 1996). Zhang et al. (2005) compared global climate model (GCM) simulated clouds with data from the NASA Clouds and the Earth’s Radiant Energy System (CERES) experiment (Wielicki et al. 1996, 1998) and the International Satellite Cloud Climatology Project (ISCCP) and found that most GCMs underestimated mid-latitude total cloud fractions (CF) but overestimated their optical depth. The underestimated CF and overestimated cloud optical depth in the models will tend to offset each other in calculating TOA radiation budgets.

For the last three decades clouds have been acknowledged to be one of the largest modifiers of the global climate system (Cess et al. 1990; Senior and Mitchell 1993; Zhang et al. 2005; Bony et al. 2006; Jiang et al. 2012; IPCC 2001; Yao and Del Genio 2002; Su et al. 2013b; Stanfield et al. 2014). The Coupled Model Intercomparison Project Phase 5 (CMIP5) project was implemented in response to these realities as a follow up to earlier versions (e.g. CMIP3) to make available contemporary climate simulations from several participating modeling centers. CMIP5 inherently takes on the challenge of understanding the issues concerning discrepancies among similarly forced models and their simulations of clouds (Taylor et al. 2012). Recent studies have investigated the progress of simulated clouds and their corresponding radiative forcings between CMIP5 and its predecessor, CMIP3. Although many improvements have been made in CMIP5 (Lauer and Hamilton 2012; Jiang et al. 2012; Wang and Su 2013; Li et al. 2013; Klein et al. 2013; and Chen et al. 2013), clouds and their feedbacks are still problematic in climate models as concluded in the IPCC AR5 Chapter 9 (2013).

Cloud vertical distribution and overlap are some of the major uncertainties in determining the heating/cooling profile by radiative and precipitable/evaporative processes (e.g., Stephens and Webster 1984; Morcrette and Jakob 2000; Stephens et al. 2002). The spread of climate warming predictions by a multimodel ensembles is, arguably, a result of an over simplification of cloud vertical distributions and their overlap assumptions (Stephens et al. 2002). Simulated cloud vertical distributions under the random overlap assumption shows better agreement with observations than those under other assumptions (maximum, minimum, and maximum-random). Yet still, the maximum-random overlap assumption has been used by most of the GCMs (e.g., Hogan and Illingworth 2000; Collins et al. 2001).

Lauer and Hamilton (2012) have revealed that the model simulated cloud radiative forcings (CRFs) tend to outperform CF results, suggesting that models are not accurately depicting fundamental cloud processes; rather, the models are being tuned to provide simulations that converge to observations. However, model developers cannot tune all parameters to match observations. Jiang et al. (2012) developed a grading scale in an attempt to rate each model based upon spatial mean, standard deviation, and correlation of combined clouds and water vapor fields. Furthermore, they highlighted that there exists large model spread and a high degree of discrepancy from observations, particularly in the upper troposphere. In many instances, when evaluating a multimodel ensemble, absolute model error is rather small. However, when evaluating models independently, the spread of the model results is fairly large because most GCMs used different cloud parameterization and radiation schemes, offering more complexity to these types of validation studies.

A recent study by Su et al. (2013b) suggests that most errors in model-simulated clouds are due to cloud parameterization errors, rather than the large-scale dynamics. Jiang et al. (2012) and Li et al. (2012) both reported that improvements were made in the representation of ice clouds from CMIP3 to CMIP5; a result due to improvements in cloud parameterization, such as the use of double-moment cloud microphysical schemes and two separate prognostic equations for ice and liquid clouds in some of models. Through their analysis, a 50 % reduction in error was apparent in the multimodel mean bias and root mean squared error (RMSE). Wang and Su (2013) investigated the relationships between CRFs and vertical velocities of the atmosphere in 12 uncoupled CMIP5 simulations and concluded that the overestimated net CRF (strong cooling) was primarily a result of overestimated SW CRF cooling and underestimated LW CRF warming. They further analyzed these models’ results based upon vertically driven dynamic regimes in an attempt to quantify biases within upwelling and downwelling (at 500 hPa) regions of the tropical oceans. In this study, through an integrative analysis of multimodel ensemble means and NASA satellite observations, we investigate modeled and observed clouds and CRFs, as well as their performance over in atmospheric upwelling and downwelling regions over the oceans (±45°N/S).

By applying a thorough error analysis technique, Zhu et al. (2007) quantified errors from cloud amount and cloud condensate within the response of doubled CO2 and 2K sea surface temperature (SST) perturbations. Due to its statistical significance, they instigated the multiple linear regression (MLR) method because it is an appropriate technique for identifying cloud feedbacks in climate sensitivity experiments. By applying this approach, they also quantified the errors in radiation feedbacks due to CF and cloud condensate, and characterized their corresponding implications on climate simulations. We employ a similar technique to quantify the errors in simulated CFs and cloud water path (CWP) on CRF simulations.

The general objective of this study is to determine the overall representation of clouds and their corresponding radiative forcings in uncoupled CMIP5 models, while inherently providing the practical foresight and motivation for climate model advancement. We separate our analysis into atmospheric upwelling and downwelling regimes based upon 500 hPa vertical velocities, considering that different cloud types, such as deep convective clouds associated with upwelling regions and stratiform clouds in downwelling regions, have different cloud radiative effects (Su et al. 2008a, b; Bony et al. 2004). Also, climate models have shown significantly different behaviors and a relatively large degree of multimodel spread in these two regimes (Su et al. 2013a). Cloud parameterization schemes oftentimes use in-cloud vertical velocities to distinguish convective and stratiform-type cloud predictions. By identifying systematic details between clouds and TOA radiation in both the observations and GCMs in these regimes, more suitable constraints may be applied to parameterizations. Considering the sensitivity between clouds and their radiative forcings in different dynamic regimes will provide insight as to whether errors from clouds or their microphysical properties (i.e. CWP) are contributing more to the overall CRF bias. Again, a quantitative assessment of these complex variables will support the improvement of climate models regarding the physical representation of clouds and their radiative feedbacks. This study evaluates the output from 28 uncoupled CMIP5 models together with satellite observations from CERES to MODIS, ISCCP, CloudSat/CALIPSO, as well as contemporary reanalysis data. Through an integrative analysis of 28 CMIP5 GCM output and multiple satellite observations, we quantitatively assess the strengths and weaknesses in current climate simulations and provide the information useful for model improvements.

Following the introduction, Sect. 2 highlights the methodology detailing the CMIP5 models and observational datasets used in this study. Section 3 presents the modeled and observed CF, CWP, TOA radiation budgets and CRFs. Section 4 investigates the sensitivity characteristics between CRFs and CF/CWP in the upwelling and downwelling regimes, while Sect. 5 summarizes the overall ability of the models to simulate clouds and TOA CRFs in these regimes. Section 6 provides a detailed analysis into the quantification of CRF biases, and finally, Sect. 7 provides general conclusions and suggestions for further analyses.

2 Data and methodology

2.1 Global climate models (GCMs)

In this study, we analyze the output from 28 models submitted to the CMIP5 as part of the Atmospheric Model Intercomparison Project (AMIP), which are available from the Earth System Grid Federation (ESGF) through the Program for Climate Model Diagnosis and Intercomparison (PCMDI). These models are available at http://pcmdi9.llnl.gov/esgf-web-fe/, and their associated center name, model name, and horizontal and vertical grid spacing are summarized in Table 1. These models were chosen based upon data availability at the onset of the study. The AMIP output is designed for historical climate simulations (1979–2008), while the NASA observations used in this study begin in March 2000. Therefore, the period between March 2000 and February 2008 is selected. The AMIP models are uncoupled models with climatologically prescribed SST and sea ice observations. For cloud simulations, the AMIP models are comparable with their coupled counterparts, which are linked to fully dynamic ocean models, although both versions have their own biases (Lauer and Hamilton 2012). Therefore it is suggested that the issues related to clouds and CRF uncertainties are not originating from discrepancies in the representation of SST fields, but rather the cloud simulations themselves, such as in convective and boundary layer cloud parameterizations. A study by Li et al. (2012) has found that the uncoupled models (prescribed SSTs) can produce a more accurate depiction of the field if there are no extreme events. Therefore, we focus on the AMIP runs and evaluate their simulated clouds and radiation budgets for this study.

Table 1 CMIP5 AMIP-type models evaluated in this study

Although the physics schemes are different for each model, facilitating an intercomparison encourages the application of well-performed parameterization schemes. Li et al. (2012) provided cloud physics schemes for the models in their study, of which 11 are used in this study. All modeled and observed results are averaged and interpolated to a standard 2.0° × 2.5° (latitude × longitude) grid for a side-by-side comparison and evaluation. Furthermore, we restrict the global evaluations to 65°S–65°N to avoid the large apparent multimodel spread in the Polar Regions (on average ~42 % with a maximum of ~60 %, not shown). We will henceforth be referring to global values as those from 65°S to 65°N. We must also recognize the intrinsic disadvantage of using monthly mean gridded data used in this study. Lin et al. (2010) found that different cloud types are likely mixed together at longer time scales and at relatively coarse, non cloud-resolving spatial grids. This creates additional uncertainties in model-observation comparisons.

2.2 Satellite observations

Satellites can provide a global view of cloud and radiation fields, through either direct observations or retrieved from physical and empirical methods, and are commonly used for evaluating climate model simulations. However, the uncertainties and potential biases of these products must be understood before they are used for evaluating model simulations. Table 2 offers a quick glance into the observational products and their associated uncertainties used in the study.

Table 2 Level-3 Global monthly mean gridded data products

2.2.1 CERES–MODIS (CM) SYN1 deg Ed2.6: daytime only

This study applies 8 years (March 2000–February 2008) of monthly mean combined Terra/Aqua MODIS retrievals for the evaluation of total column CF and CWP (both liquid and ice water). Due to the large uncertainties in the nighttime CWP retrievals (>50 gm−2), only daytime retrievals are used for the evaluation of CF and CWP (Stanfield et al. 2014). The CERES–MODIS (CM) cloud properties have been extensively validated with other space-borne satellites (Minnis et al. 1999, 2002, 2011) and ground-based measurements (Dong et al. 2008a; Xi et al. 2010, 2014a, b). For example, Dong et al. (2008a) documented uncertainties in the CM retrieved cloud liquid water path (LWP) and found mean LWP differences of 0.6 ± 49.9 gm−2 compared to DOE Atmospheric Radiation Measurement (ARM) ground-based microwave radiometer retrieved LWPs at the Southern Great Plains Central Facility. Minnis et al. (2011) found that the CM LWP over the ocean was, on average, 0.2 ± 53.6 gm−2 less than the LWP from matched overcast AMSR-E footprints. CM IWP retrievals show an average negative bias of 3.3 ± 16.2 gm−2 when compared to ground-based radar measurements (Mace et al. 2005). For a more detailed survey of CM cloud microphysical property descriptions and uncertainties please refer to Minnis et al. (2011).

2.2.2 ISCCP-D2

The ISCCP provides another CF product for comparison. The ISCCP CF has an estimated uncertainty of ~10 % (Han et al. 1994; Rossow et al. 1993). Although no comprehensive analysis is performed with ISCCP CF in this study, the excellent agreement of CF between ISCCP and CM (global average difference ~0.1 %, R2 = 0.84, not shown) may provide more confidence for modelers to use the long-term ISCCP results to evaluate their simulations in the future.

2.2.3 Combined CloudSat/CALIPSO–CERES MODIS (CCCM) data product

In addition to CM and ISCCP cloud products, the integrated CloudSat, CALIPSO, CERES, and MODIS (CCCM) Rel1B merged lidar/radar product provides instantaneous retrieved cloud properties and vertical profiles in CloudSat/CALIPSO (CC) ground track (Kato et al. 2010). In this study, only the CC CFs from this product have been compared with the other observations and GCMs because no statistical difference between the MODIS-in-CC swath and SSF products has been found (Xi et al. 2014a, b). The CC CF data are from the CCCM dataset. The uncertainty of CC combined cloud faction profiles has been estimated to be 5 % (Mace et al. 2009; Su et al. 2013b). Although the CCCM data product represents a time period different than that of the other observations and model simulations, the record represents the recent annual CF climatology, which has been used to establish representative statistics for model-observation comparisons (e.g. Jiang et al. 2012; Li et al. 2012; Su et al. 2013b). Since CALIPSO (an active remote sensor; lidar) is more sensitive than CM (passive remote sensor) to optically thin clouds (τ < 0.3) (Chiriaco et al. 2007; Minnis et al. 2008), we use the CC derived CF as an upper bound for this study. The CC dataset offers vertically distributed and total column CFs, however, only total column CF is used in this study.

2.2.4 CERES EBAF Ed2.7

The CERES Energy Balanced and Filled at the TOA (EBAF-TOA) Ed2.7 dataset (Loeb et al. 2012; Doelling et al. 2013) is used for radiation budget and CRF comparisons in this study. The CERES EBAF-TOA is an expansion on the CERES SYN1 deg product, designed for climate modelers that need a net imbalance constrained to the ocean heat storage term (Hansen et al. 2005). Edition 2.7 was released in the summer of 2013 with the following notable updates: the improvement of TOA clear-sky SW and LW fluxes in the regions with snow and sea-ice cover by increasing the sampling frequency and corrections to footprints containing two time zones, which now includes both for regional averages and has no influence on large-scale averages.

3 Results and discussion

The current status of simulated cloud properties, CF and CWP, and their corresponding effects on TOA radiation budgets are evaluated and shown in this section. The purpose is to identify biases and deterministic relationships between the observations and model simulations, and to provide aid to the advancement of model development. Multimodel ensemble means are used frequently in this analysis. The reader should be aware that this multimodel ensemble mean is simply the average of all 28-modeled values at each grid box to form a single solution for comparison. Additionally, near-global means are area-weighted averages using the cosine of latitude as weighting.

3.1 Global (65°N to 65°S) and zonal characteristics

Satellite derived and model simulated CFs are shown in Fig. 1a. The modeled CFs are, on average, underestimated by 7.6 and 7.9 % when compared to CM and ISCCP results, respectively, with an even larger negative bias (17.1 %) when compared to CC. Given the sensitivity of CC to optically thin cirrus clouds, the CC CF can be used as the upper bound of satellite observations and model simulations. Similar to Stanfield et al. (2014), CC derived CF would be close to both CM and modeled CF results if its CF results were averaged only with cloud optical depths >0.3 (not shown). Furthermore, the standard deviation (σ = 5.7 %) of modeled CFs does not fall within the range of observations. Only four models (CSIRO-Mk3.6.0 and the three models from GFDL: C180, C360, and CM3) have simulated CFs close to, or slightly larger than, CM and ISCCP CFs.

Fig. 1
figure 1

Globally (65°S–65°N) averaged (a) cloud fraction (CF), (b) cloud water path (both liquid and ice water paths) (CWP), TOA (c) reflected shortwave (SW) radiation and (d) outgoing longwave radiation (OLR) during all-sky conditions from 28 AMIP models and satellite observations: CERES MODIS/EBAF (red), ISCCP (green), Cloudsat/CALIPSO (black). Multimodel ensemble values are in blue. Model names are listed for reference

The CWP comparison (Fig. 1b) is similar to the CF results; the multimodel ensemble shows a negative bias of 16.1 gm−2 compared to CM. The standard deviation (σ = 37.4 gm−2, ~35 %) of the multimodel CWP is rather large due to a broad range of modeled results, 42.2 gm−2 (INM-CM4) to 197.7 gm−2 (GISS-E2-R). Ten models (BCC-CSM1.1, BCC-CSM1.1(m), BNU-ESM, CanAM4, CCSM4, GFDL: C180, C360, CM3, GISS-E2-R, and NorESM1-M) simulated CWPs are greater than CM, and five (BCC-CSM1.1, BCC-CSM1.1(m), BNU-ESM, GISS-e2r, and NorESM1-M) are significantly overestimated CWPs.

Figure 1c, d show the model simulated and CERES EBAF observed TOA reflected SW radiation and OLR, with the averaged differences (model minus data, hereafter for all differences and biases) of 1.8 Wm−2 (Fig. 1c) and −0.9 Wm−2 (Fig. 1d), respectively. Although the differences between ensemble means and observations are small, a few model results are not physically consistent. The TOA radiative fluxes, in particular the reflected SW flux, depend primarily on CF and CWP. The good agreement in reflected SW flux should be consistent with good agreements in CF and CWP (e.g., CanAM4), or complementary between CF and CWP, such as lower (higher) CF and larger (smaller) CWP (e.g., BCC-CSM1.1, BCC-CSM1.1(m), BNU-ESM, and GIS-E2-R). However, it does not make sense, physically, if the good agreement in reflected SW flux follows the same bias in both CF and CWP, as illustrated in the simulations of models ACCESS1.0, CESM1(CAM5), HadGEM2-A, MRI-AGCM3-2H, and MRI-AGCM3-2R. The simulated reflected SW flux has a negative correlation with OLR, which is self-consistent. In Sect. 4, we will further investigate the impacts of CF and CWP on the radiation budgets and CRFs.

To study their latitudinal variations, the zonally averaged CF, CWP, reflected SW radiation and OLR GCM simulations and satellite observations are shown in Fig. 2. Most simulated CFs and the multimodel mean agree well with both CM and ISCCP CFs in the tropical region (5°S–15°N), but then begin to diverge pole ward with a large discrepancy in the Southern mid-latitudes, consistent with Stanfield et al. (2014). Again, the CC derived CF is an upper bound of modeled CFs, and only a few models exceed the CC result near 60°N/S. Over the middle-latitudes, most models and the multimodel mean CWPs are lower than both CM. Yet still, GISS-E2-R and GFDL-CM3 greatly overestimated CWP in both the tropics and mid-latitudes (Fig. 2b).

Fig. 2
figure 2

Zonally averaged distributions of (a) CF, (b) CWP, TOA (c) reflected SW and (d) outgoing LW radiation. Colored lines representing observations are consistent with Fig. 1. The grey shaded area represents the 2σ range of the modeled results while the dotted lines are the maximum and minimum simulated values. Values in parenthesis correspond to weighted near-global means

The radiation flux comparisons are much better than their CF and CWP counterparts. Both the reflected SW radiation and OLR multimodel ensemble means converge to the observations through all latitudes, with an exception near the tropics (~10°S to 25°N) for the reflected SW radiation where CWP is similarly biased. The zonal variation of both modeled and observed reflected SW fluxes generally follow the same zonal pattern as CF and CWP with relatively large disparity over the Southern mid-latitudes. Simulated reflected SW and OLR fluxes and their corresponding ensemble means fluctuate around the observations. As expected, the OLR comparison agrees much better than the SW comparison because the reflected SW flux is strongly dependent of CF and CWP. The peak of reflected SW flux and corresponding dip in OLR, as well as relatively large variations in the model simulations near 5°N–10°N, are expected due to the frequent occurrence of deep convective clouds in that region.

The global mean comparisons in Fig. 3 and the zonally averaged comparisons in Fig. 4 are the same as those in Figs. 1 and 2, respectively, but for TOA SW, LW, and net CRFs. The SW (LW) CRF at the TOA is defined in Ramanathan et al. (1989) as the SW (LW) flux difference between all-sky and clear-sky conditions where a negative (positive) SW (LW) CRF denotes a cooling (warming) effect at the TOA. Although the clear-sky fluxes are not explicitly shown in this document, they have been investigated. The global averages of TOA reflected SW fluxes during the clear-sky conditions are 48.6 and 49.2 Wm−2, respectively, for CERES EBAF and the multimodel ensemble. For clear-sky OLR, they are 272.9 and 270.5 Wm−2, respectively. As shown in Fig. 3, the global averaged SW, LW, and net (SW + LW) CRFs from CERES EBAF are −50.1, 27.6, and −22.5 Wm−2, respectively, indicating a net cooling effect of clouds on the TOA radiation budget. The differences in SW and LW CRFs between observations and ensemble means are only −1.3 and −1.6 Wm−2, respectively, resulting in a larger net cooling effect of −2.9 Wm−2. Note that these SW and LW CRF differences include their clear-sky SW and LW (0.6 and −2.4 Wm−2, respectively). These comparisons are consistent to the results in Wang and Su (2013). However, the CRF biases in some models can amount to 10 Wm−2. Such biases are apparent in the simulated SW CRFs in BCC-CSM1.1 and NorESM1-M, the LW CRF in FGOALS-S1, and the net CRFs in CSIRO-Mk3.6.0 and FGOALS-S2. Therefore, a further investigation of these individual models is warranted.

Fig. 3
figure 3

Same as Fig. 1 but showing the SW, LW, and net CRF near global means

Fig. 4
figure 4

Same as Fig. 2 but showing the SW, LW, and net CRF zonal distributions

The zonal variations of SW and LW CRFs in Fig. 4a, b mimic their corresponding all-sky SW and LW flux variations in Fig. 2c, d, presumably due to the TOA CRF calculations (SW↑clr − SW↑all and LW↑clr − LW↑all). The strong SW cooling and LW warming effects between 5°N and 10°N are primarily contributed by deep convective clouds where the cloud albedo is ~0.7 and cloud-top height and temperature are ~10 km and ~220 K, respectively (Dong et al. 2008b). Over the southern mid-latitude ocean, marine boundary layer (MBL) clouds are dominant (Stanfield et al. 2014), which can attribute to strong SW cooling and moderate LW warming effects. With the magnitude of SW cooling dominating over LW warming, the zonal variation of the net CRF more closely follows that of the SW CRF with an overall cooling effect but to a lesser magnitude than the SW CRF. The strong SW cooling and LW warming effects over these two regions have motivated a further investigation into whether clouds and CRFs associate well with atmospheric vertical motions. Note that deep convective clouds are associated with strong upward motion, while MBL clouds are commonly related to strong sinking motion.

3.2 Biases in dynamically driven vertical velocity regimes over the oceans between 45°N to 45°S

Vertical velocities within the atmosphere are difficult to measure directly, thus we must use reanalyzed data. Modern-Era Retrospective Analysis for Research and Applications (MERRA) reanalyzed vertical velocities (omega, ω at 500 hPa) are used to investigate the relationships between cloud properties and CRFs where large scale areas of convection and subsidence are predominant within the atmosphere (i.e. The Hadley Cell and Walker Circulation). We notice that the errors in the MERRA reanalyzed vertical velocities may be large, such as in the Kennedy et al. (2011) study over the ARM SGP region, which may affect our selected upwelling and downwelling regimes. We adopt a suitable threshold for defining predominate upwelling and downwelling in the atmosphere. Tropical SW and LW CRFs are independent of ω at 500 hPa greater than ~25 hPa/day. However when ω at 500 hPa is less than −25 hPa/day there is a linear dependence of SW and LW CRF (Wang and Su 2013).

The regions with strong upwelling (ω < −25 hPa/day, blue) and downwelling (ω > 25 hPa/day, red) over the oceans are determined by MERRA reanalysis and are presented in Fig. 5a. Upwelling regions are typically representative of deep convective, and their accompanied anvil or cirrus, clouds while downwelling regions in the atmosphere are normally associated with high pressure systems where MBL clouds are predominant. The CF and CWP biases from 45°N/S are illustrated in Fig. 5b, c, respectively, and their average values over the identified upwelling and downwelling regions. The good agreement over the upwelling regime is expected because parameterized convective clouds are strongly associated with upward vertical velocity. Kennedy et al. (2010) compared the NASA GISS single column model (SCM) simulated CF with ARM SGP radar-lidar observations during the period 1999–2001. They found that the SCM simulated most of the high clouds over the upwelling regions because of their strong upward vertical velocities and positive relative humidity (RH) bias, while the SCM missed some low clouds over the downwelling regions due to a negative RH bias associated with subsidence. The slightly positive bias of CWP over the upwelling regions indicates that the models, in general, overestimated CWP even though they correctly simulated CF compared to observations.

Fig. 5
figure 5

a Relatively strong upwelling (ω < −25 hPa/day, blue) and downwelling (ω > 25 hPa/day, red) regimes over oceans are identified by the omega field at 500 hPa from MERRA reanalysis between 45°N/S. (b - f) Biases (multimodel ensemble means minus CERES observations) in CF, CWP, and SW/LW/Net CRFs. Black contours (±25 hPa day−1) help to visualize the relatively strong vertical velocity regimes. Biases in both upwelling and downwelling regions are averaged and presented to the right of each map

Biases in CRFs (Fig. 5d–f) are representative of their relative magnitudes of warming or cooling. For example, a negative bias in SW/Net CRF corresponds to an overestimate of cooling due to clouds. Conversely, a positive bias in LW CRF relates to an overestimate in warming due to clouds. The CF and CWP biases are large, up to −18 % and −25 gm−2 over the downwelling regime, whereas the CRF biases are relatively small, less than ~5 Wm−2. The cloud and radiation biases over these two regimes are consistent with the results in Figs. 1, 2, 3, 4. However, due to the large variations in CF and CWP in these two regimes, we will investigate the sensitivities of SW, LW, and net CRFs to CF and CWP in Sect. 4 to further understand why such relationships are concurrent.

4 Sensitivities of CRFs to CF and CWP

To quantitatively estimate the impacts of CF and CWP on TOA radiation budgets, the observed and the multimodel ensemble mean SW, LW, and net CRFs versus CF and CWP over the atmospheric upwelling (blue) and downwelling regimes (red) are presented in Figs. 6 and 7, respectively. These results are selected from the strong upwelling and downwelling regimes (black contours) over oceanic regions between 45°N/S in Fig. 5. A best-fit linear regression is employed and used to determine the sensitivity between the two variables in terms of slope (e.g. ∆CRF/∆CF). To add more certainty to this analysis, the 99.5 % confidence of the slope has been determined. The margin of error in the slope is identified to be

$$m_{E} = C_{V} \cdot S_{E} ,$$

where CV is the critical value and SE is the standard error. The sample distribution is assumed to be normal and large enough to be expressed by a z-score. When the z-score has a cumulative probability of 0.995, the corresponding critical value CV is 2.58. The standard error is calculated as

$$S_{E} = \sqrt {\frac{{\frac{1}{n - 2}\mathop \sum \nolimits \varepsilon^{2} }}{{\mathop \sum \nolimits \left( {x - \bar{x}} \right)^{2} }} ,}$$

where ε is the linear regression residual (\(\varepsilon = y - mx - b\)). In all cases but one, the margin of error is less than that of the slope itself, supporting our current method for determining the sensitivity between two variables and adds value to the analysis. In the case of the multimodel simulated net CRF sensitivity to CF, the margin of error (0.11 Wm−2 %−1) is greater than the characteristic slope (0.06 Wm−2 %−1). We use caution when considering this relationship and urge further analysis on the uncertainty issue.

Fig. 6
figure 6

Sensitivities of TOA (a, d) SW, (b, e) LW, and (c, f) net CRFs to CF in the upwelling (blue) and downwelling (red) regimes. The data are sampled from over the oceans only between 45°N/S. Left column represents the observed sensitivities from CERES MODIS, while right column is for the multimodel ensemble. Regression lines are shown for both regimes with the uncertainty of the slope (99.5 % confidence) in parenthesis

Fig. 7
figure 7

Same as Fig. 6 but for the sensitivities of CRFs to CWP

As illustrated in Fig. 6a, the sensitivities of observed SW CRF to CF are similar in both upwelling and downwelling regimes; the magnitude of SW CRF cooling increases significantly with increasing CF with a sensitivity of −1.2 Wm−2 %−1 (in units of watts per squared meter per percent cloudiness). The similar sensitivities (−1.2 and −1.31 Wm−2 %−1) over these two regimes are understandable from the definition of SW CRF (SW↑clr − SW↑all) because albedos of both deep convective clouds and MBL clouds are much higher than background ocean albedo (~6 %).

Conversely, LW CRF warming increases with increasing CF but is characterized by different sensitivities over the two regimes (Fig. 6b). The sensitivities are 0.81 and 0.22 Wm−2 %−1 over upwelling and downwelling regimes, respectively. The sensitivity difference of LW CRF over these two regimes primarily results from the cloud-top temperature difference of deep convective clouds (upwelling) and MBL clouds (downwelling). As discussed above, the cloud-top temperature of deep convective clouds is very cold (~220 K, Dong et al. 2008b), while MBL cloud-top temperature (~280 K, Dong et al. 2014; Xi et al. 2014a, b) is close to the underlying SST. Based upon the definition of LW CRF (OLRclear–OLRall), it is rather straightforward to explain the higher sensitivity of LW CRF to CF in the upwelling region than in the downwelling region. The net impact of CF on the TOA radiation budget is the sum of SW and LW effects, ΔCRFNET/ΔCF = ΔCRFSW/ΔCF + ΔCRFLW/ΔCF. The weak sensitivity of the net CRF to CF over upwelling regions is due to the complementing effects of SW and LW CRFs.

The sensitivities of simulated SW, LW, and net CRFs to CF (right column, Fig. 6) seemingly mimic their observed counterparts with slight differences in slope. For example, the sensitivities of SW CRF to CF over the upwelling and downwelling regimes (−1.14 and −1.05 Wm−2 %−1) are nearly the same as the observed ones. The downwelling LW CRF sensitivity is nearly the same as observed one; however in the upwelling regime, the simulated sensitivity is 0.39 Wm−2 %−1 stronger due to the inclusion of excess water (liquid, ice, or both) within the cloud column. The nearly neutral slope of net CRF to CF over the upwelling regime confirms that the model simulations effectively show a cancelation of the SW cooling and LW warming effects. The conclusions in Fig. 6 (i.e., the magnitude of SW/LW CRF (cooling/warming) increases with increasing CF) affirm those within Dong et al. (2006) who used the DOE ARM SGP ground-based observations as a reference. The distinction between upwelling and downwelling regimes suggests that large-scale dynamics greatly influence cloud-radiation interactions and their predictability. In a later section we will investigate the effect of the biases in these sensitivities on the actual simulation of CRFs.

The sensitivities of observed SW CRFs to CWP in these two regimes, shown in Fig. 7a, are nearly identical with a slope of −0.28 Wm−2/(gm−2). The magnitude of SW cooling increases with increasing CWP (and CF), and varies in sensitivity between these two variables. Changes in the SW albedo are strongly dependent on both CF and CWP. The simulated sensitivities of SW CRF to CWP are nearly the same over the two regimes, and are similar to the observed counterparts. Again the comparisons in SW CRF sensitivities in FigS. 7a and 7d are nearly the same as those in Figs. 6a, d. The LW sensitivities (Fig. 7b, e) are not strongly regime dependent, but show significant differences between observations and model simulations. The slope of modeled LW CRF (Fig. 7e) to CWP in the upwelling regime is almost twice than what is observed (0.21 Wm−2/(gm−2) versus 0.11 Wm−2/(gm−2)), similar to the result in Fig. 6.

5 General score of simulated clouds and TOA CRFs

Taylor diagrams have been generated using the 1-sigma spatial standard deviations and correlations to compare the 28 AMIP model simulations with the observational datasets in this study. The results selected from the strong upwelling and downwelling regimes are illustrated in Fig. 8 for CF and CWP, and Fig. 9 for CRFs. Taylor diagrams are an excellent tool for displaying many simulated fields together to effectively demonstrate how well they compare to observations and to track changes through the consideration of correlations, spatial standard deviations (normalized by the observed value), and RMSEs (Taylor, 2001). If the model simulations agree well with observations, then the simulated results would fall within the correlation range of 0.9–1.0 and near the reference point (REF, σ = 1.0). The simulated CFs in the upwelling regime has a correlation range of 0–0.6, MIROC5 even has a negative correlation (although small) with observations, and their standard deviations fall into a range of 1.0–2.0 (Fig. 8a). Similarly for CWP in the upwelling regime, their correlations are similar to or slightly better than CF comparisons, however, their standard deviations range from 0.5 to 2.0, with significantly larger standard deviations in GISS-E2-R, FGOALS-S2, and GFDL-CM3 CWP simulations (Fig. 8c). Over the downwelling regime, their correlations are slightly higher, and their standard deviations scatter around the reference line better than their upwelling counterparts.

Fig. 8
figure 8

Taylor diagrams for 28 AMIP model simulations of CF (a, b) and CWP (c, d) are shown for the upwelling (left) and downwelling (right) regimes over the oceans between 45°N/S. Correlations and standard deviations are normalized by CERES MODIS observations

Fig. 9
figure 9

Same as Fig. 8 but for TOA CRFs; SW (a, b), LW (c, d), and net (e, f)

The results from Fig. 8 translate fairly well to the overall score of simulated CRFs (Fig. 9). In general, the correlations in CRFs are higher, and their standard deviations are lower than those in CF and CWP. Again, the correlations and standard deviations in the downwelling regime are better than those over the upwelling regime. For example, most of the CRF correlations fall within 0.6–0.9, some more than 0.9, and their standard deviations are near the reference point in the downwelling regime. However, two models, CNRM-CM5 and MIROC5, show a small negative correlation to observations in the dowelling regime for TOA SW CRF (Fig. 9b).

Note that the results in Figs. 8 and 9 are different to those in Fig. 5. The good agreement in their mean values, but low correlations and high standard deviations over the upwelling regime, can be explained. MBL clouds are persistent in the downwelling regime, while for the upwelling regime, there are a variety of cloud types, such as cumulus, anvil, cirrus, and mixed-phased clouds. In order to adequately resolve clouds in GCMs, different types of cloud parameterizations are implemented based upon ambient atmospheric conditions. When a diverse cloud field exists, cloud parameterizations will show difficultly in generating accurate simulations of different cloud types. On the other hand, if a uniform cloud type is present (as in the downwelling regime), cloud parameterizations should replicate the atmospheric conditions more effectively and accurately.

6 Error analysis: where are these errors coming from?

This section provides a quantification of various error types due to biases in the CRF sensitivities to CF/CWP and from the simulated CF and CWP biases themselves. Such diagnoses are useful for identifying the dominant sources of model errors and areas most in need for model improvements. Again, we focus this analysis to oceanic regions between 45°N/S.

We treat TOA CRF at each grid point as a function of CF and CWP with the corresponding cloud radiative kernels,

$${\text{CRF}}_{\text{Y,m}} = \left| {\left( {\frac{{\partial {\text{CRF}}_{\text{Y}} }}{{\partial {\text{CF}}}}} \right)} \right|_{\text{m}} {\text{CF}}_{\text{m}} + \left| {\left( {\frac{{\partial {\text{CRF}}_{\text{Y}} }}{{\partial {\text{CWP}}}}} \right)} \right|_{\text{m}} {\text{CWP}}_{\text{m }} ,$$

where Y = SW, LW or net and the subscript m represents the model simulated value. In this study, we categorize clouds by their residence in the upwelling or downwelling regime (Fig. 5), although it is recognized that the cloud radiative kernels may vary significantly for clouds of different heights, phases, and particle sizes within each upwelling or downwelling regime. We use the regression slopes of CRFs versus CF or CWP for each regime, i.e., the sensitivity of CRF to CF or CWP shown in Figs. 6 and 7, in place of cloud radiative kernels, as a first-order approximation.

Hence, the total regime-averaged simulated CRF error relative to the observed CRF can be decomposed into the error associated with the discrepancy in CRF sensitivity to CF or CWP (\(\varepsilon_{\text{sen,CF}} \;{\text{and}}\; \varepsilon_{\text{sen,CWP}} )\), the error resulting from the averaged CF or CWP bias for each regime (\(\varepsilon_{\text{CF}} \; {\text{and}}\;\varepsilon_{\text{CWP}} )\), and the co-variations \((\varepsilon_{\text{cov,CF}} \;{\text{and}}\;\varepsilon_{\text{cov,CWP}} )\),

$$\varepsilon_{\text{total}} = \varepsilon_{\text{sen,CF}} + \varepsilon_{\text{sen, WP}} + \varepsilon_{\text{CF}} + \varepsilon_{\text{CWP}} + \varepsilon_{\text{cov,CF}} + \varepsilon_{\text{cov,CWP}} .$$

The three corresponding errors are computed as follows:

$$\varepsilon_{\text{sen,X}} = \left[ {\left| {\left( {\frac{{\partial {\text{CRF}}_{\text{Y}} }}{{\partial {\text{X}}}}} \right)} \right|_{\text{m}} - \left| {\left( {\frac{{\partial {\text{CRF}}_{\text{Y}} }}{{\partial {\text{X}}}}} \right)} \right|_{\text{o}} } \right] {\text{X}}_{\text{o}} ,$$
$$\varepsilon_{\text{X}} = \left| {\left( {\frac{{\partial {\text{CRF}}_{\text{Y}} }}{{\partial {\text{X}}}}} \right)} \right|_{\text{o}} \left[ {{\text{X}}_{\text{m}} - {\text{X}}_{\text{o}} } \right] ,\; {\text{and}}$$
$$\varepsilon_{{\text{cov} ,X}} = \left[ {\left| {\left( {\frac{{\partial {\text{CRF}}_{\text{Y}} }}{{\partial {\text{X}}}}} \right)} \right|_{\text{m}} - \left| {\left( {\frac{{\partial {\text{CRF}}_{\text{Y}} }}{{\partial {\text{X}}}}} \right)} \right|_{{\text{o}}} } \right] \left[ {{\text{X}}_{\text{m}} - {\text{X}}_{\text{o}} } \right] ,$$

where X = CF or CWP and the subscript o represents the observed value.

Table 3 shows the three sources of errors for SW, LW and net CRF in the upwelling and downwelling regimes separately. The multimodel means and standard deviations across the models are listed. In the upwelling regime, the simulated LW CRF sensitivity to CF contributes predominantly to the LW CRF total errors in terms of both mean and model spread (22.4 ± 20.2 Wm−2), as shown in Fig. 6. Other sources of errors such as the SW CRF sensitivities to CF or CWP, LW sensitivity to CWP, the biases in CF and CWP, and the co-variations contribute similarly (around 5 Wm−2 or less) to the total CRF errors in the upwelling regime.

Table 3 Summary of the different error sources in simulated SW/LW/Net CRFs with contributions from CRF sensitivities to CF or CWP, CF or CWP biases, and co-variations in the upwelling and downwelling regimes, separately

In the downwelling regime, the most dominant source of the error in CRF is associated with the underestimate of CF amount, as evidenced in Figs. 1 and 2. The regime-averaged SW CRF error from the CF bias amounts to −23.2 ± 9.6 Wm−2. In addition, the errors associated with the SW CRF sensitivities to CF and CWP account for errors of −17.2 ± 11.7 and 14.9 ± 19.4 Wm−2, respectively. The bias in CWP also contributes sizably, about −6.9 ± 3.9 Wm−2, to the SW CRF error on the regime average. The compensating effects of SW CRF sensitivity to CF and CWP result in rather small errors in the total SW and net CRF in the downwelling regime, indicating a common model deficiency. Only through detailed error analyses such as the decomposition conducted here, will we be able to better understand the processes accountable for model problems.

Total CF errors, including the errors associated with the CRF sensitivity to CF and those associated with the CF biases and the co-variations in both vertical velocity regimes are listed for each model in Table 4. For biases in SW CRF within the descending branch of the large-scale circulation, the errors associated with CF are negative for every model, indicating a rather universal model bias. This is mainly due to the underestimate of both CF and sensitivity of SW CRF to CF. MRI-AGCM3-2S produces the largest SW CRF error by CF, −60.6 Wm−2, and CSIRO-Mk3.6.0 has the smallest counterpart −11.1 Wm−2. Conversely in the ascending branch, the SW CRF errors by CF are positive for the majority of the 28 models, with 10 models having negative errors. CSIRO-Mk3.6.0 produces the largest error in SW CRF in the upwelling regime, −39.2 Wm−2, while CMCC-CM has the smallest error of −2.6 Wm−2.

Table 4 Summary of total errors by model in SW/LW/Net CRFs contributed by simulated CF biases in both upwelling and downwelling regimes

The errors associated with CF in LW CRF are predominantly negative (i.e., underestimate of LW cloud warming) in the descending branch of the large-scale circulation for all models in the ensemble, except in GFDL-HIRAM-C360. The largest error is from CCSM4 (−12.3 Wm−2) while GFDL-HIRAM-C180 yields no error in the downwelling regime. The upwelling regimes experience mostly positive errors associated with CF, primarily from the overestimate of LW CRF sensitivity to CF in the models. GFDL-HIRAM-CM3 poses the largest error of 77.8 Wm−2 in the overestimate of LW CRF, while MPI-ESM-MR has the least amount of error with −0.3 Wm−2. On average, the total net CRF error from CF is larger for the downwelling regimes than for the upwelling regime.

Table 5 is equivalent to Table 4, however it summarizes the errors associated with CWP for each model. In general, the maximum and minimum errors due to the CWP errors are less than those due to CF and the inter-model spreads are within one order of magnitude.

Table 5 Same as Table 4 but for total CRF errors contributed by simulated CWP biases

7 Conclusions

Globally simulated CF, CWP, TOA radiation budgets and CRFs from 28 CMIP5 AMIP models are evaluated (between 65°N/S) and compared with multiple satellite observations (CERES/MODIS/ISCCP/CloudSat/CALIPSO) during the 2000-2008 time period (CCCM data are from 07/2006 to 06/2010). The model biases are identified and quantified to facilitate model improvement in the future. From the eight-year comparisons between model simulations and observations, we have made the following conclusions.

  1. 1.

    The modeled CFs are, on average, underestimated by ~8 % when compared to CERES-MODIS (CM) and ISCCP results with an even larger negative bias (17.1 %) compared to CCCM. Most of the modeled CFs and the ensemble mean agree well with both CM and ISCCP CFs in the tropical region (5°S–15°N), but then diverge poleward, with a large discrepancy in the Southern Ocean. The CWP comparison is similar to the results in CF with a negative bias of 16.1 gm−2 compared to CM. The model simulated and CERES EBAF observed TOA reflected SW and OLR fluxes, on average, differ by 1.8 Wm−2 and −0.9 Wm−2, respectively.

  2. 2.

    The nearly global averaged SW, LW, and net CRFs from CERES EBAF are −50.1, 27.6, and −22.5 Wm−2, respectively, indicating a net cooling effect of clouds on the TOA radiation budget. The differences in SW and LW CRFs between observations and the multimodel ensemble are −1.3 and −1.6 Wm−2, respectively, resulting in a larger net cooling effect of 2.9 Wm−2 in the model simulations. The strong SW cooling and maximum LW warming effects from 5 to 10°N are primarily contributed by deep convective clouds while the moderate LW warming in the 40–60°S latitude band is due to persistent MBL clouds.

  3. 3.

    Further investigation of cloud properties and CRFs over the identified upwelling and downwelling regimes reveals that the model biases in the upwelling regime are much less than those over the downwelling regime. Sensitivity studies have shown that the magnitude of SW CRF cooling increases significantly with increasing CF with nearly the same sensitivity over both the upwelling and downwelling regimes (−1.20 and −1.31 Wm−2 %−1, respectively). The model simulations provide similar characteristics but not without some discrepancy. The 28-model ensemble underestimates the sensitivity between SW CRF and CF by 0.06 and 0.26 Wm−2 %−1 in the strong convective and subsidence regions, respectively. Conversely, the observed LW CRF increases with increasing CF but with different sensitivities (a strong warming of 0.81 Wm−2 %−1 and a moderate warming of 0.22 Wm−2 %−1 over the upwelling and downwelling regimes, respectively). The difference in sensitivity is due to the distinct cloud-top temperature characteristics of deep convective clouds (upwelling) and stratiform marine boundary layer clouds (downwelling). The multimodel ensemble does a fair job in simulating the observed LW sensitivity in the downwelling regime (−0.04 Wm−2 %−1 bias), however it provides an overly sensitive SW CRF (0.39 Wm−2 %−1 bias) in tropical and mid-latitude upwelling regions.

  4. 4.

    Several dominant sources of CRF errors are identified. The error sources that contribute largely to the regime-averaged CRF errors are: the modeled errors in the LW CRF sensitivity to CF in the upwelling regime, the errors in the simulated CF and CWP amounts, and the sensitivities of SW CRF to CF and CWP in the downwelling regimes.

Although there are several studies related to the evaluation of the CMIP5 GCM simulated cloud and radiation fields using observations, this study provides a more comprehensive assessment of the CMIP5 simulations using multiple satellite observations. More importantly, we have investigated the impact of CF and CWP on the TOA radiation fluxes and CRFs, quantitatively estimated the sensitivities of SW/LW CRFs on CF and CWP in strong upwelling and downwelling regimes, and have performed a detailed error analysis. These results will provide a better means of representing the true physical interactions between clouds and TOA radiation budgets and help modelers to better predict future climate scenarios. It is our hope that these comparisons and the statistical results from this study may aid in the advancement of the GCM simulations of clouds and TOA radiation budgets in future versions of CMIP.