1 Introduction

Societies in the Maritime Continent depend on their water supply from monsoon rainfall, generated as part of large-scale movement of the Intertropical Convergence Zone (ITCZ) in its passage from the Southern to Northern Hemispheres and back. It is therefore crucial that General Circulation Models (GCMs) are able to correctly simulate mean climate and its variability in the Maritime Continent. In particular, the Coupled Model Intercomparison Project phase 5 (CMIP5) simulations are used in the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report for future climate projection (Flato et al. 2013). However, the ability of GCMs to simulate the mean climate and climate variability over the Maritime Continent remains a modeling challenge (Jourdain et al. 2013). As one of the main diabatic heat sources for regional and global circulation, biases in the mean state simulations of the Maritime Continent also affect tropical and extratropical variability and teleconnections (Neale and Slingo 2003; Wang et al. 2014). In order to have confidence in the future projections made by models, the models should be able to correctly simulate characteristics of the current climate on regional (Maritime Continent) and global scales.

Previous studies mentioning the Maritime Continent in CMIP5 are largely focussed on the topic of nearby monsoon regions. Several studies have evaluated the fidelity of CMIP5 models in simulating the Australian monsoon (Jourdain et al. 2013; Ackerley et al. 2014), the western Pacific monsoon (Brown et al. 2013) and the Asian summer monsoon (Sperber et al. 2013). These studies found that different models show varying ability at each aspect of monsoon simulation. However, no single model in the CMIP5 ensemble best represents all aspects of the monsoon, either in an individual subregion or when considering all characteristics of the monsoon as a whole. Comparison studies between different phases of the CMIP multi-model ensemble found that generally there are improvements in the performance of CMIP5 over CMIP3 (Sperber et al. 2013; Jourdain et al. 2013) possibly due to increased horizontal and vertical resolution in the atmosphere and ocean, parameterization development and the improved representation of Earth-system processes in CMIP5 models. However, considerable systematic errors exist, suggesting that the models are still lacking good representations of the necessary physical mechanisms involved.

Atmospheric general circulation models (AGCMs) have prescribed sea surface temperature (SST); they therefore lack SST biases and could possibly have smaller errors in the large-scale circulation when compared to coupled GCMs. Atmospheric Model Intercomparison Project (AMIP) integrations using standardized lower boundary conditions enable the identification of atmospheric model deficiencies and common features. However, the lack of air-sea coupling in AGCM experiments may also introduce new systematic biases in some regions, with further feedbacks on circulation patterns. Wang et al. (2005) show that ocean-atmosphere coupling is important to the simulation of Asian-Pacific summer monsoon rainfall variability. The comparison between coupled and atmosphere-only simulations suggests that AMIP models simulate the wind better in the western Pacific monsoon (Brown et al. 2013). Li and Xie (2014) suggest that the equatorial Pacific cold tongue bias in coupled models arises from wind biases resulting from interaction with the ocean via Bjerknes feedback. However, both coupled and uncoupled model simulations fail in reproducing observed precipitation over the tropics, suggesting that the representation of convection is likely to be a key source of error (Brown et al. 2013; Li and Xie 2014; Ackerley et al. 2014).

In this paper, we evaluate the CMIP5 model performance in reproducing the observed seasonal climate over the Maritime Continent, focusing mainly on the AMIP experiment. We quantify the model performance using two metrics that measure the magnitude of simulation errors and the degree of similarity between the observed and simulated field. To determine what aspects of the models are most important for correctly representing the Maritime Continent precipitation, our study investigates three potential sources of model systematic errors: the role of horizontal resolution, the relationship to errors in the mean meridional circulation and global monsoon, and the impact of air-sea coupling. This reveals a possible connection between global biases and local Maritime Continent biases. Next, we performed clustering analysis on the annual cycle of precipitation in the AMIP experiment of CMIP5 to group together models with common systematic errors and to determine if they are connected to particular features at the large scale.

The paper is organized as follows. We first describe the CMIP5 models and observational (reanalysis) data used in this study in Sect. 2. In Sect. 3, we assess the atmosphere-only model simulations of seasonal precipitation and low-level wind in the Maritime Continent. Section 4 investigates the possible sources of model bias. In Sect. 5, we present clustering analysis on the annual cycle climatology of precipitation. Discussion is given in Sect. 6, followed by conclusions.

2 Data and methods

2.1 Models

The 30-year period (1979–2008) of AMIP from 28 CMIP5 models is analyzed in this study. These AMIP experiments are forced by the same prescribed SSTs and sea ice. The original horizontal resolution of the prescribed SST boundary conditions created by Program For Climate Model Diagnosis and Intercomparison (PCMDI) is 1\(^{\circ }\)  longitude \(\times\) 1\(^{\circ }\)  latitude. These underlying SST data are interpolated by individual modelling groups to a model’s own resolution for performing the AMIP experiments. A general description of the AMIP boundary condition is presented in Taylor et al. (2012). Only models that submitted precipitation, zonal (u), meridional (v) and vertical (omega) components of wind to the database were selected for this study. The Maritime Continent domain in this study is defined covering ranges of latitude 20\(^{\circ }\)S–20\(^{\circ }\)N and longitude 80\(^{\circ }\)E–160\(^{\circ }\)E.

The AMIP simulations are often used to identify deficiencies of the atmospheric model that are unaffected by the systematic SST biases present in coupled models. However, the lack of two-way ocean-atmosphere coupling and SST response to the atmospheric forcing in these atmosphere-only models may also introduce new biases in some regions (Wu and Kirtman 2005; Wang et al. 2005). To determine the impact of SST biases and air-sea coupling on model performance in the Maritime Continent, we make a comparison between 22 coupled CMIP5 simulations and their corresponding AMIP simulations in Sect. 4.3. We also examine the fidelity of all 46 coupled GCMs in CMIP5 in simulating the Maritime Continent mean climate from 1979–2005 (the part of the coupled experiment that overlaps AMIP). These historical runs (coupled ocean-atmosphere) are forced by observed atmospheric composition changes in both anthropogenic and natural sources, and also include land use change (Taylor et al. 2012). For brevity, the coupled simulations in CMIP5 will be denoted as “CMIP5” while the AMIP simulations with prescribed SST will be denoted as “AMIP5” hereafter.

Table 1 lists the model name, modeling center, experiment type and horizontal resolution of these models. Detailed documentation of the CMIP5 models and experiments can be found at http://cmip-pcmdi.llnl.gov/cmip5. The monthly model data were bi-linearly interpolated to a common 3.75\(^{\circ }\)  longitude \(\times\) 3\(^{\circ }\)  latitude grid for comparison with each other and with respect to observations and to enable computation of error statistics.

Table 1 CMIP5 model name, modeling center, atmosphere horizontal resolution, experiment type and key references

The multi-model mean (MMM) is obtained by taking a simple arithmetic average of climate variables among the 28 AMIP5 models. We have also calculated the 46 CMIP5 model MMM as well as the MMM of the 22 overlapping AMIP5 and CMIP5 models respectively for comparison in Sect. 4.3. The overall performance of AMIP5 and CMIP5 models is determined based on the MMM skill scores (see Sect. 2.3).

2.2 Observations

Precipitation data from the Global Precipitation Climatology Project (GPCP) of the 30-year period from 1979 to 2008 are used in this study to validate the models. This dataset consists of a combination of rain gauges, satellites and sounding observations that have been merged to estimate monthly rainfall on a 2.5-degree global grid (Adler et al. 2003).

The zonal, meridional and vertical components of wind data on a 0.7-degree grid for the same period were obtained from the ERA-Interim reanalysis data produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) (Dee et al. 2011) for validation on various pressure levels.

These monthly precipitation and wind observations (reanalysis) data were bi-linearly interpolated to the common 3.75\(^{\circ }\)  longitude \(\times\) 3\(^{\circ }\)  latitude grid for comparison with the model simulations.

2.3 Skill scores and correlation analyses

We have used two metrics to evaluate the model performance in simulating the Maritime Continent seasonal climate. The pattern correlation coefficient (PCC) is calculated to measure the degree of similarity in the spatial patterns between the observed and simulated fields. The root mean square error (RMSE) is used to measure the magnitude of simulation errors.

Correlation analyses are used to assess the relationship between different biases, for example between local (Maritime Continent) precipitation biases and biases in the global monsoon and circulation. We computed the Pearson correlation coefficient (r) to measure the strength of the association and the direction of a linear relationship between the two biases in the set of models. However, as the Pearson correlation is sensitive to outliers, we have also calculated the Spearman’s rank correlation coefficient which is more robust. The Spearman’s rank is the non-parametric version of the Pearson correlation calculated using the ranks of data. The correlation coefficients from the two methods show some differences. However, only results from Spearman’s rank correlation are shown except in Figs. 3 and 13 where both coefficients are shown.

2.4 Global monsoon metric

In Sect. 4.2.2, we will examine two annual cycle modes of the climatological monthly mean precipitation using an approach adapted from Wang and Ding (2008). In their study, Wang and Ding (2008) identified two leading empirical orthogonal function (EOF) modes that can represent the annual cycle of tropical precipitation and global monsoons. The first EOF is in phase with the annual cycle and shows the boreal summer and winter monsoon rainfall regimes (off-equatorial ITCZ positions) while the second EOF represents the location of the spring and autumn ITCZ, closer to the equator. Wang and Ding (2008) showed that the first EOF (solsticial mode) is equivalent to the difference between solsticial seasons (JJAS minus DJFM) and the second EOF (equinoctial mode) can be depicted by the difference between equinoctial seasons (AM minus ON). These seasonal differences can be used as simple metrics of the seasonal cycle of global monsoon precipitation. In this study, for consistency with the seasons used in other aspect of our analyses, the solsticial mode of the annual cycle is calculated by taking the JJA mean minus DJF mean precipitation and the equinoctial mode is calculated by taking the MAM mean minus the SON mean precipitation over the domain 45\(^{\circ }\)S–45\(^{\circ }\)N and 0\(^{\circ }\)–360\(^{\circ }\)E.

2.5 Cluster analysis

Hierarchical clustering analysis of Maritime Continent annual cycle precipitation was performed to characterize the model systematic biases in AMIP5 by grouping together models that are similar, in Sect. 5. From the clustering analysis, we investigate the biases in groups of models to see if they pertain to common features in the large-scale atmosphere. We used the Euclidean distance metric (the square root of the sum of square distances) to measure the similarity between each model using the time-latitude mean precipitation, zonally averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E (refer to Fig. 7). The Euclidean distance between two models A and B is defined as:

$$\begin{aligned} d(A, B)= \sqrt{\sum \limits _{i} \sum \limits _{j} (A_{i,j} - B_{i,j})^{2}} \end{aligned}$$
(1)

where i and j are the latitude and month respectively (Wilks 2011).

The two models with closest similarity in Maritime Continent annual cycle precipitation averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E are merged to form a new cluster based on a defined criterion and linkage method. The process is repeated until all models are merged into one cluster. The optimum number of clusters is chosen based on a cut-off point (threshold value) when there is a sudden increase in the distance value which reflects that the clusters that were joined were relatively far apart.

To ensure the robustness of the results, we have tested six linkage methods (not shown) to cluster the similarity between the 28 models. The six methods are single, complete, average, centroid, Ward’s and weighted linkage. We chose complete linkage for our analysis in this study based on the agreement of its results with other methods, and it also produced a better cluster around the central value with smaller variance. In complete linkage, the distance between two clusters is defined as the maximum distance between any two models when one model is chosen from each cluster and all possible pairs are compared (Wilks 2011).

3 AMIP5 model evaluation

In this section, we examine the AMIP5 model performance in reproducing the seasonal climate, particularly focusing on the winter (DJF) and summer monsoons (JJA).

Maritime Continent domain seasonal mean GPCP precipitation and ERA-Interim mean 850 hPa wind are shown in Figs. 1a and 2a. The Maritime Continent receives an abundance of rainfall throughout the year. However, there are pronounced seasonal variations in precipitation and wind patterns. The Intertropical Convergence Zone (ITCZ), where the trade winds from the Northern Hemisphere converge with those from the Southern Hemisphere throughout the year, moves northward in boreal summer and shifts to the south in boreal winter. The annual cycle of monsoons is generated as part of the large-scale movement of the ITCZ in its passage from the Southern to Northern Hemisphere and back. The GPCP observed seasonal precipitation shows that the central and southern part of the Maritime Continent receives substantial precipitation during the (boreal) winter monsoon and less rainfall during the summer monsoon. In contrast, the Maritime Continent region north of 10\(^{\circ }\)N (Myanmar, Thailand, Laos, Vietnam, Cambodia and the Philippines) experiences a dry season during the winter monsoon and a wet season during the summer monsoon. The seasons between the summer monsoon and winter monsoon are known as intermonsoon seasons (not shown).

Fig. 1
figure 1

DJF precipitation (mm/day) and 850 hPa wind (m s\(^{-1}\)) for a GPCP and ERA-interim, b MMM biases and cad AMIP5 biases for 1979–2008 over the Maritime Continent region (20\(^{\circ }\)S–20\(^{\circ }\)N, 80\(^{\circ }\)E–160\(^{\circ }\)E). Third panel shows the Maritime Continent domain and land-sea mask

In DJF, most models exhibit wet biases over the West Pacific Ocean except for four models: BCC-CSM1-1 (Fig. 1e), BCC-CSM1-1m (Fig. 1f), IPSL-CM5B-LR (Fig. 1w) and MRI-CGCM3 (Fig. 1ac). About two-thirds of the models underestimate the precipitation over the land. This can be seen from the MMM (Fig. 1b) whereby most of the land has dry biases except for the islands of Sulawesi, New Guinea and the Philippines. The dry biases over land are associated with easterly wind biases over the region. This easterly wind bias and its associated dry bias is a common error in the atmosphere-only models. Models are able to capture the reversal of Australian monsoonal circulation from low-level westerly winds in DJF (Fig. 1) to easterlies in JJA (Fig. 2) over northern Australia. However, most of the models simulate weaker westerlies over northern Australia in DJF, while a few models such as CCSM4 (Fig. 1i), CSIRO-Mk3-6-0 (Fig. 1l), GFDL-HIRAM-360 (Fig. 1q), MIROC5 (Fig. 1x) and NorESM1-M (Fig. 1ad), simulate stronger westerlies and wet biases over northern Australia.

Fig. 2
figure 2

JJA precipitation (mm/day) and 850 hPa wind (m s\(^{-1}\)) for a GPCP and ERA-interim, b MMM biases and cad AMIP5 biases for 1979–2008 over the Maritime Continent region (20\(^{\circ }\)S–20\(^{\circ }\)N, 80\(^{\circ }\)E–160\(^{\circ }\)E). Third panel shows the Maritime Continent domain and land-sea mask

Biases in JJA aren’t very consistent, i.e. the MMM bias is small compared to individual model biases. Models are more consistent in DJF, especially over the southern Maritime Continent. A notable difference between DJF and JJA seasons is the more common presence of large biases of precipitation over the Maritime Continent north of 10\(^{\circ }\)N in JJA (Fig. 2) as compared to DJF, indicating that the models simulate the monsoonal precipitation poorly. In JJA, BNU-ESM (Fig. 2g), CSIRO-Mk3-6-0 (Fig. 2l), IPSL-CM5A-LR (Fig. 2u), IPSL-CM5A-MR (Fig. 2v) and NorESM1-m (Fig. 2ad) models simulate weaker westerlies over the Maritime Continent north of 10\(^{\circ }\)N whereas other models simulate overly strong westerlies that extend too far east into the West Pacific Ocean as shown in Fig. 2. These biases are consistent with precipitation biases, with weak westerlies associated with underestimation of the precipitation while strong westerlies increase the moisture supply and lead to overestimation of precipitation over the Maritime Continent north of 10\(^{\circ }\)N and West Pacific. This also implies that stronger westerlies to the north in JJA are also a response to a stronger monsoon.

Our result agrees with Ackerley et al. (2014), who found that the summer precipitation biases over northern Australia in AMIP5 simulations are linked to the low-level winds. Models that overestimate the northern Australian precipitation have mean northerly flow between 120\(^{\circ }\) and 150\(^{\circ }\)E, which transports moisture from the ocean, whereas other models that underestimate the precipitation have mean southerly flow across the same range of longitudes. We next use correlation analyses to determine the relationship between the PCC of precipitation and 850 hPa wind across the suite of models, as well as the RMSE.

Table 2 PCC and RMSE skill scores for annual cycle climatology of precipitation, DJF and JJA seasonal mean precipitation and 850 hPa wind

The PCC and RMSE are calculated with respect to GPCP precipitation and ERA-Interim 850 hPa winds over the Maritime Continent domain of latitude 20\(^{\circ }\)S–20\(^{\circ }\)N and longitude 80\(^{\circ }\)E–160\(^{\circ }\)E in winter and summer seasons and are listed in Table 2. The text in bold highlights the best performing models showing either highest PCC or lowest RMSE. Three models (MRI-AGCM3-2S, MRI-AGCM3-2H and MRI-CGCM3) from the same centre capture the spatial pattern of the precipitation in DJF with PCC higher than 0.8 as shown in Table 2. In JJA, five models (CCSM4, IPSL-CM5B-LR, MRI-AGCM3-2S, MRI-AGCM3-2H and MRI-CGCM3) capture the spatial pattern of the precipitation (PCC > 0.8). A few models such as CMCC-CM and F-GOALS-g2 have a substantial RMSE of more than 5 mm/day with particularly large precipitation errors over the region north of 10\(^{\circ }\)N in JJA. For the 850 hPa wind, most models can adequately simulate the spatial pattern of low-level winds, with nearly half of the models having PCC scores higher than 0.9 in DJF and only two models (FGOALS-g2 and MRI-CGCM3) having PCC scores less than 0.9 in JJA. In terms of magnitude of simulation errors, only the MMM and MRI-AGCM3-2H have RMSE less than 2 m s\(^{-1}\) in DJF. Most models have higher RMSE in JJA compared to DJF.

The MMMs for precipitation have higher PCC and lower RMSE scores than almost all individual models for all seasons. The MMMs also have PCC above 0.8 for both precipitation and low-level wind in all seasons. The better performance of the MMM in reproducing the observed mean precipitation is in agreement with other CMIP5 studies (Colman et al. 2011; Jourdain et al. 2013; Sperber et al. 2013; Feng et al. 2014) which found that the MMM outperforms individual models at reproducing the observed monsoon climate.

To condense the information from the spatial performance skill scores of all the models and compare them for different fields, we plot scatter diagrams of precipitation PCC and 850 hPa wind PCC in Fig. 3a and also the RMSE skill scores for the same fields in Fig. 3b (in other words, we generate scatter plots using pairs of columns from Table 2). The Pearson correlation coefficients (r) and Spearman’s rank correlation coefficients (sr) in the scatter plot of precipitation and 850 hPa wind in Fig. 3 suggest that the modeled precipitation biases are somewhat linked to the meridional circulation at 850 hPa, consistent with Sperber et al. (2013) for the Asian monsoon region. The two correlation coefficient types are comparable except for JJA and SON RMSE. This is because JJA and SON precipitation RMSE scores feature a number of model outliers with substantial biases of more than 4mm/day and the Pearson correlation is sensitive to outliers. We will therefore use Spearman’s rank correlation in the remaining sections. The PCC values correlate better (>0.45) than the RMSE except for MAM season, which is low for both. The positive linear relationship reflects the intrinsic moisture transport link between precipitation and winds in the tropics.

Fig. 3
figure 3

Scatter plot of the AMIP5 seasonal mean a PCC and b RMSE of Maritime Continent (20\(^{\circ }\)S–20\(^{\circ }\)N, 80\(^{\circ }\)E–160\(^{\circ }\)E) precipitation versus 850 hPa winds for each season. The Pearson correlation coefficient (r) and Spearman’s rank correlation coefficient (sr) for each season are shown in the yellow box on the top left corner. Both correlation coefficients for PCC and r for RMSE are statistically significant with a p-value less than 0.05 for most seasons except for MAM. For RMSE Spearman’s-rank correlation, only JJA sr is statistically significant with a p-value less than 0.05

The PCC in Fig. 3a also shows that the 850 hPa wind is better simulated than the precipitation in all four seasons, as found by Brown et al. (2013) over the Western Pacific monsoon region and Sperber et al. (2013) over the Asian monsoon region. This makes sense, since one might expect the large-scale flow, which is resolved, to be represented better than rainfall, which is parameterized.

Both the RMSE and PCC in Fig. 3 and Table 2 also show that the magnitude and spatial distribution of the biases vary in each model according to season, i.e, the poor models in one season do not necessarily poorly represent other seasons. For example FGOALS-g2 captures the DJF and MAM precipitation but poorly simulates the JJA and SON precipitation. GISS-E2-R performs poorly in simulating DJF (PCC 0.540) precipitation over the Maritime Continent but simulates other seasons well (PCC > 0.7). MRI-AGCM3-2S and MRI-AGCM3-2H capture both the precipitation and low-level winds in all seasons.

4 Investigating potential sources of model biases

In this section, we will analyse how the performance of models over the Maritime Continent depends on model characteristics, such as resolution, or on the representation of the global monsoon. This may give us clues as to what aspects of the models are most important in order to correctly represent the climate of the Maritime Continent. We focus on three possible sources of the Maritime Continent precipitation biases: the role of horizontal resolution, the relationship to biases in the local Hadley circulation and global monsoon, and the presence or lack of air-sea coupling.

4.1 Sensitivity of simulated mean climate to AMIP5 model resolution

Current GCMs still exhibit large precipitation biases over the Maritime Continent region. Qian (2008) suggests that insufficient representation of land-sea breezes associated with the under-representation of the islands and orography in the Maritime Continent in coarse-resolution global circulation models leads to the underestimation of precipitation over the Maritime Continent.

Neale and Slingo (2003) found that a decrease in grid spacing from about 350–110 km does not reduce the precipitation biases in the Maritime Continent whereas other more recent studies (Schiemann et al. 2014; Johnson et al. 2015) suggest increasing resolution improves the precipitation simulations. Schiemann et al. (2014) attributed the improvement of Maritime Continent precipitation to the better resolved boundary conditions (land-sea mask, soil and vegetation parameters) when the resolution increased from approximately 350–110 km. Johnson et al. (2015) showed that better representation of the orography over the Maritime Continent at high resolution (approximately 40 km) improves precipitation over the islands compared to coarse resolution (approximately 200 km). However, Neale and Slingo (2003) used the older version of the Met Office model (HadAM3), whereas Schiemann et al. (2014) and Johnson et al. (2015) used the newer version of the Met Office Unified Model (MetUM). Despite different conclusions, these studies highlight the Maritime Continent as a region where the simulated mean climate has some sensitivity to resolution.

In this section, we will now assess the sensitivity of Maritime Continent precipitation to climate model resolution. Among the 28 AMIP5 models in this study, the highest horizontal resolution is \(0.2^{\circ } \times 0.2^{\circ }\), while the lowest resolutions are as coarse as \(3.7^{\circ } \times 1.9^{\circ }\) and \(2.8^{\circ } \times 2.8^{\circ }\) (refer to Table 1). The monthly Maritime Continent precipitation values were bi-linearly interpolated to a common \(3.75^{\circ } \times 3^{\circ }\) grid for the calculation of model skill scores.

More than half of the highlighted top 5 models with the highest PCC and lowest RMSE scores in Table 2 are from the higher resolution models, which are models with ranking number 1–6 in horizontal resolution, sorted from highest to lowest in Table 2. This suggests that the highest resolution models have lower biases on average, although these models may also have other advantages independent of resolution.

Fig. 4
figure 4

Maritime Continent monthly precipitation a PCC for all 28 models divided into high (blue lines), medium (grey lines) and low resolution (red lines) groups and b PCC for eight models, where each pair of models belong to the same institution and solid lines represent the higher resolution models while dashed lines represent the lower resolution models

We assess the sensitivity of Maritime Continent precipitation to climate model resolution by dividing the models into 3 categories, which are the 6 models with the highest resolutions (blue lines), the 6 models with the lowest resolutions (red lines) and the remaining 16 models at intermediate resolutions (grey lines), as shown in Fig. 4a. The individual model monthly precipitation PCCs in Fig. 4a suggest that not all high resolution models produce better precipitation simulations and vice-versa. The same calculations performed over the land-only precipitation and sea-only precipitation model skill scores (both PCC and RMSE) also show no clear relationship between resolution and model performance (not shown). A comparison between model pairs from the same institution with different resolutions in Fig. 4b shows that only the BCC-CSM1-1m model with higher resolution performs better than its corresponding lower resolution model, whereas the other lower resolution models perform better than their corresponding higher resolution models in most months. This suggests that the cause of deficiencies is largely unrelated to resolution, although these model pairs may have other differences in addition to resolution.

4.2 Mean meridional circulation and the global monsoon

Wang et al. (2014) found that ascent in the tropics also influences subsidence over the subtropics in the Hadley circulation and suggest that remote biases are linked with regional biases. To investigate if errors in the local scale over the Maritime Continent are related to errors in the large-scale movement of the ITCZ, we will assess local Hadley circulation and the global monsoon biases and their relationship with Maritime Continent precipitation biases.

4.2.1 Mean meridional circulation

The ERA-Interim mean meridional circulation in Fig. 5a, d shows seasonal variability in the intensities and locations of the ascending and descending branches of the local Hadley cell. In winter, the ascending branch is located around 5\(^{\circ }\)S while during summer, a broader ascending branch is located around 15\(^{\circ }\)N. In general, most of the models produce a similar structure and location of the local Hadley circulation with respect to ERA-Interim in all four seasons. The model with lowest RMSE, MPI-ESM-LR, is able to reproduce the Hadley Circulation with PCC above 0.9 for winter (Fig. 5b) and summer (Fig. 5e) monsoons.

Fig. 5
figure 5

DJF mean meridional circulation averaged over the Maritime Continent region (80\(^{\circ }\)E–160\(^{\circ }\)E) using omega (shading and arrow, Pa s\(^{-1}\)) and v (arrow, m s\(^{-1}\)) from a ERA-Interim and b the AMIP5 model with lowest RMSE and c the AMIP5 model with highest RMSE. df are as in ac but for JJA season. RMSE between observed and simulated local Hadley Circulation is above each panel

The location of the ascending branch of the local Hadley circulation is consistent with the seasonal shift of the ITCZ and maximum precipitation. The reduced ascent over the Maritime Continent in CNRM-CM5 (Fig. 5c) is connected with dry biases over the Maritime Continent in DJF. The overly strong ascent around 15\(^{\circ }\)N simulated in FGOALS-g2 (Fig. 5f) in JJA is associated with strong overestimation of rainfall over the region.

Table 3 Spearman’s rank correlation between AMIP5 model skill scores at simulating the local Hadley circulation and skill scores for simulation of Maritime Continent (20\(^{\circ }\) S–20\(^{\circ }\)N, 80\(^{\circ }\)E–160\(^{\circ }\)E) precipitation for the seasons indicated

The correlation between RMSE skill scores of local Hadley circulation and precipitation in Table 3 also suggests that the seasonal mean rainfall biases over the Maritime Continent are linked to the local Hadley Circulation biases. The RMSE correlates better than the PCC except for the MAM season, which is low for both. The Spearman’s rank correlation coefficients for RMSE are above 0.35 and statistically significant (\(p<0.05\)) for all seasons except for MAM. Thus there is some connection between errors at the relatively small scale of the Maritime Continent and the larger global-scale circulation.

4.2.2 Global monsoon

Next, we investigate if the models that have a better representation of the monsoons on the global scale (solsticial and equinoctial modes using the metrics defined in Sect. 2.4) have a better representation of the seasonal mean and annual cycle of precipitation over the Maritime Continent.

Fig. 6
figure 6

Comparison of the spatial pattern of the solsticial mode (JJA minus DJF) between a GPCP, b the AMIP5 model with the lowest RMSE and c the AMIP5 model with the highest RMSE. df are as in ac but for equinoctial mode (MAM minus SON). The PCC and RMSE calculated between observed and simulated patterns (in the domain 45\(^{\circ }\)S–45\(^{\circ }\)N and 0\(^{\circ }\)–360\(^{\circ }\)E) are above each panel

The global solsticial mode (JJA minus DJF) in Fig. 6b and equinoctial mode (MAM minus SON) in Fig. 6e for a selected model with the lowest RMSE (MRI-AGCM3-2H) are in good agreement with observations (Fig. 6a, d), but the model generally overestimates the overall amplitude of the solsticial and equinoctial precipitation signal. The PCCs are comparable for both solsticial and equinoctial modes but the RMSE scores are slightly larger for the solsticial mode. The model with the highest RMSE bias, FGOALs-g2, simulates an overly strong precipitation amplitude for both solsticial (Fig. 6c) and equinoctial (Fig. 6f) modes, especially over the tropical Indian and western Pacific Oceans.

Next we plot the local annual cycle of precipitation in these examples, as a time-latitude diagram averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E in Fig. 7. We can see that both MRI-AGCM3-2H and FGOALS-g2 also simulate overly strong monsoon precipitation, especially in summer and autumn seasons, which is consistent with the solsticial and equinoctial biases in Fig. 6. This suggests that Maritime Continent precipitation biases are related to global monsoon biases. This also indicates how closely the seasonal movement of the global-scale ITCZ is related to local precipitation over the Maritime Continent. The PCC and RMSE of Maritime Continent annual cycle precipitation with respect to GPCP for all AMIP5 models are listed in Table 2.

Fig. 7
figure 7

Latitude-time plot of precipitation zonally averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E for a GPCP, b MRI-AGCM3-2H and c FGOALS-g2. White dashed line shows the position of the maximum precipitation each month. Precipitation biases with respect to GPCP are shown for this same temporal-spatial averaging for d MRI-AGCM3-2H and e FGOALS-g2

Table 4 Spearman’s rank correlation between skill scores at simulating the global monsoon (solsticial mode and equinoctial mode) and both the local Hadley circulation and precipitation over the Maritime Continent respectively

To explore this further beyond the two models shown in Fig. 7, Table 4 shows the Spearman’s rank correlations of scores representing skill at simulating the solsticial mode or equinoctial mode with scores at simulating local Hadley circulation and Maritime Continent precipitation. Most of the correlation coefficient values are above 0.4 (\(p<0.05\)) suggesting that those models having a better representation of the global monsoons (solsticial and equinoctial modes) will also have a better representation of the mean meridional circulation and precipitation pattern over the Maritime Continent region. Thus our analysis demonstrates a connection all the way from the skill at simulating the global-scale circulation down to the regional scale of Maritime Continent precipitation in the AMIP5 models.

4.3 Sensitivity of simulated mean climate to ocean-atmosphere coupling

Ocean-atmosphere coupling is important for monsoon simulation. Song and Zhou (2014) compared the coupled and uncoupled simulations from CMIP5 and found that air-sea coupling improves the East Asian Summer Monsoon simulation in CMIP5 models. On the other hand, while many errors arise from cloud and convective parameterizations affecting coupled and atmosphere-only models alike, other errors arise through coupled feedbacks. Li and Xie (2014) attributed the double ITCZ problems in CMIP5 to cloud simulation errors in the atmospheric model, while the equatorial Pacific cold tongue errors were attributed to ocean-atmosphere feedbacks.

In this section, the analysis in Sect. 3 was first repeated for 46 coupled versions of CMIP5 models and summarised briefly: we found some similarities between CMIP5 and AMIP5. For instance, the MMM has better skill at reproducing the observed mean climate than individual models. The 850 hPa wind is better simulated than the precipitation in all four seasons in terms of its pattern correlation. The coupled versions of CMIP5 also have a significant spread in model performance. The good models in one season do not necessarily represent other seasons well.

To determine the potential impact of SST biases in simulating Maritime Continent precipitation, we made a comparison between 22 coupled CMIP5 model simulations and their corresponding AMIP simulations, which are a subset of the 28 models analysed in the previous section above. This is because modelling groups often run several different versions of their models, whereas not all coupled and atmosphere models from a given group are directly equivalent. The mean-state climate and biases of these 22 CMIP5 models are almost identical to the 46 CMIP5 models (not shown). The same result holds for AMIP5, whereby the 22 models have similar mean state biases to those of the 28 AMIP5 models. The MMMs shown in the remainder of this section consist of 22 corresponding CMIP5 and AMIP5 models respectively.

Fig. 8
figure 8

Comparison between CMIP5 and AMIP PCC of a Maritime Continent annual cycle precipitation averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E, b solsticial mode, c equinoctial mode. MMMs are plotted in black color

In Fig. 8a, the comparison between 22 AMIP5 and CMIP5 models PCCs in reproducing the annual cycle (time-latitude) precipitation shows that most CMIP5 models from a given modelling group perform better than their corresponding AMIP5 models. AMIP5 and CMIP5 MMMs have very similar values of PCC with observations (0.968 and 0.969) and RMSE (0.99 and 0.98 mm/day, not shown). However, the PCCs (as well as RMSE) for individual models from CMIP5 and AMIP5 vary greatly, and 14 of the 22 CMIP5 models have higher PCC scores than their corresponding AMIP5 models. This seems to suggests that air-sea coupling improves the simulation of the Maritime Continent annual cycle precipitation despite the inevitable SST biases. However, AMIP5 models generally outperform CMIP5 models in the simulation of solsticial (Fig. 8b) and equinoctial (Fig. 8c) modes. This opposite result suggests that SST biases and ocean-atmosphere feedback errors introduce larger biases in the coupled models at the large scale. Fig. 8 also indicates that there is a clear lack of correlation between a model’s performance at simulating the annual cycle of precipitation in atmosphere-only mode and in coupled mode. This is also the case for the patterns of the equinoctial and solsticial modes.

Although most AMIP5 models simulate the seasonal mean local Hadley circulations better than CMIP5 for all seasons (figure not shown), we also found mixed results for model skill at reproducing the seasonal precipitation and 850 hPa wind patterns over the Maritime Continent. CMIP5 better simulates both JJA and SON precipitation and low-level winds, whereas AMIP5 shows better simulation of the DJF and MAM seasonal mean climate. This suggests that air-sea coupling can be important for Maritime Continent climate simulation but its impact is complex.

Table 5 Spearman’s rank correlation of skill at simulating Maritime Continent precipitation with skill at simulating each of the four listed fields for the 22 CMIP5 models

CMIP5 (coupled) models show a weaker correlation between skill scores for simulating Maritime Continent precipitation and skill scores for simulating local Hadley Circulation than in AMIP5. The correlation coefficients of skill scores for simulating Maritime Continent precipitation with skill scores for simulating the low-level wind is also lower in CMIP5 than in AMIP5. However, CMIP5 skill scores for simulating Maritime Continent precipitation has stronger correlation with skill scores for simulating global monsoon solsticial mode compared with AMIP5 (see Table 5).

Coupling adds extra complexity, which will be a focus of later work when we will look at SST bias, but it is beyond the scope of this paper.

5 Clustering of the AMIP5 Maritime Continent annual cycle precipitation

Cluster analysis is used for classification of homogeneous climate patterns and weather regimes in climate studies (Unal et al. 2003; Bao and Wallace 2015). Apart from that, cluster analysis can be used to group ensemble members for operational forecasting purposes (Molteni et al. 1996; Legg et al. 2002) and classifying CMIP5 climate change projections (Masson and Knutti 2011; Mizuta et al. 2014).

In this study, hierarchical clustering analysis of Maritime Continent annual cycle precipitation was performed to characterize model systematic biases in the AMIP5 runs and determine if these biases are related to common factors elsewhere in the tropics. We chose to perform the clustering analysis on the annual cycle time-latitude diagram since it is a single map which contains information from a range of seasons. The annual cycle diagnostic captures the seasonal movement of the ITCZ in the Maritime Continent domain, so it is a diagnostic at the intersection of global and local scale.

Fig. 9
figure 9

a Hierarchical clustering dendrogram. The models in the same colors are in the same clustering group while the distances greater than or equal to the threshold are colored black. b Box-and-whisker plot of PCC between AMIP5 simulations and GPCP observations of Maritime Continent annual cycle precipitation averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E. Green dots are models’ PCC; magenta lines indicate the median; red dots represent the mean and blue boxes indicate the interquartile range (IQR). The plus signs are the outliers, which are PCC scores smaller than the lower quartile by at least 1.5 times the IQR

In cluster analysis, the models that are similar are grouped together based on minimizing the Euclidean distance of the Maritime Continent annual cycle precipitation between each pair of models or model clusters using the complete linkage method as described in Sect. 2.5. The dendogram in Fig. 9a shows that the clustering resulted in five clusters with 13, 8, 4, 2 and 1 model(s) in each of the clusters.

To ensure that the clustering analysis is robust and models that are similar are grouped together, we calculated the PCCs between AMIP5 simulations and GPCP observations and plotted these in a box-and-whisker plot in Fig. 9b according to each cluster. Clusters I and II are quite distinct in terms of their PCC scores. Almost all of the models in Cluster I perform better than Cluster II models in simulating the annual cycle of precipitation. Cluster III only consists of four models and has a large spread in PCC values. Clusters IV and V consist of only two and one model respectively, which are also outliers from the overall sample. Consequently, for the remainder of the composite analysis in Sect. 5.1, we will only consider Clusters I and II. Although there is one outlier each in Clusters I and II, we have also examined the Euclidean distances between each of the AMIP5 simulations and the GPCP observations as well as the Euclidean distances between each of the models with other models in all clusters (figure not shown) to ensure that the models that are most similar are clustered together. We found that all the models have smaller distances between the models in the same cluster and bigger distances between models of different clusters. This shows that the clustering analysis is able to successfully group together models that are most similar.

The dendogram in Fig. 9a also shows that models from the same institution mostly belong to the same clusters (Masson and Knutti 2011; Mizuta et al. 2014). For example, the models from the same institutions such as MRI (AGCM3.2 models), GFDL (HIRAM models), CSIRO-BOM (ACCESS models), IPSL (CM5A models) and MPI (ESM models) are in the same cluster. The models that shared the same atmospheric model also tend to cluster. The ACCESS models are based on the UK Met Office HadGEM atmospheric component and are in the same cluster with the HadGEM2-A model. The NorESM1-M model uses the same atmospheric model as CCSM4 (Bentsen et al. 2013), and BNU-ESM also uses the similar Community Atmospheric Model version 4 (CAM4) atmosphere. Both models are in the same cluster as CCSM4. On the other hand, MRI-CGCM3 is an Earth System Model, with its atmosphere component interactively coupled to an aerosol model, and it is in a different cluster from the less complex MRI-AGCM3.2H and MRI-AGCM3.2S models.

5.1 Composites of mean climate simulation biases for the leading two clusters of the AMIP5 models

In this section, the composites of Clusters I and II are obtained by taking the average of all models in each cluster. Firstly, we looked at the latitude-time plot of precipitation averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E in Fig. 10 (first column), the metric on which the models are clustered, which shows the transitions of precipitation during the course of the annual cycle. Cluster I simulates a similar seasonal migration of precipitation over the Maritime Continent (Fig. 10a) to GPCP (Fig. 7a). Cluster I is also able to capture both the winter monsoon and summer monsoon shift and also the movement of the ITCZ, but it overestimates the precipitation, especially during the JJA and SON seasons (Fig. 10b). Cluster II simulates less seasonal migration and the position of maximum rainfall stays closer to the equator throughout the year (Fig. 10e, f). The PCC and RMSE for the composites of the two clusters show that Cluster I has better skill than Cluster II at simulating the annual cycle climatology of precipitation.

Fig. 10
figure 10

Cluster I a latitude-time plot of precipitation averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E, the white dashed line indicates the position of maximum precipitation for each month, roughly illustrating the ITCZ. b Precipitation biases with respect to GPCP. Cluster I zonal mean meridional circulation omega (shading and arrow, Pa s\(^{-1}\)) and v (arrow, m s\(^{-1}\)) biases with respect to ERA-Interim averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E for c DJF and d JJA. eh are as in ad but for Cluster II. PCC and RMSE are shown above each panel

To see if these errors in Maritime Continent ITCZ position are related to the larger-scale overturning circulation, we next investigate the relationship between precipitation biases and local Hadley circulation biases during DJF and JJA seasons in the two clusters. We can see from Fig. 10b, f that Cluster II has larger precipitation biases than Cluster I in DJF. The Cluster I dry biases correspond to local Hadley circulation subsidence biases in the southern Maritime Continent. Cluster II wet biases over the northern Maritime Continent and dry biases over the southern Maritime Continent are consistent with the ascent and descent biases in Fig. 10g.

During JJA, the clusters show different biases in precipitation. Cluster I has the correct pattern but too large magnitude of precipitation in the observed wet regions of the Maritime Continent. There is a slight discrepancy between the wet biases and ascent biases. Note that, for regions with near-zero rainfall biases or small positive precipitation biases, there are sometimes downward motion biases (Fig. 10d, \(\simeq\)7 \(^{\circ }\)N). This discrepancy may occur because GPCP observations are used for precipitation and ERA-Interim for vertical motion in this study. Cluster II has dry biases north of the equator and wet biases to the south. These biases are consistent with the biases in ascent and descent of the local Hadley circulations in Fig. 10h. Cluster II simulates the ascending branch too far south, which results in overestimation of rainfall in the southern Maritime Continent and underestimation in the northern Maritime Continent. This shows that the precipitation biases in the Maritime Continent are linked closely to the (local) Hadley circulation.

Fig. 11
figure 11

Zonal mean precipitation in the global domain (averaged between 0\(^{\circ }\) and 360\(^{\circ }\)E) for a GPCP, b Cluster I and c Cluster II, the white dashed line indicates the position of maximum precipitation for each month, roughly illustrating the ITCZ. Biases with respect to GPCP for d Cluster I and e Cluster II. PCC and RMSE are shown above each panel

Examining the tropics-wide properties of the clusters, we find that the mean precipitation averaged over the whole tropics between 0\(^{\circ }\) and 360\(^{\circ }\)E in Fig. 11d, e shows similar biases to the precipitation biases averaged between 80\(^{\circ }\)E and 160\(^{\circ }\)E in Fig. 10b, f when accounting for the tropics-wide mean wet biases of both clusters. This confirms what we saw earlier for individual model analysis, that Maritime Continent precipitation biases are closely related to global monsoon biases. The errors in movement of the ITCZ over the Maritime Continent are thus related to global ITCZ errors. A separation of land-only and sea-only grid points precipitation biases (figure not shown) shows that the wet biases over sea-only grid points dominate the tropics-wide errors.

Fig. 12
figure 12

GPCP precipitation and ERA-Interim 850 hPa wind for a DJF and b JJA seasons, and biases for Cluster I in c DJF and d JJA seasons and Cluster II in e DJF and f JJA seasons. PCC and RMSE are shown above each panel

Figure 12 shows the DJF and JJA seasonal mean GPCP precipitation and 850 hPa winds over the tropics along with Cluster I and Cluster II mean biases. Clusters I and Cluster II have somewhat similar biases over the Indian Ocean in DJF season and also in JJA season. In DJF, Cluster I (Fig. 12c) simulates an overly strong South Pacific Convergence Zone (SPCZ) whereas Cluster II (Fig. 12e) underestimates the SPCZ. In JJA, Cluster I (Fig. 12d) simulates an overly wet Western North Pacific (WNP) while Cluster II (Fig. 12f) underestimates the precipitation over the region. Bush et al. (2015) highlighted the WNP as a region with too much rainfall in the MetUM, a more recent version of the MOHC HadGEM2-A model from CMIP5, which is evident in Cluster I.

Cluster I shows dry biases over India in JJA (Fig. 12d), and it shows wet biases over the east-central equatorial Pacific and dry biases in the Maritime Continent, Australia and northern South America in both seasons (Fig. 12c, d). For Cluster II, DJF biases in Fig. 12e are associated with a mean shift in the global monsoon whereby Cluster II simulates maximum precipitation further north than GPCP.

Figure 13 shows scatter plots of AMIP5 Cluster I and Cluster II seasonal mean skill scores at simulating Maritime Continent precipitation versus skill scores at simulating the global monsoon solsticial and equinoctial modes. The MMMs in each cluster outperform the individual models in their cluster for all seasons. Although the individual model skill scores vary greatly, the Cluster I MMMs perform better than the Cluster II MMMs in simulating almost all of the local and global properties. This is consistent with our earlier results suggesting Cluster I simulates a realistic movement of the ITCZ, capturing the rainfall pattern but overestimating the rainfall, whereas in Cluster II, the ITCZ does not move as observed, giving an unrealistic pattern of rainfall which impacts both the PCC and RMSE scores. The only exception is Maritime Continent precipitation RMSE in SON, indicating that Cluster II has a slightly more realistic representation of the autumn rainfall, when the ITCZ is close to the equator anyway.

The RMSE of seasonal mean precipitation of both clusters has a strong significant correlation with the RMSE of solsticial (Fig. 13b, d) and equinoctial modes (Fig. 13f, h), except for Cluster II JJA RMSE. This indicates that the amplitude of the particular global monsoon and Maritime Continent biases singled out by these clusters are linked, just as the global monsoon biases and Maritime Continent biases are linked for all models (Sect. 4.2.2).

In Fig. 13a, c, Cluster I, which is able to capture both the winter monsoon and summer monsoon patterns, shows a significant positive correlation with the solsticial mode, whereas Cluster II, which has less similarity in the spatial patterns with GPCP and simulates the position of maximum rainfall too close to the equator, shows no correlation with the solsticial mode (off-equatorial monsoon) PCCs. However, Cluster II Maritime Continent precipitation shows a stronger correlation with the equinoctial mode (Fig. 13e, g), perhaps because the location of the ITCZ in spring and autumn seasons is closer to the equator.

Fig. 13
figure 13

Scatter plots of the AMIP5 Cluster I (blue) and Cluster II (red) seasonal mean a DJF PCC, b DJF RMSE, c JJA PCC and d JJA RMSE of Maritime Continent precipitation versus skill scores at simulating the solsticial mode; and e MAM PCC, f MAM RMSE, g SON PCC and h SON RMSE of Maritime Continent precipitation versus skill scores at simulating the equinoctial mode. The Maritime Continent domain is 20\(^{\circ }\)S–20\(^{\circ }\)N and 80\(^{\circ }\)E–160\(^{\circ }\)E, whereas solsticial and equinoctial modes domains are 45\(^{\circ }\)S–45\(^{\circ }\)N and 0\(^{\circ }\)–360\(^{\circ }\)E. The Pearson correlation coefficient (r) and Spearman’s rank correlation coefficient (sr) for each season are shown in the yellow box on the top left corner. All correlation coefficients for PCC and RMSE are statistically significant with a p-value less than 0.05 for most seasons except for Cluster II DJF PCC, Cluster I MAM PCC, Cluster II JJA PCC and Cluster I SON PCC

The results of the cluster analysis are in good agreement with the analysis in the previous section stating that Maritime Continent precipitation biases are closely related to local Hadley circulation biases and biases in the global monsoon. These relationship are stronger for the solsticial mode in Cluster I and the equinoctial mode in Cluster II.

6 Discussion

This study evaluates the AMIP5 model performance in simulating the mean climate over the Maritime Continent and investigates the model characteristics that may be potential sources of bias.

Our results in Sect. 4.1 agree with Neale and Slingo (2003) that model performance is largely unrelated to resolution. Although the range of resolutions in AMIP5 is comparable to the size of several typical Maritime Continent islands, even the higher resolution models in AMIP5 insufficiently resolve some of the smaller islands and steeper orography in the Maritime Continent. The re-gridding of monthly Maritime Continent precipitation to a common lower \(3.75^{\circ } \times 3^{\circ }\)  resolution in this analysis may also obscure some of the benefit of high resolution. The model performance perhaps does not improve as resolution is increased because these models all still rely on convective parameterisations.

Our analysis of the relationship between local Hadley circulation biases and precipitation biases in the Maritime Continent suggests that the seasonal mean biases in the region are linked to the local Hadley Circulation biases. The correlations between the global monsoon (solsticial and equinoctial modes) and the Maritime Continent precipitation show some connection of errors at the regional scale of the Maritime Continent with errors at the larger global circulation scale. The same results hold for both Cluster I and Cluster II in Sect. 5.

On the other hand, ocean-atmosphere coupling impacts are more complex, with mixed results found in this study when analysing model skill at reproducing the mean climate over the Maritime Continent and the global monsoon and circulation. Our results seem to suggest that air-sea coupling improves the simulation of Maritime Continent annual cycle precipitation despite the inevitable SST biases. However, SST biases and ocean-atmosphere feedback errors introduce larger biases in the coupled models at larger scales. Hendon (2003) suggests that the SST changes feed back on the surface winds and thus affect the Walker circulation and precipitation. Future work will look at SST biases over the Maritime Continent and investigate their relationship with precipitation and circulation biases.

Apart from the three potential sources of bias discussed in Sect. 4, i.e. the role of horizontal resolution, the relationship to biases in the local Hadley circulation and global monsoon, and the presence or lack of air-sea coupling, there are also other factors. One possible source of common error between global monsoon biases and local Maritime Continent biases are errors associated with the parametrization of cumulus convection. Studies have shown that tropical precipitation simulation is highly sensitive to the convection scheme (Sherwood et al. 2014; Bush et al. 2015). Ackerley et al. (2014) suggest that wet biases over Australia in summer in the BNU-ESM, NorESM1-M and CCSM4 models might be related to the convection schemes that are used in these models. Here we briefly examine the relationship between the convection scheme and the model biases. Table 6 shows the convection scheme and type of convective closure for each of the AMIP5 models, arranged by cluster group. Although the cluster members in the same cluster use different convection schemes and closures, the models from the same institution that use different convection schemes do not cluster. This includes GFDL-HIRAM and GFDL-CM3; IPSL-CM5A and IPSL-CM5B; FGOALS-s2 and FGOALS-g2. The IPSL-5A-LR and IPSL-5A-MR atmospheric models, which differ only in resolution, are in the same cluster, whereas the IPSL-CM5B models that involved substantial changes in the atmospheric model including convection scheme and closure (Dufresne et al. 2013) are in different clusters. On the other hand, the BCC-CSM models are the only ones which use the same convection scheme, and differ only in resolution, but are not in the same cluster.

Table 6 CMIP5 models convection scheme and closure

These results suggest that the convection scheme can be important for model simulation of the annual cycle precipitation in the Maritime Continent, but further work would be necessary to characterize the biases according to convection scheme, which is beyond the scope of this paper.

7 Conclusions

This paper examines the fidelity of CMIP5 models in simulating mean climate over the Maritime Continent, focusing mainly on the uncoupled versions of the models. The 28 CMIP5 model simulations for the 30-year period (1979–2008) in AMIP configuration with prescribed SSTs and sea ice (AMIP5) are compared with observational datasets. We quantify the model performance based on the pattern correlation coefficient (PCC) and root mean square error (RMSE) skill scores. We find that there is a considerable spread in the performance of the 28 AMIP5 models in reproducing the seasonal mean climate and seasonal cycle over the Maritime Continent region. Model performance is not necessarily consistent across seasons. A model with high skill in one season does not necessarily represent other seasons well. The multi-model mean (MMM) has better skill at reproducing the observed climate than individual models, in common with other studies of monsoon regions (e.g. Colman et al. 2011; Jourdain et al. 2013; Sperber et al. 2013; Feng et al. 2014). The PCC comparison also shows that models have higher skill at simulating the Maritime Continent 850 hPa wind than the precipitation in all four seasons.

We also investigated the possible sources of the model biases. Our assessment of the sensitivity of Maritime Continent precipitation to climate model resolution suggests that, at the resolutions typical of AMIP5, the model performance is largely unrelated to model horizontal resolution. Instead, our analyses show that the local Maritime Continent biases are somewhat related to global circulation and global monsoon biases. The models that have a better representation of the local Hadley Circulation and global monsoons have a better representation of the seasonal means of precipitation and winds over the Maritime Continent.

The analysis was repeated for 46 coupled versions of CMIP5 models, which we called “CMIP5”, and we found similar results as in AMIP5. For instance, the MMM has better skill at reproducing the observed mean climate than the individual CMIP5 models. The 850 hPa wind is better simulated than the precipitation in all four seasons. CMIP5 models also showed significant spread in model performance. The comparison between 22 pairs of CMIP5-AMIP5 simulations shows that most CMIP5 models perform better than their AMIP5 counterpart in reproducing the annual cycle of Maritime Continent precipitation. However, AMIP5 models generally outperform CMIP5 models in simulating the global monsoon. Although most AMIP5 models simulate the seasonal mean local Hadley circulation better for all seasons compared to CMIP5, we found mixed results for model skill at reproducing the seasonal mean precipitation and 850 hPa winds over the Maritime Continent. Besides that, CMIP5 models show weaker correlation between the Maritime Continent precipitation biases and both the local Hadley Circulation biases and the low-level wind biases but stronger correlation with the global monsoon solsticial mode biases compared to AMIP5.

Hierarchical clustering analysis of Maritime Continent annual cycle precipitation was performed to characterize model systematic biases in the AMIP5 runs and determine if these biases are related to common factors elsewhere in the tropics. Our analysis resulted in two distinct clusters. Cluster I is able to reproduce the observed seasonal migration of Maritime Continent precipitation, but it overestimates the precipitation, especially during the JJA and SON seasons. On the other hand, in Cluster II the maximum rainfall position is too close to the equator throughout the year. The tropics-wide properties of these clusters also indicate a connection all the way from the skill of simulating the global properties down to skill at simulating the regional scale of Maritime Continent precipitation.

The present study therefore highlights the importance of global monsoon and circulation simulations in AMIP5, which are significantly associated with the mean climate simulation biases at Maritime Continent. On the other hand, ocean-atmosphere coupling impacts are more complex, and these will be the focus of future work to look at SST biases over the Maritime Continent and investigate their relationship with precipitation and circulation biases.