1 Introduction

West Africa is one of the areas in the world that has had significant climate anomalies in the past century. The dramatic change from wet conditions in the 1950s to much drier conditions in the 1970s and 1980s over West Africa represents one of the strongest interdecadal signals on the planet in the twentieth century (Redelsperger et al. 2006). The drought in this area since the late 1970s is the most severe and longest at continental scale in the world during that century (IPCC 2007). The West African climate is dominated by the West African monsoon (WAM) system with a mean annual rainfall of between 150 and 2,500 mm per year. Monsoon circulations are forced and maintained by land–sea thermal contrasts and by latent heat released into the atmosphere. Following the seasonal northward migration of the Inter-tropical Convergence Zone (ITCZ), the monsoon develops during the northern spring and summer, with a rapid northward WAM jump from 5°N in May–June to 10°N in July–August (Sultan and Janicot 2000). The WAM brings the associated rainfall maxima to their northernmost locations in August and then withdraws to the south in September and October. This is the West African monsoon rainy season. The seasonal characteristics of monsoon rainfall (i.e., onset, jump, length, and termination of the rainy season), seasonal rainfall amount, and intraseasonal rainfall distribution during the rainy season show high interannual variability (e.g., Fontaine and Janicot 1996; Le Barbé et al. 2002). A comprehensive investigation of these WAM features is of prime importance for understanding and predicting the seasonal, interannual, and interdecadal variability, anomalies, and drought in West Africa. Such understanding and predictive ability are crucial for the development of the fragile West African economy (Redelsperger et al. 2006).

Although numerous diagnostic studies have been conducted to investigate the WAM, there are relatively few general circulation model (GCM) studies to explore the WAM seasonal predictability and mechanisms associated with WAM variability (e.g., Rowell et al. 1995; Douville et al. 2001; Xue et al. 2004). Some key research issues remain with regard to our understanding of WAM variability and important associated features, such as the African Easterly Jet (AEJ) and the impacts of aerosol, oceanic, and land processes. No GCMs with either prescribed sea surface temperature (SST) forcing, land forcing, or aerosol forcing were able to produce even half the magnitude of the West African droughts (e.g., Xue 1997; Hoerling et al. 2006; Yoshioka et al. 2007).

Part of the difficulty is due to the inability of climate models to simulate the fundamental features of the WAM and feedbacks among the different main processes, which operate at multiple temporal and spatial scales. More research is required to systematically evaluate climate models and to exploit fully the observational data, in order to improve the WAM prediction. Thus, far, there have been very few studies evaluating GCMs’ performance in simulating the WAM in multi-model experiments (Lau et al. 2006; Cook and Vizy 2006; Hoerling et al. 2006; Biasutti et al. 2009). The West African Monsoon Modeling and Evaluation project (WAMME), a Global Energy and Water Cycle Experiment (GEWEX)/Coordinated Energy and Water Cycle Observation Project (GEWEX/CEOP) initiative in collaboration with the African Monsoon Multi-disciplinary Analysis project (AMMA, Redelsperger et al. 2006), uses GCMs and regional climate models (RCMs) to evaluate the performance of current state-of-the-art climate models in simulating the WAM precipitation, onset, withdrawal, and relevant processes at diurnal, intraseasonal, interannual, and interdecadal scales, and to address issues regarding the role of land–ocean–atmosphere interaction, land-cover and land-use change, vegetation dynamics, and aerosols, particularly dust, on WAM development It also identifies common deficiencies among models in simulating the major WAM features and provides better understanding of the fundamental physical processes involved. In particular, it intends to demonstrate the utility and synergy of CEOP and AMMA field campaign data sets in providing a pathway for the evaluation and improvement of climate models.

This paper presents the preliminary GCM results from the WAMME’s first intercomparison experiment and serves as an introductory paper for other WAMME papers of this special issue, which include the assessment of the participating RCMs (Druyan et al. 2009), evaluation of fluxes from the land surface exchange models (Boone et al. 2009b), and in-depth studies based on individual climate models (Kim et al. 2009; Moufouma-Okia and Rowell 2009; Patricola and Cook 2009). Section 2 introduces the GCMs, data for evaluation, and the design of the WAMME first experiment. Section 3 evaluates the WAMME GCM-simulated precipitation, surface temperature, and some major circulation features. Section 4 applies AMMA data to diagnose the divergence of the GCM simulations in relation to surface variables. Section 5 employs CEOFs to investigate the characteristics of WAM precipitation and surface temperature to evaluate models’ performance in these aspects, to explore the character of model simulation discrepancies in WAM simulation, and to analyze the WAM mechanism. Section 6 summarizes results.

2 WAMME GCMs, experimental design, and evaluation data

WAMME consists of 11 GCMs (Table 1) and 7 RCMs with a wide range of spatial resolutions and physical parameterizations. Among the GCMs, the JMA MRI (Japan Meteorological Administration Meteorological Research Institute, Mizuta et al. 2006) GCM has very high horizontal resolution, about 20 km; the Cornell/NCAR CAM/CLM3.0 (National Center for Atmospheric Research Community Atmospheric Model/Community Land Model, Collins et al. 2006) and the MOHC (Met Office Hadley Centre, Pope et al. 2000) HadAM3 have slightly lower resolutions. The NCEP CFS (National Center for Environmental Prediction Climate Forecast System, Saha et al. 2006) is a coupled ocean/atmosphere model with the NCEP GFS (Global Forecast System) as its atmospheric component. The CAM/CLM3.0 and GSFC FVGCM (Goddard Space Flight Center Finite-Volume GCM, Lin and Rood 1996, 1997) include comprehensive aerosol packages and can be run with or without aerosol simulations. Most models include comprehensive biophysical models for land surface processes. The UCLA MRF (University of California, Los Angeles Medium Range Forecast, Kanamitsu et al. 2002b; Xue et al. 2004), the UCLA GCM (Mechoso et al. 2000; Xue et al. 2009), and the COLA (Center for Ocean-Land-Atmosphere Interactions, Kinter III et al. 1997) GCM have similar land surface schemes. More information on the physical components of participating models, including land surface models, can be found in Table 1.

Table 1 List of WAMME GCMs

The first WAMME experiment presented in this paper includes several years in the twenty-first century with available AMMA data. The model runs presented in this paper go from April 1, 2, 3, and 4 through October 31 for years 2003, 2004, 2005, and 2006. The initial conditions are from the NCEP/DOE (Department of Energy) Reanalysis II (Kanamitsu et al. 2002a), and the repetition of each year with four slightly different start dates enhances the sample size. Reanalysis II includes corrections of human processing errors and incorporates upgrades to the forecast model and a diagnostic package that had been developed since the time the Reanalysis I was finalized. Except for the CFS coupled ocean/atmosphere model, SST and sea ice data are from the MOHC’s HadISST1 data set (Rayner et al. 2003). They are monthly data with 1-degree resolution, interpolated by each group to their model’s grid and then processed to preserve monthly means (Taylor et al. 2000). We have received 12 sets of GCM runs from 10 climate modeling groups (CAM/CLM3.0 and GSFC FVGCM provide runs with/without aerosol for the experiment). The first WAMME experiment outputs have been posted on the CEOP database, openly available to the scientific community (http://data.eol.ucar.edu/master_list/?project=WAMME). The model intercomparison results in this paper emphasize the WAM precipitation and surface temperature and include spatial distribution, temporal evolution, and variability, as well as major circulation features.

Several observational and proxy data sets are used for the model evaluations. Comparison of these data sets should provide evidence of uncertainty in the observational data and errors in the best assimilated data sets, which should assist us in evaluating models’ performance. These data include two data sets from the Climate Prediction Center (CPC), NCEP: one is the CPC Merged Analysis of Precipitation (CMAP, Xie and Arkin, 1997) and the other is daily data from the CPC Global Telecommunications System (GTS) gauge-based analysis of global daily precipitation and surface temperature, which is based on GTS daily reports from 6,000 to 7,000 stations around the globe and referred to as CPC GTS in this paper. They cover the entire global land area on 0.5 (CPC GTS) and 2.5 (CMAP) degree lat/lon grids. We mainly use the CPC GTS data for evaluating model performance since this CPC’s new generation data set contains more data. The methodology of GTS data interpolation is presented in Xie et al. (1996). We also use Reanalysis II, European Center for Medium-Range Weather Forecasts (ECMWF) Reanalysis Interim (ERA-Interim, Simmons et al. 2006), and Reanalysis I (Kalnay et al. 1996) for analyses in this study. The ERA-Interim is a new global reanalysis product based on a recent release of the ECMWF Integrated Forecasting System; it contains many improvements both in the forecasting model and in analysis methodology when compared to the ERA-40. These three reanalyses data represent three of the best assimilation data sets thus far with 6-hourly outputs. Outperformance of models in any aspect relative to reanalyses reflects recent model development.

Due to scale discrepancies, it is difficult to directly apply the valuable and most recent contribution of observational data sets offered by the AMMA field campaign for the evaluation of WAMME GCMs. Therefore, we use instead the gridded data set from the AMMA Land Surface Model Intercomparison Project (ALMIP, Boone et al. 2009a) for this analysis. ALMIP conducted an ensemble of offline land surface model simulations that rely on dedicated satellite-based forcing and land surface parameter products, and data from the African AMMA observational field campaigns to address the known limited ability of land surface models to simulate surface processes over West Africa (Boone et al. 2009a and De Rosnay et al. 2009). ALMIP rainfall is from TRMM 3B42 (Huffman et al. 2007), and the solar radiation is from combined numerical prediction and satellite data. One of the goals of ALMIP is to produce a multi-land off-line surface model climatology of high resolution (multi-scale) soil moisture, surface fluxes, and water and energy budget diagnostics at the surface using the forcing described above. The ALMIP-simulated flux and soil moisture have been evaluated using the AMMA field campaign data. The scale issue has been addressed when ALMIP results are compared with the AMMA field measurement. For example, the ALMIP-simulated sensible heat flux from the multi-model climatology over the AMMA Mali mesoscale domain has proven quite consistent with observations (Boone et al. 2009b). This ALMIP multi-land model climatology is used to evaluate the simulated surface components of the GCMs within WAMME. We refer to this data set as ALMIP data in this paper. The ALMIP data set used in this study consists of the area between 20°W to 30°E and 5°S to 20°N, from 2003 through 2006 with 0.5° resolution and 3-hourly output, and currently represent the best estimate of the land surface processes over West Africa from 2003 to 2007.

Most results presented in this paper are 4-year means averaged from four different initial conditions unless otherwise indicated. The observational data and model results are bi-linearly interpolated to the 0.5° CPC GTS grid for comparison. We have also compared the WAMME GCMs results onto a 2° horizontal grid (not shown). The results are similar and conclusions are consistent.

3 Comparison of WAMME simulated WAM precipitation, temperature, and circulation

3.1 Seasonal WAM precipitation simulation

The period of June, July, August, and September (JJAS) is the major WAM season. Figure 1 shows the 2003–2006 JJAS precipitation mean in the WAMME simulations, various quasi-observed data sets, and the gauge-based analysis. The CPC GTS observed 1 mm day−1 isohyet reaches around 18°N in the north and around 5°S in Central Africa (Fig. 1p). The axis of the maximum precipitation band starts at 10°N at the West African west coast and stretches eastward to 5°N at 30°E. There are two heavy precipitation centers with more than 10 mm day−1: one along the southwestern coast of West Africa and one near the Cameroon and Nigerian coasts. Between these two centers there is a relatively low precipitation break between 0° and 5°W. These features are apparent in all the observational data (Fig. 1p–r). Reanalysis II (Fig. 1a) and I (Fig. 1s) also show similar patterns but with an apparent wet bias. In addition, their rainfall bands are too close to the coast in West Africa. ECMWF-Interim also presents the pattern well but the rainfall is mainly limited to the south of Chad Lake. Precipitation over the eastern Sahel is relatively high compared to observation (Fig. 1b).

Fig. 1
figure 1

JJAS 2003–2006 mean precipitation (mm day−1). a NCEP/DOE Reanalysis II, b ECMWF Reanalysis Interim, cn WAMME simulations; o WAMME ensemble mean; p CPC GTS data; q ALMIP data; r CMAP data; and s NCEP/NCAR Reanalysis I

Every WAMME model simulates realistically the zonal monsoon rainfall band over the Western African continent; with the majority of models reproducing its slight northwest-southeast tilt of the axis of the band and simulating both maximum precipitation centers and the break in between (Fig. 1c–n). In this analysis, we also produce an ensemble GCM mean for comparison (Fig. 1o). For those models with two simulations (with/without aerosol), we take an average for these two runs first and the average is then used for calculating the ensemble mean. The ensemble mean produces better coherent spatial distribution and rainfall intensity as compared with individual models and reanalyses.

To quantitatively assess the models’ 6-month simulations, we use the Taylor diagram (Taylor 2001) to show statistical comparisons with observed precipitation of 12 model runs’ spatial estimates of the West African pattern (Fig. 2). The results in the figure are based on averaged monthly mean data from May to October over the same 4 years at every grid cell over the land points within 5°N to 20°N and 15°W to 20°E. Each model run’s May-to-October mean over the area is removed when calculating root-mean-square-error (RMSE). Therefore, this diagram does not directly show the model bias.

Fig. 2
figure 2

Taylor diagram displaying statistical comparisons of 12 model runs’ estimates with observation of the West African mean precipitation pattern for May to October 2003–2006

The position of each symbol appearing on the plot quantifies how closely that model’s simulated precipitation and its variability match CPC GTS observations. The radial coordinate in the figure gives the magnitude of total standard deviation, normalized by the observed value (dotted arcs in the figure). If the model’s standard deviation is the same as observed, its radial distance from the original point equals 1. The values of normalized standard deviations are marked along the X axis. The angular coordinate gives the correlation with observations. The correlation values are marked along the periphery of the circle. The distance between the model point and the observation point, which is located at the unit distance of the horizontal radius (red dot in Fig. 2), denotes the RMSE of the model (solid arcs in the figure), also normalized by the observed standard deviation. The closer the model’s symbol to the observation point, the better the simulation is. In this figure, we choose CPC GTS as “true” data for model comparison. Precipitation of CMAP and ALMIP data, which is from TRMM3B42, is quite close to CPC GTS with minor discrepancies. Differences between GTS and other observational data and proxy data are considered as measurement errors/uncertainty to help assess models’ results.

CMAP, ALMIP data, ERA-interim, and reanalyses II and I’s correlations with CPC GTS equal 0.97, 0.94, 0.93, 0.87, and 0.92, respectively. Their normalized RMSEs are 0.26, 0.34, 0.51, 0.69, and 0.60 of the observed standard deviation (2.43 mm day−1), i.e., 0.63, 0.83, 1.24, 1.68, and 1.46 mm day−1, respectively, and normalized standard deviations are 1.05, 0.95, 1.30, 1.35, and 1.37, respectively. The CMAP and ALMIP results suggest that measurements’ RMSE are less than 1 mm day−1 and relative measurement discrepancies in spatial correlation and normalized standard deviation are about 5%. Although reanalyses’ spatial correlations are close to observation (~90%), their discrepancies in standard deviation (~30%) and the RMSE of Reanalysis I and II (about 1.5 mm day−1) are quite large. Figure 2 shows large scattering among different models, indicating substantial discrepancies in model simulations. The GCM ensemble mean (red star), for which correlation, normalized RMSE, and standard deviation are 0.93, 0.42, and 1.11, respectively, is close to and slightly worse than CMAP and ALMIP and better than reanalyses or most GCMs.

Spatial correlations of the WAMME-simulated precipitation with observations range from 0.70 to 0.94. Only four model runs, FVGCM, CFS, GFS, and UCLA GCM, have correlations higher than 90%. The normalized RMSEs of WAMME models range from 0.34 to 1.35, i.e., 0.83 to 3.28 mm day−1. Most models’ normalized RMSEs are larger than ERA-interim: 0.51, i.e., 1.24 mm day−1. The normalized standard variation of precipitation of WAMME models vary from 0.61 to 2.18 of observed standard deviation. MRI, GMAO/NSIPP1 (Schubert et al. 2002), and MRF’s results are close to the observed values. The standard deviations of CFS and GFS are quite high, associated with their considerable positive biases (Fig. 1). On the other hand, CAM/CLM3.0 has relatively lower standard deviation, which may be partially due to its low resolution (T42).

3.2 Seasonal surface air temperature simulations

Figure 3p shows that the JJAS surface air temperature at 2-m height has a zonal pattern with high temperature in the Sahara Desert and a steep meridional temperature gradient from the northern boundary of West Africa to the Guinean Coast. The Central African tropical rainforest has the lowest temperature in the region. The difference of other observational data and each model’s surface air temperature relative to CPC GTS’s surface air temperature is shown in Fig. 3c–n. ALMIP surface temperature data (Fig. 3q) and Reanalysis II, ERA-interim, and Reanalysis I 2-m temperature (Fig. 3a, b, s) have lower temperature along the Guinean Coast and southern Sahel by about 1–3°C compared with CPC GTS temperature data. Since the ground observations there are based on limited stations, these differences reflect the uncertainty in measured surface temperature. Every model produces a zonal pattern but with quite different meridional gradients. Four GCMs (CAM/CLM3.0, MRI, GFS, and HadAM3) have a cold bias (about −2 to −3°C) over 15°W to 20°E and 5°N to 20°N; the most severe biases are over the 10°N to 15°N zonal band, where the two observational data sets and three reanalyses have consensus. These biases are consistent with their positive biases in precipitation. Meanwhile, over the same area, FVGCM (no aerosol) and MRF have positive biases (about 1–2°C). FVGCM (with aerosol), GMAO/NSIPP1, CFS, and UCLA GCM show less bias over 15°W to 20°E and 5°N to 20°N, where West Africa is located. Meanwhile, most models show a negative difference from CPC GTS data along the Guinean coast but are consistent with ALMIP data and reanalyses over that region. The GCM ensemble mean again shows better results with only a slight cold bias (about 0.9°C, Fig. 3o).

Fig. 3
figure 3

JJAS 2003-2006 mean 2-m air temperature bias (°C) for a NCEP/DOE Reanalysis II, b ECMWF Reanalysis Interim, c–n WAMME simulations, o WAMME ensemble mean, q ALMIP data, and s NCEP/NCAR Reanalysis I. Temperature bias color bar is shown at the bottom of the figure. The JJAS 2003–2006 mean observed 2-m air temperature (°C) for CPC GTS and corresponding color bar are shown in panel (p)

Figure 4 shows the Taylor diagram for the May-to-October 2003–2006 average surface temperature. The correlation, normalized RMSE, and normalized standard deviation for ALMIP/ERA-interim/Reanalysis I/Reanalysis II are 0.95/0.96/0.95/0.91, 0.44/0.47/0.33/0.41, and 1.27/1.32/1.09/0.95, respectively. Observed standard deviation is 2.87°C. The degree of scattering among different models’ results is smaller and the results are closer to observations compared with Fig. 2. Most models’ correlation coefficients are higher than 0.9 and normalized RMSEs are less than 0.5, comparable to reanalyses. Only MRF has a relatively high RMSE (0.67). It is interesting to note that most models show a bias toward high standard deviations. For example, the normalized standard deviations of GMAO/NSIPP1, FVGCM, and MRF are larger than 1.3. But they are similar to ALMIP’s. By and large, the models produce decent simulations of seasonal mean surface air temperature but also with divergence in variance and gradient. CAM/CLM3.0 (dust) and HadAM3 produce better performance than Reanalysis II in every respect. The GCM ensemble mean again shows superior performance, much better than any individual model or reanalyses.

Fig. 4
figure 4

Same as Fig. 2 except for 2-m air temperature

3.3 Circulation features

This section evaluates the GCMs’ simulations of some aspects of large scale circulation. The mid-tropospheric AEJ is an important WAM feature and has been considered to be a significant factor playing a crucial role in the WAM system. It is located above the region of strong low-level potential temperature gradients between the Sahara and the Guinean Coast during the boreal summer (e.g., Burpee 1972; Reed et al. 1977) and is characterized by strong vertical wind shear and meridional contrasts in thermodynamic properties. The existence/maintenance of the AEJ has been considered to be related to surface temperature gradients (i.e., Burpee 1972), gradients of soil moisture and SST (Cook 1999), and cloud distributions (Druyan 1989). It has been found that hot, dry surface conditions and a deep, well-mixed boundary layer in the Sahara heat low and cool, moist surface conditions associated with deep moist convection in the intertropical convergence zone are intimately linked to the existence of the AEJ (Thorncroft and Blackburn 1999). Studies have also identified its relationship with interannual variability of the WAM (Newell and Kidson 1984; Nicholson 1989; Fontaine et al. 1995).

Figure 5 shows the north–south cross section of the JJAS zonal wind velocity longitudinally averaged between 10°W and 10°E. Reanalysis II, ERA-Interim, and Reanalysis I (Fig. 5a, b, p) indicate the AEJ with a maximum around 12 m s−1 at 600 mb and 10–15°N. The low-level monsoon Westerlies between the equator and 20°N are beneath the AEJ. Meanwhile, the tropical easterly jet (TEJ) is located at 200 mb and 5–10°N. At about the same level, the subtropical westerly jet can be seen at 30–35°N. GCMs generally produce these zonal structures but their simulations have deficiencies in producing various components in zonal wind features. Every model produces the AEJ and TEJ at around 600 and 200 mb, respectively, as well as monsoon westerlies underneath the AEJ. The departure of latitudinal position of the AEJ from observation for most models is within a 2.5° range. However, most models, except CFS, GFS, and MRF, fail to produce proper AEJ intensity. These three models and NCEP reanalyses use similar atmospheric models. Furthermore, most models produce the TEJ too strongly. In addition, CAM/CLM3.0 and FVGVM simulate near-surface easterlies too strongly to the north of low-level monsoon westerlies, the so-called Harmattan Easterlies. Because of the WAMME models’ systematic biases in TEJ and AEJ intensity simulation, the multi-model ensemble mean (Fig. 5o) shows that its AEJ is too weak and its TEJ is too strong, which indicates that as long as most models have systematic biases, the improvement of the multi-model ensemble mean will be limited.

Fig. 5
figure 5

Pressure-latitude cross-section of JJAS 2003–2006 average zonal wind between longitudes 10°E and 10°W for a NCEP/DOE Reanalysis II, b ECMWF Reanalysis Interim, c–n WAMME simulations, o WAMME ensemble mean; and p NCEP/NCAR Reanalysis I. Isotachs for −6, −8, −10, −12, −14, and −16 m s−1 are superimposed to highlight the jets’ locations

Another important feature of the WAM is low level moisture transfer. The WAM low-level wind field and moisture transport are presented in Fig. 6. Northwestward flow across the Guinean coast curves northeastward then eastward and brings moisture into West Africa during the monsoon season (Fig. 6a, b, p), which is a critical WAM feature. Most models properly produce this feature. In addition, the low-level convergence position, marked by a zero meridional wind line at 900 hPa, is fairly congruent in most models. However, MRI and CFS’s moisture transport is relatively weak, and FVGCM’s moisture transport is rather strong. It is interesting to note that the former models have a positive bias in simulated precipitation and the latter one does not show a wet bias. Apparently, moisture transport is only one factor that affects the WAM evolution. Moisture convergence should be more relevant to WAM precipitation development. This issue will be investigated further in the next section. In addition, Druyan et al. (2009) find that even with a realistic amount of moisture advection, a model could still produce a substantial precipitation bias because the frequency of excessive moist convection also affects the amount of precipitation. However, a detailed analysis of such aspects is beyond the scope of the current study.

Fig. 6
figure 6

JJAS 2003-2006 average 900-hPa moisture flux and wind flow for a NCEP/DOE Reanalysis II, b ECMWF Reanalysis Interim, c–n WAMME simulations, o WAMME ensemble mean, and p NCEP/NCAR Reanalysis I. Bold black line indicates where the meridional component of the wind equals to zero

4 Evaluation of model performance using the ALMIP data and reanalyses

One of the important WAMME goals is to explore the utility and synergy of AMMA data in providing a pathway for model physics evaluation and improvement. In this section, we apply the ALMIP data to analyze the WAMME GCM results and focus on the possible association between simulated spatial distributions of precipitation and surface variables. We consider surface variables obtained from the ALMIP ensemble mean to be the best estimate so far for West African ground hydrology. In a Global Land–Atmosphere Coupling Experiment (GLACE) study (Dirmeyer et al. 2006), a similar investigation was conducted over many basins by examining the local covariability of key atmospheric and land surface variables. In that study, it was found that most models do not encompass well the observed relationships between surface and atmospheric state variables and fluxes, suggesting that these models do not represent land–atmosphere coupling correctly. In this study, we take a similar intercomparison approach with the focus on the character of model discrepancy and possible WAM mechanisms. In this section our focus is the association between precipitation simulation and surface variable simulations rather than individual model performance.

Figure 7a–d show a comparison of spatial correlations of May–October 2003–2006 mean precipitation between observation and WAMME simulations and spatial correlations of May–October 2003–2006 mean latent heat flux, sensible heat flux, surface temperature, and precipitation minus evaporation between the ALMIP data and WAMME simulations, respectively. ALMIP data is used as reference for the spatial correlation calculation. Therefore, the correlation coefficient of ALMIP data is 100%. The standard deviations of the correlations between individual off-line ALMIP land surface model simulations with ALMIP data are relatively small. They are about 0.02 for temperature and latent heat flux and 0.12 for sensible heat flux, much smaller than the WAMME intermodal spread as shown in Fig. 7. Since precipitation of the ALMIP data set is slightly different from the GTS data as shown in Fig. 2, correlations of simulated precipitation with ALMIP data in Fig. 7 also have slight differences from those shown in Fig. 2.

Fig. 7
figure 7

Comparison between 2003 and 2006 May–October precipitation spatial correlation coefficients and a latent heat flux, b sensible heat flux, c surface temperature, and d precipitation minus evaporation spatial correlations; with ALMIP data as the reference (i.e., ALMIP spatial correlations are equal to 1). e Similar to (a) but between surface temperature and 600-hPa zonal wind with reanalysis II as reference; and f same as e but between latent heat flux and 600-hPa zonal wind. Bold solid lines indicate the linear fit, and Rs indicate the R-squared of the linear regression. Reanalyses are plotted in red and ALMIP in black to distinguish from the WAMME models shown in blue

Among the four variables in Fig. 7, although latent heat flux exhibits a general relationship with precipitation (i.e., correlations of latent heat flux and precipitation of WAMME models with the ALMIP data are generally consistent), the scattering in Fig. 7a is relatively large with a low R-squared (the square of the linear regression’s correlation coefficient) as listed in the figure.Footnote 1 The correlation of GCM-simulated spatial distribution of latent heat flux is homogeneously high, more than 0.8 for most models. However, several models’ results are not consistent with the general precipitation/evaporation relationship. For instance, Reanalysis II and GMAO/NSIPP1 have very high spatial correlation with the ALMIP evaporation, 0.95 and 0.96, respectively, but their correlations with precipitation are relatively low, about 0.75. On the other hand, MRI’s correlation with evaporation (0.62) is much lower than its correlation with precipitation (0.77). Evaporation provides important moisture source for WAM precipitation. The ratio of evaporation over precipitation in the WAM area in the ALMIP data is about 0.52. However, the results here indicate the skill of precipitation simulation is not highly associated with the skill of latent heat simulation. In contrast to the latent heat flux, Fig. 7b shows that high spatial correlations in precipitation and sensible heat flux are closely associated with each other. The skill of simulated spatial distribution of precipitation from different models corresponds well to the skill of simulated spatial distribution of sensible heat flux. The spatial correlations of sensible heat flux in the WAMME model simulation (0.1–0.75) are much lower than the ones of precipitation, which are between 0.7 and 0.9.

Monsoons are macroscale phenomena and are driven by differential heating of the land and the ocean. Studies have indicated that they are modulated by the magnitude of the associated north–south gradient of low level moist static energy and their interaction with tropical fronts and the AEJ (Eltahir and Gong 1996; Parker et al. 2005). A study of the East Asian and African summer precipitation has also indicated that different longitudinal and latitudinal sensible heat gradients at the surface influence the low-level temperature and pressure gradients, wind flow (through geostrophic balance), moisture transport, and in turn, the summer monsoon (Xue et al. 2004). The results here reveal a close association between the surface energy partition and precipitation, confirming the importance of the spatial distribution of sensible heat flux at the land surface in the WAM. The simulated sensible heat distribution is a reflection of parameterizations of surface turbulent fluxes and simulation of surface energy balance, as well as the specifications of the vegetation characteristics and coverage, land use, and soil properties over the WAM area. Figure 8 shows the spatial distribution of JJAS sensible heat flux of the WAMME models. It is clear that the north–south gradient of sensible heat flux is a prominent characteristic of its spatial distribution, i.e., weak sensible heat flux along the Guinean coast; it gradually increases northward with a clear contrast along 15°N (Fig. 8p). The model simulations, including the reanalyses, have substantial differences from ALMIP data. Only the ensemble mean produces both adequate spatial distribution and proper magnitude of the sensible heat flux. The models with a proper north–south gradient have relatively high correlations with the ALMIP data. The models with low correlations produce either too strong a gradient (e.g., MRF), or too weak a gradient (e.g., MRI and GMAO/NSIPP1). Since the sensible heat flux is closely related to the surface temperature, it is not surprising to see a high R-squared listed in Fig. 7c. However, it is not as high as that with sensible heat flux (Fig. 7b). Further analysis in Sect. 5.4 will show that temperature gradient between the Sahara and the Sahel has a great impact on the monsoon simulation and suggests that differences in its simulation contribute to the model simulation discrepancies.

Fig. 8
figure 8

JJAS 2003-2006 mean sensible heat flux (W m−2) for a NCEP/DOE Reanalysis II, b ECMWF Reanalysis Interim, c–n WAMME simulations, o WAMME ensemble mean; and p ALMIP data. The contour lines indicate the standard deviation of ALMIP land models

Although the R-squared shown in Fig. 7a is not high, Fig. 7d shows that the simulated precipitation distribution is highly correlated to the simulated distribution of precipitation minus evaporation, which is a good indicator of vertically integrated moisture flux convergence (IMFC). Since a differential equation is used to calculate IMFC, this calculation is sensitive to temporal resolution, sample size, etc. and requires high horizontal resolution (Berbery and Rasmusson 1999), which is challenging for GCMs. The WAMME does not have direct model output for IMFC. The result in Fig. 7 indicates that the discrepancy in simulated spatial distribution of moisture flux convergence is very closely related to the discrepancy in simulated spatial distribution of precipitation. Moisture flux as shown in Fig. 6 is a major WAM moisture source. The relatively high R-squared in Fig. 7d compared to Fig. 7a further confirms its important role in WAM precipitation. In previous sections, it has been pointed out that a realistic amount of moisture advection alone is not sufficient to produce accurate precipitation. The results here further support this argument.

We also check the relationship between surface variables and ERA-Interim 600-mb zonal wind between 15°W to 20°E and 5°N and 20°N, where the AEJ is located. It is not unexpected to see a close relationship between 600-mb zonal wind and surface temperature (Fig. 7e) since the thermal wind balance associated with a surface temperature gradient is well known (e.g., Li et al. 2007). However, Fig. 7f shows an even higher R-squared between spatial correlations of 600-mb zonal wind and spatial correlations of latent heat flux between the ERA-interim and WAMME simulations. Diabatic heating due to moist convection has been suggested as helping to maintain the AEJ (e.g., Thorncroft and Blackburn, 1999). Based on Reanalysis I, Cook (1999) analyzed the surface energy budget at 7°N, 15°N, and 28°N and pointed out that the latent heat gradient encourages a positive temperature gradient and helps establish a strong AEJ. This is confirmed by a GCM experiment with uniform soil moisture, which produces a weak AEJ (Cook 1999). The analysis in Sect. 5.4 will provide further evidence to support the importance of latent heat flux in establishing the AEJ. Meanwhile, the R-squared with sensible heat flux is low (R 2 = 0.24) since it acts to reduce the temperature gradient. In addition to ERA-Interim, a similar relationship is also confirmed by the analysis using Reanalysis II and 2006 ECMWF-AMMA, which includes the assimilation of some of the 2006 AMMA measurements. All these show a highest R-squared between latent heat flux and the AEJ and lower R-squared between sensible heat flux and the AEJ. The discussion in this section reveals the importance of land surface energy and water balances and provides imperative information for WAMME’s next experiments to advance the understanding of the role of land model parameterization and specification, land data, and land/atmosphere coupling in the WAM simulations.

In addition to the factors identified above, there are a number of other factors which affect the WAM simulation and are associated with discrepancies in model simulation. For instance, it is interesting to note that for the two GCMs with/without aerosol runs, which are indicated by letters ‘d’ and ‘h’/’e’ and ‘i’, respectively, the discrepancies are also apparent. After introducing aerosol in the simulation, the correlations of both models improve. This seems to be consistent with Dirmeyer and Zhao (2004) finding that downward fluxes from the atmosphere affect the communication between the land surface and the atmosphere. In another study, Lau et al. (2006) have identified that the coupling between Sahel rainfall and Indian Ocean SST, as well as the coupling between Sahel rainfall and the Atlantic Ocean SST, contribute to the discrepancies in 19 GCM simulations in the Intergovernmental Panel for Climate Change Assessment Report 4 (Hegerl et al. 2003). They conclude that proper simulation of these couplings is essential for a good WAM precipitation simulation.

The preliminary analysis here demonstrates the utility of AMMA data in evaluating WAMME models’ performance in simulating surface water and energy balances and in identifying the association of WAMME model discrepancies in simulated precipitation and AEJ with surface variables. It also provides useful information/guidance for the WAMME’s next stages of experiment design. In the next section, we conduct further analysis to evaluate the WAMME models’ ability to simulate the WAM major climate modes and to further understand the WAM mechanisms.

5 Analyses of WAM major features and model performance using the common empirical orthogonal functions (CEOF)

5.1 Setting of CEOF analysis

The model intercomparison results have not only been used to identify the discrepancies, consensuses, and models’ common weakness; they have also been used to identify the climate modes (e.g., in Barnett 1999; Stouffer et al. 2000; Benestad 2001). Further brief information regarding CEOF is summarized in the appendix, and a comprehensive explanation about CEOF for atmospheric model intercomparisons can be found in Sengupta and Boyle (1998).

We apply this method to analyze the common variance of 4-year (2003–2006) averaged 6-month simulations from 12 GCM runs, three observational data sets (CPC GTS, CMAP, and ALMIP), and three reanalyses. This approach is similar to Boyle (1998). The CEOF is applied to investigate major features of temporal evolution and spatial characteristics of intraseasonal WAM precipitation by analyzing the observed and model-simulated WAM precipitation and temperature. The analysis in this section provides further evidence of the WAM mechanisms revealed in the previous section. Five-day means are applied for the CEOF calculation. This method concatenates the model-simulated fields, two observational data, and three reanalysis data sets to form a “single” dataset P’(s, t), described as

$$ P^{\prime } (s,t) = \left\{ \begin{gathered} P_{1} \left( {s,t^{\prime } } \right)^{\prime } ,\,t = 1,2, \ldots ,36;\,t^{\prime } = 1,2, \ldots ,36 \hfill \\ P_{2} \left( {s,t^{\prime } } \right),\,t = 37,38, \ldots ,72;\,t^{\prime } = 1,2, \ldots ,36 \hfill \\ \ldots \hfill \\ P_{18} \left( {s,t^{\prime } } \right),\,t = 613, \ldots ,648;\,t^{\prime } = 1,2, \ldots ,36 \hfill \\ \end{gathered} \right. $$

where P i is the variable for the ith model, observation, or reanalysis; s is a spatial gridpoint counter for locations; and t is a dummy time variable that describes the order of concatenation. Following Barnett (1999), the 6-month mean for each model, observational data set, or reanalysis is subtracted from the data sets on a grid point by grid point basis. Therefore, the array P′ is subjected to a normal EOF analysis of its covariance matrix. The common EOF produces the patterns of variability that the GCMs and observation share in common. Given the errors in the observations as shown in the Taylor diagram, rainfall “observations” are also an imperfect realization of the real world. Since we have done a separate comparison of models and observations in the previous section, by doing them together, the analysis will offer a somewhat different perspective, which will bring out the common physical processes underlying the dominant modes in the grand ensemble (models plus observation).

5.2 Intraseasonal WAM precipitation variability

The first common EOF for the 16-member ensemble is shown in Fig. 9a, and principal components (PC) 1 for each model and observational data (except CMAP) are shown in Fig. 10. The area covers West Africa as well as the adjacent eastern Sahel and the central African continent. Explained variances of eigenvalues for their first three PCs are shown in Fig. 9c.

Fig. 9
figure 9

Results from May-to-October 2003–2006 5-day average precipitation common EOF analysis. a CEOF first eigenvector; b CEOF second eigenvector; and c explained variances of CEOF first, second, and third eigenvectors

Fig. 10
figure 10

Precipitation first PC (red line) and area-averaged precipitation between 10°W and 10°E along latitudes 10°N and 15°N (blue line, mm day−1) for a NCEP/DOE Reanalysis II, b ECMWF Reanalysis Interim, cn WAMME simulations, p CPC GTS; and q ALMIP data. The vertical dashed line in each panel indicates the approximate observed monsoon onset pentad

The leading CEOF, which explains 30% of total variance for the entire data set, is a dipole pattern between the Sahel and the coastal area/central Africa with the zero line along about 7–8°N. The temporal evolution of the leading PC1 of observed precipitation shows that this mode in fact exhibits the WAM evolution (red lines in Fig. 10p, q). To confirm this point, the time evolution of the averaged rainfall at 10°N and 15°N over 10°W to 10°E from simulations and observations, which is based on a five-day running mean, is also shown in Fig. 10 (blue lines). The trends of the two lines in the figure are very consistent with the correlation coefficients for the three observational data being higher than 98%. To aid in discussion, we draw a vertical dashed line (12th pentad) in Fig. 10 indicating the CPC GTS and ALMIP monsoon onset date. Since the WAMME data set has no low-level daily wind available, we use precipitation to approximate the monsoon onset time. In this study, we follow Fontaine and Louvet (2006) approach. Two rainfall indexes are defined: a northern index (NI) averaging 5-day mean precipitation north of 7.5°N to 15°N and 10°W to 10°E and a southern index (SI) for the region extending between 7.5°N and the equator. A WAM onset index (WAMOI) is defined as the difference between the NI and SI standardized indexes, after elimination of time variability of less than 15 days. The onset date is defined as the first pentad of a 20-day period registering positive WAMOIs. This estimated time is consistent with the monsoon onset time identified by Sultan and Janicot (2000) based on observation and reanalysis.

CPC GTS PC 1 (red line in Fig. 10p) and ALMIP precipitation PC1 (red line in Fig. 10q) indicate that rainfall starts gradually increasing in the Sahel in May. During late June, a rapid rainfall increase/decrease occurs in the Sahel/coastal area coincidentally, indicating the WAM onset (12th pentad for CPC GTS and ALMIP). The onset date is consistent with the time when PC1 changes from negative to positive. The rainfall keeps increasing over the Sahel, especially in West Africa (Fig. 10a), for more than 2 months after the onset. After reaching a peak in August, a quick retreat occurs in early September. We also conduct a normal EOF analysis with the 6-months CMAP data from 1979 to 2004. Its first EOF produces a dipole pattern (not shown), very similar to the one shown in Fig. 9a. The second CEOF mode mainly emphasizes areas to the south of 10°N along the Guinean coast (Fig. 9b). Since CPC GTS, ALMIP data, and CMAP’s PC2s explain less than 10% of the variance (Fig. 9c) and show only small oscillations in temporal evolution over the entire period (not shown), we will not discuss them further in this paper. In fact, this pattern is very similar to the annual mean of 1949–2000 CMAP precipitation (not shown).

We further compare the PC1 s of observational data with model-produced ones to evaluate the models’ performance. In general, every model produces proper seasonal evolution in PC1 with correlation against CPC GTS PC1 higher than 85% for most models except CAM/CLM3.0 and GFS. Six models (COLA GCM, CAM/CLM3.0 (dust), GMAO/NSIPP1, FVGCM (no aerosol), CFS, and MRF) produce an onset time consistent with the observations. CAM/CLM3.0 (no dust), MRI, and UCLA GCM produce monsoon onset dates with more than 10 days difference from observation. The second dramatic rainfall increase in June in the UCLA GCM simulation, however, is similar to the observed monsoon onset (Fig. 10m). In addition, the mean precipitation between 10° and 15°N (blue lines in Fig. 10) indicates that most models, except Reanalysis II, FVGCM, and MRF (blue lines in Fig. 10h, i, n), start with rather high precipitation over the Sahel at the beginning of May. Compared with the monsoon’s gradual development process during the July–August timeframe, the observed monsoon retreat in early September is much faster (Fig. 10p, q). Most models adequately simulate this dramatic reduction in precipitation in early September.

Although WAMME GCMs produce a generally reasonable PC1 and monsoon onset dates, the WAMME models have difficulty in producing proper variance in the PCs. The first PC of every observational data set explains about 30% of its total variance (Fig. 9c). Variance explained by ERA-Interim is slightly high (37%) and by Reanalysis II is very close to observation (27%). Several GCMs (MRI, GMAO/NSIPP1, FVGCM (no aerosol), CFS, and HadaM3) produce PC1 variance within the range of the two reanalyses. CAM/CLM3.0’s PC1 explains too little variance (less than 13%), which probably is related to its main monsoon rain occurring over the Sahara (Fig. 1d), rather than over the Sahel. On the other hand, the PC1s of Reanalysis I, COLA GCM, FVGCM (aerosol), UCLA GCM, and MRF explain high variance (40–46%). For the second PCs, observational data explain less than 10% of variance. However, only ERA-Interim, COLA GCM, MRI, CFS, GFS, UCLA GCM, and MRF produce proper variance. The other two reanalyses and other GCMs show high variance in PC2 (more than 20%). For the third CEOF, the observational data and reanalyses show very low explained variability (less than 5%, Fig. 9c). Most models but CAM/CLM3, FVGCM, and CFS properly produce the variance in PC3.

By and large, the CEOF analysis produces physically meaningful first EOF spatial patterns and a monsoon precipitation evolution process. WAMME models’ simulations of evolution are generally consistent with observations. Some models have weaknesses in simulating parts of the precipitation evolution process, such as onset. The difficulty for most models is in producing proper precipitation variability in their first three PCs.

5.3 Intraseasonal daily surface temperature variability

The thermal gradient has been considered a major driving force for West African monsoon evolution (Sultan et al. 2007). CEOF analysis is conducted for surface temperature to evaluate its intraseasonal evolution and spatial characteristics. Since the ALMIP data set only covers a domain south of 20°N and Fig. 3 shows that ALMIP and CPC GTS have very similar temperature over the area close to the Sahara desert (north of 15°N), we fill in the domain to the north of 20°N in the ALMIP data set with the corresponding CPC GTS data for CEOF temperature analysis. We also conducted another CEOF analysis without ALMIP data, and the results for CEOFs and other models’ PCs are very similar and consistent. Figure 11 shows the first two CEOFs as well as variances explained by the first three CEOFs. Figures 12 and 13 show PCs for each model and observation. The monsoon onset date estimated from Fig. 10 is also presented to help identify the relation of temperature gradient development and monsoon evolution.

Fig. 11
figure 11

Same as Fig. 9 but for 2-m air temperature

Fig. 12
figure 12

Two-meter air temperature first PC (red line) and average 2-m air temperature between 10°W and 10°E along latitudes 20°N and 25°N (blue line, °C) for a NCEP Reanalysis II, b ECMWF Reanalysis Interim, cn WAMME simulations, p CPC GTS, and q ALMIP data. The vertical dashed line in each panel indicates the approximate observed monsoon onset pentad

Fig. 13
figure 13

Same as Fig. 12 except for second PC (red line), and average 2-m air temperature between 10°W and 10°E along latitudes 10°N and 15°N (blue line, °C)

The first CEOF emphasizes the surface temperature anomalies over the Sahara and accounts for 49% of total variance. We refer to this mode as the Sahara mode in this paper. Similar to Fig. 10, the time evolutions of the mean surface temperature at 20°N and 25°N over 10°W to 10°E, associated with the Sahara mode, from simulations or observations are also shown in each panel in Fig. 12 (blue lines). Their evolutions are all very consistent with their PC1s. The correlation coefficients are higher than 97%. Before the monsoon onset, the negative temperature anomaly in the Sahara is dramatically reduced and eventually becomes positive anomalies (Fig. 12p, q). The monsoon onset is quite consistent with the time about 10–15 days after the PC1 positive anomaly reaches its maximum, which remains at about the same level (a plateau) during the monsoon period (about 90 days). After August, the positive anomaly reduces dramatically and becomes negative in early September (Fig. 12p, q). Sultan et al. (2007) applied 1979–2000 Reanalysis II data, identifying common EOF leading modes for both temperature and low level wind in WAM development. The first mode identified in this study is consistent with their 1st EOF, albeit their PC1 does not have a plateau (lasting for about 40 days in CEOF PCs) and has a peak 15 days after monsoon onset, similar to UCLA GCM (Fig. 12m). We have also conducted a normal EOF analysis with 1979–2006 CPC GTS data. The results show features similar to Sultan et al. (2007). The plateau apparently is a special feature for WAMME-selected years, as shown in blue lines in Fig. 12p, q.

Reanalyses and most models, except CAM/CLM3.0 and MRI, properly simulate this evolution process with their PC1s’ correlation coefficients with observation higher than 90%. The PC1 evolution of GMAO/NSIPP1 and HadAM3 are very similar to observation. The PC1 temporal variations of CAM/CLM3.0 and MRI are different from other models, consistent with some PC2 features (to be discussed later) and their wet biases over the Sahara. The relationship between precipitation mode and temperature modes will be discussed further in Sect. 5.4. Reanalyses and most models’ simulations have a latitudinal band with low sea level pressure (a thermal low) around 20°N (not shown). Figures 11a and 12 reveal that most models present a dramatic increase in temperature near heat low regions before monsoon onset as in observations, indicating a close relationship between WAM onset and thermal low development in the Sahara.

CEOF 2 explains 29% of the variance and is dominated by the zonal temperature anomaly over the Sahel with a center in West Africa (Fig. 11b). We refer to this mode as the Sahel mode in this paper. The time evolution of the mean surface temperature of 10°N to 15°N and 10°W to 10°E from simulations and observations is also shown in Fig. 13 (blue lines) and is consistent with PC2s, with correlation coefficients generally larger than 98%. CPC GTS and ALMIP data’s PC2 are very similar. Observational data (Fig. 13p, q) show that before the monsoon onset, the positive anomaly is dramatically reduced from its maximum and close to about zero, consistent with the northward movement of the monsoon. After monsoon onset, the negative anomaly increases and reaches a maximum in August. After August, the negative anomaly reduces and temperature increases again (Fig. 13p, q). Reanalyses and most models correctly produce PC2’s evolution processes and magnitude, with correlation coefficients being higher than 90%. CFS, GFS, UCLA GCM, and MRF either keep the flat negative anomalies or simulate a general decreasing trend after the negative maximum in August, similar to Sultan et al’s PC of the second EOF as well as the PC of the second EOF based on 1997–2006 GTS data analysis. It seems that these models’ PC2s are closer to the long term climatology.

A significant difference between models and observational data is again in the explained variances. While they are only 29 and 37% for CPC GTS PC1 and ALMIP PC1, respectively, most models’ PC1s, except CAM3/CLM3.0 (26%), MRI (44%), and HadAM3 (41%), explain much higher variance, about 50–80%, more close to Reanalysis I (46%), Reanalysis II (52%), and ERA-Interim (47%) (Fig. 11c). The CPC GTS observational data have limited stations in the Sahara area, which probably causes low variance of CPC GTS data in PC1 compared with most models. Two observational data exhibit high variance in the Sahel mode, 50% for CPC GTS and 44% for ALMIP data. Except CAM/CLM3.0 (dust, 43%), MRI (39%), and HadAm3 (42%), reanalyses and most models’ PC2s explain less variance, around 30% (Fig. 11c). The uncertainty in explained variances by the Sahara mode and the Sahel mode indicate that further diagnostic studies based on observation and model simulations are necessary to understand these relationships and to improve model simulations. Since PC3 only explains 8% of the variance, we will not discuss it in this paper.

5.4 WAM evolution and changes in temperature gradient and latent heat evolution

Two temperature modes (i.e., PC1, Sahara mode, and PC2, Sahel mode) exhibit the evolution of the temperature gradient during the monsoon development process. The progress of the monsoon precipitation northward (precipitation PC1) is associated with the weakening of the Sahel mode (temperature PC2) and the enhancing of the Sahara mode (temperature PC1), which in turn enhances the meridional temperature gradient. The timing of monsoon onset is about 10–15 days after the peak of the summer temperature anomaly in the Sahara. The temperature gradient keeps increasing during the entire monsoon season as the Sahel mode (temperature PC2) gets weak. The dramatic reduction in precipitation PC1 occurs in early September when the trend of change of the heat gradient starts to reverse: the trends of temperature change in both Sahara mode and Sahel mode move in opposite directions. Most GCMs properly simulate these evolutions (Figs. 10, 12, 13).

To further analyze the relationship between monsoon precipitation evolution and temperature gradients, we calculate the correlation between precipitation PC1 and temperature PC1 and PC2 for each model (Table 2). The correlation of precipitation PC1 with surface temperature is positive with temperature PC1 (Sahara mode) and negative with temperature PC2 (Sahel mode) in CPC GTS and the ALMIP data, consistent with our previous discussions about heat anomaly development in the Sahara and Sahel. Both GTS and ALMIP data have close absolute correlation values for their surface temperature PC1 and PC2 with their precipitation PC1, around 0.7–0.8. The correlation of MRI and CAM3/CLM3.0’s surface temperature PC1 (Sahara mode) with precipitation PC1 is negative. Meanwhile, their PC2s (Sahel mode) with large explained variances show a high negative correlation with precipitation PC1. These two characteristics indicate that the location of their surface temperature maximum is probably too far north of the center of the Sahara mode, where a positively correlated relationship between precipitation and temperature gradient evolution should exist, thus consistent with their wet bias in the southern Sahara (Fig. 1d–f). On the other hand, CFS, UCLA GCM, and MRF’s temperature PC1s’ (Sahara mode) correlation with precipitation is dominant. The correlation of their Sahel mode, which explains very low variance (Fig. 11), with precipitation PC1 is rather low, indicating less effect of Sahel temperature anomalies on their WAM evolution simulation, inconsistent with observation.

Table 2 Simultaneous and 15-day lag/lead correlation coefficients

In Sect. 4, we showed that the skill of simulated spatial distribution of precipitation is highly related to that of temperature. The analysis here confirms the close temporal correlation between temperature gradient in West Africa and WAM precipitation evolution. To further explore this mechanism and identify the character of discrepancy in WAMME models’ simulations of precipitation, we calculated the lag/lead correlations between the precipitation PC1 and temperature PC1 and between the precipitation PC1 and temperature PC2. Our analysis shows that the lead correlations between precipitation PC1 and temperature PC1 (Sahara mode) from ALMIP, observation, reanalyses, and almost every model are substantially smaller than the simultaneous (i.e., zero lag/lead) correlations. After 15 days, no statistically significant correlations exist (not shown).

However, the lag correlations (R1LG) between precipitation PC1 and temperature PC1 are statistically significant (Table 2). With no lead correlation and higher lag correlation than the simultaneous one, the results here indicate that the Sahara mode leads precipitation PC1 because if the lead/lag relationship is merely a reflection due to variables’ autocorrelation, similar lead/lag correlations pattern (i.e. graduating reduction while the lead/lag time increases) should be expected. The lag correlation reaches a peak in 15 days, except for HadAM3 and UCLA, whose lag correlations reach peaks in 5 days. Only two GCMs, MRI and CAM3/CLM3.0, show anonymous lag correlations, consistent with their apparent wet biases as discussed earlier. Meanwhile, the lag correlations (R2LG) between the precipitation PC1 and temperature PC2 (Sahel mode) are substantially smaller than the simultaneous correlations (Table 2). Their lead correlation coefficients (R2LD) are quite high (Table 2) and persistent, indicating the temperature anomaly in Sahel response to the WAM rainfall evolution. Such relationship is also consistent with the negative correlation; i.e., more rainfall leads to lower surface temperature.

In a similar approach, which analyzed the lag/lead correlation between Sahel precipitation and Sahara geopotential height simulated by the CMIP3 models, Biasutti et al. (2009) identified that the variability of the Sahara low is a driver of interannual and decadal variability in Sahel rainfall and that the intermodal variation in the Sahara thermal low may cause the discrepancy of CMIP3 models in simulating Sahelian interannual rainfall. The results in this study indicate that the development of the Sahara mode leads the WAM precipitation seasonal evolution and that the divergence in its simulation in the WAMME models is relevant to the discrepancy in WAM precipitation simulation, consistent with Biasutti et al’s interannual-decadal study (2009). Meanwhile, the negative correlation between precipitation PC1 and Sahel mode and their lag/lead correlation patterns (Table 2) show the response of temperature in the Sahel to WAM precipitation, but this response enhances the gradient between the Sahara mode and the Sahel mode and then also contributes to the WAM precipitation evolution as discussed earlier. Further investigation of the link between physical processes, such as planetary boundary layer parameterization, land surface parameterization, and radiative transfer, and the deficiency in simulating the relationship between WAM precipitation evolution and temperature gradient development in the Sahel and Sahara will be an important task in the next WAMME experiment.

To further analyze the relationship shown in Fig. 7e–f, using 4-year monthly mean data, we also conduct similar calculations of 1-month lag/lead correlations between zonal wind at 600 hPa over 10°W to 10°E and 5°N to 15°N, where the maximum AEJ is located, and the latent heat flux and surface temperature over the Sahel (10°W to 10°E and 10°N to 15°N) (Table 3).Footnote 2 No significant 1-month lead correlations have been found (not shown). According to the geostrophic dynamics, a positive latitudinal temperature gradient will generate easterly thermal wind. When the atmospheric temperatures below mid-troposphere are higher to the north (i.e., over the Sahara) and lower to the south (i.e. over Sahel), the mean latitudinal temperature gradient is positive over tropical West Africa and the thermal wind (and hence the jet) is easterly. The larger negative 1-month lag correlations in the ALMIP, ERA-Interim, and Reanalysis II results shown in Table 3 confirm such a relationship and that the temperature is a driving force for the discrepancy in AEJ simulation. Furthermore, the positive simultaneous and lag correlations between zonal wind at 600 hPa and latent heat flux also (Table 3) demonstrate that increased latent heat flux gradient between Sahel and Sahara, where the latent heat flux is near zero, enhanced the AEJ; therefore, latent heat flux is another driving force in producing AEJ simulation discrepancy. Most models fail to produce larger lag correlations (R2LG in Table 3) than the simultaneous correlation as indicated in ALMIP and reanalyses, which may be associated to the poor AEJ simulations by the WAMME models.

Table 3 Simultaneous and 1-month lag correlation coefficients between zonal wind at 600 hPa and surface temperature and latent heat flux at Sahel

6 Discussion and summary

This paper briefly presents the WAMME project and serves as an introduction for other WAMME papers in this special issue. It evaluates models’ performances in simulating magnitudes, spatial distributions, and variability of WAM precipitation, surface temperature, and major circulation features at seasonal and intraseasonal scales. Major differences/deficiencies in simulations are identified and their character with respect to mechanisms of WAM spatial distribution and evolution are explored using observational data, especially ALMIP data.

The analyses indicate that models with specified SST generally have reasonable simulations of the pattern of the spatial distribution of WAM seasonal mean precipitation and surface temperature as well as the averaged zonal wind in latitude-height cross-section and low level circulation. However, the discrepancies of simulated spatial correlation, intensity, and variance of precipitation are large compared with observations. Furthermore, the majority of models fail to produce proper intensities of the AEJ and TEJ. Although individual models show weaknesses in different aspects, WAMME multi-model ensembles produce good WAM seasonal mean precipitation and surface temperature spatial distribution, intensity, and variability, better than reanalyses in many respects. However, when the majority of the models show a systematic bias, such as in the simulated intensity of the AEJ and TEJ, and AEJ evolution (not shown), the ensemble mean fails to yield better results, which suggests that while applying the ensemble mean for prediction, caution must be taken because the multi-model ensemble mean does not necessarily always produce the best result in all aspects compared with the individual models.

ALMIP data are used to analyze the associations between simulated surface variables and WAM precipitation and the AEJ, to explore model simulation differences, and to investigate the WAM mechanism. WAMME models have shown that spatial distributions of surface sensible heat flux, surface temperature, and precipitation minus evaporation (i.e., moisture convergence) are closely associated with the divergence of simulated spatial distribution of precipitation; while surface latent heat flux is closely associated with the AEJ.

We conduct CEOF analyses to identify major common modes of seasonal WAM precipitation and surface temperature anomaly evolutions for 2003–2006 to evaluate model simulations in these modes and to investigate the relationship between WAM precipitation evolution and development of the surface temperature gradient during the monsoon season. The PC1 of precipitation and PC1 (Sahara mode) and PC2 (Sahel mode) of surface temperature characterize the WAM precipitation evolution and northward movement of temperature gradient, respectively. CEOF analysis reveals distinct features in these modes during 2003–2006 compared to long-term climatological modes, despite similarities. The analysis of simultaneous and lag/lead correlations indicates that the WAM precipitation northward movement/retreat is closely associated with an enhanced/weakened Sahara mode and a weakened/enhanced Sahel mode. Although the WAMME models generally simulated these modes, there are large discrepancies in their explained variance in each mode. Furthermore, although the observed WAM evolution is associated with developments of both the Sahara mode and the Sahel mode, some WAMME models’ temperature gradient development relies solely on variations in a single mode, either the Sahara mode or Sahel mode, as evident in the variance explained by each mode. Meanwhile, it has also found that some models’ deficiencies in rainfall simulation can be traced to their ability in simulating the Sahara mode.

This paper provides an extensive quantitative assessment of common state-of-the-art GCMs in WAM simulations in the WAMME project with Taylor analysis, CEOF analysis, and other statistical analyses, and introduces the AMMA data for GCM applications for the WAM modeling study. Furthermore, taking advantage of the CEOF analysis with multi-model results and the AMMA data, the contribution of the Sahara mode and the Sahel model to the WAM precipitation evolution and to simulation discrepancies, as well as the contribution of latent heat flux and surface temperature over the Sahel to the AEJ, are identified. Such comprehensive GCM intercomparisons and analyses for WAM simulations, especially applying AMMA data to explore the WAM mechanisms and the character of model simulation discrepancies, have not been done before. Based on the results from this study, the WAMME will conduct further experiments to investigate the causes of major common deficiencies identified here and design specific experiments to evaluate/identify the relative contributions of external forcings in WAM variability. The present results should provide a good starting point as benchmarks for future studies to understand the roles of external forcing and internal dynamics in WAM variability.