1 Introduction

The Indian summer monsoon (ISM) is a fundamental part of the climate of the Indian subcontinent, as India receives about 80 % of its annual rainfall amount during the months of June until September (Basu 2007). The ISM is influenced by external factors, e.g. sea surface temperatures over the Indian Ocean, El Niño Southern Oscillation (ENSO), northern hemisphere snow cover or by teleconnections with the mid-latitudes (Hahn and Shukla 1976; Krishnan et al. 2009; Wang et al. 2003; Ashok et al. 2001) and is in parallel characterized by a high internal variability (Ajayamohan 2007). The spatial rainfall distribution over India is heavily influenced by steep orography at the western coast (Western Ghats) and in the north of the country at the Himalayan foothills. This study investigates the ability of the regional climate model COSMO-CLM to simulate subseasonal rainfall characteristics of the ISM system, as well as the models skill to represent observed wet and dry events within the monsoon season.

Several studies investigated the Indian monsoon using global climate models (GCM) under present, paleo and future climate conditions (Gadgil and Sajani 1998; May 2003, 2004, 2011; Wang et al. 2004; Dallmeyer et al. 2010). The main characteristics of the large-scale monsoon system, e.g. the atmospheric circulation are in good agreement with observations. However, these models lack a realistic representation of the spatial rainfall distribution, mainly due to the coarse resolved orography. As this study concentrates on results based on a limited area model, we refer to Wang (2002) and references therein for an extensive overview of the ISM system in GCM simulations.

Models with a higher spatial resolution and hence orography, such as regional climate models (RCM), show in general an improved representation of spatial rainfall patterns (Asharaf et al. 2012; Asharaf and Ahrens 2013; Dobler and Ahrens 2010; Dobler and Ahrens 2011; Srinivas et al. 2013; Lucas-Picher et al. 2011; Nguyen and McGregor 2009). The majority of these studies investigated seasonal and climatological rainfall distributions over India. Dobler and Ahrens (2010) used the model COSMO-CLM driven by ERA-40 reanalysis data (Uppala et al. 2005) to simulate the ISM under recent climate conditions. They found that the model is able to capture the spatial distribution of rainfall, but also noticed too wet conditions over the west coast of India, which they explain with enhanced convective activity over the warm tropical oceans surrounding India. Lucas-Picher et al. (2011) investigated the representation of the summer monsoon in four different RCM. The spatial distribution of mean rainfall is well captured by the models, but large differences on the regional scale were found.

Differences of model results compared to observations are mainly caused by insufficient parametrization schemes or deficits in soil conditions: Both, Srinivas et al. (2013) and Dash et al. (2006) showed that rainfall amounts over India in the regional climate models WRF (Srinivas et al. 2013) and RegCM3 (Dash et al. 2006) show high sensitivity to the choice of the convection scheme. Furthermore, previous studies (Saeed et al. 2009; Lucas-Picher et al. 2011) revealed positive surface temperature biases over the northwestern part of India. Saeed et al. (2009) suggested that this overestimation is due to the fact that irrigation over Pakistan is not taken into account in these model simulations. Asharaf et al. (2012) showed that wetter initial soil conditions in the model COSMO-CLM increases rainfall over northwestern India, mainly due to recycling of soil-moisture. Additionally, Saeed et al. (2009) showed that cyclones from the Bay of Bengal enter farer west into India, if soil moisture is increased. Besides the effect of soil moisture on cyclone paths, Sabin et al. (2013) showed that monsoon depressions are better captured in higher resolved simulations.

Despite all these differences regarding soil conditions and parametrization of convection, these studies indicate that current RCMs are able to depict seasonal characteristics of the ISM. However, there is only a limited number of studies analyzing rainfall over India on a time scale from days to weeks, although the ISM reveals a high intraseasonal variability. Active and break cycles with a length of several days until several weeks influence the seasonal rainfall pattern significantly (Goswami and Mohan 2001; Sperber et al. 2000). Long lasting dry spells affect seasonal rainfall over India and have been investigated by Bhat (2006) and Gadgil (2003) and references within. For example, in the year 2002 the annual rainfall deficit was about 21 %, which was mainly due to a dry event lasting for 36 days around July in which rainfall was 56 % below normal (Bhat 2006). Suhas et al. (2013) showed that this is linked to a reduced northward propagation of rainfall during July.

This study focuses on the intra-seasonal variability of the ISM and is therefore intended to extend the analyses carried out by Dobler and Ahrens (2010) using an earlier model version driven by ERA-40 reanalysis data at its lateral boundaries. Our results regarding the mean spatial rainfall distribution show only slight differences compared to results from Dobler and Ahrens (2010), with a general underestimation of rainfall over most parts of India (Fig. 1).

Fig. 1
figure 1

Mean rainfall during JJAS (mm/day) (1979–2007) for a Aphrodite, b COSMO-CLM and c difference between COSMO-CLM and Aphrodite

Here, focus is put on the representation of observed intraseasonal features of the ISM in COSMO-CLM, including (a) daily rainfall variability over the Indian subcontinent, (b) the northward propagation of monsoon intraseasonal oscillations as well as (c) the representation of wet and dry events and their related atmospheric circulation. To analyse changes of the ISM on the intraseasonal timescale using COSMO-CLM a validation based on e.g., mean rainfall distributions or seasonal winds is not sufficient as it gives no information about the models capability to represent subseasonal features. Thus, climatological features as, e.g. mean rainfall distribution are only discussed briefly. Here, ERA-Interim data is used to force COSMO-CLM at its lateral boundaries. Obviously, due to the generation of this reanalysis dataset, we expect ERA-Interim to capture the observed features of the ISM itself. However, it cannot be concluded a priory that COSMO-CLM is able to simulate these features even if driven by a good performing model at its lateral boundaries. Thus, the main goal of this study is to analyse in how far COSMO-CLM is able to depict the observed intraseasonal features of the ISM for the right reasons if driven by a good performing model at its lateral boundaries and not to show the performance of COSMO-CLM compared to ERA-Interim. As explained above, this validation is vital to analyse changes of the ISM on the intraseasonal time scale under different climate conditions using COSMO-CLM forced by global climate model simulations.

In this study, a COSMO-CLM simulation in a horizontal resolution of about 55 km is carried out. The choice of the resolution is motivated by the set up of the current CORDEX simulations, which have mainly been running with a similar resolution. Thus, the model configuration offers detailed structures of the ISM simulation compared to GCM simulations and is also suitable for simulations of about 30  years due to the need of reasonable computational resources.

The paper is structured as following: the model used in this study is described in Sect. 2 and the data used is presented in Sect. 3. Methods applied in this study are described in Sect. 4. The results are presented in Sect. 5, whereas this section is divided into three parts dedicated to the analysis of daily rainfall variability over the Indian subcontinent, the northward propagation of monsoon intraseasonal oscillations as well as the representation of wet and dry events and their related atmospheric circulation.

2 Model configuration

The COnsortium for SMall scale MOdeling (COSMO) model in Climate Mode (COSMO-CLM) is the community model of the German regional climate research. The model is based on the COSMO model which is used by several weather services across Europe for numerical weather prediction (NWP). The main differences between the climate version and the NWP version are given in Böhm et al. (2006). In this study, we use the model version 4.8, subversion 17 for the period from 1979 to 2011 over the domain shown in Fig. 2. The lateral boundary conditions are provided by the ECMWF ERA-Interim (Dee et al. 2011) reanalysis dataset. The simulation is performed on a rotated pole grid with a horizontal resolution of 0.5° × 0.5° (\(\approx\)55 km) and 32 vertical levels. The parameterizations used for the model integration include a radiation scheme following Ritter and Geleyn (1992), a micro-physics scheme including cloud water, rain and snow (Kessler 1969) and the Tiedtke mass flux convection scheme (Tiedtke 1989). The temporal discretization is performed with a leapfrog scheme at an integration time step of 150 s.

Fig. 2
figure 2

COSMO-CLM model domain and topography (greyscale) and All Indian monsoon rainfall (AIMR) region over India (red)

Soil moisture and soil temperature profiles at the start of the simulation are taken from a previous COSMO-CLM simulation which is integrated for the same model domain, also driven by ERA-Interim reanalysis data at its lateral boundaries but only from 1989 until 2001. We average soil moisture and temperature from this simulation for the 1st of January over all years and take these values to initialize the model simulation presented in this paper. This should reduce the models spin up as the average soil conditions from the earlier simulation are closer to the models climatology compared to the initialized values taken from reanalysis data. Similar to our approach Jaeger et al. (2009) used climatological values from a long-term simulation to initialize the model to validate the land-atmosphere interactions in the model COSMO-CLM.

3 Data

In this study we use the rainfall dataset carried out within the Aphrodite (Asian Precipitation—Highly-Resolved Observational Data Integration Towards Evaluation of Water Resources) project (Yatagai et al. 2012). For this study the Aphrodite (version: V1003R1) for the region “Monsoon Asia” with a horizontal resolution of 0.5° is used (Yatagai et al. 2012).

We use additionally Tropical Rainfall Measuring Mission (TRMM) (Huffman et al. 2007) as well as Global Precipitation Climatology Project (GPCP) (Huffman et al. 2001) satellite based rainfall estimates to investigate the uncertainty of rainfall measurements. Similar to Aphrodite, these two datasets are available on a daily time scale and are used to validate model results on a sub-monthly time scale (c.f. Table 1 for a detailed overview).

Table 1 Datasets used in this study as well as the temporal coverage, horizontal resolution, temporal resolution and the database used to derive this product (only given for precipitation datasets)

To investigate the performance of the model in comparison to its forcing model, we use ERA-Interim reanalysis data (Dee et al. 2011) over the Indian region. To calculate daily precipitation sums for ERA-Interim reanalysis data we use the forecast simulation starting at 00 and 12 UTC. Using the accumulated rainfall amount after 12 hours from both forecasts we obtain the daily precipitation sum.

For the investigation of the atmospheric circulation ERA-Interim reanalysis data of zonal and meridional wind components in 200, 500 and 850 hPa are used.

4 Methodology

Daily and weekly unfiltered rainfall amounts are calculated for the All-Indian Monsoon Rainfall (AIMR) region covering all India (Fig. 2). We apply the method used by Turner and Slingo (2009) to evaluate the model’s capability to capture the northward propagation of monsoon intraseasonal oscillations. Therefore, May to October 30–60 day bandpass-filtered precipitation is used. Originally, Turner and Slingo (2009) used the region from 70°–100° east and 20° south to 20° north. Due to the limited domain of the RCM, this investigation is only carried out from the equator to 20° north. Additionally, the monsoon intraseasonal oscillation index (MISO) developed by Suhas et al. (2013) is applied to the RCM data. Daily anomalies for this analysis are derived by removing the annual cycle (mean and first three harmonics) as done by Suhas et al. (2013). Zonally averaged daily precipitation anomalies derived from the GPCP dataset for the region 60.5°–95.5° east and 12.5° south to 30° north have originally been used by Suhas et al. (2013) for an Extended Empirical Orthogonal Functions (EEOF) analysis. The first two EEOF’s are projected on simulated daily precipitation anomalies and the first two PC’s are called MISO1 (PC1) and MISO2 (PC2). Due to the limited domain of the RCM simulation we project the EEOF’s, derived from GPCP data for the original domain, only on the region ranging from 60.5°–95.5° east and 0°–30° north for the model data. The PC’s calculated by projecting the EEOF’s on the limited domain will be called LMISO1 and LMISO2 (for observation data) and LMISO1-C and LMISO2-C (for COSMO-CLM model data), subsequently. The intensity is defined as \(\sqrt{({\text {MISO1}}^2+{\text {MISO2}}^2)}\) and will be refereed as MISO-Int, LMISO-Int and LMISO-C-Int, subsequently.

We use weekly rainfall amounts over the AIMR region as a basis to identify wet and dry events in model, observation and reanalysis data. This approach prevents investigating short lasting extreme events of e.g. one day, which have a smaller impact on the society of the Indian country. As modelled absolute rainfall amounts can show high biases compared to observations, which complicates the comparison between both datasets, we calculate the Standardized Precipitation Index (SPI) based on weekly rainfall sums to detect wet and dry events. To calculate SPI values a gamma distribution is fitted to the weekly rainfall sums. This distribution is afterwards transformed to a normal distribution with a standard deviation of 1 using a quantile–quantile mapping (Lloyd-Hughes and Saunders 2002). Hence, each precipitation amount can be converted to a SPI value, respecting the local climatological rainfall distribution, meaning that the SPI is a relative precipitation measure. Thus, SPI values can be compared between different regions and different times. Due to the fact that a gamma distribution is fitted to the data of each week separately, SPI values of one specific dataset are not associated with a fixed rainfall amount, meaning that e.g., a wet event (\(\hbox {SPI}>1\)) during the last week of September might be drier than a wet event (\(\hbox {SPI}>1\)) during the second week of June. Furthermore, SPI values can be compared between different datasets, but their associated rainfall amounts might be different (even for the same week) as the gamma distribution is fitted for each dataset separately.

To identify dry and wet events in model and observation data, SPI values are divided into three and five bins of discrete values (Table 2). To assess the skill of COSMO-CLM in simulating dry and wet events we calculate the Gerrity Skill Score (GSS) which is based on the Gandin–Murphy Skill Score (Gerrity 1992). The GSS gives an objective measurement of the model skill to simulate a multi-category variable with natural ordering. It ranges between minus infinity and +1, with +1 indicating a perfect agreement between model and reference dataset and values below zero indicating results less skillful compared to climatology. The scoring weight is higher if the model represents rare events correctly (e.g. severe droughts) in comparison to regular occurring events. Additionally, the penalty is lower if the model simulates an event similar to the observed one (e.g. severe drought is observed, but model shows a moderate drought) in comparison to the case when the model simulates a complete different event (e.g. observed is a severe drought and model predicts a flood).

Table 2 Definition of SPI classes

To investigate the spatial variability on a weekly time scale we calculate SPI time series for 34 meteorological subdivisions of India (provided by the Indian Institute of Tropical Meteorology). These time series from model, reanalysis and observation data are further used for a single component analysis.

5 Results

5.1 Validation of daily rainfall variability in COSMO-CLM

To evaluate the simulated seasonal precipitation of COSMO-CLM compared to reanalysis and observation data we calculate the mean daily rainfall amounts over the AIMR region. Furthermore, we apply a 10-day running mean to smooth day-to-day variability in the datasets (Fig. 3). COSMO-CLM as well as ERA-Interim are able to capture the observed seasonal cycle over the AIMR region with correlations of 0.95 for COSMO-CLM and 0.99 for ERA-Interim (Fig. 3). This well marked seasonal cycle is characterized by lower rainfall amounts during June and September and a maximum during the peak monsoon season in July and August. It is found that the overall underestimation of rainfall during JJAS is mainly due to reduced rainfall during July, August and September in COSMO-CLM compared to observations.

Fig. 3
figure 3

Climatological daily precipitation (1979–2009) over India (AIMR) for Aphrodite, ERA-Interim reanalysis data as well as COSMO-CLM simulation data. Rainfall amounts are smoothed with a 10-day running mean filter

To analyze anomalies of daily rainfall amounts, we calculate correlation coefficients, RMSE and standard deviation for ERA-Interim and COSMO-CLM model output over the AIMR region (Fig. 4), illustrated in a Taylor diagram. The diagram is normalized to the standard deviation of the reference dataset (Aphrodite).

Fig. 4
figure 4

Taylor diagram for precipitation anomalies for: COSMO-CLM, ERA-Interim, TRMM and GPCP. Analysis are performed for AIMR for 1979 until 2007 (ERA-Interim and COSMO-CLM only) as well as 1998 until 2007 (all datasets) with Aphrodite as reference dataset

The temporal correlation of daily rainfall anomalies is similar in COSMO-CLM (\(\approx\)0.65) compared to ERA-Interim data (\(\approx\)0.7) regarding the AIMR region. RMSE is higher for COSMO-CLM but the variability of daily rainfall in the model is in better agreement with observations than for ERA-Interim data.

As gridded precipitation products can differ significantly, even if having the same source of data, we added the satellite based rainfall products TRMM and GPCP to compare with daily rainfall anomalies of COSMO-CLM. As these datasets are only available since 1998 (1997, respectively for GPCP) RMSE, correlation and standard deviation are computed for all four datasets over the period 1998 until 2007. Correlations of ERA-Interim and COSMO-CLM data are nearly the same for the period 1998 until 2007 compared to 1979 until 2007, which indicates a fairly constant skill of COSMO-CLM with increased integration time. TRMM and GPCP show both lower correlation coefficients compared to COSMO-CLM and ERA-Interim, which is consistent with rain gauge data from Thailand and TRMM data (Chokngamwong and Chiu 2008).

Overall, COSMO-CLM is in good agreement regarding the mean seasonal cycle as well as the day-to-day variability of daily rainfall over the Indian subcontinent.

5.2 Monsoon intraseasonal oscillation

Here, two different methods are applied to investigate the models capability to simulate the northward propagation of monsoon intraseasonal oscillations. First, we use the method based on lagged correlations between zonally averaged rainfall and rainfall at a reference point over India, which is called BSISO [Boreal Summer Intraseasonal Osciallations, see Turner and Slingo (2009)]. Next, we apply the method developed by Suhas et al. (2013), which gives information about the models capability to simulate these monsoon intraseasonal oscillations (MISO) on a yearly time scale, which is not possible using the former method. We calculate lag correlations of 30–60 day bandpass-filtered precipitation zonally averaged over the region 70°–100° east against a reference point at 85° east and 12.5° north (see Sect. 4) to evaluate the model’s capability to simulate observed features of the BSISO (Fig. 5). Fig. 5c shows that COSMO-CLM is capable to simulate the observed northward propagation (Fig. 5a). However, lead-lag correlation coefficients are somehow smaller than observed. ERA-Interim shows a higher ability to capture the observed northward propagation (Fig. 5b).

Fig. 5
figure 5

Cross-Correlation of zonally average precipitation (for longitudinal section between 70°–100°) against a reference point near 85° east and 12.5° north for a GPCP, b ERA-Interim and c COSMO-CLM

As these analyses clearly show the general ability of the model to simulate this feature of the ISM, it does not give any conclusion about the model’s ability to simulate the observed northward propagation in specific years. We apply the method developed by Suhas et al. (2013) to the model data. As pointed out in Sect. 4, it is not possible to apply the originally proposed method to the RCM data, as the southern boundary is located at around 5° south of the equator. Thus, before carrying out model based results we analyzed in how far results using the original domain (12.5° south–30° north) and the smaller domain (0°–30° north) differ if using GPCP observations. Therefore, we perform the EEOF analysis for the original domain and project the first two EEOFs on the GPCP data to obtain the MISO1 and MISO2 indices. In another step, we use the same EEOFs (from the original domain) but use only the values from 0°–30° north, which are then projected to the GPCP precipitation from 0°–30° north. We find high correlations for MISO1 and LMISO1 (0.94) and MISO2 and LMISO2 (0.97), respectively. Additionally, the correlation of the intensity between MISO-Int and LMISO-Int of the original and the small domain yields a correlation of (0.90), which confirms the general usability of this method on the smaller domain.

Finally, we project the EEOF (derived from GPCP data) on COSMO-CLM simulated precipitation anomalies to derive the corresponding LMISO1-C and LMISO2-C indices for COSMO-CLM. We find strong correlation between LMISO1-C and LMISO1 (0.76) and LMISO2-C and LMISO2 (0.71), respectively, suggesting that the temporal development of the northward propagation is well captured by the model. However, the intensities (LMISO-Int and LMISO-C-Int) show a weaker linear relation of only 0.50. Figures 6 and 7 shows the observed northward propagation (derived from GPCP data), as well as the northward propagation as found in COSMO-CLM and in ERA-Interim reanalysis data in a MISO phase diagram for the years 2005 and 2002. The MISO phase diagram shows the approximate location of anomalous high precipitation. In general, phases 1–4 are characterized by lower than average rainfall amounts over central India, whereas phases 5–8 are characterized by higher amounts over central and northern India (see Suhas et al. 2013).

Fig. 6
figure 6

Summer season 2002 LMISO phase diagram for a GPCP, b ERA-Interim and c COSMO-CLM

Fig. 7
figure 7

Summer season 2005 LMISO phase diagram for a GPCP, b ERA-Interim and c COSMO-CLM

2002 was marked by a intense MISO event during June, followed by a minor MISO event in July. This minor event during July led to a huge drought affecting all India. ERA-Interim is in good agreement with observations (Fig. 6a), but it underestimates the MISO intensity during June (Fig. 6b), which is also seen for COSMO-CLM (Fig. 6c). For both, reanalysis and COSMO-CLM only a minor event during July is identified, as no activity in phase 5–8 is found. However, COSMO simulates a MISO event with too high values especially in phase 1 and phase 2. During 2005 COSMO-CLM depicts the observed MISO events (Fig. 7a) quite well, with a strong event taking place during September (Fig. 7c).

Overall, COSMO-CLM is capable in simulating the observed northward propagation of rainfall during the summer monsoon season. Temporal variability of these MISO’s is in good agreement with observations, even though the model has some problems in simulating the observed strength of observed MISO’s.

5.3 Assessment of dry and wet events

Dry and wet events are identified using the SPI, derived from weekly rainfall amounts (see Sect. 4).

5.3.1 Identification of dry and wet events

The skill of COSMO-CLM and ERA-Interim to represent dry and wet conditions is measured using the Gandin–Murphy Skill Score with the extension from Gerrity (GSS) (see Sect. 4). Generally, during observed normal conditions (\(-1\le\) SPI \(\le 1\)), ERA-Interim and COSMO-CLM simulate normal conditions as well. Only for a small number of these observed events with normal conditions, the model simulates moderate dry or wet events. COSMO-CLM is not able to simulate the observed extreme wet events (\(\hbox {SPI}>2\)) with the same magnitude at the same time. Here, ERA-Interim is also only able to depict a small number of these observed events. The ability of both models simulating moderate to extreme dry events (\(-2<\hbox {SPI}<-1\)) is better compared to wet events.

GSS values (Table 3) of 0.52 for ERA-Interim indicate a higher ability of this dataset to detect wet and dry events at the same time with the same intensity than COSMO-CLM (GSS: 0.31). Additionally, we calculate GSS based on the three SPI classes: dry (\(\hbox {SPI}\le -1\)), normal (\(-1\le \hbox {SPI}\le 1\)) and wet conditions (\(\hbox {SPI}>1\)). GSS values for both ERA-Interim and COSMO-CLM are increased compared to five SPI categories. The higher skill of ERA-Interim is due to the higher probability to detect observed extreme wet events, whereas COSMO-CLM is not able to simulate these events at the same time with the observed intensity. Due to the small sample size of only 29 years for all datasets, results for five bins should be discussed carefully as the number of cases for extreme wet/dry events is very small, which makes results less robust. For three bins (−1, 0, 1) results are more robust as the number of cases in each bin is higher.

Table 3 Gerrity skill scores (GSS) for COSMO-CLM, ERA-Interim, TRMM and GPCP compared to Aphrodite observation dataset based on SPI values derived from weekly rainfall over AIMR region

To investigate whether the GSS of this COSMO-CLM model simulation is representative, we calculate GSS also for other precipitation datasets which are available on a daily resolution. As discussed in Sect. 5.1, TRMM and GPCP satellite based daily rainfall estimates have a lower temporal correlation with Aphrodite observation data compared to COSMO-CLM. Chokngamwong and Chiu (2008) found that the relation between gauge measurements over Thailand and TRMM strengthens when averaging TRMM data over five days or longer time periods. We find a similar behavior for both: TRMM and GPCP compared to Aphrodite observations (not shown). Thus, it is reasonable to calculate wet and dry spells on a weekly timescale. Unfortunately, both datasets, TRMM and GPCP, are only available since 1997 or 1998, respectively. Thus, when calculating weekly sums of precipitation only 180 weekly rainfall amounts values are left for the summer monsoon season (JJAS) from 1998 until 2007. Concerning the statistical representativeness of this sample we use only three SPI categories (Table 3).

GSS is highest for ERA-Interim data (0.62), followed by TRMM (0.59), GPCP (0.52) and COSMO-CLM (0.49). For the shorter period (1998–2007) the GSS of COSMO-CLM is nearly the same as for the whole period (1979–2007), indicating a stable skill in time. As discussed in Sect. 5.1 correlation of daily rainfall anomalies with Aphrodite observations is higher for COSMO-CLM than for TRMM and GPCP. Furthermore, temporal correlation of weekly precipitation is similar for COSMO-CLM compared to both satellite products. The reason for higher GSS values for TRMM and GPCP is due to the higher probability of capturing dry events. Normal and wet condition are captured nearly equal in COSMO-CLM compared to TRMM and GPCP. As the scoring weight of rare events (e.g. dry events) is higher compared to regular events (see Sect. 4), GSS is higher in TRMM and GPCP data. These results have to be interpreted carefully as the gamma distribution fit to derive the SPI values is performed using only 10 years, which is obviously not as robust as fitting a gamma distribution to a 30 year long dataset. Nevertheless, these results can be viewed as an estimate of the value of the COSMO-CLM simulation.

Overall, the ability of COSMO-CLM to simulate extreme precipitation events is comparable to those represented in observation or ERA-Interim data. Thus, COSMO-CLM is useful to investigate those extreme events over India. GSS for different subregions are smaller compared to All-Indian GSS, which reflects the models ability to simulate the observed temporal evolution of events but not their spatial occurrence.

5.3.2 Spatial variability

Based on weekly SPI values for 34 subregions, we perform an empirical orthogonal function (EOF) analysis. As pointed out by Wu et al. (2007) SPI values on short timescales and especially in dry climates can be misleading as the SPI values are not normal distributed due to too many zero rainfall events. In the beginning of June and end of September some subregions in northwestern India receive very little rainfall amounts which might lead to misleading SPI values. To address this fact, we compute EOF analysis using data for the whole summer monsoon season (JJAS) and for the peak monsoon season (July–August), only. As the results from both time periods indicate similar results in general, we discuss EOF patterns for the complete season.

In principle, the first EOF shows a variability pattern with a sharp contrast between northeastern/southern India and the rest of the subcontinent (Fig. 8a–c), which is more pronounced in Aphrodite than in ERA-Interim (correlation: 0.92) or COSMO-CLM (correlation: 0.86) data. As we perform an EOF analysis based on SPI values for 34 subregions, we count every subregion only once when computing the correlation. The second EOF (Fig. 8d–f) shows a strong north–south contrast of the loadings in observation data, which is also present in both: reanalysis (correlation: 0.98) and model data (correlation: −0.89). The third EOF (Fig. 8g–i) indicates a variability pattern associated with positive loadings over southern/northeastern India and negative loadings over northwestern India, which is well captured by ERA-Interim (correlation: 0.81). EOF 3 and 4 are in reversed order comparing COSMO-CLM to Aphrodite and ERA-Interim. Nevertheless, explained variances of both patterns are similar and comparing EOF 4 of COSMO-CLM with EOF 3 of Aphrodite indicates a high agreement (correlation: −0.73). EOF 4 (Fig. 8j–l) in Aphrodite data shows positive values over the core monsoon region and negative values over northwestern and southern India. This pattern is well captured by both ERA-Interim (correlation: 0.80) and by the third EOF pattern of COSMO-CLM (correlation: −0.71). The first EOF pattern in COSMO-CLM accounts less explained variance (28.4 %) than in Aphrodite (32.8 %). For the following three EOF’s explained variances have similar magnitudes.

Fig. 8
figure 8

SPI EOF loading patterns, based on weekly JJAS precipitation amount from 1979 until 2009, derived for Aphrodite, ERA-Interim and COSMO-CLM. The values within the square brackets represent the explained variance of each loading pattern

As this study focuses on the validation of the models spatial variability on a weekly time scale, the complete physical explanation of these EOF patterns are beyond the scope of this paper. It is worth mentioning that the first EOF pattern is similar to earlier studies from Krishnamurthy and Shukla (2000) and Sontakke and Singh (1996) even though there are some differences which might be caused by the relatively large regions used in our study. Overall, COSMO-CLM shows high correlations for the first four EOF patterns. As these four EOFs explain over 60 % of the natural variability, we conclude that spatial variability on a weekly timescale is well captured by the model.

5.3.3 Coupling between rainfall anomalies and the large-scale circulation anomalies on intraseasonal time scales

Rainfall variability on the intraseasonal time scale over India is predominately determined by changes in the large-scale atmospheric circulation. Thus, it is important for the model to capture the coupling between large-scale anomalies in atmospheric conditions and rainfall anomalies over India. We investigate in how far differences in dry and wet events between observations and COSMO-CLM can be attributed to differences in the large-scale atmospheric circulation between both datasets. For this reason composite analysis for three different categories are carried out: (1) events, which are observed and simulated with a similar magnitude by COSMO-CLM, (2) events, which are observed and simulated by ERA-Interim but not simulated with a similar magnitude by COSMO-CLM and (3) events, which are not observed nor simulated by ERA-Interim but simulated by COSMO-CLM.

As shown in Sect. 5.3.2, the first EOF pattern shows a strong contrast between northeastern India and the rest of the subcontinent, indicating an antisymmetric behavior of SPI between these regions. As mechanisms leading to extreme events might be different between northeastern India and the rest, we only use SPI information from the region covering all India without the northeastern area for the following investigation. To ensure a reasonable large number of extreme events, we classify events into normal conditions (\(-1<\hbox {SPI}<1\)), dry conditions (\(\hbox {SPI}<-1\)) and wet conditions (\(\hbox {SPI}>1\)). As the SPI is normal distributed with a standard deviation of one, 66 % of SPI values are between \(-\)1 and +1. Thus, there are 82 dry and 81 wet weeks in the observational dataset (about 31 % of the whole dataset).

5.3.3.1 Dry events:

Out of the observed 82 dry events, COSMO-CLM simulates 49 (\(\approx\)60 %) dry events at the same time with a similar magnitude, whereas ERA-Interim captures 64 (\(\approx\)78 %) dry events at the same time with a similar magnitude, showing the higher ability of ERA-Interim in depicting these events. 21 (\(\approx\)26 %) observed dry events are not simulated by COSMO-CLM but found in ERA-Interim. Thus, 12 (\(\approx\) 15% of all 82) observed dry events are not found in ERA-Interim and COSMO-CLM, indicating that the boundary conditions are the main reason for the mismatch of these events. Figures 9, 10 and 11 show composite anomalies of vorticity in 200, 500 and 850 hPa for the three defined categories of events.

Fig. 9
figure 9

Anomaly composites of vorticity (shaded) + wind field (arrows) in 850 hPa for COSMO-CLM (a, c, e) and ERA-Interim data (b, d, f). a, b Dry events simulated by COSMO-CLM; c, d dry event not simulated by COSMO-CLM; e, f: dry events simulated by COSMO-CLM only. Shaded areas are significant with an p value of 5 % (t test)

Fig. 10
figure 10

Anomaly composites of vorticity (shaded) + wind field (arrows) in 500 hPa for COSMO-CLM (a, c, e) and ERA-Interim data (b, d, f). a, b Dry events simulated by COSMO-CLM; c, d dry event not simulated by COSMO-CLM; e, f dry events simulated by COSMO-CLM only. Shaded areas are significant with an p value of 5 % (t test)

Fig. 11
figure 11

Anomaly composites of vorticity (shaded) + wind field (arrows) in 200 hPa for COSMO-CLM (a, c, e) and ERA-Interim data (b, d, f). a, b Dry events simulated by COSMO-CLM; c & d dry event not simulated by COSMO-CLM; e, f dry events simulated by COSMO-CLM only. Shaded areas are significant with an p value of 5 % (t test)

Dry events, which are simulated by COSMO-CLM (category 1) are associated with significant positive vorticity anomalies north of Pakistan and negative anomalies over Bangladesh in 200 hPa (Fig. 11b). In 500 hPa the positive anomaly is again found over Pakistan region and a negative vorticity anomaly over central India is found (Fig. 10b). In 850 hPa the positive anomaly over Pakistan is not present, but the negative anomaly over central India is much more pronounced and stretches over large parts of the Arabian Sea (Fig. 9b). These results compare well with those found by Krishnan et al. (2009). All these features are captured by COSMO-CLM, meaning that the mechanisms are well represented in the model (Figs. 9a, 10a, 11a). Thus, an observed dry event on the intraseasonal time scale, which is captured by COSMO-CLM is associated with an enhanced low to mid-tropospheric cyclonic activity over central India, which leads to a decreased moisture flow into the Indian subcontinent. Additionally, these events are associated with a mid to upper-tropospheric anti-cyclonic vorticity anomaly over Pakistan, which leads to an increased inflow of dry air from the mid-latitudes into northwestern India (Krishnan et al. 2009).

For observed dry events, which are not simulated by COSMO-CLM (category 2), changes in the upper-tropospheric circulation are not significant over most of the region, however, signs of a cyclonic anomaly are found over Pakistan again (Fig. 11d), which is also captured by COSMO-CLM (Fig. 11c). In line with this, no significant cyclonic vorticity anomaly is found over Pakistan in 500 hPa for ERA-Interim and COSMO-CLM (Fig. 10c, d). However, ERA-Interim reveals a significant anti-cyclonic vorticity anomaly over central India, which is not found in COSMO-CLM (Fig. 10c, d). Additionally, COSMO-CLM shows no significant vorticity anomalies over most of India in the lower troposphere, whereas a significant anti-cyclonic vorticity over large parts of India is found in ERA-Interim (Fig. 9c, d). Thus, the reason why COSMO-CLM does not simulate these events is due to differences in the lower troposphere as the model does not simulate an anti-cyclonic circulation anomaly over India in these cases, which does coherently not lead to an extreme deficit in rainfall. However, it is worth mentioning that for all these 21 events COSMO-CLM simulates dry conditions (\(\hbox {SPI}<0\)), however not as dry as observed (\(\hbox {SPI}<-1\)).

For dry events, which are only simulated by COSMO-CLM but neither observed nor simulated by ERA-Interim (category 3), upper- and mid-tropospheric circulation anomalies do not reveal significant cyclonic anomalies over Pakistan in ERA-Interim and COSMO-CLM (Figs. 10e, f, 11e, f). In contrast, COSMO-CLM shows a significant low-level anti-cyclonic vorticity anomaly over India, which is not found in ERA-Interim (Fig. 9e, f). Thus, differences in SPI values for these events are caused by a lower-tropospheric anti-cyclonic vorticity anomaly in COSMO-CLM, which is not present in ERA-Interim.

5.3.3.2 Wet events:

An analog analysis for wet events affecting India is performed. Out of the total observed 81 wet events, 43 (\(\approx\)53 %) observed wet events are also simulated by COSMO-CLM (category 1) and the main characteristic leading to these events is a strong cyclonic vorticity anomaly over India in 850 hPa (Fig. 12b), which is well captured by the model (Fig. 12a).

18 (\(\approx\)22 %) observed wet events are not simulated by COSMO-CLM but found in ERA-Interim reanalysis (category 2). The vorticity anomaly composite for reanalysis data shows again a strong positive vorticity anomaly (Fig. 12d), which is not captured by COSMO-CLM (Fig. 12c).

Fig. 12
figure 12

Anomaly composites of vorticity (shaded) + wind field (arrows) in 850 hPa for COSMO-CLM (a, c, e) and ERA-Interim data (b, d, f). a, b wet events simulated by COSMO-CLM; c, d wet event not simulated by COSMO-CLM; e, f wet events simulated by COSMO-CLM only. Shaded areas are significant with an p value of 5 % (t test)

30 wet events are simulated by COSMO-CLM but neither found in reanalysis nor in observation data (category 3). In these cases COSMO-CLM simulates a strong cyclonic vorticity anomaly over India (Fig. 12e) which is stronger than in reanalysis data (Fig. 12f).

In the upper troposphere (200 hPa) a similar vorticity anomaly pattern with reversed sign is found for wet events compared to dry events (not shown). However, these upper-tropospheric features are not present in the mid-troposphere (500 hPa), which is characterized by a cyclonic vorticity anomaly over India only (not shown).

Our results suggest that dry events which are caused by both, changes in the lower tropospheric circulation over India as well as changes in the upper tropospheric circulation over Pakistan, are better captured by COSMO-CLM than dry events which are only caused by changes of the lower tropospheric circulation over the Indian subcontinent only. One explanation is that tropical-extratropical interactions are mainly induced by a strong upper level cyclonic anomaly over Pakistan (Krishnan et al. (2009)), which is forced by lateral boundary conditions supplied by ERA-Interim reanalysis. In contrast to this, the lower level anticyclonic anomaly over India might be much more influenced by changes of the circulation over the tropical oceans. In our model setup COSMO-CLM is forced by observed sea surface temperatures but it is likely that model parameterizations might play an important role in configuring the local climate conditions over the tropical oceans, which in turn affects monsoon variability over India.

6 Conclusion

In this study we investigate the ability of the RCM, COSMO-CLM, driven by ERA-Interim reanalysis data at its lateral boundaries to represent the intraseasonal variability of the ISM. Furthermore, we focus on daily rainfall variability, northward propagation of monsoon intraseasonal oscillations and longer lasting extreme precipitation events.

We find that the general underestimation of rainfall in COSMO-CLM during the summer monsoon season [found by Dobler and Ahrens (2010)], is mainly due to too little rainfall amounts during the period from July to September. However, modelled daily rainfall anomalies over all India show a high correlation with observations.

We investigate the model’s capability to simulate observed northward propagation of rainfall during the summer monsoon season (Turner and Slingo 2009; Suhas et al. 2013). It is found that COSMO-CLM simulates the observed northward propagation, however lag correlations between zonally averaged rainfall and rainfall at a reference point over India reveal smaller correlations for the model than for observations.

Northward propagation for specific years have been investigated using the method developed by Suhas et al. (2013). The originally proposed method to calculate MISO indices (Suhas et al. 2013) has been modified to suit the limited area of the RCM simulation. It is shown that these indices show similar results for the smaller domain compared to the original domain. The application of this approach on COSMO-CLM data shows that the model is able to simulate the temporal evolution of the LMISO1 and LMISO2 indices. However, the correlation of the intensity (LMISO-Int and LMISO-C-Int) between observed and simulated events is smaller compared to the individual correlation of both time series.

Further on, the models ability of simulating longer lasting extreme intraseasonal events has been investigated. To account for systematic model errors regarding absolute rainfall amounts, we use the standardized precipitation index (SPI) to detect extreme events on a weekly time scale. The spatial variability of COSMO-CLM regarding SPI timeseries is in good agreement with observational data. An analysis of the models ability to represent observed extreme dry and wet weeks at the same time with a similar magnitude, shows reasonable skill of the model. Even if ERA-Interim reanalysis data shows higher skill, COSMO-CLM is comparable to results deduced from satellite based rainfall estimates from TRMM and GPCP data. The latter has to be discussed carefully as only ten years of data are available for TRMM and GPCP data. Thus, the fit of the gamma distribution, which is needed to derive the SPI values, is likely to be not as robust as using a longer dataset.

To identify driving mechanisms leading to dry and wet events on a weekly time scale, we analyze atmospheric circulation in 200, 500 and 850 hPa during dry and wet events over the Indian subcontinent excluding the northeastern part of the country. Three different categories are defined: (1) found in Aphrodite and COSMO-CLM, (2) found in observations and ERA-Interim but not simulated by COSMO-CLM and (3) simulated by COSMO-CLM but neither observed in Aphrodite data nor simulated by ERA-Interim. It is found that COSMO-CLM performs better in simulating dry events associated with an anomalous upper tropospheric cyclonic vortex over Pakistan and an anomalous lower tropospheric anti-cyclonic vortex over India compared to dry events which are characterized by an anomalous lower tropospheric anticyclonic vortex over India only. We hypothesize that the upper level vortex over Pakistan is largely influenced by boundary conditions supplied by ERA-Interim but that the lower troposphere anticyclonic vortex over India is heavily influenced by the surrounding tropical oceans. Due to the high sensitivity of RCM results to physical parameterizations, e.g. convection (Srinivas et al. 2013), we assume that improvements of these parameterizations are necessary to enhance the skill of COSMO-CLM to capture events which are mainly forced by the anomalous anti-cyclonic vortex over India. A slighly better agreement between model and observations for dry events compared to wet events is found. This might be caused by the fact that the upper tropospheric circulation anomaly found for dry events is also present in the middle troposphere, which is not the case for wet events. Thus, the circulation during dry events might be more influenced by upper-tropospheric features, which are in turn in better agreement with observations compared to lower tropospheric circulation anomalies in COSMO-CLM. Even though not all observed dry/wet events could be identified in COSMO-CLM at the same time with a similar magnitude it is found that the mechanisms between lower level circulation anomalies and rainfall over India on the intraseasonal timescale are well represented in COSMO-CLM.