Diverse skill of seasonal dynamical models in forecasting South Asian monsoon precipitation and the influence of ENSO and IOD

The South Asia Seasonal Climate Outlook Forum (SASCOF) issues seasonal tercile precipitation forecasts to provide advance warning of anomalously dry or wet monsoon seasons in South Asia. To increase objectivity of the SASCOF seasonal outlook, the World Meteorological Organisation recommends using a multi-model ensemble combining the most skilful dynamical seasonal models for the region. We assess the skill of 12 dynamical models at forecasting seasonal precipitation totals for 1993–2016 for the southwest (June–July–August–September) and northeast (October–November–December) monsoon seasons at regional and national levels for Afghanistan, Bangladesh, Nepal, and Pakistan, using identical forecast periods, hindcast initialisation months and domain used at the SASCOF. All models demonstrate positive skill when regionally-averaged, especially for the southwest monsoon season, noting considerable spatial differences. Models exhibit highest skill where correlation between observed precipitation and El Niño Southern Oscillation (ENSO) is highest, e.g., central/north India and Nepal during the southwest monsoon, and Afghanistan and north Pakistan during the northeast monsoon. Model skill is especially low in northwest India and northeast of South Asia during the southwest monsoon, e.g., Bangladesh (despite high precipitation totals) coinciding with a weak ENSO teleconnection. The Indian Ocean Dipole teleconnection is less pronounced in the southwest monsoon season, whereas the spatial pattern for the northeast monsoon closely resembles that of ENSO. Due to high variability in model skill, we recommend basing the SASCOF forecast on a multi-model ensemble of all models but discounting poorly performing models at the national level.


Introduction
South Asia is the most densely populated geographical region in the world and highly vulnerable to anomalous climatic conditions (Shaw et al. 2022). Droughts and floods can lead to widespread adverse impacts on livelihoods and the economy, especially in the agriculture (Ray et al. 2015;Aryal et al. 2020) and water (Srivastava et al. 2020) sectors. Seasonal forecasts, if skilful, can provide information on how precipitation may deviate from normal several months ahead, supporting governments, organisations, and communities in preparing for anomalous climatic conditions and mitigating humanitarian disasters (e.g., Golding et al. 2019;Bett et al. 2020;Daron et al. 2020).
The South Asian monsoon is the principal source of precipitation for most of the region. There are two distinct monsoon seasons based on the prevailing wind direction: the southwest (SW) monsoon and the northeast (NE) monsoon. The fundamental driving mechanism of the monsoon circulation is the pressure gradients (ocean to land) established by thermal contrasts between the landmass of Asia and the large extent of ocean to its south. Precipitation associated with the monsoon rarely reaches the far northwest of South Asia, for example Afghanistan, where it remains largely dry during the SW monsoon season. Precipitation here is predominantly driven by low pressure systems, known as western disturbances, which originate in the extratropical F. Colledge: Moved to WS Atkins International Ltd, Bristol, UK.

3
North Atlantic as well as the Mediterranean and then move eastwards across the northwest of South Asia from October to June (Hunt et al. 2018).
Even though the chaotic nature of the atmosphere prevents skilful day-to-day forecasts at lead times longer than 2-3 weeks (Buizza and Leutbecher 2015), it can be possible to forecast the average conditions of a season with reasonable skill. The predictability at seasonal timescales comes from variations in more slowly evolving processes, such as changes in sea-surface temperatures (SSTs), albedo and soil moisture, which interact with large scale atmospheric processes (Charney and Shukla 1981). It has been well established that seasonal prediction of South Asia precipitation is strongly linked with tropical sea surface temperature anomalies (Goddard et al. 2001;Kucharski and Abid 2017), especially in the central and eastern tropical Pacific; a phenomenon known as the El Niño Southern Oscillation (ENSO). Through changes in the Walker circulation (Bjerknes 1969), during the SW monsoon, El Niño events (the warm phase of ENSO) tend to suppress precipitation and La Niña events (the cool phase of ENSO) tend to enhance it (Pant and Parthasarathy 1981;Rasmusson and Carpenter 1983;Ju and Slingo 1995). More recently, irregular oscillations in SSTs in the Indian Ocean, where the western part becomes alternately warmer (positive phase) or colder (negative phase) than the eastern part, known as the Indian Ocean Dipole (IOD; Saji et al. 1999), have also been linked to monsoon precipitation variability (Behera et al. 1999;Kripalani and Kumar 2004). Several studies suggest that ENSO and IOD are closely interconnected; the IOD has been thought to enhance or modulate the influence of ENSO on South Asia precipitation (e.g., Ashok et al. 2001). Whilst ENSO and IOD are understood to be the main source of predictability for the region (Johnson et al. 2017), many other drivers of variability have been proposed, for example, springtime snow depth in the Himalayas (Hahn et al. 1976), aerosols (Ramanathan et al. 2005) and SST and atmospheric patterns in the North Atlantic Ocean (Pai and Rajeevan 2006;Yadav 2009;Wang et al. 2018).
Historically, seasonal forecasts for SW monsoon precipitation have been produced using statistical methods (e.g., Walker 1924;van den Dool 2006;Rajeevan et al. 2007), but in recent years, advances in dynamical general circulation models (GCMs) have allowed them to become the predominant method for producing seasonal forecasts (e.g., Pillai et al. 2018;Scaife et al. 2019). GCMs used for seasonal forecasts are commonly referred to as seasonal dynamical models or prediction systems.
Numerous studies have assessed the ability of seasonal dynamical models to simulate and predict precipitation associated with the Asian monsoon on different timescales (Webster et al. 1998;Rajeevan et al. 2004;Kim et al. 2012;Pai et al. 2017;Ramu et al. 2017;Jain et al. 2018;Pillai et al. 2018;Mohanty et al. 2019). While improvements have been made in predicting the Asian monsoon large scale flow patterns, especially with the introduction of coupled atmosphere-ocean models, providing skilful predictions of seasonal precipitation over South Asia remains a challenge and is an active area of research. With the growing demand for climate services, a further challenge is to operationally assemble seasonal predictions with demonstrable skill from available modeling centers to objectively generate climate outlooks for the region.
Supported by the World Meteorological Organisation (WMO), many locations around the world hold a Regional Climate Outlook Forum to produce a consensus seasonal forecast for the upcoming season (Ogallo et al. 2008). In South Asia, the forum is known as the South Asian Seasonal Climate Outlook Forum (SASCOF). SASCOFs are held twice a year; in the last week of April for the SW monsoon season forecast period encompassing June to September (JJAS) (a lead time of about 1 month) and in September for the NE monsoon season referring to October to December (OND) forecast period (a lead time of less than 1 month). Additionally, a November update for the December to February season is held online. The India Meteorological Department Pune are the WMO-designated Regional Climate Centre for the region and have been coordinating the preparation and issuing of seasonal consensus forecasts for South Asia since inception in 2010. The forecast is presented as probabilities of precipitation amounts falling into tercile categories of below, near, or above normal, compared to the long-term climatological average; an example is provided in Fig. 1. The regional forecast is then refined by the National Meteorological and Hydrological Services (NMHSs) in each country to produce a seasonal forecast specific to their country and tailored to the requirements of different sector users, such as those in agriculture. Hence there is a requirement to issue the regional forecast in April, rather than May, to allow time for the NMHS to produce their national forecast before the monsoon season commences.
Following a global review, the WMO issued guidance for the development of objective seasonal forecasts at the Regional Climate Outlook Forums, stating the procedure should be traceable, reproducible, and well-documented (WMO 2020). They recommend producing a primary forecast based on a multi-model ensemble of dynamical climate models, which possess sufficient skill for the season and domain of interest. Several studies have shown that multimodel ensembles can be better predictors of observed climate than any single model over a long period (e.g., Hagedorn et al. 2005;Chakraborty and Krishnamurti 2006;Cash et al. 2019). Additionally, the WMO guidance recommends identifying and monitoring the key drivers of predictable climate variability and assessing their representation and prediction skill in models (WMO 2020).
To date, there are no studies that assess the skill of the operational models for the entire South Asia region, using the forecast period and initialisation month used within the SASCOF process, and for both the SW and NE monsoon seasons. For example, most skill assessments (e.g., Rajeevan et al. 2012;Jain et al. 2018) use the forecast with the most up to date initialisation month, for example May for the SW monsoon period. However, the SASCOF forum for the SW monsoon is held at the end of April, and therefore the forecasts initialised in May are not yet available. Holding the forum further in advance is essential to allow time for NMHSs to produce their own country-level seasonal forecast and appropriate national adaptation plans to be put in place ahead of the season. To conclude, skill assessments using the same domain, hindcast initialisation and forecast periods as those used at the SASCOF would directly inform the model selection process on which to base the South Asia seasonal outlook.
In this study, the skill of 12 dynamical seasonal prediction systems in capturing year-to-year precipitation variability in South Asia is assessed, with the aim of informing the model selection process when producing the seasonal forecast at the SASCOF, supporting an objective forecast methodology in line with WMO recommendations. Therefore, our focus is on the regional seasonal forecast for precipitation during the JJAS and OND seasons consistent with the SASCOF outlooks. Furthermore, we explore the relationship between model skill and the two major climate drivers of South Asia precipitation variability: ENSO and IOD.
The following section introduces the observational and model data used for this study and methods used in calculating the verification metrics. Section 3 presents and discusses the results of the skill assessment and explores the influence of ENSO and IOD. In Sect. 4 we suggest how these results could inform the model selection for the SASCOF outlook and explain potential reasons for the spatial variability in our results, followed by concluding remarks in Sect. 5.

Observations
Observations have been taken from the Climate Hazards Group InfraRed Precipitation with Station dataset (CHIRPS) (Funk et al. 2015). The CHIRPS dataset is based on precipitation estimates derived from high-resolution satellite imagery, blended with station rain-gauge data to create a near real-time gridded daily precipitation time series from 1981 to present, downloaded as monthly aggregates. The dataset covers 50°S to 50°N for land points only and has been downloaded from the IRI Climate Data Library (https:// iridl. ldeo. colum bia. edu/) at a resolution of 1.0° × 1.0° to align with model resolutions (it is also available at 0.05° × 0.05°). CHIRPS v2.0 was chosen as it covers the region and period of interest, is based on both satellite and station data, and is commonly used in the SASCOF forecast production process. Gridded observational products, such as CHIRPS, provide estimates of precipitation, and therefore, like models, are not completely reliable. For comparison, verification results were also obtained using the gridded CRU TS dataset, which are based on station observations (Harris et al. 2020). The results exposed some differences, which we discuss in Sect. 4.2.4, and consequently observational uncertainty should be considered when analysing the results. For example, precipitation totals can often be difficult for observations to capture so care should be taken when analysing areas with low precipitation amounts, as showers or more sporadic precipitation can be more easily missed by a rain gauge than an area of widespread dynamic precipitation. Other factors can also contribute to observational uncertainty, such as, interpolation schemes, temporal and spatial resolution of the dataset and poor rain gauge maintenance and land coverage, particularly in regions with large topographic variations.

Seasonal prediction systems
The observational data has been compared with corresponding forecasts from 12 seasonal dynamical prediction systems, hereafter referred to as "models"; details about each model can be found in Table 1. Models were chosen  (SASCOF-19, 2021) based on them being ocean-atmosphere coupled models, the availability of the data at the time of analysis and previous use in the region. All data has been downloaded from the IRI Climate Data Library (https:// iridl. ldeo. colum bia. edu/) and regridded to the CHIRPS dataset at a resolution of 1.0° × 1.0° to ensure a consistent grid. A common hindcast period of 1993-2016 has been used from all models. Since the purpose of this assessment is to inform the SASCOF process, the target period and initialisation months have been chosen to align with those used in SASCOF. For the SW monsoon, hindcasts are for the forecast period JJAS and initialised in April (a 2-month lead time). For the NE monsoon, hindcasts are for the forecast period OND and initialised in September (a 1-month lead time). An ensemble mean has been taken from all available members from each model; note that, as shown in Table 1, the number of ensemble members varies greatly between models.
Furthermore, a multi-model ensemble (MME) has been produced. To ensure the ensemble is equally weighted for the ROC score calculations (which groups each ensemble member into a tercile category), we replicate the model ensembles until they reach 42 (the maximum ensemble size from our models in Table 1). If the number of ensemble members does not multiply into 42 exactly, the additional ensemble members are chosen at random. For example, for the CanCM4i model with 10 ensemble members, we replicate the members as they are 4 times, then randomly pick another 2 members to make 42 total members. Note that once those members have been chosen at random, the ensemble is fixed so that the members are consistent for each skill plot (including between seasons).

Verification metrics
To display the spatial variations of deterministic skill, the Pearson correlation coefficient is calculated at each grid-point by comparing the total seasonal precipitation in the model hindcast ensemble mean with observations at each time step. All references to 'correlation' and/or 'r' throughout this document use the Pearson correlation. Pearson correlation has been chosen for its common usage and simplicity. For robustness, results have been compared to correlation maps using Kendall's tau (not shown), which relies on ranking and is less susceptible to extremes as it weights each year equally. The sets of results are spatially similar and therefore Pearson correlation is deemed acceptable for this purpose. Note that, we also calculate the correlation of the average precipitation over the specified domain, which has been calculated by first averaging both the observed and model precipitation over the specified area, and second calculating the correlation between these two values.
As discussed in Sect. 1, the SASCOF forecast presents the probability of the seasonal precipitation totals falling into one of three tercile categories, and thus, an assessment of how well models predict the correct tercile category probabilistically is vital for the SASCOF forecasts. This is achieved by using the relative operating characteristic (ROC; Mason 1982;Marzban 2004). In this study we calculate the ROC score which quantifies the probabilistic skill in terms of the occurrence or non-occurrence of a specific forecast "event". In this study, events are classified according to the climatological tercile category into which the seasonal precipitation total falls. We only present the above-and below-normal events, as seasonal prediction systems typically perform poorly in the near-normal tercile category due to a lack of signal in large-scale climate drivers. ROC curves and reliability diagrams were also computed for each tercile category; these results are not shown here but can be found in Stacey et al. (2021).

Study area and country domains
This analysis covers the South Asia region, encompassing Afghanistan, Bangladesh, Bhutan, India, Myanmar, Nepal, Pakistan, and Sri Lanka. An outline of the study area is shown in Fig. 2a. Due to limited observational data, the Maldives has not been included in this study.
To assess the variation in skill spatially, more in-depth analyses have been provided for Afghanistan, Bangladesh, Nepal, and Pakistan. These countries have been chosen specifically as are the four focal countries of the project funding this study. In consultation with their National Meteorological and Hydrological Services, rectangular boxes surrounding the countries have been used when calculating our area averages, rather than using the detailed borders, as presented in Fig. 2b. We take a larger domain box as an outline, firstly, to capture the large-scale climate features surrounding each of the countries. Secondly, precipitation falling in areas around the selected countries could also of interest. For example, precipitation may run into rivers within a country from the surrounding mountains (Fig. 2a); this is particularly important for Bangladesh. Also note that Pakistan has been divided into north and south domains due to its diverse climate, as requested by the Pakistan Meteorological department.

Teleconnection indices
The relationship between South Asian precipitation and its strongest climate drivers, ENSO and IOD, are investigated using observed and model SST data. Firstly, observational data for the Oceanic Niño Index (ONI) has been used as a proxy for ENSO and taken from the NOAA Climate Prediction Center (https:// origin. cpc. ncep. noaa. gov). The ONI is the rolling 3-month average temperature anomaly of the surface waters of the Niño 3.4 region in east-central tropical Pacific (5°N-5°S, 120°-170°W). A positive index over 5-month typically signifies an El Niño event, and a La Niña event is represented by a similar negative index. The Indian Ocean Dipole Mode Index (IODMI) represents IOD phase and captures SST anomalies between the west (50°E-70°E, 10°S-10°N) and southeast (90°E-110°E, 10°S-equator) of the tropical Indian Ocean. Observational data has been sourced from the NOAA Physical Science Laboratory (https:// psl. noaa. gov). Second, SST data for each of the 12 models has been obtained using the IRI Climate Data Library (https:// iridl. ldeo. colum bia. edu/) over the Niño 3.4 region, and spatially averaged with a resolution of 1.0° × 1.0°. The Pearson correlation has been used to assess the strength of the teleconnection between mean SST and precipitation for both observations and the 12 models. Each of the indices has been averaged for the concurrent period used for precipitation, i.e., no time lag has been used.

Precipitation observations
The SW monsoon from June to September is the wettest season for most of the region (Fig. 3a), with the highest precipitation totals occurring in Bangladesh, Nepal, the western Ghats of India, and the Arakan Mountains of Myanmar. In contrast, the monsoon does not affect the far northwest of the region, namely Afghanistan and southern Pakistan, where it is one of their driest seasons. The NE monsoon season from October to December is much drier for most of South Asia (Fig. 3b), with rainfall predominantly affecting the far south and southeast. Afghanistan and Pakistan receive precipitation with the passage of western disturbances, although this is still minimal compared to the NE monsoon rains in the southeast.
The coefficient of variance (CV) for year-to-year precipitation ( Fig. S1) is useful for identifying the extent of the variability in relation to the total precipitation. Typically, areas where CV are high (low) tend to coincide with areas of low (high) precipitation totals. For the SW monsoon season, CV is low (below 0.5) for much of South Asia but high (above 0.5) for southern Pakistan and northeastern India. Whereas for the NE monsoon season, CV is above 0.5 for most of the region, which is unsurprising given the lower precipitation totals, and it is especially high (above 0.8) in mid-west India.

Model skill
The different models vary greatly in terms of their mean biases (i.e., differences from the observed mean; Fig. S2).
In the SW monsoon season, most of the models have a dry bias, especially in the areas that receive high precipitation. For example, most of the models exhibit a dry bias over the western Ghats of India, the Arakan mountains, and the Indo-Gangetic plains. In the NE monsoon season, nearly all models have a slight wet bias over most of the region. However, in areas such as the far south and southeast of the region and Myanmar, most models exhibit a dry bias. During both seasons the MME seems to exhibit lower biases than most models.
The correlations for the SW and NE monsoon seasons between precipitation observations and the 12 models demonstrates considerable spatial and seasonal variability (Fig. 4). In general, areas that receive more precipitation correspond to higher and significant correlation (> 0.4) compared to areas with less precipitation. For instance, the SW monsoon precipitation is well captured by almost all models, as well as the MME, for large swathes of India, especially in central and northern areas, and for much of Nepal, especially in the west, with correlations of 0.4 to 0.8. Areas which receive greater rainfall during the NE monsoon season, such as Afghanistan and north Pakistan, also display positive and significant correlations (> 0.4) in many of the models and the MME, although precipitation totals are still relatively low and thus observational uncertainty could be a factor in these regions during both seasons. In the SW monsoon season, the far northwest of India exhibits especially low correlations, and even significantly negative (< − 0.4) in several models, including CFS2_NCEP and ECMWF-S5. This area of poor skill coincides with an area of high yearto-year CV in the observations (Fig. S1), and furthermore the corresponding CV simulated by the models (not shown) is much lower than seen in the observations (Fig. S1). In Fig. 3 Mean precipitation in mm/day from the CHIRPS dataset for period 1993 to 2016 for a JJAS and b OND contrast, correlations for the NE monsoon season results for the northwest India are mostly positive and interestingly, some of the same models that performed poorly in that area for the SW monsoon (e.g., ECMWF-S5) show significant positive correlations in the NE monsoon, highlighting the lack of consistency between seasons. Southern India and Sri Lanka receive significant rainfall during the NE monsoon season and results show mainly positive and significant correlations, particularly in GEM-NEMO and ECMWF-S5, but in the SW monsoon season, when comparatively less rain falls, correlations are lower and not significant in all models. One area that receives high precipitation amounts, especially in the SW monsoon season, but where models represent precipitation variability poorly, is in the east of the region encompassing Myanmar, Bhutan, Bangladesh and northeast India. In the SW monsoon season, Bangladesh, Bhutan and northeast India have very low correlations (< 0.4), although some moderate correlation (0.4-0.6) is shown over Myanmar. In the NE monsoon season, correlations are more mixed but generally higher in this area. For example, Bangladesh ranges from positive and significant correlation exceeding 0.4 in models such as CFS2 and ECMWF-S5, to negative and significant correlation (< − 0.4) in CanCM4. An area that has noticeably high skill in the models is the Gujarat region in far west Indi,a despite low precipitation amounts, which interestingly coincides with an area of high year-toyear CV (Fig. S1). In both seasons, the MME is a helpful indication of the areas where models generally perform well and not so well, and it looks to perform relatively well when compared to most models individually.
The spatial patterns of ROC scores show how well the models can predict the tercile category of the observed precipitation over a range of forecast probabilities, which is relevant for informing the SASCOF tercile forecast. Here, for brevity, we present ROC scores for the MME only; individual model ROC scores can be found in Figs. S3 and S4.
The ROC scores for the MME exhibit similar, albeit noisier, patterns to the spatial patterns of the correlations in Fig. 4, and display considerable variability between models and between seasons (Figs. 5, 6). In general, the ROC scores for the SW monsoon are higher (Figs. 5a, 6a) than ROC scores for the NE monsoon in both tercile categories. For the SW monsoon season, Nepal, central/ northern India and northern Afghanistan have good skill (ROC scores of 0.6-1.0), particularly in the lower tercile. Interestingly, ROC scores appear higher for the lower tercile in the SW monsoon season and the upper tercile in the NE monsoon season. During the NE monsoon (Fig. 5b,  6b), the MME has particularly good skill at predicting the Fig. 4 Pearson correlation between precipitation in observations and the 12 models plus the MME in a JJAS and b OND from 1993 to 2016, stippling marks statistical significance at the 5% level (correlations stronger than ± 0.4) correct tercile across Afghanistan. Furthermore, for the upper tercile, ROC scores are high (0.6-0.9) in much of Sri Lanka, the east coast of India, Bangladesh and coastal Myanmar; whereas, ROC scores are generally lower for much of these areas for the lower tercile. Although note that there is considerable variability between the models in this area for both terciles (Figs. S3b, S4b).

Spatially averaged correlation for region and country domains
Correlations have been averaged over the South Asia domain (the area outlined in Fig. 2a) as well as for the country-specific domains (outlined in Fig. 2b), to further explore model and spatial variability in skill. These results are also useful to assess the strength of the operational models at the national  level for climate services on the seasonal timescale. Note that only correlations stronger than ± 0.4 (to the outside of the orange dashed lines in Fig. 7) are significantly different to zero at the 5% level. The correlations averaged over the entire South Asia region for the SW monsoon season ( Fig. 7a; top-left panel) are significant and positive for 8 of the 12 models, with CMCC, GEM-NEMO, ECMWF-S5 and the MME having values over 0.5. However, this general picture masks a high degree of variability in the correlations for the individual country domain, in terms of the model rankings as well as the correlation magnitudes. All correlations for Afghanistan and both Pakistan domains fail to reach the 0.4 significance line, although some models such as CFS2, CMCC and GFDL-SPEAR come close. As seen in the skill maps, average correlations around Nepal are particularly high, with most models possessing significant skill, except for COLA-4 and CFS2. In contrast, correlations for the Bangladesh domain are especially low and even negative for about half of the models. The MME demonstrates typically above average skill scores relative to the individual models, performing especially well over the Nepal domain and South Asia regional domain.
Correlations averaged over the South Asia region are generally lower for the NE monsoon season ( Fig. 7b; top-left panel), with 5 models having significant skill and Meteo-France-7, COLA-4 and ECMWF-S5 having the highest average correlations. In contrast to the SW monsoon season, Afghanistan and Pakistan North have higher and, in some models, significant, correlations, with Meteo-France-7 exhibiting the highest correlation for both. While the Pakistan North domain correlations are higher for this season, the Pakistan South domain correlations are comparatively much lower and not significant. Similarly, the correlations for Nepal are especially low, although this is a very dry season. Correlations are also much higher in this season for Bangladesh, although only CFS2 demonstrates significant positive correlation. The MME has significant skill for the South Asia regional domain and performs particularly well over North Pakistan and Afghanistan domains.

Teleconnections
We explore the link between observed and model precipitation, and the two main climate drivers in the region: ENSO and IOD. Unless otherwise stated, we take the ONI and IODMI indices to be synonymous with their associated climate drivers, ENSO and IOD respectively (see Sect. 2.5). For brevity, only precipitation links with ENSO and IOD are covered in this section, but it is important to note that other teleconnections could also affect model skill; see Stacey et al. (2019) for descriptions of other climate drivers of South Asia precipitation. Fig. 7 The 12 seasonal prediction systems and MME ranked by their correlation with observations in the South Asia and country specific domains from 1993 to 2016 for a JJAS and b OND, correlations stronger than ± 0.4 are significantly different from zero

El Niño Southern Oscillation (ENSO)
The observed ENSO-precipitation relationship is investigated by calculating the correlation between observed precipitation and the observed ONI index (our proxy for ENSO) from 1993 to 2016 for the SW and NE monsoon season. During the SW monsoon (Fig. 8a), there is a largely negative correlation between ENSO and observed precipitation for much of South Asia, indicating El Niño events correspond with anomalously low precipitation, and La Niña events with anomalously high precipitation. The negative correlation is significant in parts of Nepal as well as central and southern India. Elsewhere, the correlations are much weaker. During the NE monsoon season, the relationship between ENSO and observed precipitation ranges from significant positive correlations over Afghanistan, north Pakistan, and Sri Lanka to significant negative correlations over western India, Bangladesh, and western Myanmar (Fig. 8b). Lower, less significant correlation values extend across most of Nepal and India, although precipitation totals are low during this season. The spatial patterns between the ENSO-precipitation observed relationship are broadly similar to spatial patterns in model skill (Figs. 4, 5, 6), most likely because a stronger ENSO teleconnection means there is higher predictability in that location.
The correlation between the simulated mean SST averaged over the Niño 3.4 region and precipitation in the 12 models ( Fig. 9) exhibits broadly similar patterns to those found in the observations in Fig. 8, although the models appear to exaggerate the ENSO teleconnection. For the SW monsoon season, all the models simulate the widespread negative and significant correlation across most of the region, but with pockets of positive correlation in the northeast areas, although the details of the spatial patterns vary greatly between the models. Notably, MeteoFrance-7 has an area of positive correlation across central India, and incidentally has low skill scores in this area. For the NE monsoon, again the teleconnections look much more exaggerated for both the positive and negative correlations, with most of the models adequately resembling the pattern shown in the observations. Interestingly, the models which capture the positive relationship over Sri Lanka tend to also have more skill in this area, for example, ECMWF-S5 and GEM-NEMO. Similarly, models with higher skill over northern Afghanistan and Pakistan, such as GloSea-6 and MeteoFrance-7, also appear to pick up on the strong, positive ENSO teleconnection.
To investigate the relationship between the strength of the ENSO teleconnection and model skill, the correlation between the spatially-averaged model and observed precipitation (a proxy for model skill) against the correlation between model SSTs in the Niño 3.4 region and model precipitation (a proxy for the ENSO teleconnection) is plotted in Fig. 10. Note that the uncertainty in these correlations is large because of the small sample size (24 years, 1993-2016).
During the SW monsoon season, the exaggerated ENSO teleconnection in the models apparent in Fig. 9 is also demonstrated in the scatter plot for the South Asia domain ( Fig. 10a; top-left panel). The observation line suggests that in the "real world" (albeit limited by the precision of the CHIRPS observations dataset) the ENSO-precipitation relationship is much weaker than many of the skilful models suggest. Therefore, even though the models appear oversensitive   to the influence of ENSO, this oversensitivity also appears to improve their skill in forecasting precipitation. This is likely because if the variability is largely driven by ENSO, then models that capture the ENSO response are likely to be more skilful than those that do not, even if they overestimate it. For the country-specific domains during the SW monsoon season (Fig. 10a), the ENSO teleconnection appears to have little influence on model skill. Although, the observation line for Nepal suggests a strong correlation between observed precipitation and ENSO, and the most skilful models also have a strong ENSO teleconnection. Thus, these results, alongside those showing the observed ENSO-precipitation correlation in Fig. 8a, suggest that ENSO is a major driver of precipitation variability in Nepal during JJAS. The correlations in the Afghanistan and Pakistan North domains are weaker, with the observation line near to zero for both domains, suggesting ENSO has negligible influence in this area. Our previous analyses have shown that both precipitation amount and skill are low here, especially in Afghanistan. However, interestingly CFS2 seems to perform well in the Afghanistan and Pakistan North domains and is the only model which has a highly positive ENSO-precipitation correlation. For the Bangladesh and Pakistan South domains there is no significant relationship between model skill and the ENSO teleconnection in the SW monsoon season, suggesting ENSO has less influence in these areas and thus there is less predictability for models to exploit.
The lack of ENSO teleconnection for the South Asia region during the NE monsoon (Fig. 10b, top left panel) is likely an averaging artefact of the mixed positive and negative correlations, as seen in the observations (Fig. 8b). The scatter plots for the Afghanistan and Pakistan North domains show a positive correlation between ENSO and precipitation, also shown by the observations. Our results suggest that all of the models appear to pick up on this teleconnection in these two domains, with the most skilful models typically simulating a slightly higher ENSO-precipitation correlation than the others. The results for the Pakistan South domain imply a positive relationship between model performance and the ENSO teleconnection. However, note that some models suggest a negative ENSO-precipitation correlation, whilst the models with most skill suggest a positive correlation; closer to the observations line, which is weakly positive. For the Bangladesh domain, the ENSO teleconnection looks to have more of an influence in the OND season than JJAS, with the most skilful models appearing to capture the negative ENSO-precipitation correlation as seen in the observations. Negligible ENSO influence is detected by the results for the Nepal domain, which is unsurprising given the low precipitation amounts during OND.

Indian ocean dipole (IOD)
There is a much weaker relationship between observed South Asia precipitation and the IOD compared with ENSO for the SW monsoon season, indicated by the low correlations (between − 0.4 and + 0.4) for much of the area (Fig. 11a). A positive and significant correlation between the IOD and observed precipitation is apparent for south Pakistan and part of east India, although model skill is low in these areas (correlations below 0.4) suggesting the models are not capturing the IOD teleconnection. For the NE monsoon season, there appear to be more areas with a significant relationship (Fig. 11b); although interestingly, the pattern is remarkably similar to the relationship with ENSO shown in Fig. 8b for the same season; this is discussed further in Sect. 4.2.1.

Fig. 11
Correlation between observed IOD index and precipitation over 1993-2016 for a JJAS and b OND, stippling marks statistical significance at the 5% level Similarly, the model simulations of the relationship between observed IOD and model precipitation (not shown) depict much weaker correlation values than those for ENSO (Fig. 9). For the SW monsoon season the picture is for a widespread, weak negative relationship. While most of the models appear to pick up on the positive correlation in the east of the region, the location and shape of this area varies greatly between models, plus all models fail to simulate the area of significant positive correlation over the south of Pakistan. For the NE monsoon season, again, patterns look very similar to those for ENSO but with slightly weaker correlations.
Scatterplots like those in Fig. 10 but comparing the observed IOD and model precipitation relationship with model skill for the SW and NE monsoon season (not shown) are unsurprising given the results so far. For the SW monsoon season, the relationship between model precipitation and the observed IOD is generally weak for all domains and does not appear to be linked to model skill. Whereas for the NE monsoon season, the results look similar to those for the ENSO relationship in Fig. 10b.

Which models should be incorporated into the SASCOF seasonal outlook?
The variation in model skill supports the WMO's case for using a multi-model ensemble as a basis for the SASCOF regional outlook (WMO 2020). We do not suggest dismissing any of the 12 models in the study since no single model performs consistently poorly in all areas. For example, one method for model selection could be to discount the models with lowest correlation when regionally averaged. However, these models still perform well in certain locations, e.g., skill for CanCM4 in the NE monsoon is low (correlation < 0.2) when regionally averaged ( Fig. 7b; top-left panel), but positive and significant (correlation > 0.4) for the Afghanistan domain ( Fig. 7b; top-right panel). The results also show stark differences in model skill from one season to the other; for instance, when averaged over the entire region, MeteoFrance-7 has the lowest correlation in the SW monsoon season (Fig. 7a) but highest in the NE monsoon season (Fig. 7b). At the country-level, there are clearly models that exhibit substantially more skill than others, and therefore, poorly performing models could be disregarded at the national level, such as DWD and CanCM4I over Bangladesh. One way to incorporate this into SASCOF, would be for each National Meteorological and Hydrological Service to produce their own national forecast based on the most skilful models for their country and use this to make appropriate recommendations for the regional outlook at the SASCOF.

What is driving the spatial variability in model skill?
The complex combination of influencing factors at seasonal and sub-seasonal scales that determine the spatial and temporal variability of monsoon precipitation are not always clear. In this section, we discuss some potential factors driving the variation in the results.

The ENSO and IOD teleconnections
For both the SW and NE monsoon seasons, we find similarities between the spatial patterns in model skill (Figs. 4,5,6) and the observed ENSO teleconnection (Fig. 8). When averaged over the entire region for the SW monsoon season ( Fig. 10a; top-left panel), we find, in general, that the skilful models simulate an even stronger teleconnection than in the observations. The strength of the ENSO and SW monsoon precipitation relationship has long been debated; Kumar et al. (1999) suggest the relationship has weakened since the 1980s, although other studies (e.g., Pai 2004;Delsole and Shukla 2012) attribute the apparent breakdown to other factors, such as sampling variability. The results in this study concur that ENSO is a major driver of seasonal precipitation variability when assessing the entire South Asia region from 1993 to 2016. For example, Nepal is an area where high model skill coincides with a strong ENSO teleconnection, particularly in the west (Fig. 8a), in agreement with other studies (e.g., Bohlinger and Sorteberg 2018;Sharma et al. 2020). The high model skill is impressive given the highly varied topography and climate of Nepal and is potentially highlighting the predictive power of the models when a strong ENSO teleconnection is present. The difference in the ROC scores between the lower (Fig. 5) and upper (Fig. 6) tercile categories suggest that in some places the MME has more skill at predicting one tercile over another. For example, around central India in the SW monsoon, higher ROC scores are seen for the lower tercile category than the upper tercile category, which could suggest that the models are typically more skilful at predicting the teleconnection during the warm phase of ENSO, i.e., an El Niño event, which is typically associated with drier years, as shown by the observations in Fig. 8a.
In our results, the IOD teleconnection appears less pronounced than ENSO for the SW monsoon season (Fig. 11a). However, a positive and significant IOD-precipitation relationship is observed in parts of south Pakistan and east India. The teleconnection in these areas is supported by studies (e.g., Ashok et al. 2001) that detect a correlation between positive (negative) IOD events and increased (decreased) SW monsoon precipitation totals over the monsoon trough region which on average runs from Pakistan to Myanmar. Models typically have low skill here and thus the IOD teleconnection could provide more predictability in this area; however, the monsoon trough region is also associated with considerable intraseasonal variability, as discussed in Sect. 4.2.2.
Interestingly, during the NE monsoon season, a strong ENSO and IOD teleconnection is observed across much of Afghanistan and north Pakistan (Fig. 8b), despite low precipitation totals during OND. The results suggest a positive and significant correlation with ENSO and IOD, with some models demonstrating significant correlation (> 0.4), such as MeteoFrance-7, suggesting a detectable source of predictability in this region. There have been very few studies on the influence of ENSO and IOD during the OND season; however, Kar and Rana (2014) find that ENSO is the dominant mode of variability from December to February. Similarly, our results suggest a strong ENSO and IOD teleconnection surrounding the Bay of Bengal coast and northwest India (Figs. 8b, 10b). These are both areas influenced by tropical cyclone activity during the NE monsoon season, which has also been found to negatively correlate with ENSO (Felton et al. 2013). Replicating this teleconnection appears to be important for model skill, as the models that perform well in these areas (Figs. 4b,5b,6b) are typically those that appear to capture this teleconnection (Fig. 9b). For example, in the areas surrounding the coastlines of Bangladesh and Myanmar, GloSea-6 possesses significant correlation (> 0.4;Figs. 4b,5b, 6b) alongside a significant and negative ENSO-precipitation correlation (Fig. 9b), albeit more widespread and exaggerated than apparent in the observations (Fig. 8b). In contrast, CanCM4I has poor skill in this area (correlations < 0; Figs. 4b, 5b, 6b) and simulates a weakly positive ENSO-precipitation correlation conflicting with the observed relationship.

Intraseasonal variability
Some of the areas of low model skill also coincide with areas of high intraseasonal variability since seasonal models still have limited ability to capture these short-range oscillations a season ahead (Lee et al. 2015). The South Asian monsoon exhibits substantial variability on the sub-seasonal timescale with active (enhanced) and break (suppressed) phases in precipitation, related to westward (10-20-day period) and northward (25-60-day period) propagation of convection and circulation anomalies (Yasunari 1979(Yasunari , 1981Krishnamurti and Ardanuy 1980;Annamalai and Slingo 2001). Variability around the monsoon trough region is particularly complicated, due to the north-south swing of the trough about its normal location and low-pressure systems from the Bay of Bengal moving along the trough axis. Some northeastern areas, such as Bangladesh, northeast India, and Myanmar, also lie in the monsoon trough region, and are therefore subject to considerable intraseasonal variability (Fujinami et al. 2011) which, alongside a lack of obvious teleconnections, is likely to play a factor in the low model skill there.

Geographical features
Models cannot accurately represent local geographical features, such as the diverse topography seen in South Asia as shown in Fig. 2a, in fine detail. Seasonal models have limited horizontal and vertical resolution on which to capture these local effects accurately (in this study, models are regridded to a 1.0° × 1.0° horizontal grid). For example, precipitation is typically enhanced on the windward side and suppressed on the leeward side of high ground, evident in the relatively higher precipitation totals along the Western Ghats of India and the Arakan Mountains in Myanmar in the SW monsoon season (Fig. 3a). In our results, the places where models possess higher skill are generally those with less complex terrain, for example over the more homogeneous Indo-Gangetic Plains and the Deccan Plateau during the SW monsoon. The terrain in the northwest of India consists of relatively lowlying terrain to the south rising to the Himalayan-mountain range in the north, which exhibits high year-to-year variability (Fig. S1) and may explain the consistent area of poor skill here for the SW monsoon (Figs. 4a, 5a, 6a). In contrast, models exhibit good skill in the SW monsoon season over west Nepal, which similarly has an extreme range in elevation; however, compared with northwest India, predictability may be enhanced by the stronger and more homogeneous prevailing winds from the flatter Indo-Gangetic Plain region and the notable ENSO teleconnection. For improved model skill, the sharp mountain ranges and vast coastlines of South Asia should be well represented to capture the complicated precipitation effects and further analysis could assess if skill improvements come from increasing model resolution.

Observational uncertainties
All results have been replicated using the CRU TS dataset (not shown) to compare with those using CHIRPS, and while the spatial patterns in model skill are similar to those with CHIRPS, skill is generally lower in the results with the CRU TS dataset. The ordering of the models in the regionally averaged bar plots (not shown) are also slightly different. This observational uncertainty within our results highlights the need to check skill results against additional observation datasets before making any decisions to disregard models from the multi-model ensemble.

Similarities between the ENSO and IOD teleconnection patterns for NE monsoon season
The remarkable likeness between the influence of ENSO and IOD on precipitation for the NE monsoon season (Figs. 8b,  10b), suggests that ENSO and IOD events can co-occur and/ or a positive (negative) IOD event has a similar influence on precipitation to an El Niño (La Niña) event. Although some studies have argued the IOD is an independent mode of interannual variability (Saji et al. 1999), other studies have found an observable influence from ENSO on the IOD (e.g., Allan et al. 2001;Krishnamurthy and Kirtman 2003;Yang et al. 2015). Furthermore, the basin wide warming or cooling in the Indian Ocean becomes more dominant during the boreal winter, which has been seen to have an ENSO correlation (Behera et al. 2006) and therefore this could amplify the similarities in the correlation patterns. While the combined influence of ENSO and IOD on South Asia monsoon precipitation has been well studied for the SW monsoon period, particularly over India (e.g., Ashok et al. 2001;Cherchi and Navarra 2013) we were unable to find similar studies for the NE monsoon period. Improving understanding of how the two main climate drivers affect NE monsoon precipitation could prove vital for improving model skill, especially for the east and southeast parts of the region where precipitation totals are highest (Fig. 3b).

Conclusions
In this study, the ability of 12 seasonal prediction systems to capture precipitation variability in South Asia has been assessed for both the southwest (June to September) and northeast (October to December) monsoon seasons, with further analysis into how spatial variability in model skill is influenced by the strength of the ENSO and IOD teleconnections.
There is a substantial range in model skill spatially for both seasons. In general, model skill is higher where a strong ENSO teleconnection exists, for example, in central and northern India and Nepal during the SW monsoon season, and in Afghanistan, northern Pakistan and southeast India in the NE monsoon season. When averaged over the region for the SW monsoon season, the models that simulate the strongest ENSO-precipitation relationship appear to exhibit more skill. In contrast, in places where the connection with ENSO is weaker, such as Bangladesh and northwest India during the SW monsoon, most models exhibit much lower skill. The IOD teleconnection is less pronounced than that for ENSO in the SW monsoon season, but stronger in places for the NE monsoon, when the spatial pattern looks very similar to the ENSO teleconnection. Most places with low skill coincide with low precipitation amounts, for example, Nepal and central India in the NE monsoon season. The places where skill is low but precipitation is high, for example, in the northeast of the region during the SW monsoon season, are areas where model improvements could be especially beneficial. However, the explanations for low skill are varied and complicated and may be partly explained by a combination of lack of identifiable climate drivers, high intraseasonal variability, observational uncertainty and/or the complex geographical features in the region. Our results highlight the importance of models accurately simulating the ENSO and IOD teleconnections to enhance their skill as well as improving our understanding of how the models behave in different locations when selecting models for the SASCOF forecast.
Assessing the spatially averaged skill over regional and national domains demonstrates the high variability in model skill depending on the chosen domain and season. When spatially averaged over the South Asia region, all models have positive skill in replicating precipitation totals in the SW and NE monsoon seasons, with 8 of these models having significant correlation for the SW monsoon season, and 5 for the NE monsoon. Since there are no standout high or low skilled models for all areas, we recommend including all 12 models in the multi-model ensemble as a basis for the SASCOF seasonal outlook. However, poorly performing models could be dismissed when considering forecasts at the national level. Further skill assessments comparing multi-model ensembles with differing model combinations could support this process.
The ability for models to accurately simulate large-scale climate drivers and their complex relationship with South Asia precipitation will be a continued challenge for many years to come. With further research, more widespread and reliable observations, and model developments from improved resolutions, parameterisations and data assimilation techniques, forecast skill will continue to improve into the future. While enhancing our understanding of model skill is vital, future work should also focus on producing subseasonal and seasonal forecasts which are user-relevant to increase their uptake and usability. By working closely with sector users to improve understanding on how forecasts are interpreted and applied in practice, seasonal forecasts can be co-developed to have the greatest societal impact for the South Asia region.
PB. The first draft of the manuscript was written by JS and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.