1 Introduction

The precipitation over China exhibits prominent variability on both interannual and decadal time scales (Kosaka et al. 2011; Li et al. 2004). The rainy season starts in southern China and the Yangtze River valley in May–June, and then, it moves northward into northern China in July–August, exhibiting large meridional migration characteristics. Summer is the primary rainy season in China, and the anomalous behaviors of summertime rainfall often cause devastating flooding, leading to heavy casualties, property damage, ecological destruction, and serious negative impacts on society. The middle and lower reaches of the Yangtze River, which is one of the most densely populated and main agricultural and industrial areas in China, has suffered from frequent flooding since the 1980s, especially in the summers of 1998, 2016, and 2020 (Zong and Chen 2000; Sun et al. 2016; Li et al. 2017; Zheng and Wang 2021). While most previous studies have focused on the summer precipitation anomalies, relatively little attention has been paid to the spring precipitation. Actually, the spring precipitation also contributes a significant amount to the rainfall in China (He et al. 2007; Pan et al. 2013). According to records, it accounts for as much as 30% of the annual total (Feng and Li 2011), of which more than one half occurs in late spring (May). The amount of precipitation in this first rainy period in China is crucial to the early planting season in southeastern China. Moreover, as a transitional period linking the preceding winter with the following summer, the late spring can be treated as an early onset indicator of the East Asian subtropical monsoon (Wang et al. 2004). Thus, a better understanding of the precipitation variations in late spring and its predictability will significantly benefit society and the economy, which inspired us to conduct this study.

Over the past few decades, significant progress has been made in understanding the mechanisms contributing to the variations in precipitation over China, and it has been well documented that both tropical and extra-tropical factors can influence the precipitation. In the tropics, it has been long since demonstrated that pronounced precipitation anomalies tend to take place over the Yangtze River during the summer following El Niño events (Huang and Wu 1989; Lin and Lu 2009); and the Indian Ocean warming, especially during the summer when El Niño has dissipated, is also a major cause of the summer precipitation in China (Xie et al. 2009; He and Wu 2014). The western North Pacific anomalous anticyclone (WNPAC) is the essential atmospheric bridge conveying these tropical signals to the East Asian monsoon system, thus affecting the precipitation in China (Wang and Zhang 2002; Wu et al. 2003; Xie et al. 2009; Zhang et al. 2017). In the extratropics, the North Atlantic sea surface temperature anomalies (SSTA) can influence the climate over China through wave train-like atmospheric circulation anomalies across the Eurasian continent (Watanabe 2004; Wu et al. 2009). In addition, the anomalous snow cover over the Eurasian-Tibetan Plateau has been found to play an important role in modulating and predicting the summer precipitation in eastern China (Yang and Xu 1994; Wu and Qian 2003; Wu and Kirtman 2007; Zuo et al. 2012). Although sufficient research is lacking, studies on the spring rainfall in China have shown that the significant characteristics are modulated by remote oceanic SST forcing. The first two modes of the April–May rainfall over southern China were found to be related to the El Niño–Southern Oscillation (ENSO) and a north Atlantic sea surface temperature anomaly dipole, which provides an effective way of using these two predictors to improve the forecasting of the spring precipitation over southern China using a regression model (You and Jia 2018). Wang et al. (2000) found that the winter El Niño-related SSTA generally favors the development of the WNPAC, which persists until the following spring or early summer, causing anomalously wet conditions in southern China. Feng and Li (2011) further revealed that El Niño Modoki events and traditional eastern El Niño events have different influences on the spring precipitation over southern China. Based on the above results, while the fact that the El Niño teleconnection causes warming of the tropical Indian Ocean and results in an abnormally large summer rainfall in the MLYZR is conclusive, whether the mechanisms are the same in late spring and the performances of the climate models are still unclear.

Substantial effort has been devoted to investigating the seasonal forecasting abilities of numerical models and the predictability of precipitation over China. However, there are significant regional differences in the predictability sources of the climate models. Generally, the most predictable region is in the tropics, the precipitation in the East Asian monsoon region exhibits a relative low predictability (Ding 2011) and large model dependences (Wang et al. 2009). The seasonal dependences and spatial patterns of the precipitation forecasting skills can be largely explained by ENSO and its related teleconnection (Kumar and Hoerling 2003). Particularly in spring, a season when climate models generally exhibit a spring predictability barrier associated with ENSO (Torrence and Webster 1998), outstanding questions remain concerning the predictability in the late spring, the seasonal prediction of the anomalous robust rainfall over southeastern China, and its controlling mechanisms. Thus, this also raises interest in exploring the predictability sources of the abnormally large rainfall over southeastern China in late spring in climate models. This study will help to improve the late spring precipitation predictions.

In this study, first we further examined the ability of the state-of-the-art climate models to forecast the late spring rainfall variability in China. Moreover, the impact of the SST anomalies on the rainfall forecasting skills was evaluated. Finally, the key processes in the seasonal climate models that influenced the performance of the heavy rainfall prediction over southeastern China were studied. This paper is organized as follows: four of the operational seasonal forecasting systems in the North American Multi-Model Ensemble (NMME; Kirtman et al. 2014) seasonal prediction experiment, the observations, the reanalysis data, and the methods are described in Sect. 2. An assessment of their abilities to forecast the late spring rainfall is presented in Sect. 3, and the major sources of the forecasting errors of the seasonal forecasting systems in late spring are discussed in Sect. 4. Section 5 presents the summary and discussion.

2 Data and methods

2.1 Hindcast data

The NMME is a multi-model forecasting system consisting of a series of coupled climate models from the US modeling centers, including the National Centers for Environmental Prediction (NCEP), the Geophysical Fluid Dynamics Laboratory (GFDL), the National Aeronautics and Space Administration (NASA), the National Center for Atmospheric Research (NCAR), and the Canadian Meteorological Centre (CMC). In this study, four of the models in the NMME were selected: CanCM4i (Merryfield et al. 2013), CCSM4 (Gent et al. 2011), GEO-NEMO (Côté et al. 1998), and GFDL-CM2p5-FLOR-B01 (hereafter GFDL-FLOR). These models are selected based on two reasons: One, these models are all participating in the NMME real-time forecasting in 2020, which represents the sufficient recognition of these model predictions by NMME; Second, these models all have consecutive hindcasts from 1982 to 2018, ensuring that forecast skills can be evaluated for such period. CanCM4i, CCSM4, and GEO-NEMO each have 10 members, while GFDL-FLOR has 12 members. The NMME hindcast data used in this paper are available on the International Research Institute (IRI) data server: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/.

The hindcast of the precipitation, SST for the four models and 850 hPa wind for CanCM4i and GEM_NEMO were used. Following convention, a 1-month lead prediction was treated as a prediction initialized during the previous month, that is, for the 1-month lead predictions for May, the predictions of the monthly means are based on the initial conditions in April. In this study, we focused on the deterministic assessments based on the target months, and the hindcast with 1- and 3-month lead times were studied. The ensemble mean of each individual system was obtained before applying the multi-model ensemble (MME) mean, and then, the MME results were interpolated onto a 1° × 1° grid. We obtained the ensemble climatological mean for each system for the period of 1983–2010, and we then subtracted the climatological mean from each system to obtain the anomalies for each variable.

2.2 Observations and reanalysis data

Two sets of widely used observational precipitation data for the period of 1983–2018 were used in this study, the Global Precipitation Climatology Project (GPCP) dataset, version 2.3 (GPCPv2.3; Adler et al. 2003) and the Climate Prediction Center (CPC) Merged Analysis of Precipitation (CMAP) dataset (Xie and Arkin 1997), both of which have horizontal resolutions of 2.5° × 2.5°. To reduce the uncertainty of using a single dataset, the observational precipitation results were obtained using the average of these two datasets. We used the optimum interpolation version 2 (OI) monthly mean sea surface temperature (Reynolds et al. 2002) to study the SSTA variability related to the heavy late spring rainfall in the MLYZR region. This dataset was constructed using a regular global 0.25° grid. In addition, the 850 hPa wind data from the National Centers for Environmental Prediction-Department of Energy (NCEP-DOE) Reanalysis 2 (Kanamitsu et al. 2002) were also used. This reanalysis dataset is available on a 2.5° × 2.5° grid, and it was also extracted for the period from 1983 to 2018.

For the convenience of comparing the results with the hindcast results, all of the observational and reanalysis datasets were linearly interpolated to a common 1° × 1° grid, which is consistent with the hindcast. It should be noted that the correlation, composite, and empirical orthogonal function (EOF) analyses were performed based on the anomalies, and the climatology of each variable was also derived based on the 1983–2010 period.

3 Ability to forecast the late spring rainfall in China

The climatology and standard deviation of the observed and MME May precipitation are presented in Fig. 1. As is shown in Fig. 1a, the observed May precipitation over China has two centers: one located around the Pearl River Delta region, and the other located over the regions south of the lower reaches for the Yangtze River (LYZR), which has a slightly weaker magnitude. Compared with the observed results, the spatial patterns are almost described by the MME with 1- and 3-month leads, with comparable magnitudes over the LYZR but lower magnitudes over the Pearl River Delta region. The spatial distribution of the standard deviation (STD) of the observed May rainfall is similar to that of the climatological mean, and the STD in southeastern China is more pronounced compared to those of other regions in China. The STD of the MME precipitation also exhibits a center located in southeastern China, which is consistent with the spatial distribution of the MME climatological mean. However, the STD is much lower for the MME (Fig. 1e, f), indicating that the variability of the MME precipitation is much weaker compared to the observation.

Fig. 1
figure 1

Climatology and standard deviation of the May precipitation over East Asia for 1983–2018. a, d Are the results of observations, b, e are the results of the MME precipitation with a 1-month lead, and c, f are the MME results with a 3-month lead. The red rectangle in b indicates the MLYZR region used in this study (25°–35° N, 110°–122° E) (unit: mm/day)

Anomaly correlation has been used to represent the skill to predict the May precipitation. Figure 2a, b show the geographic distribution of the temporal correlation coefficient (TCC) skill for 1- and 3-month lead precipitation prediction using the four climate models’ MME. As is shown in Fig. 2, prediction skill is significant above the 95% confidence level over the tropical and subtropical oceans. On the land-surface of China, significant positive correlations were found over the central-northern and eastern parts of China between the prediction with a 1-month lead time and the observation, demonstrating that the MME has some ability to capture the precipitation anomalies in these areas. In contrast, when forecasting with a 3-month lead, the prediction skill is basically insignificant over all of China, except for in several sporadic areas located in northwestern and eastern China. The prediction is particularly poor in northeastern and southwestern China. It should also be noted that the ability to predict the precipitation over the MLYZR is higher than that over the Pearl River Delta region. To test the homogeneity of the TCC of the four individual models, only the grids in which the TCCs of the four models have the same signs as that of the MME were marked with solid dots. It can be seen that in the MLYZR, the changes between different models and observations are almost consistent for the forecasts with 1-month lead, but the differences among different models are obvious for the forecasts with 3-month lead, showing low forecast skill for the forecasts with 3-month lead.

Fig. 2
figure 2

The temporal correlation coefficient (TCC) for the May precipitation anomaly prediction skill using the MME with a a 1-month lead and b a 3-month lead during 1983–2018. The solid green contour is the TCC of 0.32 with a statistical significance of 95%. The solid black dots indicate the areas in which all four single models have the same TCC sign as the MME results

We conducted EOF analysis of the rainfall anomalies over southeastern China to further identify the dominant rainfall patterns from observations and model predictions. The first EOF mode of the observed May rainfall anomalies is characterized by uniformly enhanced rainfall over southeastern China, which is spatially similar to the climatological mean (Fig. 3a). This shows that the variance explained by EOF1 for the observed rainfall anomalies and the MME rainfall anomalies with 1- and 3-month leads are 59.3, 56.4, and 81.6%, respectively, indicating its completely dominant role. The MME can basically forecast the spatial pattern of the EOF1 of the May rainfall anomalies, but the central location of the MME extends farther northward than and only has half the magnitude of the observations. Moreover, the TCC of principle component 1 (PC1) between the observations and the MME with a 1-month lead is 0.56, far exceeding the 95% significance level. However, the TCC of PC1 between the MME with a 3-month lead and the observations is only 0.17. In contrast to the EOF1 mode, the main pattern for the EOF2 of the observed rainfall anomalies is characterized by a north–south dipole structure, with an inverse variability, between the MLYZR and the Pearl River Delta region. The forecasts with 1- and 3-month leads also exhibit relative more northward precipitation centers, the southern edges of which have already crossed the Yangtze River. Furthermore, the TCC for PC2 between the observations and forecasts are low, i.e., only 0.21 for the forecast with a 1-month lead and − 0.12 for that with a 3-month lead.

Fig. 3
figure 3

As in Fig. 1, but for the spatial distributions of the first two leading EOF modes of the May precipitation anomalies. The black lines represent the 1 mm/day isoline, and the green line represents the 2 mm/day isoline

In summary, the above results indicate that in general, the models can forecast the climatological mean and the variability of May rainfall. There are still some deficiencies, such as the forecasting of the north-shifted precipitation center and the much weaker intensity. Therefore, in this study, we focused on examining the variability of the heavy rainfall over the region marked by the red rectangle (MLYZR) in Figs. 1b and 2a, and we will explore the main sources of the heavy precipitation forecasting ability over the MLYZR in the next section.

4 Major sources of heavy rainfall

In order to determine the primary temporal variations of the May rainfall anomalies over the MLYZR, the MLYZR index (MLYZRI) was constructed by averaging the May rainfall anomaly data over the MLYZR. Heavy rainfall years are defined by those years when the MLYZRI was larger than the STD for 1983–2018. To ensure the consistency of the spatial distributions of rainfall anomalies in the heavy rainfall years, we continued by using the method described above, that is, only those grids in which all of the single heavy rainfall years have the same sign as the MME results are marked with black dots. This method is also used in the following figures. The composite result of the observed May rainfall anomalies (Fig. 4a), shows that there are significant positive rainfall anomalies over southeastern China, but there are obvious negative rainfall anomalies over the South China Sea (SCS) and over the oceans to the east of the Philippines. Compared to the observed heavy rainfall over the MLYZR in May, the simultaneous composite MME rainfall with a 1-month lead is much weaker, indicating a somewhat limited forecasting skill. While for the MME rainfall anomalies with a 3-month lead, the simultaneous composite results are insignificant for almost all land-surface of China, showing no forecasting skill. Figure 4d, e show the composite MME rainfall anomalies related to the corresponding MME heavy rainfall over the MLYZR, which resembling well with the observations. Therefore, the models have the ability to forecast the rainfall pattern shown in Fig. 4a, but they are out-of-phase with the observations. The inconsistent spatial distributions of the rainfall anomalies shown above are consistent with the relatively low TCC between the MME and the observations, especially for the MME with a 3-month lead.

Fig. 4
figure 4

a Composite observed May precipitation anomalies (mm/day) for the years for which the observed May MLYZRI is larger than the STD, b, c are the composite MME May precipitation anomalies with 1- and 3-month leads, respectively, for years for which the observed May MLYZRI is larger than the STD, d, e are the composite MME May precipitation anomalies with 1- and 3-month leads, respectively, for the years for which the MME May MLYZRI is larger than the STD (unit: mm/day)

As was discussed in the introduction, tropical SSTA are generally considered to be the major sources of seasonal predictability. Therefore, the SSTAs that occurred concurrently with the heavy rainfall over the MLYZR (represented by those years for which the MLYZRI is larger than the STD) were diagnosed in the observations and models. The positive SSTAs concurrent with heavy rainfall over the MLYZR particularly occur over the tropical eastern Indian Ocean (EIO), the South China Sea, and its adjacent regions (Fig. 5a). In addition, significant positive SSTAs also occur in limited areas of the subtropical central Pacific Ocean, indicating the insignificant effect of ENSO on the contemporaneous heavy rainfall over the MLYZR. The predicted SSTAs with a 1-month lead can reproduce the pattern of the observed SSTAs well, as is shown in Fig. 5a. However, for the predicted SSTAs with a 3-month lead, the warm SSTAs that were originally located in the eastern tropical Indian Ocean extend into the southeastern and southwestern Indian Ocean. The heavy MME rainfall over the MLYZR is strongly related to ENSO in the models, which is demonstrated by the strong positive SSTAs over the tropical eastern Pacific Ocean for both the 1- and 3-month leads (Fig. 5d, e). The years of heavy rainfall over the MLYZR based on observation and model forecast are listed in Table 1, which shows some differences between observation and models. We also remark the El Niño years with asterisk, as shown, there are two El Niño years when it occurred heavy rainfall over the MLYZR in the observation, but three for the model forecasts with 1-month lead, and even five for the model forecasts with 3-month lead. The inconsistency between observation and model forecasts also indicate that the occurrence of heavy precipitation in the models is likely to be mistakenly modulated by El Niño, and such modulation effect may be amplified with the increase of the leading month.

Fig. 5
figure 5

As in Fig. 4, but for the composite results of the SSTA (units: °C)

Table 1 The major heavy precipitation years based on observation and model forecasts

Correlation analysis is carried out between the rainfall over China and the Niño3 index for both the observations and the model results in order to explore their relationships. The observed rainfall over China, including the MLYZR, has no significant contemporaneous correlation with the observed Niño3 index (Fig. 6a), which is consistent with the spatial distribution of the SSTAs shown in Fig. 5a. However, for the models, the correlations and the consistencies between the predicted rainfall anomalies over southeastern China produced by the four individual models and the contemporaneous Niño3 index are significant for both the 1-month lead and the 3-month lead (Fig. 6b, c). The lead correlation reveals that the significant correlation between the observed rainfall over the MLYZR and observed Niño3 index slightly exceed the 95% confidence level in the preceding December and current February, indicating the lagged response of the atmospheric circulation to the El Niño signal. In contrast, the MME rainfall over the MLYZR is significantly correlated with the MME Niño3 index for both the 1- and 3-month leads. The forecasts of the MME and the individual models all show this lagging strong influence of ENSO on the rainfall over the MLYZR, and the results with a 3-month lead have a higher correlation coefficient than those with a 1-month lead (Fig. 6d, e). In short, based on the observations, ENSO is not an essential factor causing the heavy rainfall over the MLYZR. However, the models all overestimate the relationship between ENSO and the heavy rainfall over the MLYZR. Wu et al. (2000) have long since concluded that the correlation between the SSTAs in the equatorial central-eastern Pacific and China’s climate is only skin deep. In fact, the Indian Ocean SSTA is the real reason for the direct causal relationship with China’s climate.

Fig. 6
figure 6

a Correlation between the observed May grid point precipitation anomalies and the observed Niño3 SST index. b, c The correlations between MME May precipitation anomalies and the MME Niño3 SST index with 1- and 3-month lead times, respectively. d, e The lead TCCs between the averaged MME May precipitation anomalies over the MLYZR and the MME Niño3 SST index with 1- and 3-month lead times, respectively, from the previous December to the current May. The blue line is the TCC skill of 0.32, which is statistically significant at the 95% confidence level

Considering the importance of the SSTA in the EIO, we calculated the area-averaged SSTA over the EIO region (15° S–15° N, 80° E–110° E; Fig. 7a) and defined it as the EIO index. The relationship between the EIO index and the rainfall over China were explored. The observed May EIO index is significantly correlated with the observed May rainfall anomalies in the MLYZR. The models all capture the relationship between the EIO index and the rainfall anomalies over the MLYZR well. The lead correlation shows that the maximum correlation coefficient occurs in May for both the observations and models, verifying the direct influence of the EIO SSTA on the rainfall anomalies over the MLYZR.

Fig. 7
figure 7

As in Fig. 6, but for the relationship between the May precipitation anomalies and the averaged SSTA over the eastern Indian Ocean. The rectangle indicates the eastern Indian Ocean (15° S–15° N, 80°–110° E)

Since we found a strong relationship between the SSTA in the EIO and the rainfall anomalies over the MLYZR in May, it was necessary to investigate the spatial–temporal evolution of the SSTA in the EIO. During the years in which there was heavy rainfall over the MLYZR, positive SSTAs were observed over the central tropical Indian Ocean from December (− 1) to March (0) (Fig. 8a–d). Here, (− 1) and (0) indicate the preceding and current year of the heavy rainfall over the MLYZR, respectively. The positive SSTA migrates to the northeast in April, resulting in the warming of the eastern part of the Bay of Bengal (Fig. 8e). In May, the positive SSTA keeps moving toward the South China Sea and the adjacent seas. Meanwhile, anomalous south-westerlies occur in the left edge of the low-level anti-cyclonic anomalies located around the northwestern Pacific, which transport more moisture from the South China Sea to the MLYZR, leading to the heavy rainfall over the MLYZR. This process is consistent with previous findings of the influence of the Indian Ocean SSTA on the summer rainfall over the MLYZR (Du et al. 2009; Xie et al. 2009). The SSTA transition from April to May is most noteworthy, and it is of greatest importance to the occurrence of the heavy rainfall in the MLYZR.

Fig. 8
figure 8

Spatial distribution of the composite observed SSTA from the preceding December to May for the years in which the May precipitation anomalies were observed over the MLYZR were larger than the standard deviation (unit: °C). The green arrows show the 850 hPa wind anomalies according to the SSTA shown in f. The blue dashed rectangle shows the evolution of SSTA movement from af

We further studied the SSTA transition from April to May in the models in order to explore the possible factors that affect the heavy rainfall over the MLYZR. During the transition period from April to May in the years in which heavy rainfall was observed over the MLYZR, the MME can forecast the warming process from the eastern Bay of Bengal to the southern part of the South China Sea with a 1-month lead (Fig. 9a, b), but it failed for a 3-month lead (Fig. 9c, d). Correspondingly, there is an anti-cyclonic anomaly at 850 hPa in May for a 1-month lead, but the low-level 850 hPa wind anomalies are disorganized for a 3-month lead. For the transition period from April to May for the MME for the years with heavy rainfall over the MLYZR, the MME SSTA exhibits obvious warming features for both 1- and 3-month leads, and furthermore, almost the entire tropical Indian Ocean is robust, showing basin-wide warming (Fig. 9e–h). Similarly, there are also low-level anti-cyclonic anomalies located around the northwestern Pacific, which correspond well with the heavy rainfall over the MLYZR (Fig. 4d, e).

Fig. 9
figure 9

a, b The spatial distributions of the composite MME SSTA with a 1-month lead from April to May for the years in which the observed May precipitation anomalies over the MLYZR were larger than the standard deviation. c, d Similar to a and b, but for the composite MME SSTA with a 3-month lead time. e, f Similar to a and b, but for the composite MME SSTA results for the MME May precipitation anomalies with a 1-month lead time, and g, h are with respect to a 3-month lead time. The 850 hPa wind anomalies in May are shown in b, d, f, and h

The climate models have the highest forecasting abilities for the SSTAs in the tropical Oceans. The correlation coefficients between the observations and the forecasts of the Niño3 index and the EIO index all exceed 0.7 for every individual model, and the MME results exceed 0.8, indicating high levels of prediction skill. Thus, the forecasting of the monthly SSTA is not the main reason that the models can basically obtain the rainfall anomalies in the MLYZR with a 1-month lead but fail with a 3-month lead. As a matter of fact, what really matters is whether the model can forecast the SSTA evolution processes from April to May. We selected the SSTA differences between May and April in the tropical Indian Ocean, the SCS, and its adjacent seas and calculated its correlation coefficients with the May rainfall anomalies in the MLYZR. The observed results show that the areas significantly related to the rainfall anomalies in the MLYZR in May are mainly located in the northeastern SCS and the eastern part of Taiwan Island (Fig. 10a). In comparison, in the models, the regions that are significantly related to the models’ rainfall anomalies in the MLYZR are located in the northeastern Indian Ocean and in the southern SCS (Fig. 10b, c). For a 1-month lead, from April to May, the forecasted SSTA differences between the models’ MLYZR rainfall related regions (Northeastern Indian Ocean, NEIO, 0°–18° N, 80°–120° E) and the observed key regions (Northeastern SCS, NSCS, 10°–25° N, 110°–125° E) have a high synchronous correlation coefficient, with 0.52 for the MME forecasts with a 1-month lead and 0.32–0.52 for the four individual models. However, for the forecasts with a 3-month lead, the correlation coefficient for the MME is only 0.1, and those for the four individual model are − 0.08 to 0.24. This significant inconsistency indicated by the correlation coefficients mentioned above satisfactorily explains why the models can generally forecast the heavy rainfalls over the MLYZR with a 1-month lead but fail for a 3-month lead. In addition, the relationship between the SSTA differences from April to May over the NEIO and the Niño3 index of the models was studied. As is shown in Fig. 10d, the MME results with a 1-month lead have a correlation coefficient of almost 0.6, and those for the four individual models are 0.41–0.68. While for the MME results with a 3-month lead, the correlation coefficient is 0.68, and those for the four individual models are 0.46–0.6. This indicates that the El Niño events may trigger warming, especially over the NEIO, from April to May, thus inducing the heavy rainfall over the MLYZR in the models. It should also be noted that the observed SSTA differences from April to May over the NSCS have no significant correlation with the observed Niño3 index, i.e., the correlation coefficient is only 0.1. These results indicate that in the real world, the warming processes over the NSCS, which are vital to the occurrence of heavy rainfall over the MLYZR, are not the direct results of El Niño events, and this phenomenon should be studied more in-depth in the future.

Fig. 10
figure 10

a Spatial distribution of the correlation between the observed MLYZRI and the SSTA differences for May minus April. b, c Are the correlation results with 1- and 3-month leads, respectively (black dots indicate that all of the individual models have the same sign of the correlation with the MME). d Correlation between the Niño3 index and the models’ averaged SSTA differences for May minus April over the NEIO for the MME and the four individual models

5 Summary and discussion

The statistics of the observation data show that the precipitation in May can account for 15% of the annual precipitation in southeastern China, with two centers of precipitation located in the Pearl River Delta region and the southern part of the lower reaches of the Yangtze River. The May rainfall exhibits significant interannual variability, which can cause severe droughts and floods in southeastern China. In this study, four models participating in the NMME were selected to study their abilities to forecast the late spring rainfall in China and the major sources of their heavy rainfall forecasting abilities. Based on the evaluation of the climate state, standard deviation, main modes, and correlation analysis between the forecasts and observations, the results show that the models can describe the rainfall over the southern part of lower reaches of the Yangtze River better than over the Pearl River Delta. We found that the spatial distribution of the rainfall anomalies during the observed heavy rainfall over the MLYZR in May can be captured by the forecasts with a 1-month lead but not by the forecasts with a 3-month lead. However, the models themselves can forecast the spatial pattern of the observed rainfall anomalies when there is heavy rainfall over the MLYZR, implying that the models can reproduce the spatial distribution of the heavy rainfall, but they deviate from the observations in terms of occurrence time. Based on this, considering that the climate models mainly depend on the memory of the ocean for seasonal prediction, in this study, the key areas and possible processes influencing the heavy rainfall in the MLYZR were investigated from the perspective of the SSTA field.

It should be noted that when heavy May rainfall occurred over the MLYZR, the tropical eastern Indian Ocean always contained significant warm SSTAs, while only scattered areas in the central Pacific contained warm SSTAs. This indicates that the warm SSTAs in the tropical eastern Indian Ocean are important to the simultaneous precipitation over the MLYZR, but El Niño events are not a necessary condition for determining the rainfall over the MLYZR. Corresponding to the observed heavy rainfall over the MLYZR, the forecasts with a 1-month lead can basically obtain the observed spatial distribution of the SSTAs, but for a 3-month lead, the key area originally located in the tropical eastern Indian Ocean shifts to the southwestern and southeastern Indian Ocean. In contrast, corresponding to the predicted heavy rainfall over the MLYZR, the most significant spatial feature is that the models all show an El Niño-like SSTA distribution in the equatorial Pacific, accompanied by warm SSTAs in the tropical eastern Indian Ocean. The lead correlation analysis method also shows that the relationship between the observed Niño3 index and the rainfall anomalies over the MLYZR in May is not significant, and even when the Niño3 index leads 2–4 months, their correlation coefficients are just on the edge of the 95% significance line. However, the correlation coefficients of the models’ results are far beyond the 95% significance line, especially when the Niño3 index is 1–2 months ahead. In contrast, the SSTA in the tropical eastern Indian Ocean has a significant synchronous correlation with the rainfall over the MLYZR in May for the observations and models.

Considering the importance of the SSTA in the eastern Indian Ocean to the heavy rainfall over the MLYZR in May, we studied the evolution characteristics of the SSTA. We found that for the occurrence of heavy rainfall over the MLYZR in May, there is always warming of the northeastern Indian Ocean and the northeastern South China Sea (NSCS) from April to May in the models and observations, respectively. It is well known that the seasonal forecasting models have very high prediction skills for the monthly SSTAs in the tropical oceans, e.g., the forecasting of the Niño3 index and the EIO index, the prediction skills of which can exceed 0.7 with a 3-month lead for all of the individual models. Thus, the forecast of the monthly SSTAs is not the main reason that the models can basically obtain the rainfall anomalies in the MLYZR with a 1-month lead but fail for a 3-month lead. We found that the key regions containing the SSTAs change from April to May, which explains why the rainfall anomalies in the MLYZR differ for the models and observations. The key areas for the observations are mainly located in the NSCS and its adjacent regions; while for the models, the key areas are located around the NEIO and in the southern SCS. The models’ averaged SSTA differences from April to May with a 1-month lead over the NEIO have a high correlation with the observed averaged SSTA differences from April to May over the NSCS, but there is no significant correlation for a 3-month lead. The significant inconsistency demonstrated by the correlation coefficients that we obtained for the different lead times satisfactorily explains why the models can generally forecast the heavy rainfalls over the MLYZR with a 1-month lead but fail for a 3-month lead.

Finally, the Niño3 index and the SSTA differences from April to May over the NEIO in the models have a high-correlation coefficient with 1-month or 3-month leads, indicating that it is possible that in the models, the El Niño events can promote the warming of the NEIO, thus leading to heavy rainfall over the MLYZR. In contrast, in the real world, this warming is not the direct result of El Niño events, and further research on the reasons for this is still needed.