Introduction

Both modeling results and observation data show an increase in the number of extreme precipitation events (Alexander et al. 2006; Beniston et al. 2007; Qian et al. 2007; Wang et al. 2008). This trend is typically explained by climate change, and is expected to exacerbate with the increase of greenhouse gas emissions (Easterling et al. 2000; Durman et al. 2001; Allen and Ingram 2002; Field et al. 2012; IPCC-AR5 2013). However, climate model simulations often underestimate the observed increase in heavy rainfall during the last five decades (Allen and Ingram 2002; Wilby and Wigley 2002; Min et al. 2011), which points to causes other than those traditionally considered in climate models, and the importance of considering anthropogenic factors (Allan and Soden 2008; Li et al. 2011). A recent research found that the climate models used by the IPCC AR5 capture reasonably the temporal trends of extreme precipitation during 1961–2000 in western China. However, the models do not adequately reproduce the trends over eastern China, which is characterized by much more intense anthropogenic activity (Shepherd 2005). Studies found that convective rainfall has experienced greater increase than stratiform precipitation: given that the former is influenced mostly by local interactions and the later by planetary circulation (Wang and Zhou 2005; Ou et al. 2013), the increased frequency of heavy rainfall events (Chen et al. 2010; Ou et al. 2013) might be due to human induced local changes. This highlights the need to explore more deeply the specific roles of a range of anthropogenic processes and their relative contributions to heavy rainfall at regional scales.

Data and methods

Data source

Precipitation data for 659 meteorological stations and annual haze days are from China Meteorological Administration; 29 large scale climate factors influencing China’s precipitation are from NOAA and Chinese National Climate Center (Table 1); precipitable water and water vapor flux data of NCEP/NCAR and ERA/ECMWF reanalysis data; horizontal visibility data for 1957–2005 are from the Chinese Academy of Meteorological Sciences; and 11 country-level socio-economic and environmental indicators including GDP; primary, secondary and tertiary industrial output; construction-sector output; energy consumption; total population; urban population; rural population; urbanization ratio; annual average haze days; and county-level population density and horizontal visibility data are compiled from 60-year Statistics of New China and Prefectural (Municipal) Social and Economic Statistics Summary of China. Such 40 factors include 29 climatic factors and 11 anthropogenic factors. The former mainly covers regional circulation factors, the latter mainly refers to elements representing economic and social activities of human beings.

Table 1 29 large scale climate factors and 11 socioeconomic factors

The heavy rainfall amounts, heavy rainfall days and heavy rainfall intensity of inter-annual and decadal heavy rain in China are calculated based on precipitation from 659 stations, configuration of which remains consistent through the research period. IDW (Inverse Distance Weight) interpolation method is used to produce spatially continuous raster layer of heavy rainfall factors.

Methods

Calculation of heavy rainfall

In this research, heavy rainfall is defined as daily rainfall greater than 50 mm. The sum of \(HRA_{i}\) and heavy rainfall days \(HRD_{i}\) of each meteorological station as well as average heavy rainfall intensity \(HRI_{i}\) were calculated as per formulae (1)–(3), for the 6 decades of 1951–1960, 1961–1970, 1971–1980, 1981–1990, 1991–2000 and 2001–2010 respectively.

$$HRA_{i} = \mathop \sum \limits_{j = 1}^{10} hra_{1940 + 10i + j}$$
(1)
$$HRD_{i} = \mathop \sum \limits_{j = 1}^{10} hrd_{1940 + 10i + j}$$
(2)
$$HRI_{i} = \mathop \sum \limits_{j = 1}^{10} hra_{1940 + 10i + j} \Bigg/\mathop \sum \limits_{j = 1}^{10} hrd_{1940 + 10i + j}$$
(3)

Notes wherein, \(HRA_{i}\) is total heavy rainfall amount at a meteorological station in the \(i\) th decade within a study phase; \(hra_{1940 + 10i + j}\) total heavy rainfall amount at a meteorological station in the \(j\) th year of \(i\) th decade; \(HRD_{i}\) total heavy rainfall days at a meteorological station in the \(i\) th decade; \(hrd_{1940 + 10i + j}\) total heavy rainfall days at a meteorological station in the \(j\) th year of \(i\) th decade; \(HRI_{i}\) heavy rainfall intensity at a meteorological station in the \(i\) th decade; \(i\) decadal order (\(i = 1,2, \ldots\,6\)); \(j\) yearly order (\(j = 1,2, \ldots\,10\)).

Model selection and validation

  1. (1)

    Stepwise regression: Here we considered 40 factors (i.e. the 29 climate factors and 11 anthropogenic factors) as the candidate predictors and heavy rainfall (i.e. HRA, HRD, and HRI) as the target variables. Stationarity test and cointegration test are performed to eliminate the possibility of spurious regression. In each selecting step, only those variables significant at the 95% level are identified and included in the regression equation. In the removing step, the variables not significant at the 90% level are ruled out from the equation (Johansen 1994).

  2. (2)

    AIC to confirm model optimal variables: We use the Aakaike Information Criterion (AIC), as a criteria for model selection that penalize models having large number of predictors and search for the models that have small values of AIC (Cahill 2003).

  3. (3)

    Cross-validation test the robustness and stability of the regression model: To address the issue of over-fit, we conducted cross-validation by intentionally leaving out up to 33% of data and used them to verify the model prediction.

MLR-based variance explanation rate

A MLR equation was established for standardized sequence based on the multiple regression theory (Harris 1992; Pedroni and Peter 1999; Mackinnon 2010):

$$Y_{i} = b_{1} X_{1i} + b_{2} X_{2i} + b_{3} X_{3i} + b_{4} X_{4i} + b_{5} X_{5i} + b_{6} X_{6i}$$

wherein, i = 1,…, n, n = 60 years and b1…b6 are regression coefficients.

wherein, r 1 , r 2 , r 3 , r 4 , r 5 and r 6 are correlation coefficient between heavy rain and WPSH, ENSO (HRA and HRD)/AMO (HRI), AAO, GDP2, UP and HD respectively. It was proved that:

$$c^{2} = b_{1} r_{1} + b_{2} r_{2} + b_{3} r_{3} + b_{4} r_{4} + b_{5} r_{5} + b_{6} r_{6}$$

wherein, c is multiple correlation coefficient, c 2 represents the six factors’ rate of variance explanation of heavy rainfall and the b 1 r 1 ,b 2 r 2 ,b 3 r 3 ,b 4 r 4 ,b 5 r 5 ,b 6 r 6 represent respectively contributions of each factor to heavy rainfall in China.

Spatial correlation analysis

Spatial correlations are performed between county level raster images of HRA, HRD, HRI (which are generated using IDW) and population density (PD) and low visibility days (LVD). Higher spatial similarities between the two images compared would produce higher spatial correlation value. (Gao and Deng 2002).

Results

Trend in heavy rainfall in China

Since 1950s, the total precipitation amount for China shows no obvious trend, whereas both the intensity of heavy rainfall and the area suffering from extreme precipitation events have expanded (Zhai et al. 2005). HRA, HRD and HRI has increased significantly (Fig. 1): from the 1950s to the 2000s, HRA, HRD and HRI increased by 58.6–68.7, 46.5–60.2, and 7.1–11.5 percent respectively. Note that higher numbers are average of 659 station data, which tend to overestimate due to more stations in southeast China where most of the increase occurred; and the lower numbers are from the 0.5° grid data based on Ou et al. (2013), which tends to give lower estimation due to the smoothing effect of the interpolation. The share of heavy rainfall in total rainfall increased from 15.9% to 26.0% (Table 2). This increase demonstrates a clear, shifting spatial pattern with high values of HRA (Fig. 2; Table 3) and HRD (Fig. 3; Table 4) moving progressively from the southeast coast to inland China during the last 60 years. Such spatial–temporal features are obviously inconsistent with that of the warming temperature (Yatagai and Yasunari 1994; Shi et al. 2014) and cannot be reasonably explained by the leading atmospheric and oceanic climate factors (Easterling et al. 2000; Liu et al. 2009; Wan et al. 2012). Below we investigate, using climate and socioeconomic data, whether, and if so to what extent, local and regional anthropogenic processes contributed to the observed trend and pattern.

Fig. 1
figure 1

Yearly and decadal heavy rainfall amounts, days and intensity in China from 1951 to 2010. a Yearly HRA in China. b Decadal HRA in China. c Yearly HRD in China. d Decadal HRD in China. e Yearly HRI in China. f Decadal HRI in China

Table 2 The share of heavy rainfall in total rainfall in different decade in China
Fig. 2
figure 2

Spatial distribution pattern of decadal HRA in China from 1951 to 2010

Table 3 Change of station number of decadal HRA in China from 1951 to 2010
Fig. 3
figure 3

Spatial distribution pattern of decadal HRD in China from 1951 to 2010

Table 4 Change of station number of decadal HRA in China from 1951 to 2010

Temporal analysis: identifying key factors influencing heavy rainfall and their relative contributions

Factors influencing heavy rainfall

29 climate factors that are known to influence East Asian precipitation and 11 socioeconomic factors are considered as candidate predictors with heavy rainfall as the target variable (Table 1). Seven factors are eventually chosen as significantly related to heavy rainfall (0.05 significance level): four climatic factors including WPSH (western Pacific Subtropical High), ENSO (El Nino-Southern Oscillation), AMO (Atlantic Multi-decadal Oscillation) and AAO (Antarctic Oscillation), and 3 socio-economic factors including output value of the secondary industry (GDP2), urban population (UP) and annual average haze days (HD). The four climatic factors are determined by large-scale climate dynamics and not directly influenced by local human activities. The three socio-economic factors are all closely related to land-use change and air pollution (Bai et al. 2012; Ding and Liu 2014), with urban population being an indirect demographical indicator of land-use change, GDP2 most indirect indicator for air pollution, and HD represents the environmental consequence of land-use change and air pollution, which is most closely linked to the air quality.

A Pearson correlation analysis shows that all three anthropogenic factors correlate very strongly with heavy rainfall (all significant at the 0.01 significance level), whereas the climate factors tend to be correlated less strongly and at statistically less significant levels (Table 5).

Table 5 Correlation coefficient and variance explained percentage of climate and anthropogenic factors

The influence of water vapor increment on heavy rainfall

Atmospheric precipitable water (PW) and divergence of water-vapour flux (WVF) can affect regional precipitation. We calculated regional total column PW and (surface—300 hPa) divergence of WVF in eastern and central China where significant increase in heavy rainfall occurred. As shown in Fig. 4, PW and divergence of the WVF increase until the end of 1980s but decline afterwards, none of which converge with the trend in heavy rainfall. The spatial distributions of the changes in the two variables between 1970 and 2010 indicate that both PW and WVF decreased in most of the areas where heavy rainfall actually increased. In addition, during the last two decades the proportion of decadal convective HRD to total HRD increased from 81.8 to 86.0%, with a corresponding drop in the proportion of continuous HRD to total HRD, suggesting that the increase in heavy rainfall is increasingly influenced by local conditions rather than the large-scale circulation and moisture fluxes.

Fig. 4
figure 4

Atmospheric precipitable water vapor and divergence of the water-vapour flux (Div-WVF) (surface—300 hPa) in the Central and East China: a Spatial distribution of changes in annual mean precipitable water vapour from NCEP/NCAR between 1971 and 2010 [kg/(m2a)]; b Annual mean precipitable water (PW) vapour in central and east China (red rectangular area in Fig. 2a) from 1971 to 2010; c Spatial distribution of changes in annual mean divergence of the water-vapour flux from NCEP/NCAR between 1971–2010 [kg/(msa)]; d Annual mean divergence of the water-vapour flux in central and east China (red rectangular area in Fig. 2c) from 1971 to 2010

Quantifying relative contributions

To estimate the relative contributions of these seven factors to the observed increase in heavy rainfall, we performed multiple linear regression. The selected factors collectively explained 85.8, 84.7 and 87.5% of the total variance of HRA, HRD and HRI respectively. Anthropogenic factors are the main contributors, each contributing at equivalent magnitude and collectively accounting for 71.7, 69.0 and 75.0% of the total explained variance whereas the climate factors account for only 28.3, 31.0 and 25.1% (Table 5). Each of the three anthropogenic factors has roughly the same level of contribution as the sum of all the climate factors. Combined together, they are thrice as likely to have led to the variance in heavy rainfall than the climate factors.

Robustness of the results

The robustness of our statistical model is tested through four different analyses. First of all, to evaluate the influence of various lag effects of the alternative factors that are not included in our model, we have performed power spectrum analysis to obtain the possible lag time of all the input variables, and conducted an un-interpreted residual analysis. Results show that the variance explained percentage of HRA, HRD and HRI are 6.9, 6.3 and 5.3% respectively (Table 6), which are very small compared to the explained variance percentage of our model. This means the factors included in our model through step-wise regression is robust, despite the limitation of the method. In addition, using a different heavy rainfall threshold (95 percentile) gave consistent results. Furthermore, we also used the Akaike Information Criterion (AIC) as a criteria for model selection that penalize models having large number of predictors (Table 7), and finally a cross-validation analysis leaving out up to 33% of data (Table 8). Both results show high-level stability and robustness of our model.

Table 6 Variance explained percentage of the different lag climate and anthropogenic factors related to the residual of HRA, HRD, HRI
Table 7 Akaike information criterion (AIC) of MLR
Table 8 Cross validation correlation coefficient of regression

To further illustrate the relative importance of the climatic and anthropogenic factors in increasing heavy rainfall, we produced normalized HRA, HRD and HRI. We also generated four integrated factors—integrated heavy rainfall index, integrated climatic indicator, integrated anthropogenic indicator and integrated natural-anthropogenic indicator—by normalizing the individual factors and integrating them using the variance explanation rate of the factor as respective weight, and plotted the scatter diagrams of these factors against normalized heavy rainfall factors (Fig. 5). In all cases, the integrated anthropogenic- and anthropogenic-climatic factor factors demonstrate much more synchronized trends with normalized and integrated heavy rainfall factors, with R-square of the fitting curves typically around 0.90. In contrast, the R-square of the fitting curves for the climatic factors are typically around 0.40. These findings reinforce that integrated anthropogenic factors explain much more the documented increase in heavy rainfall in China.

Fig. 5
figure 5

Correlation between integrated climatic and anthropogenic factors and normalized heavy rainfall factors over Time. a Normalized HRA. b Normalized HRD. c Normalized HRI. d Integrated and normalized heavy rainfall index

Spatial correlation between anthropogenic factors and heavy rainfall

If the anthropogenic processes indeed contributed more to the increasing trend in heavy rainfall, then the changing spatial pattern of anthropogenic processes should be related to the shifting spatial pattern of the heavy rainfall factors. We tested this via spatial correlation analysis between county level socioeconomic data and heavy rainfall indicators. Due to the limited data availability at this fine resolution over the long term, we used county-level population density (PD) to represent the spatial distribution of urbanization, and the annual average days with visibility less than 10 km (LVD) as a proxy indicator of HD, given that LVD can be affected by air pollution, and there is a statistically significant, high correlation between HD and LVD (r = 0.79, p < 0.01). The meteorological station data were interpolated to generate 1-km resolution images, based on which prefecture level mean values were generated. Figure 6 shows there are statistically significant high correlations between the county level heavy rainfall data and the urbanization and air pollution over time and across space, with r steadily increasing over time (Table 9). Urbanization in China was rather stable until late 1970s and then accelerated in terms of scale and magnitude during the last three decades (Bai et al. 2014), coinciding with the increase of r and thus further supporting that land-use change resulting from urbanization and associated air pollution has indeed played a major role in the increase of heavy rainfall. Our tests show this result is not affected by potential spatial autocorrelation of variables.

Fig. 6
figure 6

Spatio-temporal correlation between county level annual average heavy rainfall (AAHRA, AAHRD, AAHRI) and population density (PD) and annual average low visibility days (AALVD). Data include 6 time sections for 2618 counties. Population density data are for 1953, 1964, 1982, 1990, 2000 and 2010; decadal horizontal visibility data are derived from data during 1957–2005; asterisk means correlation is significant at the 0.01 level

Table 9 Spatial correlation coefficient between county level heavy rainfall and PD & LVD

Conclusion

All the results of our analyses point to the same conclusion: the decadal increases in, and shifting spatio-temporal patterns of, heavy rainfall in China during 1951–2010 are likely caused primarily by large scale and rapid urbanization and industrialization. A likely explanation is the climate impacts of land-use change triggered by urbanization: indeed, land–atmosphere interactions are known to affect both temperature (Seneviratne et al. 2006; Sun et al. 2014) and precipitation (Lowry 1998; Thielen et al. 2000; Li et al. 2011; Kaufmann et al. 2007). China has been urbanizing rapidly over the last three decades (Bai et al. 2014), driven by economic growth (Lambin and Patrick 2011; Bai et al. 2012). Urban land-use typically means more paved area, less vegetation and tall buildings, which can cause more convective rainstorms (Wan et al. 2012; Han et al. 2014; Jin et al. 2015). Moreover, industrialization is concurrent with urbanization in China, with most of the secondary industrial activities concentrated in cities and towns. Emissions resulting from industrial activities, the demand for heating and the rapidly growing use of personal cars in cities trigger a significant increase in hazy days (Ding and Liu 2014), which in turn suppresses the light rainfall and may enhance the strong convective rainstorms (Mölders and Olson 2004; Li et al. 2011).

Previous studies linking urbanization to rainfall are mostly focusing on the impact on total rainfall and mostly considered only the local or city scale (Rosenfeld 2000; Ramanathan et al. 2001; Kaufmann et al. 2007; Alexander et al. 2013). Kishtawal et al. (2010) found urbanization as likely cause of increased heavy rainfall in India over five decades. Our results support this finding, but also show that urbanization is only one of the factors- air pollution contributes at equivalent magnitude. Our analysis is the first, to our knowledge, to statistically establish urbanization and air pollution as likely the primary cause of a nation- or sub-continental-scale increase in heavy rainfall over decades, and to quantify relative contributions of anthropogenic and climate factors.

Our findings indicate that local anthropogenic processes may shift the regional climate through mechanisms other than GHG emissions. The physical mechanism of such statistically robust connection needs to be better understood, and socio-economic and human dimensions need to be better reflected into the climate models. With cities in China increasingly experiencing extreme rainfall events (Li et al. 2012), compounded by the increasing extreme summer heat in the same region (Sun et al. 2014), our findings call for a careful reevaluation of the risks of extreme weather in formulating national policies on urbanization, industrialization and environmental management, in China and elsewhere. Rapidly growing and industrializing cities and nations will need to better control the air pollution, and to anticipate and accommodate these regional climate consequences, if they are to reduce the risk of flooding and waterlogging.