1 Introduction

Drought-affected areas have remarkably increased over the past 50 years in China as a result of variation in precipitation (P) and temperature (Yang et al. 2013; Leng et al. 2015; Chang et al. 2016). In fact, a noticeable extreme winter-spring droughts events occurred in southwest China from August 2009 to May 2010 (Lu et al. 2011; Lu et al. 2012; Zhang et al. 2012; Yang et al. 2013), while in 2011 the middle and lower reaches of the Yellow River were impacted by the spring-summer drought (Lu et al. 2013; Zhao et al. 2013; Chang et al. 2016). Droughts have caused greatest damages in China during 1949–1995, and many damages have resulted to economic losses more than the US $12 billion (Xu et al., 2015a; Qin et al. 2015; Chang et al. 2016). Consequently, droughts are of prominent concern in the outlining and control of water resources. Studying meteorological droughts is necessary by itself, and also because they act as antecedents to the longer-lasting and more significant agricultural and hydrological droughts (Haslinger et al., 2014, Wilhite et al., 2014).

The popular method of assessing drought risk requires the calculation of the probability that a specific value of drought variable will be exceeded, which is equal to the evaluation of the recurrence interval (Prohaska et al. 2008; Chen et al. 2012). This procedure is normally focused on the univariate frequency analysis. In this analysis, the authors have concentrated on some characteristic of drought and assume that the other variables are constant. A direct attempt to simplify the multi-dimensional character of droughts has been investigated by several authors (Shiau, 2006; Ganguli and Reddy, 2014; Huang et al., 2014a). Although the simplistic analysis produces practical results, it does not strictly represent the actual mathematical interactions between the various dimensions of the phenomenon and obviously cannot give accurate results. Since drought variables are usually reliant on one other, a proper frequency analysis of droughts should consider such dependencies within a suitable multivariate framework.

It should be mentioned that the attention of researchers in multivariate modeling has increased considerably in the last decades, due to the application of copulas. The comprehensive theoretical structure proposed by Sklar (1959) can be gathered from Nelsen (2006), Joe (1997) and Salvadori et al. (2007). There have been diverse cases of copula applications in the setting of drought management (Serinaldi et al., 2009; Shiau and Modarres, 2009; Kao and Govindaraju, 2010; Wong et al., 2010; Song and Singh, 2010; Mishra and Singh, 2010; Mirabbasi et al., 2012). This includes return-period estimation (Salvadori and De Michele, 2010), multivariate simulation (AghaKouchak et al. 2010), propose a new drought indicator (Kao and Govindaraju, 2010), and many other theoretical studies of multivariate severe issues (Salvadori and De Michele, 2015; Zhang et al. 2015a; She et al. 2016). There are several types of copula functions, which have been described (e.g. Nelsen, 1999). The regularly applied copula for hydrological analysis belongs to four classes which includes: Archimedean class (AMH, Frank, Clayton, Gumbel, and Joe), elliptical class (student t and normal), extreme value class (Gumbel, Galambos, Tawn, Husler-Reiss, and t-EV), and miscellaneous class (Farlie–Gumbel–Morgenstern and Plackett) (Shiau and Modarres, 2009; Mirabbasi et al. 2012; Lee et al. 2013; Xu et al., 2015b). The Archimedean class of copulas are characterized by their simple structure and strong representativeness (Huang et al., 2014a; Tsakiris et al., 2016).

There are many applications of two-variate copula-based drought studies in many countries. For instance in China, Shiau (2006) built a joint distribution between drought severity and duration in Southern Taiwan while Shiau and Modarres (2009) employed Clayton copula to model drought severity and duration frequency curves for two climatic regions in Iran. Tsakiris et al. (2016) analyzed drought severity and areal extent utilizing Gumbel-Hougaard copula in Greece. The Frank and Gumbel–Hougaard have been considered the best copulas for illustrating drought occurrence frequency in Canada and Iran, respectively (Lee et al. 2013). Meta-Gaussian copula was applied for the joint modeling of drought variables within Texas (Song and Singh, 2010) while drought duration and severity of Sharafkhaneh in the northwest of Iran was modeled using Galambos copula (Mirakbari et al., 2010). In each of the cases, some bivariate probabilistic properties of droughts were investigated.

Three-variate Archimedean copulas have been employed for the joint distribution of rainfall, drought and flood events (Grimaldi and Serinaldi, 2006; Zhang and Singh 2007). Using asymmetric Archimedean copulas, Serinaldi and Grimaldi (2007) described an inference procedure to carry out a three-variate frequency analysis. Ma et al. (2013) created a joint distribution of drought duration, severity, and peak using elliptical, asymmetric and symmetric Archimedean copulas. Because of the flexibility of meta-elliptical copula (Fan et al. 2016), many meta-elliptical, Clayton, Frank, Gumbel-Hougaard, and Ali-MikhailHaq copulas have been employed to create a joint distribution of drought severity, duration, and interval time, and the best copula was chosen (Song and Singh, 2010).Three-variate Plackett copulas have been adopted for the investigations of extreme rainfall cases (Kao and Govindaraju, 2008) and for modeling a joint distribution of drought duration, severity and inter-arrival time (Song and Singh, 2010). Student’s t copula has been utilized to identify three-variate drought events under the control of La-Nina, EL-Nino, and natural climate state in New South Wales, Australia (Wong et al. 2010). Other studies on drought frequency analysis using copula includes Huang et al. (2014a, b), Zhao et al. (2015), Zhang et al. (2015b), Zhang et al. (2017) among others.

Crucial hurdles remain not only in drought characterization but also in the interpretation of drought variables to information relevant for regional monitoring, early warning, and water resources planning. Assuming that the territorial unit affected by drought is the entire region, there are two variables which together characterize each drought event; the severity and the duration. If the peak and interarrival time of drought are considered as an additional variable, a four-dimensional copulas approach may be used in which duration, severity, peak and interarrival time of droughts can be jointly analyzed. Recently, the applications of four-variate copula have been reported. De Michele et al. (2007) proposed a procedure towards building four-variate distributions, stating two copulas for a separate two-variate candidate case and utilizing the approach to give a four-variate nature of sea state.

Employing a four-variate student copula, Serinaldi et al. (2009) built the corresponding joint distributions of drought length, mean, minimum SPI values, and drought mean areal extent to investigate their joint probabilistic characteristics. So far, most of the research focused on parametric two-variate and symmetric three-variate copulas (Ganguli and Reddy, 2014), but four-variate copulas are somewhat limited because their constructions are very complicated. Although there have been some attempts to construct multivariate extensions of two-variate Archimedean copula (Embrechts et al., 2003; Whelan, 2004; Savu and Trede, 2010), however such studies seldom compares the performances of four-variate symmetric and asymmetric copulas in different climatic regions of China, considering probability and return periods of drought events. Further, knowing that droughts are multiplex phenomena, two- and three-variate analysis cannot give an exhaustive appraisal of droughts because the investigation could lead to a deficient drought probabilistic estimation (De Michele et al. 2005; Grimaldi and Serinaldi, 2006; Chebana and Ouarada, 2011; Ganguli and Reddy, 2014; Xu et al., 2015a; Tsakiris et al. 2016).

Given the challenges as mentioned above in drought characterization, a focused research is required to comprehensively analyze and delineate drought trends and spatiotemporal patterns more flexibly and intuitively. It is, therefore, the objective of this paper to produce a methodology using the four-variate copulas approach for analyzing the multivariate drought frequency of droughts. In particular, we built a joint distribution utilizing drought duration (Dd), severity (Ds), peak (Dp) and inter-arrival time (Di), based on four-variate symmetric (Clayton, Frank, and Gumbel) and asymmetric (Clayton, Frank, and Gumbel) Archimedean Copulas. We showed how it could be used (i) to observe the temporal evolution of a drought event, and (ii) to perform a real-time appraisal of drought in seven climatic regions of mainland China over 1961–2013. The risks of drought events have been described based on joint probability and return periods, which has become a conventional method for a risk-based plan of water resources structures.

The paper is prepared as follows. Section 2 describes details about the study area and data, Standardized Precipitation Evapotranspiration Index (SPEI) calculation and estimation of Dd, Ds, Dp, and Di. Section 3 introduces the four-variate symmetric and asymmetric Archimedean copula. Section 4 compares the performance of the copulas. Section 5 selects the best-fitted copula to build a joint distribution of Dd, Ds, Dp, and Di in seven climatic regions of China. Section 6 contains some concluding remarks.

2 Study Region and Data

Geographically, China lies between 150N - 600N and 75°E - 135°E. By its geographical extent, China is endowed with diverse landforms that include hills, mountains, high plateaus and deserts in the western reaches, while in the central and east areas, the land slopes into broad plains and deltas. From the higher elevations in the west, thousands of rivers drain the country eastwards; the most significant are the Yangtze (i.e., the longest river in China and third longest in the world), Heilong (Amur), Mekong, Pearl, and Yellow river. The climate of China fluctuates significantly from one region to another because of its tremendous space, complex topography as well as variation in monthly P and temperature (Zhai et al. 2010; Wu et al. 2011). The northern China and western are influenced by dry climate while humid and semi-humid predominates over eastern China (Wu et al. 2011).

Based on topography and climate, China is divided into seven climatic regions (Zhao, 1983; Ayantobo et al. 2017). Northeast humid/semi-humid warm region (NE, 72 stations), North China humid/semi-humid temperate zone (NC, 104 stations), Central and Southern China humid subtropical zone (CS, 165 stations), South China humid tropical zone (SC, 57 stations), Inner Mongolia steppe zone (IM, 44 stations), Northwest desert areas (NW, 61 stations), and Qinghai-Tibet Plateau (TP, 49 stations) (Fig. 1). Regions NE, NC, CS, SC are associated with Eastern Monsoon. NW is an arid region, and the replenishment of water in this region is mostly from melting glacial and perennial frozen soil, not from P.

Fig. 1
figure 1

Hydrometerological map of China showing the climatic regions, major rivers and national weather stations

The dataset employed in this research to appraise drought events are daily and monthly weather data from 552 national basic meteorological stations in mainland China from 1961 to 2013. These 53-year records were obtained from the China Meteorological Data Sharing Network. From Fig. 1, the meteorological stations are not uniformly spread as more are situated within the east than western regions particularly the Qinghai-Tibet Plateau. The non-parametric tests were used to cross-check the reliability and quality of the climatic data. According to Helsel and Hirsch (1992), Kendall autocorrelation test, Mann-Kendall trend test, and Mann-Whitney homogeneity tests for mean and variance were employed to test randomness, homogeneity, and absence of trends. The results showed that randomness and stationarity of the weather data were fixed between the critical points (5% statistical significance level).

3 Methods

3.1 Drought Indices and Drought Identification

Several indices have been suggested to identify and observe droughts, and there is a detailed article on modeling drought events utilizing these indices (Mishra and Singh, 2010; Zargar et al. 2011). Indices based exclusively on precipitation, do not consider the complexity of land surface processes and cannot directly account for the impacts of evapotranspiration on soil moisture. This may be a particularly serious disadvantage under warming conditions or other changes in regional hydrology. Based on these considerations, since SPEI have been widely applied in various copula-based drought-related studies in China (Liu et al. 2015; Xu et al., 2015b; Wang et al. 2015; Zhang et al. 2015b; Hao et al. 2017), it is therefore employed for this study. The SPEI is a multi-scalar drought index, recommended as an enhancement of the SPI (Vicente-Serrano et al. 2010). Firstly, in SPEI estimation, the simple water balance is calculated, which is described as a deviation in P and potential evapotranspiration (PET) for ith month as follows:

$$ {\mathrm{D}}_{\mathrm{i}}={\mathrm{P}}_{\mathrm{i}}-{\mathrm{P}\mathrm{ET}}_{\mathrm{i}} $$
(1)

PET is estimated according to the FAO-56 Penman-Monteith equation (Allen et al. 1998). The World Meteorological Organization commonly approves this approach, and the calculation is based on the minimum and maximum air temperature, wind speed, and sunshine duration. The Di values are synthesized in each time scale as:

$$ {\mathrm{D}}_{\mathrm{n}}^{\mathrm{k}}={\sum}_{\mathrm{i}=0}^{\mathrm{k}-1}\left({\mathrm{P}}_{\mathrm{n}-1}-{\mathrm{P}\mathrm{ET}}_{\mathrm{n}-1}\right) $$
(2)

where k is the time scales of synthesis, and n is the month used for calculation.

Next, Di is fitted with the three-parameter log-logistic distribution. Finally, the SPEI is obtained as standardized values, and details of the estimation can be found in Vicente-Serrano et al. (2010). The mean value of the SPEI is 0, and the standard deviation is 1. In this study, SPEI at the 6-month timescale, denoted by SPEI-6 is estimated for each station and seven regions. SPEI-6 is most useful for describing the shallow soil moisture available to crops (Reddy and Singh, 2014; Abdi et al. 2016).

Using the theory of runs (Fig. 2), drought identification is achieved by considering drought years as a continuous period where SPEI-6 is below 0 (Shiau, 2006). Next, significant drought variables which include Dd, Ds, Dp, and Di, are extracted. The definitions of Dd, Ds, and Dp, can be found in Shiau (2006), Mishra and Singh (2010) and Ayantobo et al. (2017) while Di is described as the interval between the commencement of a drought to the start of the subsequent drought (Song and Singh, 2010). Water manager needs to be aware of the risk of having drought scenarios of different Dd, Ds, Dp, and Di during drought scenarios. Therefore, a four-variate distribution must be constructed using copula functions.

Fig. 2
figure 2

Definition sketch of drought characteristics showing three drought events (labeled as 1, 2 and 3), on the basis of Run Theory. Note: Xo; Truncation level, Dd; Drought duration, Ds; Drought severity, Dp; Drought peak, Ld; Inter arrival time, Dn; Non-drought duration, ti; initiation time, te; termination time

3.2 Marginal Distribution of Drought Variable

The univariate distribution forms the framework of four-variate analysis using copulas. The marginal distributions for Dd, Ds, and Dp for different climatic regions across mainland China have been documented (Ayantobo et al. 2017). In the case of Di, ten marginal distributions which include exponential, two-parameter exponential, Weibull, three-parameter Weibull, generalized extreme value, inversion Gaussian, three-parameter inversion Gaussian, generalized Pareto, gamma, and three-parameter gamma are compared to choose the best distribution to fit Di.

The Kolmogorov–Smirnov (KS) and Anderson-darling (AD) tests are employed as a goodness-of-fit test, and the principle that the threshold value should be as small as possible to preserve the largest sample is employed (Shiau and Modarres, 2009; Song and Singh, 2010; Abdi et al. 2016). The fitted distributions of Di are compared against the empirical non-exceedance probabilities estimated utilizing Gringorten’s plotting position function as follows (Gringorten, 1963; Cunnane, 1978; Song and Singh, 2010):

$$ \widehat{F}\left({x}_i\right)=\frac{1}{n+1}{\sum}_{m=1}^i{N}_m $$
(3)

Here, n is sample size, Nm is the amount of xi regarded as xjxi, i = 1, …, n, 1≤ji. Utilizing the observed data (x), the marginal parameters are calculated employing the maximum likelihood estimation.

3.3 Empirical Four-Variate Distribution of Drought Variables

The empirical four-variate joint probability distribution of Dd, Ds, Dp, and Di can be modified from Song and Singh (2010):

$$ \widehat{F}\left({x}_i,{y}_i,{z}_i,{k}_i\right)=\frac{1}{n+1}{\sum}_{m=1}^i{\sum}_{n=1}^i{\sum}_{l=1}^i{\sum}_{p=1}^i{N}_{mnlp} $$
(4)

Here, n is size of the sample. Nmnlp is amount of (xi, yi, zi, ki) regarded as xj ≤ xi, yj ≤ yi, zj ≤ zi, and kj ≤ ki, i = 1, …, n, 1≤ji.

3.4 Joint Cumulative Probability Distribution of Drought Variables

Assuming four correlated random variables of Dd, Ds, Dp, and Di are represented using X1, X2, X3, and X4; and \( {u}_1={F}_{x_1}\left({x}_1\right) \), \( {u}_2={F}_{x_2}\left({x}_2\right) \), \( {u}_3={F}_{x_3}\left({x}_3\right) \), and \( {u}_4={F}_{x_4}\left({x}_4\right) \) denote their cumulative distribution function (CDF), respectively. Therefore, their joint CDF could be represented as (Song and Singh, 2010):

$$ {\displaystyle \begin{array}{l}F\left({x}_1,{x}_2,{x}_3,{x}_4\right)=P\left({X}_1\le {x}_1,{X}_2\le {x}_2,{X}_3\le {x}_3,{X}_4\le {x}_4\right)\\ {}={\int}_{-\infty}^{x_4}{\int}_{-\infty}^{x_3}{\int}_{-\infty}^{x_2}{\int}_{-\infty}^{x_1}f\left(x,y,z\right)\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}k\end{array}} $$
(5)

3.5 Modeling Four-Variate Drought Variables Using Copulas

For an explicit hydrological application of copulas, readers are referred to Joe (1997), Nelsen (2006), Genest and Favre (2007) and Salvadori et al. (2007). By way of definition, copulas are functions that combines marginal CDF’s to form multivariate CDF. This function allows one to characterize dependence properties more flexibly by separating effects of dependence from effects of margins. Assuming F1, 2, …. , n(x1, x2, …., xn) is many variables CDF of n associated arbitrary variables of X1, X2, . . . ., Xn with the corresponding candidate CDF F1(x1), F2(x2), …. , Fn(xn), next the n-dimensional CDF including univariate distributions of F1(x1), F2(x2), …. , Fn(xn) could be written as shown below (Sklar 1959; Nelsen, 2006; Ganguli and Reddy 2014):

$$ H\left({x}_1,{x}_2,\dots {x}_n\right)=C\left\{{F}_1\left({x}_1\right),{F}_2\left({x}_2\right),\dots .,{F}_n\left({x}_n\right)\right\}=C\left({u}_1,\dots \dots {u}_n\right) $$
(6)

Here, C is a d-dimensional copula in the form: [0, 1] d → [0, 1], with association parameter θ; Fk(xk) = uk for k = 1, . . ., n; Uk~U(0, 1).

Archimedean copulas have been utilized because the functions can be conveniently built, and can capture a wide range of dependence structures with many satisfactory properties. According to Joe (1997), Nelsen (1999), Embrechts et al. (2003), Whelan (2004), and Grimaldi and Serinaldi (2006) the generalization of n-dimensional Archimedean copula could be represented as (Chen et al. 2012):

$$ {\displaystyle \begin{array}{l}C\left({u}_1,{u}_2,\dots .,{u}_n\right)={C}_1\left({u}_n,{C}_2\left({u}_{n-1},\dots .,{C}_{n-1}\left({u}_2,{u}_1\right)\dots .\right)\right)\\ {}={\varphi}_1^{\left[-1\right]}\left({\varphi}_1\left({u}_n\right)+{\varphi}_1\left({\varphi}_2^{\left[-1\right]}\left({\varphi}_2\left({u}_{n-1}\right)+\cdots +{\varphi}_{n-1}^{\left[-1\right]}\left({\varphi}_{n-1}\left({u}_2\right)+{\varphi}_{n-1}\left({u}_1\right)\right)\dots \right)\right)\right)\end{array}} $$
(7)

Here, φ is the copula generator. As observed, \( {\varphi}_1,\dots, {\varphi}_{n^{-1}}=\varphi, \) Eq. (7) becomes Archimedean n-copula and can be rewritten as (Chen et al. 2012):

$$ C\left(\mathbf{u}\right)=\left[{\varphi}^{-1}{\sum}_{k=1}^n\varphi \left({u}_k\right)\right] $$
(8)

Many different parametric families of copulas are commonly used in dependence analysis. The two most frequently used are elliptical copulas and the Archimedean copulas. In this study, we consider three families of four-variate symmetric and asymmetric Archimedean copulas, paying particular attention to Clayton, Frank, and Gumbel (Table 1), because they are simple and easy to generate (Nelsen, 2006; Chen et al. 2012). These copulas are compared to ascertain the excellent copula for modeling drought variables across various climatic regions of mainland China.

Table 1 Description of the four-variate symmetric and asymmetric Archimedean copula

3.6 Copula Parameters Estimation

The parameters of Clayton, Frank and Gumbel copulas are calculated by employing the curve fitting method (CFM). The CFM is the minimum standards of squared residuals and determined as follows (Ganguli and Reddy, 2014):

$$ \mathrm{CFM}=\sqrt{\frac{1}{\mathrm{n}}{\sum}_{\mathrm{i}=1}^{\mathrm{n}}{\left({\mathrm{X}}_{\mathrm{c}}(i)-{\mathrm{X}}_0(i)\right)}^2} $$
(9)

Where n, xc(i), x0(i)are sample size, ith calculated value, ith observed value respectively.

3.7 Selection of Appropriate Copula Function

The goodness-of-fit assessment indices (i.e., Root mean square error (RMSE), Akaike information criterion (AIC) and BIAS) are employed to select the best-fitted copula function (Akaike, 1974; Zhang and Singh, 2006, 2007; Genest and Favre, 2007).

$$ \mathrm{RMSE}=\sqrt{\mathrm{E}\left[{\left({\mathrm{x}}_{\mathrm{c}}-{\mathrm{x}}_0\right)}^2\right]}={\left\{\frac{1}{\mathrm{n}-\mathrm{k}}{\sum}_{\mathrm{i}=1}^{\mathrm{n}}{\left[{\mathrm{x}}_{\mathrm{c}}\left(\mathrm{i}\right)-{\mathrm{x}}_0\left(\mathrm{i}\right)\right]}^2\right\}}^{0.5} $$
(10)
$$ \mathrm{BIAS}={\sum}_{i=1}^n\frac{X_c(i)-{X}_0(i)}{X_c} $$
(11)
$$ \mathrm{AIC}=\mathrm{nlog}\left(\mathrm{MSE}\right)+2\mathrm{k} $$
(12)

Where E [] = Mean square error (MSE) is the expectation operator; n, xc(i),x0(i)are sample size, ith calculated value, ith observed value respectively and k is the number of parameters utilized in getting the computed value. The copula with least values of AIC and RMSE is selected as the best-fitted fuction.

Also, using the observed data (u), the empirical copula,Cn(u), and fitted parametric copula, Cp(u), are compared to select the best-fitted function. ConstructingCn(u) requires: (1) ranking Dd, Ds, Dp and Di variables considering their pseudo-observations (Ui,1, Ui,2, Ui,3, Ui,4), (2) the empirical CDFs are constructed based on the ranks and (3) the empirical four-variate copula are calculated using the empirical CDFs (Nelsen, 2006; Genest and Favre, 2007):

$$ {\displaystyle \begin{array}{l}{C}_n(u)=\frac{1}{n}{\sum}_{i=1}^nI\left\{{\widehat{U}}_{i,1}\le {u}_1,{\widehat{U}}_{i,2}\le {u}_2,{\widehat{U}}_{i,3}\le {u}_3,{\widehat{U}}_{i,4}\le {u}_4\right\}\\ {}=\frac{1}{n}{\sum}_{i=1}^nI\left\{\frac{R_{i,1}}{n+1}\le {u}_1,\frac{R_{i,2}}{n+1}\le {u}_2,\frac{R_{i,3}}{n+1}\le {u}_3,\frac{R_{i,4}}{n+1}\le {u}_4\right\}\\ {}u=\left\{{u}_1,{u}_2,{u}_3,{u}_4\right\}\in {\left[0,1\right]}^4\end{array}} $$
(13)

Where I (A) is the pointer variable of function A which gives a number of 1 if A is valid and 0 if A is invalid. Ranks of ith Dd, Ds, Dp and Di value are given as Ri, 1, Ri, 2, Ri, 3 and Ri, 4 respectively (Mirabbasi et al., 2012). The Quantile-Quantile (Q-Q) plot is then used to compare the closeness between Cn(u) and Cp(u). graphically.

3.8 Four-Variate Joint Drought Frequency Analysis

According to Salvadori and De Michele (2004), in a multivariate situation such that X1, X2,. . .,Xd exceeds their corresponding thresholds (X1 > x1, . . ., Xd > xd), the joint probability and return period of all drought events can be computed. In this study, the four-variate joint probabilities, PDSPI, of (Dd ≥ d, Ds ≥ s, Dp ≥ p, Di ≥ i) can be calculated utilizing the copula-based procedure modified from Shiau (2006), Ganguli and Reddy (2014):

$$ {\displaystyle \begin{array}{l}{P}_{DSPI}=1-{F}_{DSPI}\left(d,s,p,i\right)\\ {}=1-{C}_{DSPI}\left(d,s,p,,i\right)\end{array}} $$
(14)

Analogous to return periods, the four-variate joint return periods, TDSPI, of (Dd ≥ d, Ds ≥ s, Dp ≥ p, Di ≥ i) can be estimated as follows:

$$ {T}_{DSPI}=\frac{\zeta }{1-{F}_{DSPI}\left(d,s,p,i\right)}=\frac{\zeta }{1-{C}_{DSPI}\left(d,s,p,i\right)} $$
(15)

In the equations above, ζ = N/q, N = period of SPEI time-series (years), q = amount of drought events in N years. CDSPI(d, s, p, i)= four-variate joint distributions, PDSPI and TDSPI, expresses the joint probability and return periods of occurrence of Dd or Ds or Dp or Di exceeding a particular value of d, s, p and i, respectively.

4 Results and Discussions

4.1 Drought Properties and Joint Dependence

The statistical description of Di extracted from SPEI-6 time series are shown in Table 2. The average regional values of Di across NE, NC, CS, SC, IM, NW, TP, and EMC during 1961–2013 were 8.9, 11.7, 12.4, 10.1, 10.1, 14.7, 9.9, and 12.1 months, respectively. Further, the qualitative joint dependence of Di against Di, Ds, and Dp was examined by employing a graphical tool in the form of a scatter plot matrix (Salvadori et al. 2007; Genest and Favre, 2007). According to Serinaldi et al. (2009), the scatter plot helps to synthesize information on dependence structure and marginals. The joint dependence between Di, Ds and Dp have been studied (Ayantobo et al. 2017) and would not repeat again here. The joint dependence of Di against Di, Ds, and Dp in regions NE, NC, CS, SC, IM, NW, TP, and EMC are shown in Fig. 3. Figure 3 gave a visual judgment on the pair-wise dependence between drought variables (Genest and Favre, 2007). Although some regions showed the collection of points in the lower left corner, an uphill pattern relationship was observed from left to right. For each pair of variables, a growing positive linear relationship existed.

Table 2 Statistical description of drought inter-arrival time across different regions during 1961–2013
Fig. 3
figure 3

Scatter-plots of the pair-wise drought variables for different regions across mainland China during 1961–2013

The Kendall’s k, Spearman’s p, and Pearson’s classical correlation coefficients r are applied to test the strength of the relationship of drought variables, and their respective values in regions NE, NC, CS, SC, IM, NW, TP and EMC are shown (Table 3). Positive and significant associations between drought variables were observed. The correlations between Dd and Ds as well as Ds and Dp were high while the correlations between Dd and Dp were small. The dependence analysis demonstrated it in Fig. 3 and Table 3 that the mutual dependence of drought variables are tremendous and hence suitable for building a regional joint distribution using copula functions.

Table 3 Values of classical, Kendall and Spearman’s correlation coefficients for drought variables in different regions between 1961 and 2013

4.2 Marginal Distribution for Inter-Arrival Time

Different marginal distributions have been recommended as a uniform procedure to fit Dd, Ds, and Dp (Shiau and Modarres, 2009; Song and Singh, 2010; Wong et al. 2010; Lee et al. 2013). The marginal distributions and associated parameters for Dd, Ds, and Dp in different regions of mainland China have been documented (Ayantobo et al. 2017). These distributions showed that the empirical and theoretical distributions had an excellent agreement. In the case of Di, many marginal distributions were also compared and the goodness of fit evaluated. According to Table 4, the KS and AD values were compared for each distribution, and their respective ranks in each region are shown. It was demonstrated that the KS was better than the AD test. Therefore, based on the KS test, the generalized Pareto was the best for regions NE, NC, CS, SC, NW, TP and EMC while gamma was the best for region IM because they had the smallest values. As given in Table 5, the parameters of the distributions were calculated using the MLE and after that used to fit the drought data.

Table 4 Comparison of selected marginal distributions for drought inter-arrival time across various regions between 1961 and 2013
Table 5 The best-fitted marginal distributions with estimated parameters for different regions across mainland China between 1961 and 2013

4.3 Copula-Based Four-Variate Joint Distributions

Table 6 presents the copula parameters, AIC, RMSE and BIAS values in regions NE, NC, CS, SC, IM, NW, TP and EMC used to compare the performances of the symmetric and asymmetric copulas. It could be observed that the copula parameters are positive; connoting that four-variate dependence existed between variables of Dd, Ds, Dp, and Di. Generally, it was shown that symmetric copulas fit better than the asymmetric copulas because they had the lowest RMSE and AIC values in most regions. In the symmetric class, the RMSE and AIC values of Clayton and Frank were more or less the same but gave different fittings in the different region. Clayton copula was best for regions NE, CS and EMC while Frank copula was the best for NC, SC, IM, NW, and TP because they had the smallest values of RMSE and AIC.

Table 6 Copula parameter, RMSE, AIC and BIAS of four-variate symmetric and asymmetric Archimedean copulas for different regions between 1961 and 2013

The observed and theoretical probabilities of the symmetric and asymmetric copula functions are compared in Fig. 4. The figure showed that the theoretical probabilities for the Clayton in regions NE, CS, EMC, and Frank copula in NC, SC, IM, NW, TP fit the observed values very well.

Fig. 4
figure 4

Comparison between empirical and theoretical probabilities for symmetric and asymmetric Archimedean copulas in different regions of mainland China during 1961–2013

Further, the theoretical joint probabilities of the occurrence orders of the observed values were evaluated. As shown in Fig. 5, the theoretical joint probabilities were sketched against empirical probabilities, which showed that there were significant differences between empirical and theoretical joint probabilities. The plots of all the copulas deviate from the 450 diagonal line uniquely. The maximum deviations from the diagonal line were observed for Gumbel plots, showing that the functions might not be suitable for building joint dependence. The symmetric Clayton in regions NE, CS and EMC, and symmetric Frank in regions NC, SC, IM, NW, and TP exhibited good agreement between the theoretical and empirical probabilities, and seemed very satisfying in modeling drought variables. Therefore, these copulas were thereafter selected for regional drought frequency analysis.

Fig. 5
figure 5

Quantile-Quantile (Q-Q) plots of symmetric and asymmetric Archimedean copulas in different regions of mainland China during 1961–2013. The solid line indicates the 450 line

In the same way, the selected copulas were compared in each station, and the RMSE and AIC goodness of fit test results are presented in Fig. 6. It could be observed that the average AIC (Fig. 6a) and RMSE values (Fig. 6b) of all the 552 stations for symmetric Frank copula were smaller than that of Clayton and Gumbel copulas. Therefore, the four-variate symmetric Frank copula was selected for the station-based drought frequency analysis across EMC.

Fig. 6
figure 6

Station-based comparison of (a) AIC and (b) RMSE for symmetric and asymmetric Archimedean copulas in different regions of mainland China during 1961–2013

4.4 Four-Variate Joint Drought Frequency Analysis

4.4.1 Regional Four-Variate Probabilities of Drought Events

The average historical drought events for regions NE, NC, CS, SC, IM, NW, TP, and EMCas well as their corresponding 5-, 10-, 20-, 50- and 100-years univariate return periods defined by separate d, s, p and i were estimated and presented in Fig. 7. Next, the PDSPIin each of these regions were obtained using Eq. (14). Fig. 7 shows the PDSPI, representing the historical, 5-, 10-, 20-, 50- and 100-year return periods to provide adequate information about the potential drought risk associated with Dd, Ds, Dp and Di across different regions. Overall, as the return period increased, the trend of the joint probability decreased. Hence, the simultaneous existence of higher degree drought events in different regions was less frequent as the year increases. For example, in sub-region I, considering 20- years univariate return period, the values of d, s, p, and i, were 13.56 months, 17.46, 2.44 and 20.73 months, respectively. The PDSPI(Dd ≥ 13.56, Ds ≥ 17.46, Dp ≥ 2.44, Di ≥ 20.73)was 0.124.

Fig. 7
figure 7

Four-variate joint probability and return periods of Dd, Ds, Dp, and Di in different regions across mainland China during 1961–2013

These results could provide useful hints in appraising the drought risk in different regions. For instance, regions CS and NE had the highest probability of about 0.57, indicating that drought management and planning are needed within these regions. It was shown that regions SC and IM had a low probability of 0.48, which meant a low combination risk. The joint probability not only confirmed the occurrence of regional drought events but also gave a quantitative approach to analyze the probability of drought under varying d, s, p and i situations.

4.4.2 Regional Four-Variate Return Period of Drought Events

The ζ values were 0.78, 1.02, 1.04, 0.84, 0.84, 1.23, 0.83 and 1.00 months for regions NE, NC, CS, SC, IM, NW, TP and EMC, respectively. Using the average historical and univariate return periods of d, s, p and i estimated for 5-, 10-, 20-, 50- and 100-year, the TDSPI were obtained for each region. Figure 7 shows the TDSPI for the historical, 5-, 10-, 20-, 50- and 100-year return periods. The graph showed that the TDSPI trends increased with the year.

For example, in region NE, considering 20- years univariate return period, the values of d, s, p, and i, were 13.56 months, 17.46, 2.44 and 20.73 months, respectively. The TDSPI, (Dd ≥ 13.56, Ds ≥ 17.46, Dp ≥ 2.44, Di ≥ 20.73)was 6.27 years. TheTDSPIwere lowest in regions NE and CS. This implied that drought frequency would be higher in these regions compared to the other regions. The return period of drought variables obtained from univariate frequency analysis is higher than those by joint distribution for the AND case (TDSPI). This showed that the univariate analysis does not furnish satisfactory knowledge about the drought risks associated with the four-variates. If engineers design hydraulic structures based on the results from univariate frequency analysis, the drought variables may be overestimated and this will lead to an increased cost of the structure. These results could be valuable in risk evaluation of water resources operations under severe and extreme drought situations.

The PDSPI and TDSPI showed that the result computed agreed with the actual data and confirmed that the selected copula functions for each region fitted the data well. Figure 7 further explained that for a particular recurrence interval, the PDSPI showed a decreased trend while the TDSPI showed an increased trend. For example, given the occurrence of d, s, p, and i in region NE, the estimated PDSPI of 5- and 100-year plan drought are 0.332 and 0.029, respectively. Considering these illustrations, the calculated results seemed feasible. Therefore, from Fig. 7, the PDSPI and TDSPI of any other drought events in regions NE, NC, CS, SC, IM, NW, TP, and EMC can be obtained directly through interpolation.

4.4.3 Spatial Distribution of Four-Variate Probability and Return Periods

The average d, s, p, and i for EMC were 5.81 months, 4.96, 1.15 and 11.1 months respectively. Similar to the regional drought analysis, the station-based PDSPI and TDSPIof drought events exceeding these specific values, (i. e., Dd ≥ 5.81, Ds ≥ 4.96, Dp ≥ 1.15, Di ≥ 11.1)were estimated for each station using the symmetric Frank copula. The spatial distribution of thePDSPI and TDSPI are mapped in Fig. 8.

Fig. 8
figure 8

Spatial pattern of four-variate (a) joint probability and (b) joint return periods (years) of Dd, Ds, Dp, and Di in different regions across mainland China during 1961–2013

The PDSPIvaried from 0.34 to 0.75. In this case, most of northern China experienced higher probability spreading from the northwest to northeast China. Notably, it seems that the PDSPI were very high in NW and some stations in NC and TP. Based on the spatial pattern of drought events, the drought risk resulted from these regions are consistent. On the other hand, the results for southern China including SC and CS were slightly different as relatively lower PDSPI was observed. Accordingly, counter measures of drought hazards need to be set for proactive actions especially in regions that had greater drought risks.

The TDSPI ranged from 1.02 to 6.79 years. Long TDSPIwere found around NC while short TDSPI dominated parts of CS, NW, and TP. MediumTDSPIwere found in eastern China, most especially NE, NC, CS. The spatial pattern of PDSPI and TDSPI suggested tremendous variations within different regions. In most cases, regions with high PDSPIare often associated with short TDSPI and vice versa. Severe droughts are most liable to eventuate in most of the northwestern and southwestern regions because of short TDSPI. Also, because these areas are commercially advanced with huge populations, severe drought events could place grave danger on water resources in these regions. This requisite information is needed by water managers and other government agencies for adequate planning as well as management of water resources under severe drought situations. In this study, four-variate distribution was determined and successfully used.

5 Summary and Conclusions

Most meteorological phenomena are multiplicative, and the advantages of using a multivariate assessment strategy are evident in this study. In particular, drought events are characterized by Dd, Ds, Dp, and Di and the joint drought risk, which reveals their probability at the same time, is vital for management and planning of water resources under drought situations. Using the symmetric and asymmetric Archimedean copula functions, annual droughts were studied as four-dimensional phenomena incorporating these variables to build the joint four-variate distributions. Next, the PDSPI for different TDSPI in each region and EMC were then estimated.

The foremost outcomes of this research were summed as follows: (1) For Di, the generalized Pareto was the best distribution for regions NE, NC, CS, SC, NW, TP, and EMC while gamma was the best distribution for region IM. (2) The RMSE and AIC values were utilized to choose the suitable copula. We employed Clayton copula in regions NE, CS, and EMC, and Frank copula in regions NC, SC, IM, NW, and TP. Symmetric Frank copula was used for the station-based drought analysis. The selected copula functions provided the best fit for the dependence structures of Dd, Ds, Dp, and Di, and consequently, it was adopted for the joint risk drought analysis. (3) The risk of drought event is determined based on PDSPI and TDSPI, which furnished vital information for drought analysis. For instance, by using only the univariate information provided by either Dd, Ds, Dp, and Di, may yield under or overestimates of the actual drought state, and, in turn, of the corresponding risk. (4) Analyzing the probabilities and return periods of four-variate drought events, the trends or changes in drought events were then assessed over time. The concept of return period used in this study is a familiar concept in the community of water resources professionals. However, for a deep understanding of drought event, the ultimate evaluation was based on the hazard which the event possesses, and the vulnerability of this region to the drought event. Therefore, based on the results of our analysis, it is reasonable to increase drought control concerns especially in regions with very high PDSPI and low TDSPI. As a recommendation, a drought return period of below ten years seems to indicate that the drought is entering an alert or a dangerous state.

This study added to a better valuable knowledge in the sphere of disaster management, especially concerning the appraisal of drought issues and the performance of full drought-risk investigations. Our insights on the real-time evaluation of meteorological droughts of different regions may provide valuable information about the possible evolution of drought episode and also help the water manager to plan effective mitigation strategies. As a conclusion, in this paper, a multivariate frequency analysis of meteorological droughts in seven climatic regions is addressed using Copulas. Such an approach is flexible, comprehensive and offered other several advantages over previous definitions of multivariate frequency analysis (Salvadori et al., 2015). The copula-derived drought analysis considers complete interdependencies between the drought variables over a geographical region. Finally, in this paper, no consideration has been included for the climate change influence on the above analysis although climate change will affect the estimation of the return period of droughts. For this aspect, rigorous and systematic attempts should be made for estimating the anticipated changes in the meteorological parameters which directly or indirectly affect the Dd, Ds, Dp, and Di and ultimately the frequency of drought events.