Development of decomposition-based model using Copula-GARCH approach to simulate instantaneous peak discharge

Estimation of instantaneous peak discharge is important in the design of hydraulic structures and reservoir management. In this study, a new approach called CEEMD-Copula-GARCH is presented for simulating instantaneous peak discharge in the Qale Shahrokh basin, upstream of Zayanderood Dam, Iran. In the developed method, the Complementary ensemble empirical mode decomposition (CEEMD) algorithm was used to analyze the observed values and generate the intrinsic mode function values and residual series. For this purpose, the intrinsic mode function values were simulated based on vine copula and its tree sequence (C-vine, D-vine, R-vine and their independent and Gaussian modes), and the residual series of the CEEMD algorithm were simulated by the GARCH model. The results of simulating instantaneous peak discharge values (m3/s) using the CEEMD-Copula-GARCH approach in the study area showed that the amount of simulation error based on the RMSE statistic compared to the CEEMD-Copula model and simulation without decomposition has improved by about 20 and 70%, respectively. The model’s efficiency was also estimated based on the Nash–Sutcliffe efficiency in the proposed approach of 0.99, and the certainty of the proposed approach was also confirmed based on the presented violin plot. According to the presented results, the proposed approach has high accuracy and efficiency in the simulation of instantaneous peak discharge (m3/s), which can be used in the flood control system design and flood management. Using the methodology proposed in this study, multivariable models can be used in simulating univariate series with high accuracy.


Introduction
In recent years, the occurrence of floods in urban and rural areas in different parts of Iran has increased. Also, the amount of instantaneous peak discharge (IPD) of rivers has increased significantly due to climatic changes and changes in the land use of basins (Khalili et al. 2016). Accurate estimation of extreme values of river flow is necessary for the optimal management of water resources and design measures to deal with flood control systems in each region. One of the limitations of simulating and estimating these types of variables is that due to their extremely nature, there is not much correlation between them and other effective hydrological variables. This causes limitations in the use of multivariate models. If the observation series is divided into several subseries, this limitation will disappear. Complementary ensemble empirical mode decomposition (CEEMD) algorithm is one of the methods that can eliminate this limitation. CEEMD algorithm is a signal decomposition method developed based on Empirical Mode Decomposition (EMD). In recent years, this algorithm has received much attention (Zhang et al. 2021a(Zhang et al. , b, 2022a(Zhang et al. , 2022bBian et al. 2020;Lian 2022;Xu et al. 2022;Zhang and Lou 2021;Yan et al. 2021). Chamani and Roushangar (2020) used Gaussian Process Regression (GPR) models and CEEMD algorithm to simulate the river flow of the Arkansas River in the USA. The results of their research showed that using the CEEMD method improves the performance of other investigated models. Bahmani and Ouarda (2021) investigated the performance of the CEEMD algorithm in simulating changes in the groundwater level by using gene expression programming methods and the M5 model tree. Their results showed that the CEEMD algorithm has a high ability in producing 1 3 182 Page 2 of 16 hybrid models and significantly reduces the amount of simulation error. Zhang et al. (2021a, b) simulated monthly rainfall in Zhengzhou, China using CEEMD algorithm and long-short-term memory (LSTM) networks. The results of their research showed that the CEEMD-LSTM model has an average absolute error of 0.056 mm and a Nash-Sutcliffe efficiency (NSE) coefficient of 0.95. They stated that this method is very efficient for predicting regional rainfall. Yan et al. (2021) proposed a weighted CEEMD to predict monthly runoff in downstream stations of the Yellow River in order to overcome the nonlinear characteristics of runoff data and the limitations of univariate models. They used support vector regression integration, back propagation neural network, PSO algorithm, and LSTM neural network. Their results showed that the NSE value was more than 0.92 and the error indicators were minimized. They confirmed that this weighted integrated model can be applied for accurate runoff forecasting. Lian (2022) proposed the CEEMD model to improve the accuracy of runoff forecasting in a river in China. The decomposed parameters were predicted by using GPR, support vector machine (SVM) and autoregressive integrated moving average models. According to the prediction error and evaluation statistics, it showed that the proposed CEEMD model has higher accuracy compared to other models.
Similarly, the nonlinear nature of runoff contradicts the structure of many models. In addition, the individual prediction model (the model that uses only one algorithm) has limitations such as the phenomenon of overfitting and local optimization in the neural network, and there is no uniform standard in the selection of the kernel function and the calibration of the parameters in the support vector sampler (Han et al. 2017). One of the new methods that does not have the limitations of common models is the copula simulation model that is based on the conditional density of copulas and the marginal distribution of data. This approach has received much attention in recent years Khashei-Siuki et al. 2021;Pronoos Sedighi et al. 2022;Nazeri Tahroudi et al. 2022a, b, c;Tabatabaei et al. 2022). Nazeri-  developed the Copula-GARCH model from autocorrelation models with generalized conditional heteroskedasticity and copula-based model to simulate potential evaporation and transpiration values in eastern regions of Iran. The results showed that the Copula-GARCH approach had a lower error rate than the copulabased model and better simulated the range of data changes.
The literature review of studies applied the CEEMD algorithm shows that in most of the studies conducted regarding the simulation of IMF values, different dataoriented models have been used. In this study, to improve the performance of CEEMD algorithm in simulating IPD values, two hybrid models of CEEMD-Copula and CEEMD-Copula-GARCH are developed and compared.
For this purpose, the CEEMD algorithm in multidimensional simulations was developed by using marginal distributions, conditional density of vine copulas and heteroskedasticity. The proposed CEEMD-Copula-GARCH approach was applied to simulate IPD values in Zayanderood basin, Iran.

Study area
The studied area in this research is Qale Shahrokh sub-basin, upstream of Zayanderood Dam, Iran. In this study, IPD data (cubic meters per second) were used in the period of 1971-2019. The maximum, minimum, average, and standard deviation values of the observed data in the studied period are 853, 55.8, 299.42, and 170.63 m 3 /s, respectively.

Complementary ensemble empirical mode decomposition (CEEMD)
CEEMD is a modified version for signal decomposition method for EEMD presented by Yeh et al. (2010) that solves the problems of residual white noise in EEMD analysis Huang 2009, 2010). The decomposition process of the CEEMD decomposed the initial series into several intrinsic mode functions (IMFs) and residuals series (Chen et al. 2018). EMD can decompose a signal into some subsignals, called IMF, each of which has a unique frequency component that can characterize the input signal at different scales (Xu et al. 2009;Roushangar and Alizadeh 2019). Wu and Huang (2009) developed the ensemble of empirical mode decomposition (EEMD) where the signals of different scales based on this is mapped to the white noise background region (Roushangar et al. 2021). Yeh et al. (2010) proposed CEEMD, which has high consistency, as with EEMD, the analysis is also aided by adding white noise.
The operator E j (.) is first defined according to its initial signal and the jth state is generated by EMD. i (n) i (n) represents a zero Gaussian white noise with standard variance N(0, 1), where i = 1,…, I. The coefficients ε k allow to select the signal-to-noise ratio in each step. If x(t) is the target signal, the decomposition steps based on CEEMD are as follows (Zhao et al. 2015): a. The calculations of the IMF 1 (t)IMF 1 (t) method are similar to EEMD. At first different noises are used to realize the calculations, the average of the total value is calculated as follows: b. For k = 1, the first-order residual series r 1 (t)r 1 (t) is calculated: c .
E M D s a t i s f i e s e x p r e s s i o n R 1 (t) + 1 E 1 ( i (n))R 1 (t) + 1 E 1 ( i (n)) until the first condition IMF(t) is satisfied, and the overall mean is defined as IMF 2 (t)IMF 2 (t) with the following relation: d. For k = 2,…, K, the kth order of the residual series r k (t)r k (t) is estimated as follows: (1) are extracted and their overall average is calculated and the performance of the objective function IMF (k+1) (t)IMF (k+1) (t) is produced using the following equation: f. By repeating steps (4) and (5), we will have: g. where K is the sum of IMF(t). Therefore, the target signal x(t) is defined as: That is, CEEMD can guaranteeing residual noise interference and save computation time (Zhang et al. 2021a, b).

Vine-based simulation
The copula theory or Sklar (1959) theory is the initial point for multivariate distribution analysis, and their conditional densities (Khozeymehnezhad and Nazeri-Tahroudi 2020). There are different ways to build a copula tree. Canonical or C-vine, Drawable or D-vine and Regular or R-vine are different in tree sequence (Bedford and Cooke 2001;Nelsen 2006;Aas et al. 2009;Czado 2010;Salvadori and De Michele 2004, 2007, 2010De Michele et al. 2005). For type C of vine copula, multivariate density is Czado (2019): is the density function (for bivariate mode), and had the i,i+j |1∶(i−1) i,i+j |1∶(i−1) parameters ( i k ;i m i k ;i m means: i k , … , i m i k , … , i m ). The log-likelihood function of type C of vine copulas with the θ CV parameter is as Eq. 9: The F(x|v) or conditional distribution functions for the vector v with m dimension is: in the m tree. The log-likelihood function of type D of vine copula with the θ DV parameter is as Eq. 12: The R-vine or Regular vine is more flexible than the type C and D. According to Dißmann et al. (2013): The log-likelihood function pf type R of vine copula with parameter RV RV and

GARCH: generalized autoregressive conditional heteroscedasticity
The GARCH model was developed as the best model for modeling the conditional dependence of variance or heteroscedasticity of time series. Based on its ability to modeling the variables that has significant changes, has been widely used by researchers (Watanabe 2012;Tse and Tsui 2002;Modarres and Ouarda 2012;Ramezani and Nazeri Tahroudi 2020). Actually the GARCH model is a generalized version of the ARCH family that developed by Engle (1982).
The root mean square error (RMSE) and the Nash-Sutcliffe Efficient (NSE) statistics were used to evaluate the error values and the efficiency coefficient of the model, respectively.
N also is the number of data (Krause et al. 2005;Nash and Sutcliffe 1970).

Results and discussion
In this study, in order to simulation the IPD values (cubic meters per second) in Qale Shahrokh sub-basin in the period of 1971-2019, three models were used as follows: 1. Simulation of IPD values using delayed IPD values and the simulation model based on the vine copula, which will be called the Copula model from now on.
2. Decomposition the IPD observation series using the CEEMD model and simulating the decomposed IMF values using the copula-based model that will be called the CEEMD-Copula model from now on. 3. Simulating the residual series of the CEEMD model using the GARCH model and simulating the IMF values using the copula-based model, which will be called the CEEMD-Copula-GARCH model from now on.
The results of the evaluation and comparison of the mentioned models are presented as follows.

Modeling of IPD values using copula functions (Copula model)
For simulating the IPD values in the studied basin at the hydrometric station of Qale Shahrokh using copula functions of the vine family, firstly, appropriate delays were investigated using partial autocorrelation function (PACF) diagrams. Based on the PACF diagram presented in Fig. 2a, it can be seen that the best delay regarding the IPD values in the studied area is 3 (Input delay to confidence interval − 1). Considering the selected delay, various copulas of the vine family, including R-vine, C-vine and D-vine, as well as their rotational, independent and Gaussian modes, were investigated to simulate the IPD values.
Based on various evaluation statistics, the Gaussian R-vine with the tree sequence presented in Fig. 2b was chosen as the best copula and best tree sequence based on the Akaike information criterion equal to − 104.8 and the logarithm of likelihood equal to 88.4 among the examined copulas. By choosing the Gaussian R-vine as the best copula, the IPD values were simulated according to the delayed  Fig. 3. According to this figure, it can be seen that there is a good agreement between the observed and simulated values. The histogram presented in the margin of Fig. 3 shows that there is a slight difference between the histogram of simulated and observed values. The error values in the simulation of IPD values by vine copula and successive delays based on the RMSE statistic is equal to 55.57 m 3 /s. This error rate is about one-sixth of the average observational data. According to the NSE statistic, the efficiency of the vine copula model in simulating IPD values is equal to 89%. In this model, the correlation between simulated and observed IPD values is 0.93.

Decomposition of IPD series using CEEMD model
CEEMD model was used to model the IPD values in the studied station based on decomposition of the original series. Using this model, each time series was decomposed into 6 intrinsic mode function (IMF) series and a residual series. The decomposition algorithm is implemented in such a way that first, the IPD series is divided into 7 series (6 IMF series and one residual series) and the sum of all 7 series is equal to the IPD values in each event. The results of the decomposition of the IPD series at Qale Shahrokh station are shown in Fig. 4. According to Fig. 4, it is possible to observe the smoothing of the IMF values. Fluctuations decrease with increasing IMF number. One of the most important applications of basic decomposition models is the separation of time series into several series in order to use multivariate models. With this method, it is possible to decompose single-variable values into several series in order to obtain the ability to use multivariable models. In general, the decomposition results may be significantly different according to the length of the observation series. The longer the sequence of observation values, the more subsets are decomposed (Huang et al. 2015a, b).

Simulation of decomposed series of IPD using the simulation model based on vine copula (CEEMD-Copula)
Before simulating the IMF series, the appropriate delay was specified for each IMF using the PACF diagram. The PACF diagram of the analyzed series was presented in Fig. 5.
Based on the results presented in Fig. 5, it can be concluded that for the IMF1, IMF2, IMF3, IMF4, IMF5, and IMF6 series, lag 0, 4, 5, 4, 5, and 6 is the best lag for the mentioned values, respectively. In the values of IMF1, zero lag is introduced as the best lag, which means that it is not possible to use copula functions in this decomposed series. Therefore, for this decomposed series (IMF1), lag 2 was considered as a lag that has the highest correlation with IMF values. Based on this, two-dimensional copula were considered for this section. For the values of IMF2 to IMF6, vine Fig. 3 The simulated values of IPD (m 3 /s) using vine-based simulation and successive lags versus the observed values in the Qale Shahrokh station copulas were used for simulations. First, the correlation of IMF values and its delays was investigated using Kendall's Tau statistic. Kendall's Tau values related to IMF values are presented in Appendix 1. Finally, according to the Log likelihood, AIC, and BIC statistics, as well as Kendall's Tau values, appropriate tree sequences and vine-based simulation were investigated. The investigated tree sequences can also be seen in Appendix 2. In this regard, various parts of the vine family have been examined. Based on the best tree sequences presented in Appendix 2 and the AIC, Log likelihood, and BIC statistics, independent C-vine, D-vine, D-vine, Gaussian independent R-vine, Gaussian independent R-vine copulas were selected for IMF2, IMF3, IMF4, IMF5 and IMF6 series, respectively. In IMF2, IMF3, IMF4, IMF5, and IMF6 series, the vine-based simulation has been done in 5, 6, 5, 6 and 7 dimensional modes, respectively. For the IMF1 series, Frank's copula with a dependency parameter of − 5.87 was used for simulation. By choosing the best copulas from the vine family, as well as their conditional density and internal copulas (simple and rotated), IMF values were simulated and the results are presented in Fig. 6. According to Fig. 6, there is a good match between observed and simulated values. The vine-based simulation model was able to simulate the range of data changes as well as the extreme Fig. 4 IMFs and residuals series resulting from IPD decomposition (m 3 /s) and minimum points of IMFs. The RMSE values for IMF1, IMF2, IMF3, IMF4, IMF5, and IMF6 series were calculated as 3.9, 6, 10.6, 3.167.94 and 15.6 m 3 /s, respectively. The efficiency values of the model (Nash-Sutcliffe efficiency coefficient) in these variables were calculated as 99, 91, 95, 92, 98 and 82%, respectively. Finally, after simulating the IMF values using copula-based simulation, the simulated IMF values are aggregated with the residual values and compared with the observed IPD values. The simulation results of IPD values using the CEEMD model were presented in Fig. 7. The error rate is equal to 16.81 cubic meters per second and the efficiency of the model is 99% using RMSE and NSE statistics, respectively. According to Fig. 7, we can see a fine correlation between simulated and observed values. According to the histogram of observed and simulated values, the high efficiency of CEEMD-Copula model can be seen in the simulation of IPD values. Decomposition of the IPD series based on the CEEMD and combining it with the vine-based simulation model compared to the simulation of the IPD values without decomposition showed that the error rate in the CEEMD-Copula model has improved by about 70%. The efficiency of the CEEMD-Copula model is also improved by about 11% compared to the copula-based model.

Simulation of IPD values based on CEEMD-Copula-GARCH approach
In this section, in addition to simulating the IMF values using the vine-based simulation model, the decomposed residual series was also fitted using the GARCH model. The residual series extracted from the CEEMD model was fitted with the GARCH model in standard mode and the new series was simulated according to the conditional variance of the residual series. The simulated and observed values of the residual series were presented in Fig. 8. According to Fig. 8, it can be concluded that the major changes and the average changes of the residual series are well simulated using the GARCH model. In the simulated values of the residual series, several cases of underestimation and overestimation are observed. Finally, using the IMF values simulated by CEEMD-Copula approach and the residual series simulated by GARCH model, the efficiency and error rate of the proposed CEEMD-Copula-GARCH model were investigated. Figure 9 was presented by summing the simulated IMF values and the simulated residual series.
In Fig. 9, the simulated values are closer to the 1:1 line than in Fig. 7, and the histogram of observed and simulated values is more similar than Fig. 7. The error rate in the proposed CEEMD-Copula-GARCH approach in simulating IPD values using RMSE and NSE statistics is about 14.01 cubic meters per second and 99%, respectively. By comparing the results obtained with vine-based simulation model and successive lags of IPD values, the improvement of RMSE and NSE values are obtained 75% and 12%, respectively. By simulating the random part of the residual series using the GARCH model, compared to the CEEMD-Copula model, the error rate (RMSE) and model efficiency coefficient (NSE) have been improved by about 20 and 0.3 percentages, respectively. The improvement rate of the efficiency coefficient is not high, but a 20% reduction in the error rate Fig. 6 The results of IMF decomposition values using vine-based simulation in the simulation of IPD values, which is about 2.8 cubic meters per second, can be satisfactory. In order to check the degree of certainty of the studied models in the simulation of IPD values, the violin plot was used. The violin plot of observed and simulated values was presented in Fig. 10. In Fig. 10, the black rectangle shows the major data changes with upper (third quartile) and lower (first quartile) limits. According to Fig. 10, it can be seen that the simulated values by the CEEMD-Copula and CEEMD-Copula-GARCH approaches are completely similar to each other and are in perfect agreement with the observed values of IPD. Average IPD values are also well estimated by both CEEMD-Copula and CEEMD-Copula-GARCH models. According to Fig. 10, it can be seen that the copula-based simulation model in the state without decomposition of the IPD series has also been able to simulate the changes in the observed values of IPD to some extent, but compared to the CEEMD-Copula and CEEMD-Copula-GARCH approach is less accurate in simulating IPD values.
Due to the great similarity of CEEMD-Copula and CEEMD-Copula-GARCH results, Taylor's diagram was used to show the difference between the studied models, which is presented in Fig. 11. According to Fig. 11, it can be seen that CEEMD-Copula-GARCH model has a shorter distance to the Taylor diagram reference point. The two models CEEMD-Copula and CEEMD-Copula-GARCH do not differ much in terms of correlation with the observed values and are not far away from the reference point of the Taylor diagram, which indicates the sufficient accuracy of the two models in simulating the IPD values. The difference of the standard deviation of the Copula model is higher than the other two models up to the reference line.
Accurate prediction of extreme values of river flow can have countless benefits regarding the development of warning systems (Barredo 2007;Langridge et al. 2006). For this reason, it is necessary to use models that provide this accuracy and efficiency as much as possible so that water resource managers can use them to provide solutions for infrastructure operations (Anderson and Burt 1985). However, creating a suitable flood forecasting and also the warning system for communities at risk requires a combination of hydrological and meteorological data, forecasting tools and accurate simulations (Ali et al. 2020). The results of this research showed that the limitations of simulation can be eliminated by combining different models. Because this study showed that by decomposition of the IPD series by the CEEMD model, the values of this series are separated from the single-variable state into several series. By using the CEEMD signal decomposition method, a limited amount of white noise is introduced into the main signal, and by using the positive statistical aspects of white noise, which has a balanced distribution in the frequency domain, the effect of intermittent noise will be removed from the decomposition process. On the other hand, the combination of CEEMD with the copula-based simulation approach can also perform simulations based on the marginal distribution of data and localize the results in a way. In this study, limitations such as non-stationary and nonlinear problems as well as conditional variance have been resolved with the addition of the GARCH model. Comparing the results of the studied models showed that by analyzing the series of observations, the error values can be reduced to a great extent. This is

Conclusion
In this study, a CEEMD-Copula-GARCH hybrid approach was developed to simulate the IPD values in a sub-basin in the upstream of Zayanderood Dam in Iran. In this study, the results have been improved three times by integrating different models. At first, the simulation of IPD values with regard to successive lags was done by the copula-based simulation model. Efficiency, error rate and certainty of the model were measured using NSE and RMSE statistics and violin plot, respectively. In the next step, by decomposing IPD values by CEEMD algorithm into six intrinsic mode functions and a residual series, simulation of IMF values with vine copulas was done, and the CEEMD-Copula model was obtained. The results showed that by decomposition the observed values, the amount of simulation error can be reduced by more than one third. Decomposition the observations series, in addition to reducing the amount of simulation error, increases the dimension of the simulation, which makes it possible to use different multivariate models. Fluctuations are well simulated due to being divided into several series.
In this model, the minimum and maximum values in the simulated series are more consistent with the observed values. In the CEEMD-Copula model, the IMF values are simulated and the residual series values are added to the simulated IMF values. In the CEEMD-Copula-GARCH approach, the effect of variance heteroscedasticity in the time series was also investigated and considered. For this reason, the residual series of the decomposed series was modeled using the GARCH model. The results of modeling and simulating the IPD values using the CEEMD-Copula-GARCH approach showed that this approach reduces the error by a quarter compared to the copula-based model without decomposition. The CEEMD-Copula-GARCH approach was able to reduce the simulation error by about 2.8 cubic meters per second (about 20%) compared to the CEEMD-Copula model. Based on the presented violin plot, it can be indicated that the certainty of two models based on CEEMD has an acceptable certainty in the simulation of IPD values. The proposed CEEMD-Copula-GARCH approach produced a high correlation between observed and simulated values. Due to the use of marginal distributions, conditional density and consideration the heteroskedasticity, this approach has provided relible results in the simulation of IPD values. The results showed that the proposed model can estimate the IPD values in the Qale Shahrokh basin with very high accuracy, and its results can be used to design flood control and management systems.

Appendix 1
Kendall's tau correlation values of IMF decomposition series (a: IMF1, b: IMF2, c: IMF3, d: IMF4, e: IMF5, f: IMF6).       Funding The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.