Introduction

Synthetic generation of streamflow is one of the major areas in stochastic hydrology. Since the flow through a river is inherently stochastic, sufficient information about this flow is almost essential in either design or operation of any water resources project. Such information is usually retrieved from the observed records of flows. However, in most of the cases, past records of flows are available for a limited length and such records do not provide the proper picture of variability in flows. Any system designed with such limited data becomes shortsighted and inherits the risk of being inadequate for the unknown flow sequences that the system may experience in future. To deal with this issue of limited available data, usually a synthetic generation model is used that is capable of generating equally likely sequences of flow data which are similar to the historical data in a statistical sense. Two different classes of models are generally used: (i) single-site models for flow data of a single river and (ii) multi-site models dealing with flow data of more than one adjacent rivers (or tributaries).

Brief review of past works

Single-site models

Streamflow generation model was originally introduced by Thomas and Fiering (1962) which is a first-order autoregressive model for generating monthly streamflows of the Clearwater River and its tributaries in Idaho. Since then a number of models have been suggested for hydrologic time series in general and streamflow series in particular. Harms and Campbell (1967) extended Thomas–Fiering model to preserve: (a) normal distribution of annual flows; (b) log-normal distribution of monthly flows; and (c) correlation between annual flows. McMahon and Miller (1971) applied the Thomas Fiering model to skewed hydrologic data using gamma transformation. However, they noted an inconsistency in the transformation process to modify random normal variates to random skewed variates used in the model. According to them, this transformation could be applied to larger skews by taking initially a logarithmic transformation of all flows prior to calculating the parameters of the model. This procedure appreciably reduced the skewness, thus allowing the transformation to generate within the limits of its consistency. Bobée and Robitaille (1975) proposed formulae for adjusting the average of estimates to give a better estimation of the skewness of the population. Phien and Ruksaslip (1981) considered four models for generation of monthly streamflows and modified them when needed with a view to reproduce the mean, standard deviation and skewness coefficient of each monthly sequence of the historical records. The four single-site streamflow generation models for monthly sequences considered were: (a) Thomas–Fiering model (1962), (b) First Spolia–Chander model (1974), (c) Second Spolia–Chander model (1977), (d) Sen model (1978). Modifications were made to account for preserving the skewness. To evaluate the above models, the number of negative values generated and the computer time required were also considered. The modified models proposed by them could satisfactorily preserve the mean, standard deviation and skewness coefficient of the historical records. Application of single-site model can also be found in the works of Shih (1978), Stedinger and Taylor (1982), Awchi and Srivastava (2009), Sangal and Biswas (1970), McMahon and Miller 1971, Moss and Dawdy 1974, Wallis et al. 1974, Mckerchar and Delleur 1974, Mejia and Rodriguez-Iturbe 1974, Charbeneau 1978, Stedinger and Taylor 1982, Sim 1987, Savic et al. 1989, Arselan 2012, etc.

Multi-site models

Most of the reported studies on streamflow generation are based on a single site. But for simultaneous generation of flows in a river–tributary system, multi-site models are more logical. If the individual flows in the river and its tributary are spatially uncorrelated, developing individual, single-site models, for the river as well as for the tributary may be sufficient. But usually, flows in a river and its tributary are observed to have significant cross-correlations, since both the river and the tributary receive runoff from the same parent rainfall on the basin. In this context, Fiering and Jackson (1971) explained that, if a particular month is unusually wet at one site in an area, it is very likely that the same month will be wet at nearby sites. In such cases, it becomes necessary to develop multi-site models which can preserve the cross-correlation in addition to the other required properties at each site (Matalas 1967).

Xu et al. (2001, 2003) used Markov cross-correlation pulse model to extend synthetic streamflow generation for a single site to multiple sites with possibly high cross-correlations of the daily values among these sites. For simulating multi-site multi-season streamflows, Srinivas and Srinivasan (2005) introduced a new hybrid stochastic model which used a parsimonious periodic parametric model without normalization for partial pre-whitening of streamflows at each site. The resulting residuals were resampled using moving block bootstrap to reproduce site-to-site correlations. Szilagyi et al. (2006) applied a hybrid, seasonal Markov chain-based model of daily flow simulation at multiple catchment sites. The model used components of the shot noise models in a Markov chain-based approach, together with a conceptual framework describing flow recession without the need for information on precipitation. They could generate arbitrarily long time series of daily flow rates that at least moderately well preserve basic long-term (mean, variance, skewness, autocorrelation structure, cross-correlations) statistics, as well as short-term behavior of the original time series. Use of multi-site models can also be found in the studies of Wang and Ding (2007), Hao and Singh (2013), Srivastav and Simonovic (2014), etc.

Apart from these autoregressive (AR) models, works based on autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models have also been reported both for single- and multi-site cases (Box and Jenkins 1970; Moss and Dawdy 1974; Mckerchar and Delleur 1974; Stedinger et al. 1985; Sim 1987, etc.) and ANN models (Cigizoglu 2005; Kisi 2007; Ahmed and Sarma 2007; Yonaba et al. 2010; Mehr et al. 2014, etc.).

Since single-site models are inherently simple and efficient for a single river, many a times, single-site models are employed for multi-river systems. But as cross-correlation structure cannot be considered into the model, they cannot properly preserve the cross-correlation structure of multi-river flow data. Since a comprehensive comparison of the performances of single-site and multi-site models applied to the same river–tributary system is not available, this study aims to provide such a detailed analysis. A number of seasonal (monthly) AR models are developed for an existing river and its tributary, considering both single-site as well as multi-site formulation. As for the distribution of the flow data, models are developed based on i) normal distribution and ii) gamma distribution, both for single-site and multi-site models.

The generated series are compared with the historical series in terms of long-term statistical parameters including, mean, standard deviation, coefficient of skewness, serial correlation between successive months and cross-correlation between two sites in the same month.

Study area

Damodar valley (DV) reservoir system in India is a multi-purpose multi-reservoir system. The two upper reservoirs, Konar and Tilaiya, are constructed across river Konar and river Barakar, respectively, as shown in Fig. 1. Performance of the integrated operation of this multi-reservoir system largely depends on the flow in these rivers. For simulation and optimization studies on the operation of this multi-reservoir system, a long sequence of possible flows in future in these rivers is essential that resembles the observed flow series. Hence, streamflow generation models are developed for the flows in these two rivers, which are actually inflows to the two reservoirs.

Fig. 1
figure 1

Location of Konar and Tilaiya reservoir

Konar dam is constructed across Konar River, about 30.6 km from its confluence with Damodar River. The reservoir is primarily responsible for flood control and to supply cooling water to Bokaro thermal power station in the downstream. Tilaiya dam was constructed across the Barakar River, at Tilaiya in Koderma district in the Indian state of Jharkhand mainly to supply irrigation water during the dry season. Tilaiya dam has a power generation capacity of 4 MW.

Streamflow generation models

Four single-site and two multi-site models are developed in this study. As for the distribution of the flow series, normal distribution and gamma distribution are used. In many reported models, normal distribution is used due to its simplicity, but being a symmetric distribution it cannot preserve skewness. Since streamflow values are always positive, its distribution has inherent skewness and use of a skewed distribution like gamma distribution is preferred.

Single site model

The general form of a seasonal, first-order Thomas–Fiering model is given below (Haan 1977):

$$ x_{i,j + 1} = \overline{x}_{j + 1} + \frac{{r_{x,j} S_{x,j + 1} }}{{S_{x,j} }} \left[ {x_{i,j} - \overline{x}_{j} } \right] + z_{i,j + 1} S_{x,j + 1} \sqrt {(1 - r_{x,j}^{2} } ) $$
(1)

in which \( x_{i,j} \) is flow in the jth month of ith year; \( \overline{x}_{j} \) and \( S_{x,j} \) are mean and standard deviation of the flows in the jth month, respectively; \( r_{x,j} \) is first-order serial correlation between j and j + 1th month; and z is a random component with zero mean and unit variance. In the above equation for a monthly model, \( x_{i,j + 1} \) is understood to be \( x_{i + 1,1} \) when j = 12.

Normal model

Equation (1) actually represents the normal model, if the random component z is taken as normally distributed with zero mean and unit standard deviation. Since normal distribution is symmetric with respect to mean, it is possible that some of the generated flows are found to be negative. But, since the flow value cannot be negative, these are usually discarded after using it for generating the next value. Moreover, as the starting value is selected arbitrarily, the first few years of generated values are discarded.

Gamma model

If the observed series has appreciable skewness, use of a skewed distribution instead of normal distribution is preferable (Haan 1977). Gamma distribution is one such distribution which is used in this study.

Equation (1) can also be used for the gamma model, except that the random component zi,j+1 is replaced by εi,j+1 as follows:

$$ x_{i,j + 1} = \overline{x}_{j + 1} + \frac{{r_{x,j} s_{x,j + 1} }}{{s_{x,j} }}\left[ {x_{i,j} - \overline{x}_{j} } \right] + \varepsilon_{i,j + 1} s_{x,j + 1} \sqrt {\left( {1 - r_{x,j}^{2} } \right)} $$
(2)

The random component εi,j is calculated from the following equation (Haan 1977):

$$ \varepsilon_{i,j} = \frac{2}{{c_{\varepsilon ,j} }}\left[ {1 + \frac{{c_{\varepsilon ,j} z_{\varepsilon ,j} }}{6} - \frac{{c_{\varepsilon ,j}^{2} }}{36}} \right] - \frac{2}{{c_{\varepsilon ,j} }} $$
(3)

Where zi, j is normally distributed with zero mean and unit standard deviation, as usual, and cε, j is skewness of random component εi, j+1 and given by

$$ c_{\varepsilon ,j} = \frac{{c_{x,j} - r_{x,j - 1}^{3} c_{x,j - 1} }}{{\left[ {1 - r_{x,j}^{2} } \right]^{1.5} }} $$
(4)

Multi-site models

Multi-site modeling was first proposed by Fiering (1964) which was a principal component model. Later, Matalas (1967) proposed a lag-one multivariate model. The multi-site seasonal AR(1) model (Matalas 1967) may be written as:

$$ {\mathbf{Z}}_{i,j} = {\mathbf{A}}_{j} {\mathbf{Z}}_{i,j - 1} + {\mathbf{B}}_{j} {\varvec{\upvarepsilon}}_{i,j} $$
(5)

Where \( \text{Z}_{i,j} \) is a vector (\( n \times 1 \)) of standardized streamflow values at \( n \) sites (reservoirs). The subscripts \( i \) and \( j \) denote the year and season, where \( j = 1, 2, \ldots w \); \( w \) representing the number of seasons in the year (\( w \) = 12 for a monthly model). \( {\varvec{\upvarepsilon}}_{i,j} \) is a vector \( \left( {n \times 1} \right) \) of serially and mutually uncorrelated independent variables with zero mean and unit variance. \( {\mathbf{A}}_{j} \) and \( {\mathbf{B}}_{j} \) are coefficient matrices of size \( \left( {n \times n} \right) \).

The \( {\mathbf{Z}}_{i,j} \) vector is assumed to be derived from the original series \( {\mathbf{X}}_{i,j} \) through a two step process of standardization and normalization (if needed, for non-normal models) as follows:

$$ y_{i,j}^{k} = \left[ {x_{i,j}^{k} - \overline{x}_{j}^{k} } \right]/S_{x,j}^{k} ,\quad k = 1, \ldots , n $$
(6)
$$ z_{i,j}^{k} = g_{i}^{k} \left( {{\text{y}}_{i,j}^{k} } \right), \quad k = 1, \ldots , n $$
(7)

In Eq. (6), \( x_{i,j}^{k} \) represents actual streamflow value at the site \( k \), during the year \( i \) and month \( j \). It is kth element of the vector \( {\mathbf{X}}_{i,j} \). The terms \( \overline{x}_{j}^{k} \) and \( S_{x,j}^{k} \) are the monthly mean and monthly standard deviation of the series \( \varvec{x}_{i}^{k} \), respectively, and \( y_{i,j}^{k} \) is the kth element of the standardized vector \( \varvec{y}_{i,j} \). In Eq. (7), the term \( g_{i}^{k} \)(.) is a transformation function which is applied in case of non-normal distributions to normalize the original series. After generation of the Z series, inverse transformation of this function is applied to achieve the desired distribution.

Estimation of parameters

The parameter matrices \( {\mathbf{A}}_{j} \) and \( {\mathbf{B}}_{j} \) of Eq. (5) are estimated as follows (Haan 1977):

$$ {\mathbf{A}}_{j} = {\mathbf{M}}_{1,j} {\mathbf{M}}_{1,j}^{ - 1} $$
(8)
$$ {\mathbf{B}}_{j} {\mathbf{B}}_{j}^{\text{T}} = {\mathbf{M}}_{0,j} - {\mathbf{M}}_{1,j} {\mathbf{M}}_{0,j}^{ - 1} {\mathbf{M}}_{1,j}^{\text{T}} $$
(9)

Where \( {\mathbf{M}}_{0,j} \) and \( {\mathbf{M}}_{1,j} \) are the cross-covariance matrix of lag zero and lag one, respectively. The cross-covariance matrices are obtained from the following equations:

$$ {\mathbf{M}}_{0,j} = E\left( {{\mathbf{Z}}_{j}\,{\mathbf{Z}}_{j}^{\text{T}} } \right) $$
(10)
$$ {\mathbf{M}}_{1,j} = E\left( {{\mathbf{Z}}_{j}\,{\mathbf{Z}}_{j - 1}^{\text{T}} } \right) $$
(11)

Matrix \( {\mathbf{B}}_{j} \) does not have a unique solution. Rather it can have several solutions. Matalas [1967] suggested principal component analysis. But a more straight forward solution was proposed by Young and Pisano [1968] assuming \( {\mathbf{B}}_{j} \) as a lower triangular matrix.

Normal model

If the normalization step (Eq. 7) is omitted, then the model acts as a normal model.

Gamma model

For developing gamma model, the original series \( {\mathbf{X}}_{i,j} \) is first standardized using Eq. (6) and the standardized series \( {\mathbf{Y}}_{i,j} \) is normalized using the Wilson–Hilferty transformation as follows:

$$ z_{i,j}^{k} = \left[ {\frac{6}{{C_{y,j}^{k} }}} \right]\left\{ {\left[ {\left( {\frac{{c_{y,j}^{k} {\text{y}}_{i,j}^{k} }}{2}} \right) + 1} \right]^{{\frac{1}{3}}} - 1} \right\} + \left[ {\frac{{c_{y,j}^{k} }}{6}} \right]\quad , k = 1, \ldots , n $$
(12)

Where \( c_{y,j}^{k} \) represents the monthly skewness coefficient of the series \( {\mathbf{y}}_{i}^{k} \), and \( z_{i,j}^{\text{k}} \) is the ith element of the normalized vector \( {\mathbf{Z}}_{{{\text{i}},{\text{j}}}} \).

The lag-zero and lag-one cross-covariance matrices are estimated from the Z series using Eqs. (10) and (11). Parameter matrices \( {\mathbf{A}}_{j} \) and \( {\mathbf{B}}_{j} \) are estimated from Eqs. (8) and (9), assuming \( {\mathbf{B}}_{j} \) as lower triangular matrix.

Then, a sequence of normal random deviate \( {\varvec{\upvarepsilon}}_{i,j} \) of length (N*n) is generated where N represents number of years for which flows are required to be generated and n is number of sites.

After the generation of standard normal vector \( {\mathbf{Z}}_{i,j} \), inverse transformation of Eq. (12) is applied to transform the generated normal vector into standard gamma vector \( {\mathbf{Y}}_{i,j} \) using the following equation:

$$ y_{i,j}^{k} = \left[ {\frac{2}{{C_{y,j}^{k} }}} \right]\left\{ {\left[ {\left( {\frac{{C_{y,j}^{k} }}{6}} \right)\left( {z_{i,j}^{k} - \frac{{C_{y,j}^{k} }}{6}} \right) + 1} \right]^{3} - 1} \right\},\quad k = 1, \ldots , n $$
(13)

Now, the original series is obtained as:

$$ x_{i,j}^{k} = S_{x,j}^{k} y_{i,j}^{k} + \overline{x}_{j}^{k} \quad , k = 1, \ldots , n $$
(14)

Results and discussion

After developing the models, monthly sequences of 100 years flow data have been generated for each of the two rivers, namely Konar and Barakar. The statistical parameters of the models are estimated from thirty-seven years of observed flow through these two rivers. The generated series of the two rivers are compared with the corresponding observed series in terms of mean, standard deviation, coefficient of skewness of each month, serial correlation of successive months and cross-correlation between the flows in two rivers. The corresponding plots of comparisons are shown in Figs. 2, 3. 4 and 5 for Konar River with single-site model, in Figs. 6, 7, 8 and 9 for Barakar River with single-site model, in Fig. 10 for both Konar River and Barakar River with single-site model, in Figs. 11, 12, 13 and 14 for Konar River with multi-site model, in Figs. 15, 16, 17 and 18 for Barakar River with multi-site model and in Fig. 19 for both Konar River and Barakar River with multi-site model.

Fig. 2
figure 2

Historical and generated monthly means of flows in Konar River (single-site model)

Fig. 3
figure 3

Historical and generated monthly standard deviations of flows in Konar River (single-site model)

Fig. 4
figure 4

Historical and generated monthly skewness coefficients of flows in Konar River (single-site model)

Fig. 5
figure 5

Historical and generated monthly serial correlations of flows in Konar River (single-site model)

Fig. 6
figure 6

Historical and generated monthly means of flows in Barakar River (single-site model)

Fig. 7
figure 7

Historical and generated monthly standard deviations of flows in Barakar River (single-site model)

Fig. 8
figure 8

Historical and generated monthly skewness coefficients of flows in Barakar River (single-site model)

Fig. 9
figure 9

Historical and generated monthly serial correlations of flows in Barakar River (single-site model)

Fig. 10
figure 10

Historical and generated monthly cross-correlations of flows in Konar and Barakar (single-site model)

Fig. 11
figure 11

Historical and generated monthly means of flows in Konar River (multi-site model)

Fig. 12
figure 12

Historical and generated monthly standard deviations of flows in Konar River (multi-site model)

Fig. 13
figure 13

Historical and generated monthly skewness coefficients of flows in Konar River (multi-site model)

Fig. 14
figure 14

Historical and generated monthly serial correlations of flows in Konar River (multi-site model)

Fig. 15
figure 15

Historical and generated monthly means of flows in Barakar River (multi-site model)

Fig. 16
figure 16

Historical and generated monthly standard deviations of flows in Barakar River (multi-site model)

Fig. 17
figure 17

Historical and generated monthly skewness coefficients of flows in Barakar River (multi-site model)

Fig. 18
figure 18

Historical and generated monthly serial correlations of flows in Barakar River (multi-site model)

Fig. 19
figure 19

Historical and generated monthly cross-correlations of flows in Konar and Barakar (multi-site model)

Results from single site models

Figure 2 presents the plots of monthly mean values obtained from single-site models with normal distribution and gamma distribution, along with those obtained from the observed data series, for Konar River. It may be seen that both normal and gamma model generated mean values almost equal to that of the observed series, except for the month of August. Similar plot for monthly standard deviation are shown in Fig. 3, which also shows very close agreement of the generated series with the observed series, except for the month of August and September. The situation is however different in case of skewness coefficient (Fig. 4), where the Gamma model yielded results similar to the observed series, but for normal model, the values are different and around zero. This is expected since normal distribution is a symmetric distribution. The small amount of skewness that can be observed is due to making the generated negative values equal to zero. In Fig. 5, which shows the serial correlations it can be seen that both the three plots are quite close.

Similar comparative plots of the statistical parameters are obtained for Barakar River also, as shown in Figs. 6, 7, 8 and 9. In case of preserving mean values, it can be seen from Fig. 6 that both the two models yielded results very close to the observed series, except for the month of August. Comparatively, gamma model yielded better results. In terms of monthly standard deviation also (Fig. 7), both normal and gamma model produced results very close to the observed series, except for the month of August and September. In case of skewness coefficient (Fig. 8), results of gamma model are quite similar to the observed series but that of normal model is quite different, similar to that observed in case of Konar River. As for the serial correlation, it can be seen from Fig. 9 that results from both the two models are very close to the observed values.

Although the single-site models were developed separately for each river and models for one river yielded results without having any knowledge about flows in the other river, just for comparison, cross-correlations are computed from the two generated series for Konar and Barakar, for each model. These values are shown in Fig. 10. Expectedly, generated cross-correlation values did not match at all with those of the observed series.

Results from multi-site models

Plots for comparing mean, standard deviation, skewness and serial correlation values obtained from the multi-site models with two different distributions with those of the observed series are shown in Figs. 11,12, 13 and 14 for Konar River and in Figs. 15, 16, 17 and 18 for Barakar River. Like the single-site models, here also it can be observed that both the normal model and the gamma model preserved the mean and standard deviation values very well, for both Konar and Barakar. In case of skewness coefficient, gamma model yielded much better result than the normal model. Regarding serial correlation values, however, both the models produced values quite different than that of the observed series, for both Konar and Barakar.

Regarding preservation of cross-correlation between the flows in two rivers, it can be seen from Fig. 19 that both the models produced excellent results with values almost equal to those of the observed series.

Conclusion

A comparative study on the performances of single-site AR model and multi-site AR model, for synthetic generation of flows in an existing river and tributary is presented in this paper. As for the distribution of the flows, both normal distribution and gamma distribution are used and compared. Gamma distribution is used to take care of the skewness in the series, if any. Results indicate that regarding preservation of mean, standard deviation and serial correlation, both single-site models and multi-site models produce very good results with each distribution, for both the rivers. Gamma model is, however, found to be much better than the normal model in preserving skewness, which is expected since normal distribution has zero skewness. Cross-correlation is not at all preserved by the single-site models, which is excellently preserved by the multi-site models. Hence, in cases, where preservation of mean, standard deviation, serial correlation and skewness is needed, single-site model with gamma distribution can be used. If preservation of cross-correlation is required, then multi-site model with gamma distribution is to be used. It may be noted here that the performance of an AR model is dependent on its parameters, which are in turn dependent on the length of the observed record and variations in flow characteristics captured in the record. Hence, these conclusions are specific to the river system studied and may be applicable to river systems with similar flow characteristics.