Introduction

A subject of paramount interest in planning and design of water works is that related with low flow frequency analysis. Due to the characteristic that design values have, given that they are linked to a return period or to an exceedance probability, the use of mathematical models known as probability-distribution functions is a must. Among the most widely used probability distribution functions for hydrological analysis, related with low flow frequency analysis, are (Kite 1988; Salas and Smith 1980; Rao and Hamed 2000; Raynal-Villasenor 2010):

  1. 1.

    Three parameters log-normal (LN3)

  2. 2.

    Pearson type III (PIII)

  3. 3.

    Extreme value type III (EVIII)

  4. 4.

    General extreme value for the minima (GEVM)

The first three probability-distribution functions have been applied to low flow frequency analysis by Matalas (1963). Gumbel (1958) developed the theoretical grounds and hydrological applications for the extreme value type III distribution for the minima (EVIIIM), the well-known Weibull distribution. This distribution has been applied since the first third of the twentieth century to the analysis of dynamic breaking strength of materials (Weibull 1939a, b). Kite (1988) provided with a computer program to estimate the parameters of the EVIIIM distribution using the methods of moments (MOM) and maximum likelihood (ML). More recently, Lee and Kim (2008) used the two-parameter Weibull distribution with Bayesian Markov chain Monte Carlo and maximum likelihood estimates to assess the uncertainty of low frequency analysis. The estimation of ML parameters of EVIIIM distribution has some difficulties when using the Newton–Raphson method as have been pointed out by Offinger (1996). Durrans and Tomic (2001) compared five methods of estimation of parameters for the log-normal distribution in fitting the lower tail of such distribution. Smakhtin (2001) made a review of 20 years of research results with regard to low flow hydrology. Yue and Wang (2004) studied the scaling of Canadian rivers to regionalize the low flows. Taha et al. (2008) presented a brief review of statistical models that are commonly used in the estimation of low flows both at sites with a reliable stream flow record and sites remote from data sources. Hao and Singh (2009) applied the maximum entropy method to the Burr III distribution and compared the results with the MOM, ML and probability-weighted moments (PWM); they found no differences on the quantiles for small return period, the differences increased for large period returns. Iacobellis (2008) studied the evaluation of a flow duration curve with assigned a T-year return period with beta and complementary beta distributions.

The use of the general extreme value distribution for the minima (GEVM) with moment estimators for the parameters, quantiles and confidence limits are proposed in the paper. A complete example of application of the proposed methodology is contained in the paper, through the application of common spreadsheets framework provided by Excel® (Excel is a registered trademark of Microsoft Corporation, Inc.). The results are compared with the other three distribution function mentioned before.

Probability distribution and density functions of the GEVM

The probability-distribution function of the GEVM distribution for the minima is, Raynal-Villasenor and Douriet-Cardenas (1994):

$$ \Uppi (x) = \exp \left\{ { - [1 - \beta (\omega - x)/\alpha ]^{1/\beta } } \right\} $$
(1)

where ω, α and β are the location, scale and shape parameters, respectively. Π(x) is the probability-distribution function of the random variable x and for the case of low flow frequency analysis is equal to the exceedance probability, Pr(X > x). The scale parameter must meet the condition that α > 0. The domain of variable x in GEVM distribution is as follows:

  1. 1.

    For β < 0:

$$ - \infty < x \le \omega - \alpha /\beta $$
(2)
  1. 2.

    For β < 0:

$$ \omega - \alpha /\beta \le x < \infty $$
(3)

The probability density function for the GEVM distribution is (Raynal-Villasenor and Douriet-Cardenas 1994):

$$ \pi (x) = \frac{1}{\alpha }\exp \left\{ { - [1 - \beta (\omega - x)/\alpha ]^{1/\beta } } \right\}[1 - \beta (\omega - x)/\alpha ]^{1/\beta - 1} $$
(4)

where π(x) is the probability-density distribution of random variable x.

Moment estimators for the parameters of the GEVM distribution

The moment estimators for the parameters of the GEVM distribution have the following expressions:

  1. 1.

    Location parameter:

$$ \hat{\omega } = \hat{A} + \frac{{\hat{\alpha }}}{{\hat{\beta }}} $$
(5)
$$ \hat{A} = \hat{\mu } - \frac{{\hat{\alpha }}}{{\hat{\beta }}}\Upgamma (1 + \hat{\beta }) $$
(6)
  1. 2.

    Scale parameter:

$$ \hat{A} = \hat{\mu } - \frac{{\hat{\alpha }}}{{\hat{\beta }}}\Upgamma (1 + \hat{\beta }) $$
(7)
$$ \hat{B} = \left( {\frac{{\hat{\sigma }^{2} }}{{\sigma_{z}^{2} }}} \right)^{1/2} = \frac{{\hat{\sigma }}}{{\sigma_{z} }} $$
(8)
$$ \sigma_{z}^{2} = \Upgamma (1 + 2\beta ) - \Upgamma^{2} (1 + \beta ) $$
(9)
  1. 3.

    Shape parameter:

For β < 0 and \( - 19.0 < \hat{\gamma } \le - 1.1396 \):

$$ \hat{\beta } = 0.24662 + 0.286678\hat{\gamma } + 0.072454\hat{\gamma }^{2} + 0.010176\hat{\gamma }^{3} + 0.000816\hat{\gamma }^{4} + 0.000037\hat{\gamma }^{5} $$
(10)

For β > 0 and \( - 1.1396 \le \,\,\hat{\gamma } < 11.35 \):

$$ \hat{\beta } = 0.279434 - 0.333535\hat{\gamma } + 0.048305\hat{\gamma }^{2} + 0.024414\hat{\gamma }^{3} + 0.003765\hat{\gamma }^{4} - 0.000263\hat{\gamma }^{5} $$
(11)

where x0, α and β are the location, scale and shape parameters of the GEVM distribution. Γ(.) is the complete Gamma function.

Design values for the GEVM distribution

The design values (quantiles) for the GEVM distribution can be obtained by inverting the GEVM distribution function:

$$ Q_{T} = \omega + \frac{\alpha }{\beta }\left\{ {\left[ { - {\text{Ln}}\left( {1 - \frac{1}{{T_{\text{r}} }}} \right)} \right]^{\beta } - 1} \right\} $$
(12)

where Q T are the design values and Tr is the return period associated with such design values.

Confidence limits for the design values of the GEVM distribution

The moment confidence limits for the GEVM distribution are been computed through the following formula:

$$ x_{\rm l} = Q_{T} \pm z_{\alpha } S_{T} $$
(13)

where xl is the confidence limit (lower or upper confidence limit), Q T is the design value, zα is a standard normal value corresponding to a confidence level of α, and S T is the standard deviation of the estimates. The form of such standard deviation is:

$$ S_{T}^{2} = \frac{{\mu_{2} }}{N}\left\{ {1 + K_{T} \hat{\gamma } + \frac{{K_{T}^{2} }}{4}(\hat{\kappa } - 1) + \frac{{\partial K_{T} }}{{\partial \hat{\gamma }}}\left[ {2\hat{\kappa } - 3\hat{\gamma }^{2} - 6 + K_{T} \left( {\hat{\lambda }_{1} - \frac{{6\hat{\gamma }\,\hat{\kappa }}}{4} - \frac{{10\hat{\gamma }}}{4}} \right)} \right]} \right. $$
(14)

where S 2 T is the variance of the estimates, μ2 is the sample variance, N is the sample size, K T is the frequency factor, γ is the skewness coefficient, κ is the kurtosis coefficient and λ1 is a function of moments. Then, the frequency factor is:

$$ K_{T} \, = B_{K} \,\left[ {\left( { - {\text{Ln}}\left( {1 - \frac{1}{{T_{r} }}} \right)} \right)^{\beta } - A_{K} } \right] $$
(15)
$$ A_{K} = \Upgamma (1 + \beta ) $$
(16)
$$ B_{K} = \frac{1}{{\left[ {\Upgamma (1 + 2\beta ) - \Upgamma^{2} (1 + \beta )} \right]^{1/2} }} $$
(17)
$$ \frac{{\partial K_{T} }}{\partial \gamma } = \left( {\frac{{\partial K_{T} }}{\partial \beta }} \right)\left( {\frac{\partial \beta }{\partial \gamma }} \right) $$
(18)
$$ \frac{{\partial K_{T} }}{\partial \beta } = \left\{ {\frac{{[y^{\beta } {\text{Ln}}(y) - G_{1} P_{1} ] - \frac{1}{2}[y^{\beta } - G_{1} ]\left[ {G_{2} - G_{1}^{2} } \right]^{ - 1} \left[ {G_{2} P_{2} - 2G_{1}^{2} P_{1} } \right]}}{{\left[ {G_{2} - G_{1}^{2} } \right]^{1/2} }}} \right\} $$
(19)
$$ \frac{\partial \gamma }{\partial \beta } = \left\{ {\frac{{\left[ {G_{3} P_{3} + 3G_{1} \left( {P_{1} \left( {G_{1}^{2} - G_{2} } \right) - G_{2} P_{2} } \right)} \right] - \frac{3}{2}\left[ {G_{2} - G_{1}^{2} } \right]^{ - 1} \left[ {G_{3} - 3G_{1} G_{2} + 2G_{1}^{3} } \right]\left[ {G_{2} P_{2} - 2G_{1}^{2} P_{1} } \right]}}{{\left[ {G_{2} - G_{1}^{2} } \right]^{3/2} }}} \right\} $$
(20)
$$ y = - {\text{Ln}}\left( {\frac{1}{{T_{r} }}} \right) $$
(21)
$$ G_{r} = \Upgamma (1 + r\beta ) $$
(22)
$$ P_{r} = \psi (1 + r\beta ) $$
(23)

where ψ(.) is the di-gamma function.

$$ \frac{{\partial K_{T} }}{\partial \gamma } = \frac{{\left[ {G_{2} - G_{1}^{2} } \right]\left\{ {[y^{\beta } {\text{Ln}}(y) - G_{1} P_{1} ]} \right\} - \frac{1}{2}\left[ {y^{\beta } - G_{1} } \right]\left[ {G_{2} P_{2} - 2G_{1}^{2} P_{1} } \right]}}{{\left\{ {\left[ {G_{3} P_{3} + 3G_{1} \left( {P_{1} \left( {G_{1}^{2} - G_{2} } \right) - G_{2} P_{2} } \right)} \right] - \frac{3}{2}\left[ {G_{2} - G_{1}^{2} } \right]^{ - 1} \left[ {G_{3} - 3G_{1} G_{2} + 2G_{1}^{3} } \right]\left[ {G_{2} P_{2} - 2G_{1}^{2} P_{1} } \right]} \right\}}} $$
(24)

Goodness of fit tests for the parameters of the GEVM distribution

The two goodness of fit tests considered in this paper are:

  1. 1.

    Standard error of fit, SEF, Kite (1988)

    $$ {\text{SEF}} = \left[ {\frac{{\sum\nolimits_{i = 1}^{N} {(x_{i} - y_{i} )^{2} } }}{{(N - n_{\text{p}} )}}} \right]^{1/2} $$
    (25)

    where x i are the descending-ordered historical values of the sample,y i are the values produced by the distribution function corresponding to the same return periods of the historical values, N is the sample size, and np is the number of parameters of the distribution function, in this case, np = 3.

  1. 2.

    Mean absolute relative deviation, MARD, Jain and Singh (1987)

    $$ {\text{MARD}} = \frac{100}{N}\sum\limits_{i = 1}^{N} {\left| {\frac{{(x_{i} - y_{i} )}}{{x_{i} }}} \right|} $$
    (26)

Numerical example

The gauging station Villalba is located in the San Pedro River in Northwestern Mexico and has been selected to analyze its sample of annual one-day low flows, using the GEVM distribution with the MOM method of estimation of its parameters, design values and confidence limits.

The geographical location of gauging station Villalba, Mexico is shown in Fig. 1.

Fig. 1
figure 1

Location of gauging station Villalba, Mexico

The first step in the computations is to obtain basic statistics of the one-day low flow sample and such statistics have been obtained by the application of common spreadsheets framework provided by Excel® (Excel is a registered trademark of Microsoft Corporation, Inc.), they are shown in Fig. 2.

Fig. 2
figure 2

Data statistics for gauging station Villalba, Mexico

The parameters, the goodness of fit measures, and design values and its confidence limits obtained through the use of the application of common spreadsheets framework provided by Excel® (Excel is a registered trademark of Microsoft Corporation, Inc.), they are shown in Fig. 3.

Fig. 3
figure 3

Estimation of parameters (GEVM-MOM) and goodness of fit measures for gauging station Villalba, Mexico

The comparison between the histogram of flood data and the theoretical probability-density function is shown in Fig. 4. Figure 5 shows the empirical and theoretical frequency curves for the MOM estimation of parameters for the GEVM, PIII, EVIIIM and LN3 distributions to the 1-day low flow sample of gauging station Villalba, Mexico. In Fig. 6, it is shown a graphical representation of the MOM method of estimation for the design values and their confidence limits. All the figures mentioned before have been obtained through the use of the application of common spreadsheets framework provided by Excel® (Excel is a registered trademark of Microsoft Corporation, Inc.)

Fig. 4
figure 4

Histogram and theoretical probability density function for gauging station Villalba, Mexico

Fig. 5
figure 5

Empirical and theoretical frequency curves for several models applied to 1-day low flow data at gauging station Villalba, Mexico

Fig. 6
figure 6

Empirical and theoretical frequency curves and their confidence limits for one-day low flow data at gauging station Villalba, Mexico

Discussion of results

The easy use of proposed methodology has been shown by the development of the numerical example. Using the common spreadsheets framework provided by Excel® (Excel is a registered trademark of Microsoft Corporation, Inc.), the user has all the time on sight the formulas and results and a possible error could be spotted very easily.

The tables shown in Fig. 3 contain all the required results for a low flow frequency analysis study for a particular set of low flow data. In these tables are contained the values of the parameters, their goodness of fit measures, design values for several return periods and their confidence limits. Two different measures of goodness of fit are provided to choose among competing models.

The information contained in the graphs produced by the common spreadsheets framework provided by Excel® (Excel is a registered trademark of Microsoft Corporation, Inc.) are informative on how good is the adjustment of a particular probability distribution function to a particular set of data, this is given by the graph showing the low flow data and the adjusted model (Fig. 5), the graph that shows the theoretical probability density function and histogram of low flow data (Fig. 4) and the graph that shows the confidence limits and the adjusted model and the low flow data (Fig. 6).

Conclusions

A proposed methodology has been presented for low flow frequency analysis, using the GEVM distribution coupled with MOM method. The use of the common spreadsheets framework provided by Excel® (Excel is a registered trademark of Microsoft Corporation, Inc.) is particularly useful in education and training. The proposed methodology compares well with the existing probability distribution functions when the MOM method is applied. The straightforward application of the proposed methodology to real data, as it has been shown in example contained in the paper, makes it a versatile tool to train students or technical personnel in the field with a personal computer and a printer.