Skip to main content
Log in

Probability of correct selection from lognormal and convective diffusion models based on the likelihood ratio

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

The objective of the paper is to show that the use of a discrimination procedure for selecting a flood frequency model without the knowledge of its performance for the considered underlying distributions may lead to erroneous conclusions. The problem considered is one of choosing between lognormal (LN) and convective diffusion (CD) distributions for a given random sample of flood observations. The probability density functions of these distributions are similarly shaped in the range of the main probability mass and the discrepancies grow with the increase in the value of the coefficient of variation (C V ). This problem was addressed using the likelihood ratio (LR) procedure. Simulation experiments were performed to determine the probability of correct selection (PCS) for the LR method. Pseudo-random samples were generated for several combinations of sample sizes and the coefficient of variation values from each of the two distributions. Surprisingly, the PCS of the LN model was twice smaller than that of the CD model, rarely exceeding 50%. The results obtained from simulation were analyzed and compared both with those obtained using real data and with the results obtained from another selection procedure known as the QK method. The results from the QK are just the opposite to that of the LR procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr AC-19(16):716–722

    Google Scholar 

  • Atkinson AR (1970) A method of discriminating between models. J Royal Statistical Society B 32:323–345

    Google Scholar 

  • Bain LJ, Engelhardt M (1980) Probability of correct selection of Weibull versus Gamma based on likelihood ratio Commun Statist-Theor Meth A 9(4):375–381

    Google Scholar 

  • Berger JO (1985) Statistical decision theory and bayesian analysis. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Bernardo JM, Smith AFM (2000) Bayesian theory (Wiley Series in Probability and Statistics), John Wiley & Sons

  • Bobee B, Rasmussen PF (1995) Recent advances in flood frequency analysis. Reviews of Geophy Supplement: 1111–1116

  • Bobee B, Cavadia G, Ashkar F, Bernier J, Rasmussen PF (1993) Towards a systematic approach to comparing distributions used in flood frequency analysis. J Hydrology 142:121–136

    Article  Google Scholar 

  • Cunnane C (1985) Factors affecting choice of distribution for flood series. Hydrological Sci J 30(1, 3):25–36

    Google Scholar 

  • D’Agostino RB, Stephens MA (1986) Goodness- of- fit techniques. Marcel Dekker, Inc., New York, Basel

    Google Scholar 

  • Dooge JCI (1973) Linear theory of hydrologic systems. Tech. Bull. 1468, Agricultural Research Service, Washington

  • Dumonceaux R, Antle CE, Hass G (1973) Likelihood ratio test for discrimination between two models with unknown location and scale parameters. Technometrics 2:55–65

    Google Scholar 

  • Dyer AR (1973) Discrimination procedures for separate families of hypotheses. J American Statistical Associates 68(344):970–974

    Article  Google Scholar 

  • Folks JL, Chhikara RS (1978) The inverse Gaussian distribution and its statistical application – a review. JR Stat Soc Ser.B 40(3):263–289

    Google Scholar 

  • Fortin V, Bernier J, Bobee B (1997a) Simulation, Bayes, and bootstrap in statistical hydrology. Water resources Research 33(3):439–448

    Article  Google Scholar 

  • Fortin V, Bobee B, Bernier J (1997b) Rational approach to comparison of flood distribution by simulation. J Hydrologic Engineering 2(3):95–103

    Article  Google Scholar 

  • Gunasekara TA, Cunnane C (1991) Expected probabilities of exceedance form non-normal flood distributions. J Hydrology 128:101–113

    Article  Google Scholar 

  • Gunasekara TA, Cunnane C (1992) Split sampling technique for selecting a flood frequenct analysis procedure. J Hydrology 130:189–200

    Article  Google Scholar 

  • Gupta VK (1970) Selection of flood frequency models. Water Resources Research 6(4):1193–1198

    Article  Google Scholar 

  • Hájek J, Šidák Z (1967) Theory of Rank Tests, Academic Press, New York, Sec. 2.2

  • Haktanir T (1992) Comparison of various flood frequency distributions using annual peaks data of rivers in Anatolia. J Hydrology 136:1–31

    Article  Google Scholar 

  • Halphen E (1941) Sur un nouveau type de courbe de fréquence. Comptes Rendius de l’Académie des Sciences. Tome 213, Paris 633–635

    Google Scholar 

  • Hosking JRM, Wallis JR (1987) Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29:339–349

    Article  MathSciNet  Google Scholar 

  • Hosking JRM, Wallis JR (1997) Regional Frequency Analysis. An Approach Based on L–Moments. Cambridge University Press, 224 pp

  • Hosking JRM, Wallis JR, Wood EF (1985) Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 27:251–261

    Article  MathSciNet  Google Scholar 

  • Johnston NL, Kotz S (1970) Distribution in statistics: Continuous Univariate Distributions 1, Houghton-Mifflin, Boston

  • Kappenman RF (1982) On a method for selecting a distributional model. Commun in Statistics – Theory Meth 11:663–672

    Article  Google Scholar 

  • Kappenman RF (1988) A simple method for choosing between the lognormal and Weibull models. Elsevier Sc. Publ., Statistics & Probability Letters 7:123–126

    Google Scholar 

  • Kendall MG, Stuart A. (1969) The advanced theory of statistics. V.1. Distribution Theory, Ch.11 and 12. Charles Griffin, London

  • Kuczera G (1982) Robust flood frequency models. Water Resour Res 18(2):315–324

    Article  Google Scholar 

  • Landwehr JM, Matalas NC, Wallis JR (1980) Quantile estimation with more or less floodlike distributions. Water Resour Res 16(1):547–555

    Article  Google Scholar 

  • Madsen H, Rosbjerg D (1997) Generalized least squares and empirical Bayes estimation in regional partial duration series index-flood modelling. Water Resources Research 33(4):771–781

    Article  Google Scholar 

  • Mitosek HT, Strupczewski WG, Singh VP (2002) Toward an objective choice of an annual flood peak distribution. Published on CD ROM: Advances in Hydro-Science and Engineering, The 5th International Conf. On Hydro-Science & - Engineering, Warsaw

  • Morlat G (1956) Les lois de probabilitié de Halphen. Revue de Statistique Appliquée, Paris 3:1–43

    Google Scholar 

  • Mutua FM, (1994) The use of the Akaike Information Criterion in the identification of an optimum flood frequency model. Hydrol Sc J 39(3):235–244

    Google Scholar 

  • O’Connell DRH, Ostenaa DA, Lavish DR, Klinger RE (2002) Bayesian flood frequency analysis with paleohydrologic data. Water Resources Research 38(5):16–1 to 16–14

    Google Scholar 

  • Perrault L, Bobée B, Rasmussen PF (1999a) Halphen distribution system, I: Mathematical and statistical properties. J Hydrol Eng ASCE 4(3):189–199

    Article  Google Scholar 

  • Perrault L, Bobée B, Rasmussen PF (1999b). Halphen distribution system, II: Parameter and quantile estimation. J Hydrol Eng ASCE 4(3):200–208

    Article  Google Scholar 

  • Quesenberry CP, Kent J (1982) Selecting among probability distributions used in reliability. Technometrics 24(1):59–65

    Article  Google Scholar 

  • Raftery AE (1993) Bayesian model selection in structural equation models. In: Bollen KA, Long JS (eds) Testing Structural Equation Models. Sage, Beverly Hills, pp 163–180

    Google Scholar 

  • Seshadri V (1994) The inverse Gaussian distribution: A Case Study in Exponential Families (Oxford Science Publications), Oxford University Press, p 256

  • Strupczewski WG, Singh VP, Feluch W (2001a) Non-stationary approach to at-site flood frequency modeling I. Maximum lielihood estimation. J Hydrol 248:123–142

    Article  Google Scholar 

  • Strupczewski WG, Singh VP, Weglarczyk S (2001b) Impulse response of linear diffusion analogy as a flood probability density function. Hydrol Sc J 46(5):761–780

    Article  Google Scholar 

  • Strupczewski WG, Weglarczyk S, Singh VP (2002a) Physics of flood frequency analysis. Part I. Linear convective diffusion wave model. Acta Geophys. Polonica 50(3):433–455

    Google Scholar 

  • Strupczewski WG, Weglarczyk S, Singh VP (2002b) Model error in flood frequency estimation. Acta Geophys. Pol 50(2):279–319

    Google Scholar 

  • Strupczewski WG, Singh VP, Weglarczyk S (2002c) Asymptotic bias of estimation methods caused by the assumption of false probability distribution. J of Hydrol 258(1–4):122–148

    Article  Google Scholar 

  • Strupczewski WG, Weglarczyk S, Singh VP (2003) Physics of flood frequency analysis. Part II. Convective diffusion model versus Lognormal model. Acta Geophys. Polonica 51(1):85–106

    Google Scholar 

  • Takara K, Takasao T (1988) Evaluation criteria for probability distribution models in hydrologic frequency analysis. Proc. 5th IAHR international Symposium on Stochastic Hydraulics, 2–4 August, Birmingham pp 10

  • Tang WH (1980) Bayesian frequency analysis. J Hydraulics Division ASCE 106(HY7):1203–1218

    Google Scholar 

  • Taskar GD (1987) A comparison of methods for estimating low flow characteristics of streams. Water Resources Bulletin 23(6):1077–1083

    Google Scholar 

  • Turkman KF (1985) The choice of extremal models by Akaike’s informaton criterion. J of Hydrol 82:307–315

    Article  Google Scholar 

  • Tweedie MCK (1957) Statistical properties of the inverse Gaussian distributions. I Ann Math Stat 28:362–377

    Article  Google Scholar 

  • Vogel RM, Thomas WO, McMohan TA (1993) Flood flow frequency model selection in southwestern United States. J Water Resources Planning and Management 119(3):353–366

    Article  Google Scholar 

  • Wilks DS (1993) Comparison of three-parameter probability distributions for representing annual extreme and partial duration precipitation series. Water Resources Research 29(10):3543–3549

    Article  Google Scholar 

  • Wood EF, Rodriguez-Iturbe I (1975) A Bayesian approach to analyzing uncertainty among flood frequency models. Water Resources Research 11(6):839–848

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to express their appreciation to the anonymous reviewers for their useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to W. G. Strupczewski.

Appendices

Appendix A: Pertinent characteristics of the CD and LN distributions

Some relations of the competing distributions used in this study are summarized in Table A.1.

Table A.1 Moment characteristics, quantile and ML-solution of CD and LN

Sample average log likelihood function Eq. (7):

The CD distribution

$$\hat{\Lambda}_{(CD)}^{(N)} = \ln \alpha - \ln \sqrt{\pi} - \frac{3}{2} \hat{\hbox{E}} (\ln X) - \alpha^{2} \hat{\hbox{E}} (X^{-1}) + 2 \beta - \left(\frac{\beta}{\alpha}\right)^{2} \hat{\hbox{E}} (\hbox{X}) $$
(A.1)

the LN distribution

$$\hat{\Lambda}_{(LN)}^{(N)} = - \hat{\hbox{E}} (\ln X) - \ln \sigma - \ln \sqrt{2\pi} - \frac{1}{2\sigma^{2}} \hat{\hbox{E}} (\ln \hbox{X} - \mu)^{2}$$
(A.2)

where

$$\hat{\hbox{E}} (Z) = \frac{1}{N} \sum\limits_{i = 1}^{N} z_{i} $$

The logarithms of the selection statistics S i [Eq. (9)] for the CD and the LN models are of the form:

$$\begin{aligned} \ln \hbox{S}_{(CD)}^{(N)} &= \ln 2 + \frac{\hbox{N}}{2} \ln \left(\frac{\beta}{\pi}\right) + 2 N \beta - \frac{3\hbox{N}}{2} \hat{\hbox{E}} (\ln \hbox{X})\\ &\quad + \frac{1}{4} \left\{ \ln \left[\hat{\hbox{E}} (\hbox{X})\right] - \ln \left[\hat{\hbox{E}} (\hbox{X}^{-1}) \right] \right\}\\ &\quad + \ln \left\{K_{-N/2} \left[ 2N\beta \sqrt{\hat{\hbox{E}} (\hbox{X}) \hat{\hbox{E}} (\hbox{X}^{-1})}\right] \right\}\\ \end{aligned}$$
(A.3)

where K ν (z) represents the modified Bessel function of second kind.

$$\ln \hbox{S}_{(LN)}^{(N)} = (1 - N) \ln \left(\sigma \sqrt{2\pi}\right) - \frac{1}{2} \ln (\hbox{N}) - \hbox{N} \cdot \hat{\hbox{E}}(\ln \hbox{X}) - \frac{N}{2\sigma^{2}}\left[\hat{\hbox{E}} (\ln^{2} \hbox{X}) - \hat{\hbox{E}}^{2} (\ln \hbox{X}) \right] $$
(A.4)

As stated in Introduction the CD model can be derived from the Halphen type A distribution. To show it let us reparameterize the Halphen type A probability density function (e.g., Perrault et. al., 1999a):

$$f_{A} (x) = \frac{1}{2\left(\frac{\alpha^{2}}{\beta}\right)^{v} K_{v} (2\beta)} x^{v-1} \exp \left[- \left(\frac{\beta^{2}}{\alpha^{2}}x + \frac{\alpha^{2}}{\beta x}\right) \right],\quad x > 0$$
(A.5)

For large values of the argument z the modified Bessel function of the second kind of order v can be approximated by the first term of the expansion:

$$K_{v} (z) = \sqrt{\frac{\pi}{2z}}e^{-z} \left[1 + \frac{u- 1}{8z} + \frac{(u - 1)(u - 9)}{2! (8z)^{2}}+\cdots\right]$$
(A.6)

where u=4v 2. Substituting it into Eq. (A.5) and putting v=− 1/12, one gets the CD density function [Eq.(10)]. Note from Table A.1 that large values of the argument z (z=2β) correspond to small values of the C V .

Appendix B: Distribution of (M (N) (CD|LN), M (N) (LN|LN)) variable

The \(\hat{\hbox{M}}^{(N)} (\hbox{CD}|\hbox{LN})\) and \(\hat{\hbox{M}}^{(N)} (\hbox{LN}|\hbox{LN})\) variables are highly correlated so the scatter points diagram \(\left((\hat{\hbox{M}}_{s}^{(N)} (\hbox{CD}|\hbox{LN}),\; \hat{\hbox{M}}_{s}^{(N)} (\hbox{LN}|\hbox{LN})),\;s = 1,\ldots,S\right)\) cannot be instructive as it can be hardly distinguished from the straight line

$$\hbox{M}^{(N)} (\hbox{CD}|\hbox{LN}) = \hbox{M}^{(N)} (\hbox{LN}|\hbox{LN})$$
(B.1)

Therefore the whole area of the possible occurrence of the estimates was divided into seven sub-areas (Fig. B.1a, b). The diagonal corresponds to the equality of the maximum log L of the true and false distributions [Eq. B.1)]. Having the set of the S elements, one can assign each element to the respective areas and then get the rate of occurrence. Our particular interest is in the cases where despite the fact that (Fig. B.1a)

$$Median \hbox{M}^{(N)} (\hbox{LN|LN}) > Median \hbox{M}^{(N)} (\hbox{CD}|\hbox{LN})$$
(B.2)

the PCS is still lower than 0.5 (Table B.1). The PCS values of the LN model (the last column of Table B.1) are as in Table 1.

Fig. B.1
figure 3

a, b Scheme of respective areas arrangement

Table B.1 Rates of occurrence in each distinguished areas for selected the (C V , N) combinations

Looking at Fig. B.1 one learn that in the limiting cases PCS = 0 if p(F)=p(D)=0.5 (i.e., with all other rates equal to zero), or PCS = 1 if p(A)=p(B)=0.5 (i.e., with all other rates equal to zero). It leads to the conclusion that (B.2) is not informative in respect to PCS and the key is hidden in the form of log L of the both competing distributions. From rate array of all (C V ,N) combinations (not shown), certain regularities of the results are noted:(1) p(A) ≈ p(C) and they are large values if PCS is large, i.e., they grow with sample size and with the C V value; (2) p(D) ≈ p(F and they are large values if PCS is small, i.e., they decrease with sample size and with the C V value; (3) p(B) ≈ p(E) and they are always small values, growing with the sample size and with the C V value; (4) p(G) ≈ 0

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strupczewski, W.G., Mitosek, H.T., Kochanek, K. et al. Probability of correct selection from lognormal and convective diffusion models based on the likelihood ratio. Stoch Environ Res Ris Assess 20, 152–163 (2006). https://doi.org/10.1007/s00477-005-0030-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-005-0030-5

Keywords

Navigation