Probability of correct selection from lognormal and convective diffusion models based on the likelihood ratio

Strupczewski, W. G.; Mitosek, H. T.; Kochanek, K.; Singh, V. P.; Weglarczyk, S.

doi:10.1007/s00477-005-0030-5

Probability of correct selection from lognormal and convective diffusion models based on the likelihood ratio

Original Paper
Published: 07 February 2006

Volume 20, pages 152–163, (2006)
Cite this article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

W. G. Strupczewski¹,
H. T. Mitosek¹,
K. Kochanek¹,
V. P. Singh² &
…
S. Weglarczyk³

190 Accesses
14 Citations
Explore all metrics

Abstract

The objective of the paper is to show that the use of a discrimination procedure for selecting a flood frequency model without the knowledge of its performance for the considered underlying distributions may lead to erroneous conclusions. The problem considered is one of choosing between lognormal (LN) and convective diffusion (CD) distributions for a given random sample of flood observations. The probability density functions of these distributions are similarly shaped in the range of the main probability mass and the discrepancies grow with the increase in the value of the coefficient of variation (C_V). This problem was addressed using the likelihood ratio (LR) procedure. Simulation experiments were performed to determine the probability of correct selection (PCS) for the LR method. Pseudo-random samples were generated for several combinations of sample sizes and the coefficient of variation values from each of the two distributions. Surprisingly, the PCS of the LN model was twice smaller than that of the CD model, rarely exceeding 50%. The results obtained from simulation were analyzed and compared both with those obtained using real data and with the results obtained from another selection procedure known as the QK method. The results from the QK are just the opposite to that of the LR procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of four methods to select the best probability distribution for frequency analysis of annual maximum precipitation using Monte Carlo simulations

Article 19 June 2021

Modelling multidecadal variability in flood frequency using the Two-Component Extreme Value distribution

Article Open access 20 April 2024

An assessment of using subsampling method in selection of a flood frequency distribution

Article 28 September 2016

References

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr AC-19(16):716–722
Google Scholar
Atkinson AR (1970) A method of discriminating between models. J Royal Statistical Society B 32:323–345
Google Scholar
Bain LJ, Engelhardt M (1980) Probability of correct selection of Weibull versus Gamma based on likelihood ratio Commun Statist-Theor Meth A 9(4):375–381
Google Scholar
Berger JO (1985) Statistical decision theory and bayesian analysis. Springer, Berlin Heidelberg New York
Google Scholar
Bernardo JM, Smith AFM (2000) Bayesian theory (Wiley Series in Probability and Statistics), John Wiley & Sons
Bobee B, Rasmussen PF (1995) Recent advances in flood frequency analysis. Reviews of Geophy Supplement: 1111–1116
Bobee B, Cavadia G, Ashkar F, Bernier J, Rasmussen PF (1993) Towards a systematic approach to comparing distributions used in flood frequency analysis. J Hydrology 142:121–136
Article Google Scholar
Cunnane C (1985) Factors affecting choice of distribution for flood series. Hydrological Sci J 30(1, 3):25–36
Google Scholar
D’Agostino RB, Stephens MA (1986) Goodness- of- fit techniques. Marcel Dekker, Inc., New York, Basel
Google Scholar
Dooge JCI (1973) Linear theory of hydrologic systems. Tech. Bull. 1468, Agricultural Research Service, Washington
Dumonceaux R, Antle CE, Hass G (1973) Likelihood ratio test for discrimination between two models with unknown location and scale parameters. Technometrics 2:55–65
Google Scholar
Dyer AR (1973) Discrimination procedures for separate families of hypotheses. J American Statistical Associates 68(344):970–974
Article Google Scholar
Folks JL, Chhikara RS (1978) The inverse Gaussian distribution and its statistical application – a review. JR Stat Soc Ser.B 40(3):263–289
Google Scholar
Fortin V, Bernier J, Bobee B (1997a) Simulation, Bayes, and bootstrap in statistical hydrology. Water resources Research 33(3):439–448
Article Google Scholar
Fortin V, Bobee B, Bernier J (1997b) Rational approach to comparison of flood distribution by simulation. J Hydrologic Engineering 2(3):95–103
Article Google Scholar
Gunasekara TA, Cunnane C (1991) Expected probabilities of exceedance form non-normal flood distributions. J Hydrology 128:101–113
Article Google Scholar
Gunasekara TA, Cunnane C (1992) Split sampling technique for selecting a flood frequenct analysis procedure. J Hydrology 130:189–200
Article Google Scholar
Gupta VK (1970) Selection of flood frequency models. Water Resources Research 6(4):1193–1198
Article Google Scholar
Hájek J, Šidák Z (1967) Theory of Rank Tests, Academic Press, New York, Sec. 2.2
Haktanir T (1992) Comparison of various flood frequency distributions using annual peaks data of rivers in Anatolia. J Hydrology 136:1–31
Article Google Scholar
Halphen E (1941) Sur un nouveau type de courbe de fréquence. Comptes Rendius de l’Académie des Sciences. Tome 213, Paris 633–635
Google Scholar
Hosking JRM, Wallis JR (1987) Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29:339–349
Article MathSciNet Google Scholar
Hosking JRM, Wallis JR (1997) Regional Frequency Analysis. An Approach Based on L–Moments. Cambridge University Press, 224 pp
Hosking JRM, Wallis JR, Wood EF (1985) Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 27:251–261
Article MathSciNet Google Scholar
Johnston NL, Kotz S (1970) Distribution in statistics: Continuous Univariate Distributions 1, Houghton-Mifflin, Boston
Kappenman RF (1982) On a method for selecting a distributional model. Commun in Statistics – Theory Meth 11:663–672
Article Google Scholar
Kappenman RF (1988) A simple method for choosing between the lognormal and Weibull models. Elsevier Sc. Publ., Statistics & Probability Letters 7:123–126
Google Scholar
Kendall MG, Stuart A. (1969) The advanced theory of statistics. V.1. Distribution Theory, Ch.11 and 12. Charles Griffin, London
Kuczera G (1982) Robust flood frequency models. Water Resour Res 18(2):315–324
Article Google Scholar
Landwehr JM, Matalas NC, Wallis JR (1980) Quantile estimation with more or less floodlike distributions. Water Resour Res 16(1):547–555
Article Google Scholar
Madsen H, Rosbjerg D (1997) Generalized least squares and empirical Bayes estimation in regional partial duration series index-flood modelling. Water Resources Research 33(4):771–781
Article Google Scholar
Mitosek HT, Strupczewski WG, Singh VP (2002) Toward an objective choice of an annual flood peak distribution. Published on CD ROM: Advances in Hydro-Science and Engineering, The 5th International Conf. On Hydro-Science & - Engineering, Warsaw
Morlat G (1956) Les lois de probabilitié de Halphen. Revue de Statistique Appliquée, Paris 3:1–43
Google Scholar
Mutua FM, (1994) The use of the Akaike Information Criterion in the identification of an optimum flood frequency model. Hydrol Sc J 39(3):235–244
Google Scholar
O’Connell DRH, Ostenaa DA, Lavish DR, Klinger RE (2002) Bayesian flood frequency analysis with paleohydrologic data. Water Resources Research 38(5):16–1 to 16–14
Google Scholar
Perrault L, Bobée B, Rasmussen PF (1999a) Halphen distribution system, I: Mathematical and statistical properties. J Hydrol Eng ASCE 4(3):189–199
Article Google Scholar
Perrault L, Bobée B, Rasmussen PF (1999b). Halphen distribution system, II: Parameter and quantile estimation. J Hydrol Eng ASCE 4(3):200–208
Article Google Scholar
Quesenberry CP, Kent J (1982) Selecting among probability distributions used in reliability. Technometrics 24(1):59–65
Article Google Scholar
Raftery AE (1993) Bayesian model selection in structural equation models. In: Bollen KA, Long JS (eds) Testing Structural Equation Models. Sage, Beverly Hills, pp 163–180
Google Scholar
Seshadri V (1994) The inverse Gaussian distribution: A Case Study in Exponential Families (Oxford Science Publications), Oxford University Press, p 256
Strupczewski WG, Singh VP, Feluch W (2001a) Non-stationary approach to at-site flood frequency modeling I. Maximum lielihood estimation. J Hydrol 248:123–142
Article Google Scholar
Strupczewski WG, Singh VP, Weglarczyk S (2001b) Impulse response of linear diffusion analogy as a flood probability density function. Hydrol Sc J 46(5):761–780
Article Google Scholar
Strupczewski WG, Weglarczyk S, Singh VP (2002a) Physics of flood frequency analysis. Part I. Linear convective diffusion wave model. Acta Geophys. Polonica 50(3):433–455
Google Scholar
Strupczewski WG, Weglarczyk S, Singh VP (2002b) Model error in flood frequency estimation. Acta Geophys. Pol 50(2):279–319
Google Scholar
Strupczewski WG, Singh VP, Weglarczyk S (2002c) Asymptotic bias of estimation methods caused by the assumption of false probability distribution. J of Hydrol 258(1–4):122–148
Article Google Scholar
Strupczewski WG, Weglarczyk S, Singh VP (2003) Physics of flood frequency analysis. Part II. Convective diffusion model versus Lognormal model. Acta Geophys. Polonica 51(1):85–106
Google Scholar
Takara K, Takasao T (1988) Evaluation criteria for probability distribution models in hydrologic frequency analysis. Proc. 5th IAHR international Symposium on Stochastic Hydraulics, 2–4 August, Birmingham pp 10
Tang WH (1980) Bayesian frequency analysis. J Hydraulics Division ASCE 106(HY7):1203–1218
Google Scholar
Taskar GD (1987) A comparison of methods for estimating low flow characteristics of streams. Water Resources Bulletin 23(6):1077–1083
Google Scholar
Turkman KF (1985) The choice of extremal models by Akaike’s informaton criterion. J of Hydrol 82:307–315
Article Google Scholar
Tweedie MCK (1957) Statistical properties of the inverse Gaussian distributions. I Ann Math Stat 28:362–377
Article Google Scholar
Vogel RM, Thomas WO, McMohan TA (1993) Flood flow frequency model selection in southwestern United States. J Water Resources Planning and Management 119(3):353–366
Article Google Scholar
Wilks DS (1993) Comparison of three-parameter probability distributions for representing annual extreme and partial duration precipitation series. Water Resources Research 29(10):3543–3549
Article Google Scholar
Wood EF, Rodriguez-Iturbe I (1975) A Bayesian approach to analyzing uncertainty among flood frequency models. Water Resources Research 11(6):839–848
Article Google Scholar

Download references

Acknowledgements

The authors wish to express their appreciation to the anonymous reviewers for their useful comments and suggestions.

Author information

Authors and Affiliations

Water Resources Department, Institute of Geophysics, Polish Academy of Sciences, Ksiecia Janusza 64, 01–452, Warsaw, Poland
W. G. Strupczewski, H. T. Mitosek & K. Kochanek
Department of Civil and Environmental Engineering, Louisiana State University, Baton Rouge, LA, 70803–6405, USA
V. P. Singh
Institute of Water Engineering and Water Management, Cracow Technical University, Warszawska 24, 31–155, Cracow, Poland
S. Weglarczyk

Authors

W. G. Strupczewski
View author publications
You can also search for this author in PubMed Google Scholar
H. T. Mitosek
View author publications
You can also search for this author in PubMed Google Scholar
K. Kochanek
View author publications
You can also search for this author in PubMed Google Scholar
V. P. Singh
View author publications
You can also search for this author in PubMed Google Scholar
S. Weglarczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to W. G. Strupczewski.

Appendices

Appendix A: Pertinent characteristics of the CD and LN distributions

Some relations of the competing distributions used in this study are summarized in Table A.1.

Table A.1 Moment characteristics, quantile and ML-solution of CD and LN

Full size table

Sample average log likelihood function Eq. (7):

The CD distribution

$$\hat{\Lambda}_{(CD)}^{(N)} = \ln \alpha - \ln \sqrt{\pi} - \frac{3}{2} \hat{\hbox{E}} (\ln X) - \alpha^{2} \hat{\hbox{E}} (X^{-1}) + 2 \beta - \left(\frac{\beta}{\alpha}\right)^{2} \hat{\hbox{E}} (\hbox{X}) $$

(A.1)

the LN distribution

$$\hat{\Lambda}_{(LN)}^{(N)} = - \hat{\hbox{E}} (\ln X) - \ln \sigma - \ln \sqrt{2\pi} - \frac{1}{2\sigma^{2}} \hat{\hbox{E}} (\ln \hbox{X} - \mu)^{2}$$

(A.2)

where

$$\hat{\hbox{E}} (Z) = \frac{1}{N} \sum\limits_{i = 1}^{N} z_{i} $$

The logarithms of the selection statistics S _i [Eq. (9)] for the CD and the LN models are of the form:

$$\begin{aligned} \ln \hbox{S}_{(CD)}^{(N)} &= \ln 2 + \frac{\hbox{N}}{2} \ln \left(\frac{\beta}{\pi}\right) + 2 N \beta - \frac{3\hbox{N}}{2} \hat{\hbox{E}} (\ln \hbox{X})\\ &\quad + \frac{1}{4} \left\{ \ln \left[\hat{\hbox{E}} (\hbox{X})\right] - \ln \left[\hat{\hbox{E}} (\hbox{X}^{-1}) \right] \right\}\\ &\quad + \ln \left\{K_{-N/2} \left[ 2N\beta \sqrt{\hat{\hbox{E}} (\hbox{X}) \hat{\hbox{E}} (\hbox{X}^{-1})}\right] \right\}\\ \end{aligned}$$

(A.3)

where K _ν (z) represents the modified Bessel function of second kind.

$$\ln \hbox{S}_{(LN)}^{(N)} = (1 - N) \ln \left(\sigma \sqrt{2\pi}\right) - \frac{1}{2} \ln (\hbox{N}) - \hbox{N} \cdot \hat{\hbox{E}}(\ln \hbox{X}) - \frac{N}{2\sigma^{2}}\left[\hat{\hbox{E}} (\ln^{2} \hbox{X}) - \hat{\hbox{E}}^{2} (\ln \hbox{X}) \right] $$

(A.4)

As stated in Introduction the CD model can be derived from the Halphen type A distribution. To show it let us reparameterize the Halphen type A probability density function (e.g., Perrault et. al., 1999a):

$$f_{A} (x) = \frac{1}{2\left(\frac{\alpha^{2}}{\beta}\right)^{v} K_{v} (2\beta)} x^{v-1} \exp \left[- \left(\frac{\beta^{2}}{\alpha^{2}}x + \frac{\alpha^{2}}{\beta x}\right) \right],\quad x > 0$$

(A.5)

For large values of the argument z the modified Bessel function of the second kind of order v can be approximated by the first term of the expansion:

$$K_{v} (z) = \sqrt{\frac{\pi}{2z}}e^{-z} \left[1 + \frac{u- 1}{8z} + \frac{(u - 1)(u - 9)}{2! (8z)^{2}}+\cdots\right]$$

(A.6)

where u=4v ². Substituting it into Eq. (A.5) and putting v=− 1/12, one gets the CD density function [Eq.(10)]. Note from Table A.1 that large values of the argument z (z=2β) correspond to small values of the C _V.

Appendix B: Distribution of (M ^(N) (CD|LN), M ^(N) (LN|LN)) variable

The $\hat{\hbox{M}}^{(N)} (\hbox{CD}|\hbox{LN})$ and $\hat{\hbox{M}}^{(N)} (\hbox{LN}|\hbox{LN})$ variables are highly correlated so the scatter points diagram $\left((\hat{\hbox{M}}_{s}^{(N)} (\hbox{CD}|\hbox{LN}),\; \hat{\hbox{M}}_{s}^{(N)} (\hbox{LN}|\hbox{LN})),\;s = 1,\ldots,S\right)$ cannot be instructive as it can be hardly distinguished from the straight line

$$\hbox{M}^{(N)} (\hbox{CD}|\hbox{LN}) = \hbox{M}^{(N)} (\hbox{LN}|\hbox{LN})$$

(B.1)

Therefore the whole area of the possible occurrence of the estimates was divided into seven sub-areas (Fig. B.1a, b). The diagonal corresponds to the equality of the maximum log L of the true and false distributions [Eq. B.1)]. Having the set of the S elements, one can assign each element to the respective areas and then get the rate of occurrence. Our particular interest is in the cases where despite the fact that (Fig. B.1a)

$$Median \hbox{M}^{(N)} (\hbox{LN|LN}) > Median \hbox{M}^{(N)} (\hbox{CD}|\hbox{LN})$$

(B.2)

the PCS is still lower than 0.5 (Table B.1). The PCS values of the LN model (the last column of Table B.1) are as in Table 1.

Table B.1 Rates of occurrence in each distinguished areas for selected the (C _V, N) combinations

Full size table

Looking at Fig. B.1 one learn that in the limiting cases PCS = 0 if p(F)=p(D)=0.5 (i.e., with all other rates equal to zero), or PCS = 1 if p(A)=p(B)=0.5 (i.e., with all other rates equal to zero). It leads to the conclusion that (B.2) is not informative in respect to PCS and the key is hidden in the form of log L of the both competing distributions. From rate array of all (C _V,N) combinations (not shown), certain regularities of the results are noted:(1) p(A) ≈ p(C) and they are large values if PCS is large, i.e., they grow with sample size and with the C _V value; (2) p(D) ≈ p(F and they are large values if PCS is small, i.e., they decrease with sample size and with the C _V value; (3) p(B) ≈ p(E) and they are always small values, growing with the sample size and with the C _V value; (4) p(G) ≈ 0

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strupczewski, W.G., Mitosek, H.T., Kochanek, K. et al. Probability of correct selection from lognormal and convective diffusion models based on the likelihood ratio. Stoch Environ Res Ris Assess 20, 152–163 (2006). https://doi.org/10.1007/s00477-005-0030-5

Download citation

Published: 07 February 2006
Issue Date: April 2006
DOI: https://doi.org/10.1007/s00477-005-0030-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probability of correct selection from lognormal and convective diffusion models based on the likelihood ratio

Abstract

Access this article

Similar content being viewed by others

Comparison of four methods to select the best probability distribution for frequency analysis of annual maximum precipitation using Monte Carlo simulations

Modelling multidecadal variability in flood frequency using the Two-Component Extreme Value distribution

An assessment of using subsampling method in selection of a flood frequency distribution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Pertinent characteristics of the CD and LN distributions

Appendix B: Distribution of (M ^(N) (CD|LN), M ^(N) (LN|LN)) variable

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probability of correct selection from lognormal and convective diffusion models based on the likelihood ratio

Abstract

Access this article

Similar content being viewed by others

Comparison of four methods to select the best probability distribution for frequency analysis of annual maximum precipitation using Monte Carlo simulations

Modelling multidecadal variability in flood frequency using the Two-Component Extreme Value distribution

An assessment of using subsampling method in selection of a flood frequency distribution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Pertinent characteristics of the CD and LN distributions

Appendix B: Distribution of (M (N) (CD|LN), M (N) (LN|LN)) variable

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Appendix B: Distribution of (M ^(N) (CD|LN), M ^(N) (LN|LN)) variable