Abstract
The objective of the paper is to show that the use of a discrimination procedure for selecting a flood frequency model without the knowledge of its performance for the considered underlying distributions may lead to erroneous conclusions. The problem considered is one of choosing between lognormal (LN) and convective diffusion (CD) distributions for a given random sample of flood observations. The probability density functions of these distributions are similarly shaped in the range of the main probability mass and the discrepancies grow with the increase in the value of the coefficient of variation (C V ). This problem was addressed using the likelihood ratio (LR) procedure. Simulation experiments were performed to determine the probability of correct selection (PCS) for the LR method. Pseudo-random samples were generated for several combinations of sample sizes and the coefficient of variation values from each of the two distributions. Surprisingly, the PCS of the LN model was twice smaller than that of the CD model, rarely exceeding 50%. The results obtained from simulation were analyzed and compared both with those obtained using real data and with the results obtained from another selection procedure known as the QK method. The results from the QK are just the opposite to that of the LR procedure.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr AC-19(16):716–722
Atkinson AR (1970) A method of discriminating between models. J Royal Statistical Society B 32:323–345
Bain LJ, Engelhardt M (1980) Probability of correct selection of Weibull versus Gamma based on likelihood ratio Commun Statist-Theor Meth A 9(4):375–381
Berger JO (1985) Statistical decision theory and bayesian analysis. Springer, Berlin Heidelberg New York
Bernardo JM, Smith AFM (2000) Bayesian theory (Wiley Series in Probability and Statistics), John Wiley & Sons
Bobee B, Rasmussen PF (1995) Recent advances in flood frequency analysis. Reviews of Geophy Supplement: 1111–1116
Bobee B, Cavadia G, Ashkar F, Bernier J, Rasmussen PF (1993) Towards a systematic approach to comparing distributions used in flood frequency analysis. J Hydrology 142:121–136
Cunnane C (1985) Factors affecting choice of distribution for flood series. Hydrological Sci J 30(1, 3):25–36
D’Agostino RB, Stephens MA (1986) Goodness- of- fit techniques. Marcel Dekker, Inc., New York, Basel
Dooge JCI (1973) Linear theory of hydrologic systems. Tech. Bull. 1468, Agricultural Research Service, Washington
Dumonceaux R, Antle CE, Hass G (1973) Likelihood ratio test for discrimination between two models with unknown location and scale parameters. Technometrics 2:55–65
Dyer AR (1973) Discrimination procedures for separate families of hypotheses. J American Statistical Associates 68(344):970–974
Folks JL, Chhikara RS (1978) The inverse Gaussian distribution and its statistical application – a review. JR Stat Soc Ser.B 40(3):263–289
Fortin V, Bernier J, Bobee B (1997a) Simulation, Bayes, and bootstrap in statistical hydrology. Water resources Research 33(3):439–448
Fortin V, Bobee B, Bernier J (1997b) Rational approach to comparison of flood distribution by simulation. J Hydrologic Engineering 2(3):95–103
Gunasekara TA, Cunnane C (1991) Expected probabilities of exceedance form non-normal flood distributions. J Hydrology 128:101–113
Gunasekara TA, Cunnane C (1992) Split sampling technique for selecting a flood frequenct analysis procedure. J Hydrology 130:189–200
Gupta VK (1970) Selection of flood frequency models. Water Resources Research 6(4):1193–1198
Hájek J, Šidák Z (1967) Theory of Rank Tests, Academic Press, New York, Sec. 2.2
Haktanir T (1992) Comparison of various flood frequency distributions using annual peaks data of rivers in Anatolia. J Hydrology 136:1–31
Halphen E (1941) Sur un nouveau type de courbe de fréquence. Comptes Rendius de l’Académie des Sciences. Tome 213, Paris 633–635
Hosking JRM, Wallis JR (1987) Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29:339–349
Hosking JRM, Wallis JR (1997) Regional Frequency Analysis. An Approach Based on L–Moments. Cambridge University Press, 224 pp
Hosking JRM, Wallis JR, Wood EF (1985) Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 27:251–261
Johnston NL, Kotz S (1970) Distribution in statistics: Continuous Univariate Distributions 1, Houghton-Mifflin, Boston
Kappenman RF (1982) On a method for selecting a distributional model. Commun in Statistics – Theory Meth 11:663–672
Kappenman RF (1988) A simple method for choosing between the lognormal and Weibull models. Elsevier Sc. Publ., Statistics & Probability Letters 7:123–126
Kendall MG, Stuart A. (1969) The advanced theory of statistics. V.1. Distribution Theory, Ch.11 and 12. Charles Griffin, London
Kuczera G (1982) Robust flood frequency models. Water Resour Res 18(2):315–324
Landwehr JM, Matalas NC, Wallis JR (1980) Quantile estimation with more or less floodlike distributions. Water Resour Res 16(1):547–555
Madsen H, Rosbjerg D (1997) Generalized least squares and empirical Bayes estimation in regional partial duration series index-flood modelling. Water Resources Research 33(4):771–781
Mitosek HT, Strupczewski WG, Singh VP (2002) Toward an objective choice of an annual flood peak distribution. Published on CD ROM: Advances in Hydro-Science and Engineering, The 5th International Conf. On Hydro-Science & - Engineering, Warsaw
Morlat G (1956) Les lois de probabilitié de Halphen. Revue de Statistique Appliquée, Paris 3:1–43
Mutua FM, (1994) The use of the Akaike Information Criterion in the identification of an optimum flood frequency model. Hydrol Sc J 39(3):235–244
O’Connell DRH, Ostenaa DA, Lavish DR, Klinger RE (2002) Bayesian flood frequency analysis with paleohydrologic data. Water Resources Research 38(5):16–1 to 16–14
Perrault L, Bobée B, Rasmussen PF (1999a) Halphen distribution system, I: Mathematical and statistical properties. J Hydrol Eng ASCE 4(3):189–199
Perrault L, Bobée B, Rasmussen PF (1999b). Halphen distribution system, II: Parameter and quantile estimation. J Hydrol Eng ASCE 4(3):200–208
Quesenberry CP, Kent J (1982) Selecting among probability distributions used in reliability. Technometrics 24(1):59–65
Raftery AE (1993) Bayesian model selection in structural equation models. In: Bollen KA, Long JS (eds) Testing Structural Equation Models. Sage, Beverly Hills, pp 163–180
Seshadri V (1994) The inverse Gaussian distribution: A Case Study in Exponential Families (Oxford Science Publications), Oxford University Press, p 256
Strupczewski WG, Singh VP, Feluch W (2001a) Non-stationary approach to at-site flood frequency modeling I. Maximum lielihood estimation. J Hydrol 248:123–142
Strupczewski WG, Singh VP, Weglarczyk S (2001b) Impulse response of linear diffusion analogy as a flood probability density function. Hydrol Sc J 46(5):761–780
Strupczewski WG, Weglarczyk S, Singh VP (2002a) Physics of flood frequency analysis. Part I. Linear convective diffusion wave model. Acta Geophys. Polonica 50(3):433–455
Strupczewski WG, Weglarczyk S, Singh VP (2002b) Model error in flood frequency estimation. Acta Geophys. Pol 50(2):279–319
Strupczewski WG, Singh VP, Weglarczyk S (2002c) Asymptotic bias of estimation methods caused by the assumption of false probability distribution. J of Hydrol 258(1–4):122–148
Strupczewski WG, Weglarczyk S, Singh VP (2003) Physics of flood frequency analysis. Part II. Convective diffusion model versus Lognormal model. Acta Geophys. Polonica 51(1):85–106
Takara K, Takasao T (1988) Evaluation criteria for probability distribution models in hydrologic frequency analysis. Proc. 5th IAHR international Symposium on Stochastic Hydraulics, 2–4 August, Birmingham pp 10
Tang WH (1980) Bayesian frequency analysis. J Hydraulics Division ASCE 106(HY7):1203–1218
Taskar GD (1987) A comparison of methods for estimating low flow characteristics of streams. Water Resources Bulletin 23(6):1077–1083
Turkman KF (1985) The choice of extremal models by Akaike’s informaton criterion. J of Hydrol 82:307–315
Tweedie MCK (1957) Statistical properties of the inverse Gaussian distributions. I Ann Math Stat 28:362–377
Vogel RM, Thomas WO, McMohan TA (1993) Flood flow frequency model selection in southwestern United States. J Water Resources Planning and Management 119(3):353–366
Wilks DS (1993) Comparison of three-parameter probability distributions for representing annual extreme and partial duration precipitation series. Water Resources Research 29(10):3543–3549
Wood EF, Rodriguez-Iturbe I (1975) A Bayesian approach to analyzing uncertainty among flood frequency models. Water Resources Research 11(6):839–848
Acknowledgements
The authors wish to express their appreciation to the anonymous reviewers for their useful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Pertinent characteristics of the CD and LN distributions
Some relations of the competing distributions used in this study are summarized in Table A.1.
Sample average log likelihood function Eq. (7):
The CD distribution
the LN distribution
where
The logarithms of the selection statistics S i [Eq. (9)] for the CD and the LN models are of the form:
where K ν (z) represents the modified Bessel function of second kind.
As stated in Introduction the CD model can be derived from the Halphen type A distribution. To show it let us reparameterize the Halphen type A probability density function (e.g., Perrault et. al., 1999a):
For large values of the argument z the modified Bessel function of the second kind of order v can be approximated by the first term of the expansion:
where u=4v 2. Substituting it into Eq. (A.5) and putting v=− 1/12, one gets the CD density function [Eq.(10)]. Note from Table A.1 that large values of the argument z (z=2β) correspond to small values of the C V .
Appendix B: Distribution of (M (N) (CD|LN), M (N) (LN|LN)) variable
The \(\hat{\hbox{M}}^{(N)} (\hbox{CD}|\hbox{LN})\) and \(\hat{\hbox{M}}^{(N)} (\hbox{LN}|\hbox{LN})\) variables are highly correlated so the scatter points diagram \(\left((\hat{\hbox{M}}_{s}^{(N)} (\hbox{CD}|\hbox{LN}),\; \hat{\hbox{M}}_{s}^{(N)} (\hbox{LN}|\hbox{LN})),\;s = 1,\ldots,S\right)\) cannot be instructive as it can be hardly distinguished from the straight line
Therefore the whole area of the possible occurrence of the estimates was divided into seven sub-areas (Fig. B.1a, b). The diagonal corresponds to the equality of the maximum log L of the true and false distributions [Eq. B.1)]. Having the set of the S elements, one can assign each element to the respective areas and then get the rate of occurrence. Our particular interest is in the cases where despite the fact that (Fig. B.1a)
the PCS is still lower than 0.5 (Table B.1). The PCS values of the LN model (the last column of Table B.1) are as in Table 1.
Looking at Fig. B.1 one learn that in the limiting cases PCS = 0 if p(F)=p(D)=0.5 (i.e., with all other rates equal to zero), or PCS = 1 if p(A)=p(B)=0.5 (i.e., with all other rates equal to zero). It leads to the conclusion that (B.2) is not informative in respect to PCS and the key is hidden in the form of log L of the both competing distributions. From rate array of all (C V ,N) combinations (not shown), certain regularities of the results are noted:(1) p(A) ≈ p(C) and they are large values if PCS is large, i.e., they grow with sample size and with the C V value; (2) p(D) ≈ p(F and they are large values if PCS is small, i.e., they decrease with sample size and with the C V value; (3) p(B) ≈ p(E) and they are always small values, growing with the sample size and with the C V value; (4) p(G) ≈ 0
Rights and permissions
About this article
Cite this article
Strupczewski, W.G., Mitosek, H.T., Kochanek, K. et al. Probability of correct selection from lognormal and convective diffusion models based on the likelihood ratio. Stoch Environ Res Ris Assess 20, 152–163 (2006). https://doi.org/10.1007/s00477-005-0030-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-005-0030-5