Abstract
Stochastic modeling of rainfall data is an important area in meteorology. The gamma distribution is a widely used probability model for non-zero rainfall. Typically the choice of the distribution for such meteorological studies is based on two goodness-of-fit tests—the Pearson’s Chi-square test and the Kolmogorov–Smirnov test. Inspired by the index of dispersion introduced by Fisher (Statistical methods for research workers. Hafner Publishing Company Inc., New York, 1925), Mooley (Mon Weather Rev 101:160–176, 1973) proposed the variance test as a goodness-of-fit measure in this context and a number of researchers have implemented it since then. We show that the asymptotic distribution of the test statistic for the variance test is generally not comparable to any central Chi-square distribution and hence the test is erroneous. We also describe a method for checking the validity of the asymptotic distribution for a class of distributions. We implement the erroneous test on some simulated, as well as real datasets and demonstrate how it leads to some wrong conclusions.
Similar content being viewed by others
References
Barger GL, Thom Herbert CS (1949) Evaluation of drought hazard. Agron J 41:519–526
Berkson J (1938) Some difficulties of interpretation encountered in the application of the chi-square test. J Am Stat Assoc 33:526–536
Berkson J (1940) A note on the chi-square test, the Poisson and the Binomial. J Am Stat Assoc 35:362–367
Biswas BC, Khambete NN, Mondal SS (1989) Weekly rainfall probability analysis of dry farming tract of Tamil Nadu. Mausam 40:197–206
Burgueño A, Martnez MD, Lana X, Serra C (2005) Statistical distributions if the daily rainfall regime in Catalonia (North-eastern Spain) for the years 1950–2000. Int J Climatol 25:1381–1403
Cochran WG (1954) Some methods for strengthening the common chi-square tests. Biometrics 10:417–451
Cramer H (1946) Mathematical methods of statistics. Princeton University Press, Princeton
Duan J, Sikka AK, Grant GE (1995) A comparison of stochastic models for generating daily precipitation at the H. J. Andrews experiment forest. Northwest Sci 69:318–329
Fisher RA (1925) Statistical methods for research workers. Hafner Publishing Company Inc., New York
Fisher RA, Yates F (1957) Statistical tables for biological, agricultural and medical research. Oliver and Boyd, Edinburgh
Goel AK, Singh JK (1999) Incomplete gamma distribution for weekly rainfall of Unai, Himachal Pradesh. J Agric Eng 36:61–74
Hargreaves GH (1975) Water requirements manual for irrigated crops and rainfed agriculture. Utah State University Publication, Utah
Hazra A, Bhattacharya S, Banik P (2014) Modeling Nakshatra-wise rainfall data of the eastern plateau region of India. Mausam 65:264–270
Kwaku XS, Duke O (2007) Characterization and frequency analysis of one day annual maximum and two to five consecutive days maximum rainfall of Accra, Ghana. ARPN J Eng Appl Sci 2:27–31
Mooley DA, Crutcher HL (1968) An application of gamma distribution function to Indian Rainfall. 47, ESSA Technical Report EDS 5, US Department of Commerce, Environmental Data Service, Silver Spring
Mooley DA (1973) Gamma distribution probability model for Asian summer monsoon monthly rainfall. Mon Weather Rev 101:160–176
Pramanik SK, Jagannathan P (1953) Climate changes in India (I)-rainfall. Indian J Meteorol Geophys 4:291–309
Rafa HAS, Khanbilvardi R (2014) Frequency analysis of the monthly rainfall data at Sulaimania Region, Iraq. Am J Eng Res 3:212–222
Rao CR, Chakravarti IM (1956) Some small sample tests of significance for a Poisson distribution. Biometrics 12:264–282
Sankaranarayanan D (1933) On the nature of frequency distribution of precipitation in India during monsoon months June to September. India Meteorol Dep Sci Notes 5:97–107
Sarker RP, Biswas BC, Khambete NN (1982) Probability analysis for short period rainfall in dry farming tract in India. Mausam 33:269–284
Sen Z, Eljadid AG (1999) Rainfall distribution functions for Libya and rainfall prediction. Hydrol Sci J 4:665–680
Sharma MA, Singh JB (2010) Use of probability distribution in rainfall analysis. N Y Sci J 3:40–49
Todorovic P, Woolhiser DA (1975) A stochastic model of n-day precipitation. J Appl Meteorol 14:17–24
Upadhyaya A, Kumar J, Kumar P, Sikka AK (2009) Analysis of rainfall in Patna Main Canal Command employing two parameter gamma probability distribution model. Indian J Soil Conserv 37:17–21
Acknowledgements
The authors would like to thank Professor Jennifer A. Hoeting from Colorado State University and an anonymous reviewer for a number of valuable suggestions and corrections.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: J. T. Fasullo.
Appendix
Appendix
Theorem 1
The asymptotic distribution of the variance test statistic, under the null hypothesis that a random sample of size n comes from a one-parameter exponential distribution, is not comparable with the central \(\chi ^2\) distribution with \(n-1\) degrees of freedom.
Proof
Suppose that \(X_1,X_2,\ldots ,X_n\) is a random sample of size n from the exponential distribution with mean \(\lambda\), where the probability density function (PDF) is given by
In the above expression, the mean \(\lambda\) is assumed unknown and estimated by its MLE. The likelihood function is given by
Thus, the estimate of the unknown parameter \(\lambda\) is \(\lambda _{\text {MLE}}=\frac{X_1+ X_2+\cdots + X_n}{n}=\overline{X}\), the sample mean. The population variance is \(\sigma ^2_F = \lambda ^2\) and thus, the MLE of the population variance is \(\widehat{\sigma ^2_F} = {\widehat{\lambda ^2}_{\text {MLE}}} = {({\widehat{\lambda }}_{\text {MLE}})}^2={\overline{X}}^2\). Hence, the test statistic in our case is given by
For the remaining part of the proof we follow Rao and Chakravarti (1956) who provide justification of the asymptotic properties of the variance test in the case of the Poisson distribution, but make necessary modifications to accommodate our case of continuous distribution.
From the above likelihood function, it is easy to note that the sample total given by \(T=X_1+ X_2+\cdots + X_n\) is a sufficient statistic for \(\lambda\) (follows from the Neyman-Fisher factorization theorem). Here T follows the gamma distribution with shape parameter n and scale parameter \(\lambda\) (it is easy to show using the characteristic function or moment generating function arguments); the density is given by
The conditional density of \(X_1,X_2,\ldots ,X_n\) given \(T=t\) is given by
i.e., independent of the values \(x_1,x_2,\ldots ,x_n\). Hence, \(E(X_i|T)= \overline{X}=T/n\) (each \(X_i\) receives same weight, i.e., 1 / n) and thus we can express the variance test statistic in the form
Now, by the definition of conditional expectation, we have, for any measurable function \(\phi (x_1,x_2,\ldots ,x_n)\):
which, in our case, translates into
Therefore, knowing the total expectation \(E(\phi )\), the conditional expectation \(E(\phi |T=t)\) can be easily obtained. Let us consider the statistic \(S^2=\sum _{i=1}^n(X_i-\overline{X})^2=\sum _{i=1}^{n}{X_i}^2-n\overline{X}^2\) whose moments are known functions of \(\lambda\). Using the above definition of conditional expectation we derive the conditional moments of \(\phi (x_1,x_2\ldots ,x_n)=S^2\) as follows.
The term \(\frac{1}{n-1}S^2\) is an unbiased estimator of the population variance \(\lambda ^2\) and hence, \(E(S^2)=(n-1)\lambda ^2\). Thus, we have
Now, we can write \(\lambda ^{n+2}=\int _{0}^{\infty }\frac{1}{\Gamma (n)}{\text{e}}^{-\frac{t}{\lambda }}t^{n+1}{\text {d}}t\), and thus it follows that
We know that if \(\int _{0}^{\infty }f_1(x){\text{e}}^{-ax}\text {d}x=\int _{0}^{\infty }f_2(x){\text{e}}^{-ax}\text {d}x\) where \(f_1(x),f_2(x)\) and both are continuous, a is some positive constant, then, \(f_1=f_2\) by the uniqueness of the Laplace transform. As a consequence,
and hence
Similarly, we obtain
and
Thus,
and so,
Since \({\text {Var}}(\chi ^2_\nu |T=t)\) is independent of t, it follows that \({\text {Var}}(\chi ^2_\nu )=\text {Var}(\chi ^2_\nu |T=t)\), which does not conform with the variance of the central Chi-square distribution with \((n-1)\) degrees of freedom which is \(2(n-1)\). This proves the theorem.
Theorem 2
If a random sample of size n comes from a population with finite fourth moment where the population variance is a non-zero continuously differentiable function f of the population mean under the null hypothesis, then under the condition (which we refer to as the “condition of approximate equality”)
where \(\mu , \sigma ^2, \mu _3, \mu _4\) are the mean, variance, third and fourth central moments of the population, respectively, the variance test statistic is asymptotically conformable with the central \(\chi ^2\) distribution with \(n-1\) degrees of freedom. If a function like f exists and the “condition of approximate equality” fails, then the variance test statistic is not asymptotically central \(\chi ^2_{n-1}\).
Proof
Suppose that \(X_1,X_2,\ldots ,X_n\) is a random sample from a population where the sufficient condition on moment existence and the existence of a non-zero continuously differentiable function f are satisfied under the null hypothesis.
Applying the bivariate central limit theorem (CLT) in the context of sample moments, we obtain the joint asymptotic distribution of sample mean \(\overline{X}_n\) and sample variance \(S^2_n\) as
where \(\mu\), \(\sigma ^2\), \(\mu _3\) and \(\mu _4\) have same meaning as mentioned earlier.
In the case of asymptotic normality of smooth functions of sample moments, Cramer (1946) shows that for a mapping \(\varvec{g}:{\mathbb {R}^d}\rightarrow {\mathbb {R}^k}\) such that \(\varvec{g}'(\mathbf x)\), the derivative of \(\varvec{g}(\mathbf x)\) at the point \(\mathbf x\), is continuous in a neighborhood of \(\varvec{\theta }\in \mathbb {R}^d\), if \(\mathbf T_n\) is a sequence of d-dimensional random vectors such that \(\sqrt{n}(\mathbf T_n-\varvec{\theta })\rightarrow N_d(\mathbf 0,\varvec{\Sigma })\) where \(\varvec{\Sigma }\) is a \(d\times d\) covariance matrix, then
In our case, the population mean is estimated by the sample mean, and since \(\sigma ^2=f(\mu )\), the population variance is estimated by \(f(\overline{X}_n)\) which is neither equal nor proportional to \(S^2_n\) (otherwise \(\sigma ^2\) can not be written as a function of \(\mu\) only). Hence \(\chi ^2_\nu =\sum _{i=1}^{n}\frac{{(X_i-\overline{X}_n)}^2}{f(\overline{X}_n)}= n\frac{{S_n}^2}{f(\overline{X}_n)}\) can be used as a test statistic. Then using the delta method (2) associated with (1), we have
Now, if
where \(\alpha =2\), then the asymptotic distribution of \(\chi ^2_{\nu }\) is N(n, 2n). Since \(N(n,2n)\approx \chi ^2_{n-1}\) (see the next “Result”), in this case the variance test statistic \(\chi ^2_\nu\) is asymptotically distributed as the central \(\chi ^2_{n-1}\) distribution.
On the other hand, if \(\alpha\) is significantly different from 2, then \(E(\chi ^2_\nu )=n\) but \(Var(\chi ^2_\nu )\ne 2(n-1)\), even asymptotically. Hence, the asymptotic distribution of \(\chi ^2_\nu\) can not be central \(\chi ^2_{n-1}\) in this case. \(\square\)
Result. For large values of n, \(N(n,2n)\approx \chi ^2_{n-1}\).
Proof
Consider an independent and identical (i.i.d.) sequence of random variables \(Y_1,\ldots ,Y_{n-1},\ldots\) which follow N(0, 1). Then, \({Y_1^2}+\cdots +{Y_{n-1}^2}\) follows central \(\chi ^2_{n-1}\). Now, by Central Limit Theorem, \(\frac{{Y_1^2}+\cdots +{Y_{n-1}^2}-(n-1)}{\sqrt{2(n-1)}} \overset{d}{\rightarrow }N(0,1)\). Thus, \({Y_1^2}+\cdots +{Y_{n-1}^2}\) has asymptotic distribution \(N(n-1,2(n-1))\). Now, for large enough n, \(N(n-1, 2(n-1))\approx N(n, 2n)\). Thus, the test statistic can be referred to the central Chi-square table with \(n-1\) degrees of freedom.
Rights and permissions
About this article
Cite this article
Hazra, A., Bhattacharya, S., Banik, P. et al. A note on the misuses of the variance test in meteorological studies . Meteorol Atmos Phys 129, 645–658 (2017). https://doi.org/10.1007/s00703-016-0490-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00703-016-0490-9