Skip to main content
Log in

A note on the misuses of the variance test in meteorological studies

  • Original Paper
  • Published:
Meteorology and Atmospheric Physics Aims and scope Submit manuscript

Abstract

Stochastic modeling of rainfall data is an important area in meteorology. The gamma distribution is a widely used probability model for non-zero rainfall. Typically the choice of the distribution for such meteorological studies is based on two goodness-of-fit tests—the Pearson’s Chi-square test and the Kolmogorov–Smirnov test. Inspired by the index of dispersion introduced by Fisher (Statistical methods for research workers. Hafner Publishing Company Inc., New York, 1925), Mooley (Mon Weather Rev 101:160–176, 1973) proposed the variance test as a goodness-of-fit measure in this context and a number of researchers have implemented it since then. We show that the asymptotic distribution of the test statistic for the variance test is generally not comparable to any central Chi-square distribution and hence the test is erroneous. We also describe a method for checking the validity of the asymptotic distribution for a class of distributions. We implement the erroneous test on some simulated, as well as real datasets and demonstrate how it leads to some wrong conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Barger GL, Thom Herbert CS (1949) Evaluation of drought hazard. Agron J 41:519–526

    Article  Google Scholar 

  • Berkson J (1938) Some difficulties of interpretation encountered in the application of the chi-square test. J Am Stat Assoc 33:526–536

    Article  Google Scholar 

  • Berkson J (1940) A note on the chi-square test, the Poisson and the Binomial. J Am Stat Assoc 35:362–367

    Article  Google Scholar 

  • Biswas BC, Khambete NN, Mondal SS (1989) Weekly rainfall probability analysis of dry farming tract of Tamil Nadu. Mausam 40:197–206

    Google Scholar 

  • Burgueño A, Martnez MD, Lana X, Serra C (2005) Statistical distributions if the daily rainfall regime in Catalonia (North-eastern Spain) for the years 1950–2000. Int J Climatol 25:1381–1403

    Article  Google Scholar 

  • Cochran WG (1954) Some methods for strengthening the common chi-square tests. Biometrics 10:417–451

    Article  Google Scholar 

  • Cramer H (1946) Mathematical methods of statistics. Princeton University Press, Princeton

    Google Scholar 

  • Duan J, Sikka AK, Grant GE (1995) A comparison of stochastic models for generating daily precipitation at the H. J. Andrews experiment forest. Northwest Sci 69:318–329

    Google Scholar 

  • Fisher RA (1925) Statistical methods for research workers. Hafner Publishing Company Inc., New York

    Google Scholar 

  • Fisher RA, Yates F (1957) Statistical tables for biological, agricultural and medical research. Oliver and Boyd, Edinburgh

    Google Scholar 

  • Goel AK, Singh JK (1999) Incomplete gamma distribution for weekly rainfall of Unai, Himachal Pradesh. J Agric Eng 36:61–74

    Google Scholar 

  • Hargreaves GH (1975) Water requirements manual for irrigated crops and rainfed agriculture. Utah State University Publication, Utah

    Google Scholar 

  • Hazra A, Bhattacharya S, Banik P (2014) Modeling Nakshatra-wise rainfall data of the eastern plateau region of India. Mausam 65:264–270

    Google Scholar 

  • Kwaku XS, Duke O (2007) Characterization and frequency analysis of one day annual maximum and two to five consecutive days maximum rainfall of Accra, Ghana. ARPN J Eng Appl Sci 2:27–31

    Google Scholar 

  • Mooley DA, Crutcher HL (1968) An application of gamma distribution function to Indian Rainfall. 47, ESSA Technical Report EDS 5, US Department of Commerce, Environmental Data Service, Silver Spring

  • Mooley DA (1973) Gamma distribution probability model for Asian summer monsoon monthly rainfall. Mon Weather Rev 101:160–176

    Article  Google Scholar 

  • Pramanik SK, Jagannathan P (1953) Climate changes in India (I)-rainfall. Indian J Meteorol Geophys 4:291–309

    Google Scholar 

  • Rafa HAS, Khanbilvardi R (2014) Frequency analysis of the monthly rainfall data at Sulaimania Region, Iraq. Am J Eng Res 3:212–222

    Google Scholar 

  • Rao CR, Chakravarti IM (1956) Some small sample tests of significance for a Poisson distribution. Biometrics 12:264–282

    Article  Google Scholar 

  • Sankaranarayanan D (1933) On the nature of frequency distribution of precipitation in India during monsoon months June to September. India Meteorol Dep Sci Notes 5:97–107

    Google Scholar 

  • Sarker RP, Biswas BC, Khambete NN (1982) Probability analysis for short period rainfall in dry farming tract in India. Mausam 33:269–284

    Google Scholar 

  • Sen Z, Eljadid AG (1999) Rainfall distribution functions for Libya and rainfall prediction. Hydrol Sci J 4:665–680

    Article  Google Scholar 

  • Sharma MA, Singh JB (2010) Use of probability distribution in rainfall analysis. N Y Sci J 3:40–49

    Google Scholar 

  • Todorovic P, Woolhiser DA (1975) A stochastic model of n-day precipitation. J Appl Meteorol 14:17–24

    Article  Google Scholar 

  • Upadhyaya A, Kumar J, Kumar P, Sikka AK (2009) Analysis of rainfall in Patna Main Canal Command employing two parameter gamma probability distribution model. Indian J Soil Conserv 37:17–21

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Professor Jennifer A. Hoeting from Colorado State University and an anonymous reviewer for a number of valuable suggestions and corrections.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnab Hazra.

Additional information

Responsible Editor: J. T. Fasullo.

Appendix

Appendix

Theorem 1

The asymptotic distribution of the variance test statistic, under the null hypothesis that a random sample of size n comes from a one-parameter exponential distribution, is not comparable with the central \(\chi ^2\) distribution with \(n-1\) degrees of freedom.

Proof

Suppose that \(X_1,X_2,\ldots ,X_n\) is a random sample of size n from the exponential distribution with mean \(\lambda\), where the probability density function (PDF) is given by

$$\begin{aligned} f(x)&= {\left\{ \begin{array}{ll} \frac{1}{\lambda } {\text{e}}^{-\frac{x}{\lambda }} &{} \quad \text {if }\; x > 0,\\ 0 &{} \quad \text {if } \; x \le 0. \end{array}\right. } \end{aligned}$$

In the above expression, the mean \(\lambda\) is assumed unknown and estimated by its MLE. The likelihood function is given by

$$\begin{aligned} L(\lambda | X_1, \ldots , X_n)= & {} \prod _{i=1}^{n} f(X_i) = \frac{1}{\lambda ^n} {\text{e}}^{- \frac{X_1+ X_2+\cdots + X_n}{\lambda }} \end{aligned}$$

Thus, the estimate of the unknown parameter \(\lambda\) is \(\lambda _{\text {MLE}}=\frac{X_1+ X_2+\cdots + X_n}{n}=\overline{X}\), the sample mean. The population variance is \(\sigma ^2_F = \lambda ^2\) and thus, the MLE of the population variance is \(\widehat{\sigma ^2_F} = {\widehat{\lambda ^2}_{\text {MLE}}} = {({\widehat{\lambda }}_{\text {MLE}})}^2={\overline{X}}^2\). Hence, the test statistic in our case is given by

$$\begin{aligned} \chi ^2_\nu= & {} \sum _{i=1}^{n}\frac{{(X_i-\overline{X})}^2}{\overline{X}^2}. \end{aligned}$$

For the remaining part of the proof we follow Rao and Chakravarti (1956) who provide justification of the asymptotic properties of the variance test in the case of the Poisson distribution, but make necessary modifications to accommodate our case of continuous distribution.

From the above likelihood function, it is easy to note that the sample total given by \(T=X_1+ X_2+\cdots + X_n\) is a sufficient statistic for \(\lambda\) (follows from the Neyman-Fisher factorization theorem). Here T follows the gamma distribution with shape parameter n and scale parameter \(\lambda\) (it is easy to show using the characteristic function or moment generating function arguments); the density is given by

$$\begin{aligned} f_T(t)= & {} \frac{1}{\lambda ^n\Gamma (n)}{\text{e}}^{-\frac{t}{\lambda }}t^{n-1}. \end{aligned}$$

The conditional density of \(X_1,X_2,\ldots ,X_n\) given \(T=t\) is given by

$$\begin{aligned} f_{X_1,X_2,\ldots ,X_n |T=t}(x_1,x_2,\ldots ,x_n )=\frac{\Gamma (n)}{t^{n-1}}. \end{aligned}$$

i.e., independent of the values \(x_1,x_2,\ldots ,x_n\). Hence, \(E(X_i|T)= \overline{X}=T/n\) (each \(X_i\) receives same weight, i.e., 1 / n) and thus we can express the variance test statistic in the form

$$\begin{aligned} \chi ^2_\nu= & {} \sum _{i=1}^{n}\frac{{(X_i-E(X_i|T))}^2}{E(X_i|T)^2}. \end{aligned}$$

Now, by the definition of conditional expectation, we have, for any measurable function \(\phi (x_1,x_2,\ldots ,x_n)\):

$$\int _{0}^{\infty }E(\phi |T=t)f_T(t){\text {d}}t= E(\phi ),$$

which, in our case, translates into

$$\int _{0}^{\infty }E(\phi |T=t){\text{e}}^{-\frac{t}{\lambda }}t^{n-1}{\text {d}}t= E(\phi ){\lambda ^n}\Gamma (n).$$

Therefore, knowing the total expectation \(E(\phi )\), the conditional expectation \(E(\phi |T=t)\) can be easily obtained. Let us consider the statistic \(S^2=\sum _{i=1}^n(X_i-\overline{X})^2=\sum _{i=1}^{n}{X_i}^2-n\overline{X}^2\) whose moments are known functions of \(\lambda\). Using the above definition of conditional expectation we derive the conditional moments of \(\phi (x_1,x_2\ldots ,x_n)=S^2\) as follows.

The term \(\frac{1}{n-1}S^2\) is an unbiased estimator of the population variance \(\lambda ^2\) and hence, \(E(S^2)=(n-1)\lambda ^2\). Thus, we have

$$\int _{0}^{\infty }E(S^2|T=t){\text{e}}^{-\frac{t}{\lambda }}t^{n-1}{\text {d}}t= (n-1){\lambda ^{n+2}}\Gamma (n).$$

Now, we can write \(\lambda ^{n+2}=\int _{0}^{\infty }\frac{1}{\Gamma (n)}{\text{e}}^{-\frac{t}{\lambda }}t^{n+1}{\text {d}}t\), and thus it follows that

$$\begin{aligned} \int _{0}^{\infty }E(S^2|T=t){\text{e}}^{-\frac{t}{\lambda }}t^{n-1}{\text {d}}t= & {} \int _{0}^{\infty }\frac{(n-1)\Gamma (n)}{\Gamma (n+2)}{\text{e}}^{-\frac{t}{\lambda }}t^{n+1}{\text {d}}t. \end{aligned}$$

We know that if \(\int _{0}^{\infty }f_1(x){\text{e}}^{-ax}\text {d}x=\int _{0}^{\infty }f_2(x){\text{e}}^{-ax}\text {d}x\) where \(f_1(x),f_2(x)\) and both are continuous, a is some positive constant, then, \(f_1=f_2\) by the uniqueness of the Laplace transform. As a consequence,

$$\begin{aligned} E(S^2|T=t)= \frac{(n-1)}{n(n+1)}t^2, \end{aligned}$$

and hence

$$\begin{aligned} E(\chi ^2_\nu |T=t)&= E\left( S^2\frac{n^2}{T^2}|T=t\right) \\&= \frac{n^2}{t^2}\frac{(n-1)}{n(n+1)}t^2\\&= \frac{(n-1)n}{n+1}\\&\approx n-1. \end{aligned}$$

Similarly, we obtain

$$E(S^4)= \frac{(n-1)(n^2+7n-6)}{n}\lambda ^4$$

and

$$\begin{aligned} E(S^4|T=t)= & {} \frac{(n-1)(n^2+7n-6)\Gamma (n)}{n\Gamma (n+4)}t^4. \end{aligned}$$

Thus,

$$\begin{aligned} E((\chi _\nu ^2)^2|T=t)&= E({S^4}\frac{n^4}{T^4}|T=t) \\&= \frac{n^4}{t^4}\frac{(n-1)(n^2+7n-6)\Gamma (n)}{n\Gamma (n+4)}t^4 \\&= \frac{{n^2}(n-1)(n^2+7n-6)}{(n+3)(n+2)(n+1)}, \end{aligned}$$

and so,

$$\begin{aligned} {\text{Var}}(\chi _{\nu }^{2} |T = t) & = E((\chi _{\nu }^{2} )^{2} |T = t) - (E(\chi _{\nu }^{2} |T = t))^{2} \\ & = 4(n - 1)\frac{1}{{(1 + \frac{1}{n})^{2} (1 + \frac{2}{n})(1 + \frac{3}{n})}} \\ & \approx 4(n - 1). \\ \end{aligned}$$

Since \({\text {Var}}(\chi ^2_\nu |T=t)\) is independent of t, it follows that \({\text {Var}}(\chi ^2_\nu )=\text {Var}(\chi ^2_\nu |T=t)\), which does not conform with the variance of the central Chi-square distribution with \((n-1)\) degrees of freedom which is \(2(n-1)\). This proves the theorem.

Theorem 2

If a random sample of size n comes from a population with finite fourth moment where the population variance is a non-zero continuously differentiable function f of the population mean under the null hypothesis, then under the condition (which we refer to as the “condition of approximate equality”)

$$\frac{1}{f(\mu )^2}\left( \sigma ^2{(f'(\mu ))^2}-2\mu _3f'(\mu )+ (\mu _4-\sigma ^4)\right)\approx 2,$$

where \(\mu , \sigma ^2, \mu _3, \mu _4\) are the mean, variance, third and fourth central moments of the population, respectively, the variance test statistic is asymptotically conformable with the central \(\chi ^2\) distribution with \(n-1\) degrees of freedom. If a function like f exists and the “condition of approximate equality” fails, then the variance test statistic is not asymptotically central \(\chi ^2_{n-1}\).

Proof

Suppose that \(X_1,X_2,\ldots ,X_n\) is a random sample from a population where the sufficient condition on moment existence and the existence of a non-zero continuously differentiable function f are satisfied under the null hypothesis.

Applying the bivariate central limit theorem (CLT) in the context of sample moments, we obtain the joint asymptotic distribution of sample mean \(\overline{X}_n\) and sample variance \(S^2_n\) as

$$\begin{aligned} \sqrt{n}\left[ \begin{pmatrix} \bar{X}_n\\ {S_n}^2 \end{pmatrix}- \begin{pmatrix} \mu \\ \sigma ^2 \end{pmatrix}\right] \overset{d}{\rightarrow }&N_2\left( \begin{pmatrix} 0\\ 0 \end{pmatrix}, \begin{pmatrix} \sigma ^2 \quad \mu _3\\ \mu _3 \quad \mu _4-{\sigma }^4 \end{pmatrix}\right) \end{aligned}$$
(1)

where \(\mu\), \(\sigma ^2\), \(\mu _3\) and \(\mu _4\) have same meaning as mentioned earlier.

In the case of asymptotic normality of smooth functions of sample moments, Cramer (1946) shows that for a mapping \(\varvec{g}:{\mathbb {R}^d}\rightarrow {\mathbb {R}^k}\) such that \(\varvec{g}'(\mathbf x)\), the derivative of \(\varvec{g}(\mathbf x)\) at the point \(\mathbf x\), is continuous in a neighborhood of \(\varvec{\theta }\in \mathbb {R}^d\), if \(\mathbf T_n\) is a sequence of d-dimensional random vectors such that \(\sqrt{n}(\mathbf T_n-\varvec{\theta })\rightarrow N_d(\mathbf 0,\varvec{\Sigma })\) where \(\varvec{\Sigma }\) is a \(d\times d\) covariance matrix, then

$$\begin{aligned} \sqrt{n} (\bf{g}({\bf T_n})-\bf{g} (\varvec{\theta })) \overset{d}{\rightarrow } N_k(\bf{0},\bf{g}'(\varvec{\theta })\varvec{\Sigma }\bf{g}'(\varvec{\theta })^T). \end{aligned}$$
(2)

In our case, the population mean is estimated by the sample mean, and since \(\sigma ^2=f(\mu )\), the population variance is estimated by \(f(\overline{X}_n)\) which is neither equal nor proportional to \(S^2_n\) (otherwise \(\sigma ^2\) can not be written as a function of \(\mu\) only). Hence \(\chi ^2_\nu =\sum _{i=1}^{n}\frac{{(X_i-\overline{X}_n)}^2}{f(\overline{X}_n)}= n\frac{{S_n}^2}{f(\overline{X}_n)}\) can be used as a test statistic. Then using the delta method (2) associated with (1), we have

$$\begin{aligned} \sqrt{n}\left( \frac{S^2_n}{f(\overline{X}_n)}-1\right)&\overset{d}{\rightarrow } N\left( 0, \frac{1}{f(\mu )^2}\left( \sigma ^2{(f'(\mu ))^2}-2\mu _3f'(\mu )+ (\mu _4-\sigma ^4)\right) \right) . \nonumber \\ \end{aligned}$$
(3)

Now, if

$$\begin{aligned} \frac{1}{f(\mu )^2}\left( \sigma ^2{(f'(\mu ))^2}-2\mu _3f'(\mu )+ (\mu _4-\sigma ^4)\right)\approx \alpha , \end{aligned}$$

where \(\alpha =2\), then the asymptotic distribution of \(\chi ^2_{\nu }\) is N(n, 2n). Since \(N(n,2n)\approx \chi ^2_{n-1}\) (see the next “Result”), in this case the variance test statistic \(\chi ^2_\nu\) is asymptotically distributed as the central \(\chi ^2_{n-1}\) distribution.

On the other hand, if \(\alpha\) is significantly different from 2, then \(E(\chi ^2_\nu )=n\) but \(Var(\chi ^2_\nu )\ne 2(n-1)\), even asymptotically. Hence, the asymptotic distribution of \(\chi ^2_\nu\) can not be central \(\chi ^2_{n-1}\) in this case. \(\square\)

Result. For large values of n, \(N(n,2n)\approx \chi ^2_{n-1}\).

Proof

Consider an independent and identical (i.i.d.) sequence of random variables \(Y_1,\ldots ,Y_{n-1},\ldots\) which follow N(0, 1). Then, \({Y_1^2}+\cdots +{Y_{n-1}^2}\) follows central \(\chi ^2_{n-1}\). Now, by Central Limit Theorem, \(\frac{{Y_1^2}+\cdots +{Y_{n-1}^2}-(n-1)}{\sqrt{2(n-1)}} \overset{d}{\rightarrow }N(0,1)\). Thus, \({Y_1^2}+\cdots +{Y_{n-1}^2}\) has asymptotic distribution \(N(n-1,2(n-1))\). Now, for large enough n, \(N(n-1, 2(n-1))\approx N(n, 2n)\). Thus, the test statistic can be referred to the central Chi-square table with \(n-1\) degrees of freedom.

figure a
figure b
figure c
figure d

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hazra, A., Bhattacharya, S., Banik, P. et al. A note on the misuses of the variance test in meteorological studies . Meteorol Atmos Phys 129, 645–658 (2017). https://doi.org/10.1007/s00703-016-0490-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00703-016-0490-9

Keywords

Navigation