A note on the misuses of the variance test in meteorological studies

Hazra, Arnab; Bhattacharya, Sourabh; Banik, Pabitra; Bhattacharya, Sabyasachi

doi:10.1007/s00703-016-0490-9

A note on the misuses of the variance test in meteorological studies

Original Paper
Published: 26 November 2016

Volume 129, pages 645–658, (2017)
Cite this article

Meteorology and Atmospheric Physics Aims and scope Submit manuscript

Arnab Hazra¹,
Sourabh Bhattacharya²,
Pabitra Banik¹ &
…
Sabyasachi Bhattacharya¹

283 Accesses
2 Citations
Explore all metrics

Abstract

Stochastic modeling of rainfall data is an important area in meteorology. The gamma distribution is a widely used probability model for non-zero rainfall. Typically the choice of the distribution for such meteorological studies is based on two goodness-of-fit tests—the Pearson’s Chi-square test and the Kolmogorov–Smirnov test. Inspired by the index of dispersion introduced by Fisher (Statistical methods for research workers. Hafner Publishing Company Inc., New York, 1925), Mooley (Mon Weather Rev 101:160–176, 1973) proposed the variance test as a goodness-of-fit measure in this context and a number of researchers have implemented it since then. We show that the asymptotic distribution of the test statistic for the variance test is generally not comparable to any central Chi-square distribution and hence the test is erroneous. We also describe a method for checking the validity of the asymptotic distribution for a class of distributions. We implement the erroneous test on some simulated, as well as real datasets and demonstrate how it leads to some wrong conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

On the Methodological Framework of Composite Indices: A Review of the Issues of Weighting, Aggregation, and Robustness

Article Open access 17 January 2018

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

Article 18 October 2014

References

Barger GL, Thom Herbert CS (1949) Evaluation of drought hazard. Agron J 41:519–526
Article Google Scholar
Berkson J (1938) Some difficulties of interpretation encountered in the application of the chi-square test. J Am Stat Assoc 33:526–536
Article Google Scholar
Berkson J (1940) A note on the chi-square test, the Poisson and the Binomial. J Am Stat Assoc 35:362–367
Article Google Scholar
Biswas BC, Khambete NN, Mondal SS (1989) Weekly rainfall probability analysis of dry farming tract of Tamil Nadu. Mausam 40:197–206
Google Scholar
Burgueño A, Martnez MD, Lana X, Serra C (2005) Statistical distributions if the daily rainfall regime in Catalonia (North-eastern Spain) for the years 1950–2000. Int J Climatol 25:1381–1403
Article Google Scholar
Cochran WG (1954) Some methods for strengthening the common chi-square tests. Biometrics 10:417–451
Article Google Scholar
Cramer H (1946) Mathematical methods of statistics. Princeton University Press, Princeton
Google Scholar
Duan J, Sikka AK, Grant GE (1995) A comparison of stochastic models for generating daily precipitation at the H. J. Andrews experiment forest. Northwest Sci 69:318–329
Google Scholar
Fisher RA (1925) Statistical methods for research workers. Hafner Publishing Company Inc., New York
Google Scholar
Fisher RA, Yates F (1957) Statistical tables for biological, agricultural and medical research. Oliver and Boyd, Edinburgh
Google Scholar
Goel AK, Singh JK (1999) Incomplete gamma distribution for weekly rainfall of Unai, Himachal Pradesh. J Agric Eng 36:61–74
Google Scholar
Hargreaves GH (1975) Water requirements manual for irrigated crops and rainfed agriculture. Utah State University Publication, Utah
Google Scholar
Hazra A, Bhattacharya S, Banik P (2014) Modeling Nakshatra-wise rainfall data of the eastern plateau region of India. Mausam 65:264–270
Google Scholar
Kwaku XS, Duke O (2007) Characterization and frequency analysis of one day annual maximum and two to five consecutive days maximum rainfall of Accra, Ghana. ARPN J Eng Appl Sci 2:27–31
Google Scholar
Mooley DA, Crutcher HL (1968) An application of gamma distribution function to Indian Rainfall. 47, ESSA Technical Report EDS 5, US Department of Commerce, Environmental Data Service, Silver Spring
Mooley DA (1973) Gamma distribution probability model for Asian summer monsoon monthly rainfall. Mon Weather Rev 101:160–176
Article Google Scholar
Pramanik SK, Jagannathan P (1953) Climate changes in India (I)-rainfall. Indian J Meteorol Geophys 4:291–309
Google Scholar
Rafa HAS, Khanbilvardi R (2014) Frequency analysis of the monthly rainfall data at Sulaimania Region, Iraq. Am J Eng Res 3:212–222
Google Scholar
Rao CR, Chakravarti IM (1956) Some small sample tests of significance for a Poisson distribution. Biometrics 12:264–282
Article Google Scholar
Sankaranarayanan D (1933) On the nature of frequency distribution of precipitation in India during monsoon months June to September. India Meteorol Dep Sci Notes 5:97–107
Google Scholar
Sarker RP, Biswas BC, Khambete NN (1982) Probability analysis for short period rainfall in dry farming tract in India. Mausam 33:269–284
Google Scholar
Sen Z, Eljadid AG (1999) Rainfall distribution functions for Libya and rainfall prediction. Hydrol Sci J 4:665–680
Article Google Scholar
Sharma MA, Singh JB (2010) Use of probability distribution in rainfall analysis. N Y Sci J 3:40–49
Google Scholar
Todorovic P, Woolhiser DA (1975) A stochastic model of n-day precipitation. J Appl Meteorol 14:17–24
Article Google Scholar
Upadhyaya A, Kumar J, Kumar P, Sikka AK (2009) Analysis of rainfall in Patna Main Canal Command employing two parameter gamma probability distribution model. Indian J Soil Conserv 37:17–21
Google Scholar

Download references

Acknowledgements

The authors would like to thank Professor Jennifer A. Hoeting from Colorado State University and an anonymous reviewer for a number of valuable suggestions and corrections.

Author information

Authors and Affiliations

Agricultural and Ecological Research Unit, Indian Statistical Institute, Kolkata, India
Arnab Hazra, Pabitra Banik & Sabyasachi Bhattacharya
Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India
Sourabh Bhattacharya

Authors

Arnab Hazra
View author publications
You can also search for this author in PubMed Google Scholar
Sourabh Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Pabitra Banik
View author publications
You can also search for this author in PubMed Google Scholar
Sabyasachi Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnab Hazra.

Additional information

Responsible Editor: J. T. Fasullo.

Appendix

Theorem 1

The asymptotic distribution of the variance test statistic, under the null hypothesis that a random sample of size n comes from a one-parameter exponential distribution, is not comparable with the central $\chi ^2$ distribution with $n-1$ degrees of freedom.

Proof

Suppose that $X_1,X_2,\ldots ,X_n$ is a random sample of size n from the exponential distribution with mean $\lambda$, where the probability density function (PDF) is given by

$$\begin{aligned} f(x)&= {\left\{ \begin{array}{ll} \frac{1}{\lambda } {\text{e}}^{-\frac{x}{\lambda }} &{} \quad \text {if }\; x > 0,\\ 0 &{} \quad \text {if } \; x \le 0. \end{array}\right. } \end{aligned}$$

In the above expression, the mean $\lambda$ is assumed unknown and estimated by its MLE. The likelihood function is given by

$$\begin{aligned} L(\lambda | X_1, \ldots , X_n)= & {} \prod _{i=1}^{n} f(X_i) = \frac{1}{\lambda ^n} {\text{e}}^{- \frac{X_1+ X_2+\cdots + X_n}{\lambda }} \end{aligned}$$

Thus, the estimate of the unknown parameter $\lambda$ is $\lambda _{\text {MLE}}=\frac{X_1+ X_2+\cdots + X_n}{n}=\overline{X}$, the sample mean. The population variance is $\sigma ^2_F = \lambda ^2$ and thus, the MLE of the population variance is $\widehat{\sigma ^2_F} = {\widehat{\lambda ^2}_{\text {MLE}}} = {({\widehat{\lambda }}_{\text {MLE}})}^2={\overline{X}}^2$. Hence, the test statistic in our case is given by

$$\begin{aligned} \chi ^2_\nu= & {} \sum _{i=1}^{n}\frac{{(X_i-\overline{X})}^2}{\overline{X}^2}. \end{aligned}$$

For the remaining part of the proof we follow Rao and Chakravarti (1956) who provide justification of the asymptotic properties of the variance test in the case of the Poisson distribution, but make necessary modifications to accommodate our case of continuous distribution.

From the above likelihood function, it is easy to note that the sample total given by $T=X_1+ X_2+\cdots + X_n$ is a sufficient statistic for $\lambda$ (follows from the Neyman-Fisher factorization theorem). Here T follows the gamma distribution with shape parameter n and scale parameter $\lambda$ (it is easy to show using the characteristic function or moment generating function arguments); the density is given by

$$\begin{aligned} f_T(t)= & {} \frac{1}{\lambda ^n\Gamma (n)}{\text{e}}^{-\frac{t}{\lambda }}t^{n-1}. \end{aligned}$$

The conditional density of $X_1,X_2,\ldots ,X_n$ given $T=t$ is given by

$$\begin{aligned} f_{X_1,X_2,\ldots ,X_n |T=t}(x_1,x_2,\ldots ,x_n )=\frac{\Gamma (n)}{t^{n-1}}. \end{aligned}$$

i.e., independent of the values $x_1,x_2,\ldots ,x_n$. Hence, $E(X_i|T)= \overline{X}=T/n$ (each $X_i$ receives same weight, i.e., 1 / n) and thus we can express the variance test statistic in the form

$$\begin{aligned} \chi ^2_\nu= & {} \sum _{i=1}^{n}\frac{{(X_i-E(X_i|T))}^2}{E(X_i|T)^2}. \end{aligned}$$

Now, by the definition of conditional expectation, we have, for any measurable function $\phi (x_1,x_2,\ldots ,x_n)$:

$$\int _{0}^{\infty }E(\phi |T=t)f_T(t){\text {d}}t= E(\phi ),$$

which, in our case, translates into

$$\int _{0}^{\infty }E(\phi |T=t){\text{e}}^{-\frac{t}{\lambda }}t^{n-1}{\text {d}}t= E(\phi ){\lambda ^n}\Gamma (n).$$

Therefore, knowing the total expectation $E(\phi )$, the conditional expectation $E(\phi |T=t)$ can be easily obtained. Let us consider the statistic $S^2=\sum _{i=1}^n(X_i-\overline{X})^2=\sum _{i=1}^{n}{X_i}^2-n\overline{X}^2$ whose moments are known functions of $\lambda$. Using the above definition of conditional expectation we derive the conditional moments of $\phi (x_1,x_2\ldots ,x_n)=S^2$ as follows.

The term $\frac{1}{n-1}S^2$ is an unbiased estimator of the population variance $\lambda ^2$ and hence, $E(S^2)=(n-1)\lambda ^2$. Thus, we have

$$\int _{0}^{\infty }E(S^2|T=t){\text{e}}^{-\frac{t}{\lambda }}t^{n-1}{\text {d}}t= (n-1){\lambda ^{n+2}}\Gamma (n).$$

Now, we can write $\lambda ^{n+2}=\int _{0}^{\infty }\frac{1}{\Gamma (n)}{\text{e}}^{-\frac{t}{\lambda }}t^{n+1}{\text {d}}t$, and thus it follows that

$$\begin{aligned} \int _{0}^{\infty }E(S^2|T=t){\text{e}}^{-\frac{t}{\lambda }}t^{n-1}{\text {d}}t= & {} \int _{0}^{\infty }\frac{(n-1)\Gamma (n)}{\Gamma (n+2)}{\text{e}}^{-\frac{t}{\lambda }}t^{n+1}{\text {d}}t. \end{aligned}$$

We know that if $\int _{0}^{\infty }f_1(x){\text{e}}^{-ax}\text {d}x=\int _{0}^{\infty }f_2(x){\text{e}}^{-ax}\text {d}x$ where $f_1(x),f_2(x)$ and both are continuous, a is some positive constant, then, $f_1=f_2$ by the uniqueness of the Laplace transform. As a consequence,

$$\begin{aligned} E(S^2|T=t)= \frac{(n-1)}{n(n+1)}t^2, \end{aligned}$$

and hence

$$\begin{aligned} E(\chi ^2_\nu |T=t)&= E\left( S^2\frac{n^2}{T^2}|T=t\right) \\&= \frac{n^2}{t^2}\frac{(n-1)}{n(n+1)}t^2\\&= \frac{(n-1)n}{n+1}\\&\approx n-1. \end{aligned}$$

Similarly, we obtain

$$E(S^4)= \frac{(n-1)(n^2+7n-6)}{n}\lambda ^4$$

and

$$\begin{aligned} E(S^4|T=t)= & {} \frac{(n-1)(n^2+7n-6)\Gamma (n)}{n\Gamma (n+4)}t^4. \end{aligned}$$

Thus,

$$\begin{aligned} E((\chi _\nu ^2)^2|T=t)&= E({S^4}\frac{n^4}{T^4}|T=t) \\&= \frac{n^4}{t^4}\frac{(n-1)(n^2+7n-6)\Gamma (n)}{n\Gamma (n+4)}t^4 \\&= \frac{{n^2}(n-1)(n^2+7n-6)}{(n+3)(n+2)(n+1)}, \end{aligned}$$

and so,

$$\begin{aligned} {\text{Var}}(\chi _{\nu }^{2} |T = t) & = E((\chi _{\nu }^{2} )^{2} |T = t) - (E(\chi _{\nu }^{2} |T = t))^{2} \\ & = 4(n - 1)\frac{1}{{(1 + \frac{1}{n})^{2} (1 + \frac{2}{n})(1 + \frac{3}{n})}} \\ & \approx 4(n - 1). \\ \end{aligned}$$

Since ${\text {Var}}(\chi ^2_\nu |T=t)$ is independent of t, it follows that ${\text {Var}}(\chi ^2_\nu )=\text {Var}(\chi ^2_\nu |T=t)$, which does not conform with the variance of the central Chi-square distribution with $(n-1)$ degrees of freedom which is $2(n-1)$. This proves the theorem.

Theorem 2

If a random sample of size n comes from a population with finite fourth moment where the population variance is a non-zero continuously differentiable function f of the population mean under the null hypothesis, then under the condition (which we refer to as the “condition of approximate equality”)

$$\frac{1}{f(\mu )^2}\left( \sigma ^2{(f'(\mu ))^2}-2\mu _3f'(\mu )+ (\mu _4-\sigma ^4)\right)\approx 2,$$

where $\mu , \sigma ^2, \mu _3, \mu _4$ are the mean, variance, third and fourth central moments of the population, respectively, the variance test statistic is asymptotically conformable with the central $\chi ^2$ distribution with $n-1$ degrees of freedom. If a function like f exists and the “condition of approximate equality” fails, then the variance test statistic is not asymptotically central $\chi ^2_{n-1}$.

Proof

Suppose that $X_1,X_2,\ldots ,X_n$ is a random sample from a population where the sufficient condition on moment existence and the existence of a non-zero continuously differentiable function f are satisfied under the null hypothesis.

Applying the bivariate central limit theorem (CLT) in the context of sample moments, we obtain the joint asymptotic distribution of sample mean $\overline{X}_n$ and sample variance $S^2_n$ as

$$\begin{aligned} \sqrt{n}\left[ \begin{pmatrix} \bar{X}_n\\ {S_n}^2 \end{pmatrix}- \begin{pmatrix} \mu \\ \sigma ^2 \end{pmatrix}\right] \overset{d}{\rightarrow }&N_2\left( \begin{pmatrix} 0\\ 0 \end{pmatrix}, \begin{pmatrix} \sigma ^2 \quad \mu _3\\ \mu _3 \quad \mu _4-{\sigma }^4 \end{pmatrix}\right) \end{aligned}$$

(1)

where $\mu$, $\sigma ^2$, $\mu _3$ and $\mu _4$ have same meaning as mentioned earlier.

In the case of asymptotic normality of smooth functions of sample moments, Cramer (1946) shows that for a mapping $\varvec{g}:{\mathbb {R}^d}\rightarrow {\mathbb {R}^k}$ such that $\varvec{g}'(\mathbf x)$, the derivative of $\varvec{g}(\mathbf x)$ at the point $\mathbf x$, is continuous in a neighborhood of $\varvec{\theta }\in \mathbb {R}^d$, if $\mathbf T_n$ is a sequence of d-dimensional random vectors such that $\sqrt{n}(\mathbf T_n-\varvec{\theta })\rightarrow N_d(\mathbf 0,\varvec{\Sigma })$ where $\varvec{\Sigma }$ is a $d\times d$ covariance matrix, then

$$\begin{aligned} \sqrt{n} (\bf{g}({\bf T_n})-\bf{g} (\varvec{\theta })) \overset{d}{\rightarrow } N_k(\bf{0},\bf{g}'(\varvec{\theta })\varvec{\Sigma }\bf{g}'(\varvec{\theta })^T). \end{aligned}$$

(2)

In our case, the population mean is estimated by the sample mean, and since $\sigma ^2=f(\mu )$, the population variance is estimated by $f(\overline{X}_n)$ which is neither equal nor proportional to $S^2_n$ (otherwise $\sigma ^2$ can not be written as a function of $\mu$ only). Hence $\chi ^2_\nu =\sum _{i=1}^{n}\frac{{(X_i-\overline{X}_n)}^2}{f(\overline{X}_n)}= n\frac{{S_n}^2}{f(\overline{X}_n)}$ can be used as a test statistic. Then using the delta method (2) associated with (1), we have

$$\begin{aligned} \sqrt{n}\left( \frac{S^2_n}{f(\overline{X}_n)}-1\right)&\overset{d}{\rightarrow } N\left( 0, \frac{1}{f(\mu )^2}\left( \sigma ^2{(f'(\mu ))^2}-2\mu _3f'(\mu )+ (\mu _4-\sigma ^4)\right) \right) . \nonumber \\ \end{aligned}$$

(3)

Now, if

$$\begin{aligned} \frac{1}{f(\mu )^2}\left( \sigma ^2{(f'(\mu ))^2}-2\mu _3f'(\mu )+ (\mu _4-\sigma ^4)\right)\approx \alpha , \end{aligned}$$

where $\alpha =2$, then the asymptotic distribution of $\chi ^2_{\nu }$ is N(n, 2n). Since $N(n,2n)\approx \chi ^2_{n-1}$ (see the next “Result”), in this case the variance test statistic $\chi ^2_\nu$ is asymptotically distributed as the central $\chi ^2_{n-1}$ distribution.

On the other hand, if $\alpha$ is significantly different from 2, then $E(\chi ^2_\nu )=n$ but $Var(\chi ^2_\nu )\ne 2(n-1)$, even asymptotically. Hence, the asymptotic distribution of $\chi ^2_\nu$ can not be central $\chi ^2_{n-1}$ in this case. $\square$

Result. For large values of n, $N(n,2n)\approx \chi ^2_{n-1}$.

Proof

Consider an independent and identical (i.i.d.) sequence of random variables $Y_1,\ldots ,Y_{n-1},\ldots$ which follow N(0, 1). Then, ${Y_1^2}+\cdots +{Y_{n-1}^2}$ follows central $\chi ^2_{n-1}$. Now, by Central Limit Theorem, $\frac{{Y_1^2}+\cdots +{Y_{n-1}^2}-(n-1)}{\sqrt{2(n-1)}} \overset{d}{\rightarrow }N(0,1)$. Thus, ${Y_1^2}+\cdots +{Y_{n-1}^2}$ has asymptotic distribution $N(n-1,2(n-1))$. Now, for large enough n, $N(n-1, 2(n-1))\approx N(n, 2n)$. Thus, the test statistic can be referred to the central Chi-square table with $n-1$ degrees of freedom.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hazra, A., Bhattacharya, S., Banik, P. et al. A note on the misuses of the variance test in meteorological studies . Meteorol Atmos Phys 129, 645–658 (2017). https://doi.org/10.1007/s00703-016-0490-9

Download citation

Received: 14 February 2016
Accepted: 14 November 2016
Published: 26 November 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00703-016-0490-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A note on the misuses of the variance test in meteorological studies

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

On the Methodological Framework of Composite Indices: A Review of the Issues of Weighting, Aggregation, and Robustness

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

References

Acknowledgements