New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes

Alonso-Revenga, J. M.; Martín, N.; Pardo, L.

doi:10.1007/s11222-015-9616-z

New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes

Published: 23 November 2015

Volume 27, pages 193–217, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

J. M. Alonso-Revenga¹,
N. Martín² &
L. Pardo³

292 Accesses
8 Citations
Explore all metrics

Abstract

It is usual to rely on the quasi-likelihood methods for deriving statistical methods applied to clustered multinomial data with no underlying distribution. Even though extensive literature can be encountered for these kind of data sets, there are few investigations to deal with unequal cluster sizes. This paper aims to contribute to fill this gap by proposing new estimators for the intracluster correlation coefficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

References

Ahn, H., James, J.C.: Generation of over-dispersed and under-dispersed binomial variates. J. Comput. Graph. Stat. 4, 55–64 (1995)
MathSciNet Google Scholar
Altham, P.M.E.: Discrete variable analysis for individuals grouped into families. Biometrika 63, 263–269 (1976)
Article MathSciNet MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Brier, S.S.: Analysis of contingency tables under cluster sampling. Biometrika 67, 591–596 (1980)
Article MathSciNet MATH Google Scholar
Budowle, B., Moretti, T.R.: Genotype profiles for six population groups at the 13 CODIS short tandem repeat core loci and other PCR-based loci. Forensic Science Communications 1999 (1999). http://www.fbi.gov/about-us/lab/forensic-science-communications/fsc/july1999/budowle.htm
Cohen, J.E.: The distribution of the chi-squared statistic under clustered sampling from contingency tables. J. Am. Stat. Assoc. 71, 665–670 (1976)
Article MathSciNet MATH Google Scholar
Cressie, N., Pardo, L.: Minimum $\phi $-divergence estimator and hierarchical testing in loglinear models. Statistica Sinica 10, 867–884 (2000)
MathSciNet MATH Google Scholar
Cressie, N., Pardo, L.: Model checking in loglinear models using $\phi $-divergences and MLEs. J. Stat. Plan. Inference 103, 437–453 (2002)
Article MathSciNet MATH Google Scholar
Cressie, N., Pardo, L., Pardo, M.C.: Size and power considerations for testing loglinear models using $\phi $-divergence test statistics. Statistica Sinica 13, 555–570 (2003)
MathSciNet MATH Google Scholar
Fienberg, S.E., Rinaldo, A.: Maximum likelihood estimation in log-linear models. Ann. Stat. 40, 996–1023 (2012)
Article MathSciNet MATH Google Scholar
Grizzle, J.E., Starmer, C.F., Koch, G.G.: Analysis of categorical data by linear models. Biometrics 25, 489–504 (1969)
Article MathSciNet MATH Google Scholar
Haberman, S.J.: The Analysis of Frequency Data. University of Chicago Press, Chicago (1974)
MATH Google Scholar
Hall, D.B.: Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000)
Article MathSciNet MATH Google Scholar
Martín, N., Pardo, L.: New families of estimators and test statistics in log-linear models. J. Multivar. Anal. 99, 1590–1609 (2008a)
Article MathSciNet MATH Google Scholar
Martín, N., Pardo, L.: Minimum phi-divergence estimators for loglinear models with linear constraints and multinomial sampling. Stat. Pap. 49, 15–36 (2008b)
Article MathSciNet MATH Google Scholar
Martín, N., Pardo, L.: A new measure of leverage cells in multinomial loglinear models. Commun. Stat. 39, 517–530 (2010)
Article MathSciNet MATH Google Scholar
Martín, N., Pardo, L.: Fitting DNA sequences through log-linear modelling with linear constraints. Statistics 45, 605–621 (2011)
Article MathSciNet MATH Google Scholar
Martín, N., Pardo, L.: Poisson loglinear modeling with linear constraints on the expected cell frequencies. Sankhya 74B, 238–267 (2012)
Article MathSciNet MATH Google Scholar
Menéndez, M.L., Morales, D., Pardo, L., Vajda, I.: Divergence-based estimation and testing of statistical models of classification. J. Multivar. Anal. 54, 329–354 (1995)
Article MathSciNet MATH Google Scholar
Menéndez, M.L., Morales, D., Pardo, L., Vajda, I.: About divergence-based goodness-of-fit tests in the Dirichlet-multinomial model. Commun. Stat. 25, 1119–1133 (1996)
Article MathSciNet MATH Google Scholar
Morel, J.G., Nagaraj, N.K.: A finite mixture distribution for modelling multinomial extra variation. Biometrika 80, 363–371 (1993)
Article MathSciNet MATH Google Scholar
Morel, J.G., Neerchal, N.K.: Overdispersion Models in SAS. SAS Press, Cary (2012)
Google Scholar
Mosimann, J.E.: On the compound multinomial distributions, the multivariate $\beta $-distribution and correlation among proportions. Biometrika 49, 65–82 (1962)
MathSciNet MATH Google Scholar
Neerchal, N.K., Morel, J.G.: Large cluster results for two parametric multinomial extra variation models. J. Am. Stat. Assoc. 93, 1078–1087 (1998)
Article MathSciNet MATH Google Scholar
Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC, Boca Raton (2006)
MATH Google Scholar
Raim, A.M.: Computational Methods for Finite Mixtures Using Approximate Information and Regression Linked to the Mixture Mean. PhD Thesis, University of Mayland (2014)
Raim, A.M. , Neerchal, N.K. Morel, J.G.: Modeling overdispersion in $R$. Technical Report HPCI-2015-1 UMBCH High Performance Computing Facility, University of Maryland (2015)
Vos, P.W.: Minimum f-divergence estimators and quasi-likelihood functions. Ann. Inst. Stat. Math. 44, 261–279 (1992)
Article MathSciNet MATH Google Scholar
Wedderburn, R.W.M.: Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61, 439–447 (1974)
MathSciNet MATH Google Scholar
Weir, B.S., Hill, W.G.: Estimating F-statistics. Annu. Rev. Genet. 36, 721–750 (2002)
Article Google Scholar

Download references

Acknowledgments

We would like to thank the referees for their helpful comments and suggestions. This research is supported by the Spanish Grant MTM2012-33740 from Ministerio de Economia y Competitividad.

Author information

Authors and Affiliations

Department of Statistics and O.R. III, Complutense University of Madrid, Madrid, Spain
J. M. Alonso-Revenga
Dapartment of Statistics and Operation Research (Decision Methods), Complutense University of Madrid, Madrid, Spain
N. Martín
Department of Statistics and O.R. I, Complutense University of Madrid, Madrid, Spain
L. Pardo

Authors

J. M. Alonso-Revenga
View author publications
You can also search for this author in PubMed Google Scholar
N. Martín
View author publications
You can also search for this author in PubMed Google Scholar
L. Pardo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Martín.

Appendix

1.1 Zero-inflated binomial distribution

The binomial distribution with zero inflation in the first cell, i.e., n-inflation in the second cell, is given by

$$\begin{aligned}&\left( \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=v\right) \\&\quad =\left\{ \begin{array}{lll} \mathcal {M}\left( n, \begin{pmatrix} p_{1}(\varvec{\theta })\\ p_{2}(\varvec{\theta }) \end{pmatrix} \right) , &{} \quad \text {if }v=1, &{} \text {with }\Pr (V=1)=w\\ n\varvec{e}_{2}, &{} \quad \text {if }v=0, &{} \text {with }\Pr (V=0)=1-w \end{array} \right. . \end{aligned}$$

Its first order moment vector is given by

$$\begin{aligned} E\left[ \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right]&=E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] \Pr \left( V=1\right) \\&\quad +E\left[ \left. n\varvec{e}_{2}\right| V=0\right] \Pr \left( V=0\right) \\&=n \begin{pmatrix} wp_{1}(\varvec{\theta })\\ 1-wp_{1}(\varvec{\theta }) \end{pmatrix} . \end{aligned}$$

The derivation for the the second order moment matrix calculation is given by

$$\begin{aligned} E\left[ Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right]&=Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] \Pr \left( V=1\right) \\&\quad +Var\left[ \left. n\varvec{e}_{2}\right| V=0\right] \Pr \left( V=0\right) \\&=Var\left[ \mathcal {M}\left( n, \begin{pmatrix} p_{1}(\varvec{\theta })\\ p_{2}(\varvec{\theta }) \end{pmatrix} \right) \right] w\\&=nwp_{1}(\varvec{\theta })\left( 1-p_{1}(\varvec{\theta })\right) \begin{pmatrix} 1 &{}\,\, -1\\ -1 &{}\,\, 1 \end{pmatrix}, \end{aligned}$$

$$\begin{aligned}&Var\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&\quad -E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] E^{T}\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] w\\&\quad +E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=0\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=0\right] (1-w)\\&\quad -n^{2} \begin{pmatrix} wp_{1}(\varvec{\theta })\\ 1-wp_{1}(\varvec{\theta }) \end{pmatrix} \begin{pmatrix} wp_{1}(\varvec{\theta })&1-wp_{1}(\varvec{\theta }) \end{pmatrix} \\&=n^{2}(1-w)wp_{1}^{2}(\varvec{\theta }) \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} , \end{aligned}$$

and hence

$$\begin{aligned}&Var\left[ \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right] \\&\quad =E\left[ Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] +Var\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&\quad =nwp_{1}(\varvec{\theta })\left[ \left( 1-p_{1}(\varvec{\theta })\right) +n(1-w)p_{1}(\varvec{\theta })\right] \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} \\&\quad =nwp_{1}(\varvec{\theta })(1-wp_{1}(\varvec{\theta }))(1+\rho ^{2}(n-1)) \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} , \end{aligned}$$

where

$$\begin{aligned} \rho ^{2}=\frac{(1-w)p_{1}(\varvec{\theta })}{1-wp_{1}(\varvec{\theta } )},\quad \text {for any }w\in (0,1). \end{aligned}$$

This result matches the one given in Morel and Neerchal (2012, p. 83). Let

$$\begin{aligned}&\left( \left. \varvec{Y}\right| V=v\right) \nonumber \\&\quad =\left\{ \begin{array}{l@{\quad }l@{\quad }l} \mathcal {M}\left( n,\varvec{p}(\varvec{\theta })\right) , &{} \text {if }v=1, &{} \text {with }\Pr (V=1)=w\\ n\varvec{e}_{M}, &{} \text {if }v=0, &{} \text {with }\Pr (V=0)=1-w \end{array} \right. \end{aligned}$$

be the multinomial distribution with zero inflation in the first $M-1$ cells, i.e., n-inflation in the M-th cell.

For $M\ge 3$, a univariate homogeneous intracluster correlation coefficient, $\rho ^{2}$, seems not to be an appropriate measure to characterize the variability of this distribution, since the intracluster correlation along the cells seems to be heterogeneous. The reason for this is that for $M\ge 3$ there is not an expression for the variance-covariance matrix of the multinomial distribution defined as a matrix not depending on parameters multiplied by a scalar with all the information about the parameters of the distribution.

1.2 Proof of Theorem 3.2

Let

$$\begin{aligned} \varvec{S}_{\varvec{Y}}=\frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) ^{T}, \end{aligned}$$

the matrix of quasi-variances and quasi-covariances of the simple random sample $\varvec{Y}^{(1)},\ldots ,\varvec{Y}^{(N)}$ and

$$\begin{aligned} \overline{\varvec{S}}_{\varvec{Y}}&=\mathrm {diag}(\varvec{S} _{\varvec{Y}})= \begin{pmatrix} S_{Y_{1}}^{2} &{} &{} \\ &{} \ddots &{} \\ &{} &{} S_{Y_{M}}^{2} \end{pmatrix} ,\\ S_{Y_{r}}^{2}&=\frac{1}{N-1}\sum _{\ell =1}^{N}(Y^{(\ell ,r)}-n\widehat{p} _{r})^{2}. \end{aligned}$$

It is well-known that each diagonal element of $\overline{\varvec{S} }_{\varvec{Y}}$ is a consistent estimator of each diagonal element of $\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })}$, i.e.,

$$\begin{aligned} \mathrm {E}\left[ \overline{\varvec{S}}_{\varvec{Y}}\right] =\mathrm {diag}\{\mathrm {E}\left[ \varvec{S}_{\varvec{Y}}\right] \}= & {} \mathrm {diag}\{\mathrm {Var}[\varvec{Y}^{(\ell )}]\}\\= & {} \mathrm {diag} \{\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })}\}, \end{aligned}$$

and

$$\begin{aligned}&S_{Y_{r}}^{2}\overset{P}{\underset{N\rightarrow \infty }{\longrightarrow } }\vartheta _{n}np_{r}(\varvec{\theta })\left( 1-p_{r}(\varvec{\theta })\right) ,\quad r=1,\ldots ,M,\nonumber \\&\text {or}\quad \overline{\varvec{S}}_{\varvec{Y}} \overset{P}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathrm {diag} (\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta } )}). \end{aligned}$$

(8.1)

It is not difficult to establish that

$$\begin{aligned} \mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}})= & {} \sum _{r=1} ^{M}S_{Y_{r}}^{2}=\mathrm {trace}(\varvec{S}_{\varvec{Y}})\nonumber \\= & {} \frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) , \end{aligned}$$

(8.2)

which is consistent for $\mathrm {trace}(\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })})=\vartheta _{n}n\sum _{r=1}^{M} p_{r}(\varvec{\theta })\left( 1-p_{r}(\varvec{\theta })\right) $. We know that the chi-square test-statistic $X^{2}(\widetilde{\varvec{Y}})$, given in (3.3), has an asymptotic $\mathcal {\chi }_{(N-1)(M-1)}^{2}$ distribution for fixed values of number of clusters N and an increasing cluster size, n, under the assumption of inter-cluster level homogeneity. However, this distribution is not a useful device for the proof. Based on the expression of the chi-square test-statistic, $X^{2}(\widetilde{\varvec{Y} })$, in terms of the variance-covariance matrix, as well as the same steps to obtain the expression and consistency of (8.2), we are going to establish (3.4). We have

$$\begin{aligned}&\mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })}^{-1})\\&\quad =\frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p} }\right) ^{T}\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta } )}^{-1}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \end{aligned}$$

and

$$\begin{aligned}&\mathrm {E}\left[ \mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y} }\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })} ^{-1})\right] \\&\quad =\mathrm {traceE}\left[ \overline{\varvec{S} }_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p} (\varvec{\theta })}^{-1}\right] =\mathrm {trace}\left( \mathrm {E}\left[ \overline{\varvec{S}}_{\varvec{Y}}\right] \tfrac{1}{n}\varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\mathrm {trace}\left( \vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })} \tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\vartheta _{n}\mathrm {trace}\left( \varvec{\Sigma }_{\varvec{p} (\varvec{\theta })}\varvec{D}_{\varvec{p}(\varvec{\theta } )}^{-1}\right) \\&\quad =\vartheta _{n}\mathrm {trace}\left( \left( \varvec{D} _{\varvec{p}(\varvec{\theta })}-\varvec{p}(\varvec{\theta })\varvec{p}^{T}(\varvec{\theta })\right) \varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\vartheta _{n}\left[ \mathrm {trace}(\varvec{I}_{M})-\mathrm {trace} (\varvec{p}(\varvec{\theta })\varvec{1}_{M}^{T})\right] =\vartheta _{n}(M-1). \end{aligned}$$

Hence,

$$\begin{aligned}&\mathrm {E}\left[ \frac{1}{M-1}\mathrm {trace}(\overline{\varvec{S} }_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p} (\varvec{\theta })}^{-1})\right] \\&\quad =\mathrm {E}\Bigg [ \frac{1}{(N-1)(M-1)}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\\&\qquad \times \tfrac{1}{n}\varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \Bigg ] =\vartheta _{n}, \end{aligned}$$

and taking into account that $\widehat{\varvec{p}}$ is a consistent estimator of $\varvec{p}(\varvec{\theta })$, as $N\rightarrow \infty $, as well as (8.1),

$$\begin{aligned}&\frac{1}{M-1}\mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}} \tfrac{1}{n}\varvec{D}_{\widehat{\varvec{p}}}^{-1})\\&\quad =\frac{1}{(N-1)(M-1)}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\tfrac{1}{n}\varvec{D} _{\widehat{\varvec{p}}}^{-1}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) \\&\quad =\frac{X^{2}(\widetilde{\varvec{Y}} )}{(N-1)(M-1)} \end{aligned}$$

tends in probability to $\vartheta _{n}$, as $N\rightarrow \infty $. In other words,

$$\begin{aligned}&\frac{X^{2}(\widetilde{\varvec{Y}})}{(N-1)(M-1)}\\&\quad =\frac{1}{(M-1)n} \sum _{r=1}^{M}\frac{1}{\widehat{p}_{r}}S_{Y_{r}}^{2}\overset{P}{\underset{N\rightarrow \infty }{\longrightarrow }}\frac{\vartheta _{n}n}{(M-1)n}\\&\qquad \sum _{r=1}^{M}\frac{p_{r}(\varvec{\theta })}{p_{r} (\varvec{\theta })}\left( 1-p_{r}(\varvec{\theta })\right) =\vartheta _{n}. \end{aligned}$$

In addition, taking into account (1.9), the right hand size of (3.4) follows. Finally, we like to mention that even though $X^{2}(\widetilde{\varvec{Y}})$ and $\vartheta _{n}(N-1)(M-1)$ have the same expectation for a fixed value of N, this proof is not trivial since $\vartheta _{n}(N-1)(M-1)$ as well as $X^{2}(\widetilde{\varvec{Y}})$ tend to infinite as $N\rightarrow \infty $.

1.3 Proof of Theorem 2.2

By applying the Central Limit Theorem it holds (3.1). Hence, from Pardo (2006, formula (7.10)), for the minimum phi-divergence estimator of $\varvec{\theta }$ of a log-linear model it holds

$$\begin{aligned} \sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0})= & {} \left( \varvec{\varvec{W}}^{T}\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\right) ^{-1}\varvec{W}^{T} \varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }\nonumber \\&\times \,\varvec{D} _{\varvec{p}\left( \theta _{0}\right) }^{-1}\sqrt{N}\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}\left( \varvec{1}_{M_{0}}\right) , \end{aligned}$$

(8.3)

and the variance-covariance matrix of $\sqrt{N}(\widehat{\varvec{\theta } }_{\phi }-\varvec{\theta }_{0})$ is

$$\begin{aligned}&\tfrac{\vartheta _{n}}{n}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}\varvec{W}^{T}\varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1}\nonumber \\&\qquad \times \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{W}\left( \varvec{\varvec{W}}^{T}\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\right) ^{-1}\nonumber \\&\quad =\tfrac{\vartheta _{n}}{n}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}. \end{aligned}$$

(8.4)

The last equality comes from

$$\begin{aligned} \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1} \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }=\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }. \end{aligned}$$

From the Taylor expansion of $\varvec{p}(\widehat{\varvec{\theta } }_{\phi })$ around $\varvec{p}(\varvec{\theta }_{0})$ we obtain

$$\begin{aligned} \sqrt{N}(\varvec{p}(\widehat{\varvec{\theta }}_{\phi })-\varvec{p} (\varvec{\theta }_{0}))=\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0}) +o_{p}\left( \varvec{1}_{M}\right) , \end{aligned}$$

(8.5)

and the variance-covariance matrix of $\sqrt{N}(\varvec{p} (\widehat{\varvec{\theta }}_{\phi })-\varvec{p}(\varvec{\theta } _{0}))$ is

$$\begin{aligned} \tfrac{\vartheta _{n}}{n}\varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }}W}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}\varvec{W}^{T}\varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }. \end{aligned}$$

(8.6)

Since $\sqrt{N}\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) $ is normal and centred, from (8.3) and (8.4), (2.8) is obtained. Similarly, since $\sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0})$ is normal and centred, from (8.5) and (8.6), (2.9) is obtained.

1.4 Derivation of Formula (4.4)

Multiplying (4.3) by $\sqrt{N_{g}}n_{g}\big / \sum \limits _{h=1}^{G}n_{h}N_{h}$

$$\begin{aligned} w_{g}(\widehat{\varvec{p}}^{(g)}-\varvec{p}\left( \varvec{\theta }_{0}\right) )\overset{\mathcal {L}}{\underset{N_{g}\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0}_{M},\tfrac{n_{g} N_{g}\vartheta _{n_{g}}}{\left( \sum \nolimits _{h=1}^{G}n_{h}N_{h}\right) ^{2}}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta } _{0}\right) }\right) , \end{aligned}$$

hence summing up from $g=1$ to G and by the independence of clusters

$$\begin{aligned}&\sum \limits _{g=1}^{G}w_{g}(\widehat{\varvec{p}}^{\left( g\right) }-\varvec{p}\left( \varvec{\theta }_{0}\right) )\\&\quad =\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \\&\qquad \overset{\mathcal {L}}{\underset{N_{g}\rightarrow \infty ,\;g=1,\ldots ,G}{\longrightarrow }}\mathcal {N}\left( \varvec{0}_{M} ,\tfrac{\sum \nolimits _{g=1}^{G}n_{g}N_{g}\vartheta _{n_{g}}}{\left( \sum \nolimits _{h=1}^{G}n_{h}N_{h}\right) ^{2}}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\right) . \end{aligned}$$

Finally multiplying the previous expression by $\sum \nolimits _{h=1} ^{G}n_{h}N_{h}\big / \sqrt{\sum \nolimits _{g=1}^{G}n_{g}N_{g}\vartheta _{n_{g}}}$, the desired expression is obtained.

1.5 Algorithms for Dirichlet-multinomial, n-inflated and random-clumped distributions

The usual parameters of the M-dimensional random variable $\varvec{Y} =(Y_{1},\ldots ,Y_{M})^{T}$ with Dirichlet-multinomial distribution are $\varvec{\alpha }=\left( \alpha _{11},\ldots ,\alpha _{M1}\right) ^{T}$, where $\alpha _{r1}=\frac{1-\rho ^{2}}{\rho ^{2}}p_{r}\left( \varvec{\theta }\right) , r=1,\ldots ,M$. For convenience it is considered with parameters $\varvec{\beta }= {\begin{pmatrix} \rho ^{2}\\ \varvec{p}(\varvec{\theta }) \end{pmatrix}} , \varvec{p}\left( \varvec{\theta }\right) =\left( p_{1}\left( \varvec{\theta }\right) ,\ldots ,p_{M}\left( \varvec{\theta }\right) \right) ^{T}$, and is generated as follows:

STEP 1. :

Generate $ B_{1}\sim Beta(\alpha _{11},\alpha _{12})$, with $\alpha _{11}=\frac{1-\rho ^{2}}{\rho ^{2}}p_{1}\left( \varvec{\theta }\right) , \alpha _{12}=\frac{1-\rho ^{2}}{\rho ^{2}}(1-p_{1}\left( \varvec{\theta }\right) )$.

STEP 2. :

Generate $\left( Y_{1} |B_{1}=b_{1}\right) \sim Bin(n,b_{1})$.

STEP 3. :

For $r=2,\ldots ,M-1$ do:

Generate $B_{r}\sim Beta(\alpha _{r1},\alpha _{r2})$ , with $\alpha _{r1} =\frac{1-\rho ^{2}}{\rho ^{2}}p_{r}\left( \varvec{\theta }\right) , \alpha _{r2}=\frac{1-\rho ^{2}}{\rho ^{2}}\left( 1-\sum _{h=1}^{r} p_{h}\left( \varvec{\theta }\right) \right) $.

Generate $( Y_{r}|Y_{1}=y_{1},\ldots ,Y_{r-1} =y_{r-1},B_{r}=b_{r}) \sim Bin\left( n-\sum _{h=1}^{r-1} y_{h},b_{r}\right) $.

STEP 4. :

Do $\left( Y_{M}|Y_{1}=y_{1},\ldots ,Y_{M-1}=y_{M-1}\right) =n-\sum _{h=1}^{M-1}y_{h} $.

The random variable $\varvec{Y}=(Y_{1},\ldots ,Y_{M})^{T}$ of the n-inflated multinomial distribution with parameters $\varvec{\beta }, \varvec{p}\left( \varvec{\theta }\right) $, is generated as follows:

STEP 1. :: Generate $V\sim Ber(\rho ^{2})$.
STEP 2. :: Generate
$$\begin{aligned}&\left( \varvec{Y|}V=v\right) =\left\{ \begin{array}{l@{\quad }l} \mathcal {M}(n,\varvec{p}\left( \varvec{\theta }\right) ), &{} \text {if }v=0\\ n\mathcal {M}(1,\varvec{p}\left( \varvec{\theta }\right) ), &{} \text {if }v=1 \end{array} \right. . \end{aligned}$$

The random variable $\varvec{Y}=(Y_{1},\ldots ,Y_{M})^{T}$ of the random clumped distribution with parameters $\varvec{\beta }, \varvec{p} \left( \varvec{\theta }\right) $, is generated as follows:

STEP 1. :: Generate $\varvec{Y}_{0}=(Y_{01} ,\ldots ,Y_{0M})^{T}\sim \mathcal {M}(1,\varvec{p}\left( \varvec{\theta }\right) )$.
STEP 2. :: Generate $K_{1}\sim Bin(n,\rho )$.
STEP 3. :: Generate $\left( \varvec{Y}_{1}|K_{1}=k_{1}\right) =\big ( (Y_{11},\ldots ,Y_{1M})^{T} | K_{1}=k_{1}\big ) \sim \mathcal {M}(n-k_{1},\varvec{p}\left( \varvec{\theta }\right) )$.
STEP 4. :: Do $\left( \varvec{Y|}K_{1}=k_{1}\right) \varvec{=Y}_{0}k_{1}+\left( \varvec{Y}_{1}|K_{1}=k_{1}\right) $.

For the details about the equivalence of this algorithm and (1.12), see Morel and Nagaraj (1993).

It is interesting to note that there exists the package “Modeling overdispersion in R” useful to generate the distributions considered in this Appendix. For more details see Raim et al. (2015).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alonso-Revenga, J.M., Martín, N. & Pardo, L. New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes. Stat Comput 27, 193–217 (2017). https://doi.org/10.1007/s11222-015-9616-z

Download citation

Received: 04 April 2015
Accepted: 12 November 2015
Published: 23 November 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11222-015-9616-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Zero-inflated binomial distribution

1.2 Proof of Theorem 3.2

1.3 Proof of Theorem 2.2

1.4 Derivation of Formula (4.4)

1.5 Algorithms for Dirichlet-multinomial, n-inflated and random-clumped distributions

Rights and permissions

About this article

Cite this article

Keywords

Navigation

New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Zero-inflated binomial distribution

1.2 Proof of Theorem 3.2

1.3 Proof of Theorem 2.2

1.4 Derivation of Formula (4.4)

1.5 Algorithms for Dirichlet-multinomial, n-inflated and random-clumped distributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation