New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes

Abstract

It is usual to rely on the quasi-likelihood methods for deriving statistical methods applied to clustered multinomial data with no underlying distribution. Even though extensive literature can be encountered for these kind of data sets, there are few investigations to deal with unequal cluster sizes. This paper aims to contribute to fill this gap by proposing new estimators for the intracluster correlation coefficient.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Ahn, H., James, J.C.: Generation of over-dispersed and under-dispersed binomial variates. J. Comput. Graph. Stat. 4, 55–64 (1995)

    MathSciNet  Google Scholar 

  2. Altham, P.M.E.: Discrete variable analysis for individuals grouped into families. Biometrika 63, 263–269 (1976)

    MathSciNet  Article  MATH  Google Scholar 

  3. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  4. Brier, S.S.: Analysis of contingency tables under cluster sampling. Biometrika 67, 591–596 (1980)

    MathSciNet  Article  MATH  Google Scholar 

  5. Budowle, B., Moretti, T.R.: Genotype profiles for six population groups at the 13 CODIS short tandem repeat core loci and other PCR-based loci. Forensic Science Communications 1999 (1999). http://www.fbi.gov/about-us/lab/forensic-science-communications/fsc/july1999/budowle.htm

  6. Cohen, J.E.: The distribution of the chi-squared statistic under clustered sampling from contingency tables. J. Am. Stat. Assoc. 71, 665–670 (1976)

    MathSciNet  Article  MATH  Google Scholar 

  7. Cressie, N., Pardo, L.: Minimum \(\phi \)-divergence estimator and hierarchical testing in loglinear models. Statistica Sinica 10, 867–884 (2000)

    MathSciNet  MATH  Google Scholar 

  8. Cressie, N., Pardo, L.: Model checking in loglinear models using \(\phi \)-divergences and MLEs. J. Stat. Plan. Inference 103, 437–453 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  9. Cressie, N., Pardo, L., Pardo, M.C.: Size and power considerations for testing loglinear models using \(\phi \)-divergence test statistics. Statistica Sinica 13, 555–570 (2003)

    MathSciNet  MATH  Google Scholar 

  10. Fienberg, S.E., Rinaldo, A.: Maximum likelihood estimation in log-linear models. Ann. Stat. 40, 996–1023 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  11. Grizzle, J.E., Starmer, C.F., Koch, G.G.: Analysis of categorical data by linear models. Biometrics 25, 489–504 (1969)

    MathSciNet  Article  MATH  Google Scholar 

  12. Haberman, S.J.: The Analysis of Frequency Data. University of Chicago Press, Chicago (1974)

    MATH  Google Scholar 

  13. Hall, D.B.: Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  14. Martín, N., Pardo, L.: New families of estimators and test statistics in log-linear models. J. Multivar. Anal. 99, 1590–1609 (2008a)

    MathSciNet  Article  MATH  Google Scholar 

  15. Martín, N., Pardo, L.: Minimum phi-divergence estimators for loglinear models with linear constraints and multinomial sampling. Stat. Pap. 49, 15–36 (2008b)

    MathSciNet  Article  MATH  Google Scholar 

  16. Martín, N., Pardo, L.: A new measure of leverage cells in multinomial loglinear models. Commun. Stat. 39, 517–530 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  17. Martín, N., Pardo, L.: Fitting DNA sequences through log-linear modelling with linear constraints. Statistics 45, 605–621 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  18. Martín, N., Pardo, L.: Poisson loglinear modeling with linear constraints on the expected cell frequencies. Sankhya 74B, 238–267 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  19. Menéndez, M.L., Morales, D., Pardo, L., Vajda, I.: Divergence-based estimation and testing of statistical models of classification. J. Multivar. Anal. 54, 329–354 (1995)

    MathSciNet  Article  MATH  Google Scholar 

  20. Menéndez, M.L., Morales, D., Pardo, L., Vajda, I.: About divergence-based goodness-of-fit tests in the Dirichlet-multinomial model. Commun. Stat. 25, 1119–1133 (1996)

    MathSciNet  Article  MATH  Google Scholar 

  21. Morel, J.G., Nagaraj, N.K.: A finite mixture distribution for modelling multinomial extra variation. Biometrika 80, 363–371 (1993)

    MathSciNet  Article  MATH  Google Scholar 

  22. Morel, J.G., Neerchal, N.K.: Overdispersion Models in SAS. SAS Press, Cary (2012)

    Google Scholar 

  23. Mosimann, J.E.: On the compound multinomial distributions, the multivariate \(\beta \)-distribution and correlation among proportions. Biometrika 49, 65–82 (1962)

    MathSciNet  MATH  Google Scholar 

  24. Neerchal, N.K., Morel, J.G.: Large cluster results for two parametric multinomial extra variation models. J. Am. Stat. Assoc. 93, 1078–1087 (1998)

    MathSciNet  Article  MATH  Google Scholar 

  25. Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC, Boca Raton (2006)

    MATH  Google Scholar 

  26. Raim, A.M.: Computational Methods for Finite Mixtures Using Approximate Information and Regression Linked to the Mixture Mean. PhD Thesis, University of Mayland (2014)

  27. Raim, A.M. , Neerchal, N.K. Morel, J.G.: Modeling overdispersion in \(R\). Technical Report HPCI-2015-1 UMBCH High Performance Computing Facility, University of Maryland (2015)

  28. Vos, P.W.: Minimum f-divergence estimators and quasi-likelihood functions. Ann. Inst. Stat. Math. 44, 261–279 (1992)

    MathSciNet  Article  MATH  Google Scholar 

  29. Wedderburn, R.W.M.: Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61, 439–447 (1974)

    MathSciNet  MATH  Google Scholar 

  30. Weir, B.S., Hill, W.G.: Estimating F-statistics. Annu. Rev. Genet. 36, 721–750 (2002)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank the referees for their helpful comments and suggestions. This research is supported by the Spanish Grant MTM2012-33740 from Ministerio de Economia y Competitividad.

Author information

Affiliations

Authors

Corresponding author

Correspondence to N. Martín.

Appendix

Appendix

Zero-inflated binomial distribution

The binomial distribution with zero inflation in the first cell, i.e., n-inflation in the second cell, is given by

$$\begin{aligned}&\left( \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=v\right) \\&\quad =\left\{ \begin{array}{lll} \mathcal {M}\left( n, \begin{pmatrix} p_{1}(\varvec{\theta })\\ p_{2}(\varvec{\theta }) \end{pmatrix} \right) , &{} \quad \text {if }v=1, &{} \text {with }\Pr (V=1)=w\\ n\varvec{e}_{2}, &{} \quad \text {if }v=0, &{} \text {with }\Pr (V=0)=1-w \end{array} \right. . \end{aligned}$$

Its first order moment vector is given by

$$\begin{aligned} E\left[ \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right]&=E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] \Pr \left( V=1\right) \\&\quad +E\left[ \left. n\varvec{e}_{2}\right| V=0\right] \Pr \left( V=0\right) \\&=n \begin{pmatrix} wp_{1}(\varvec{\theta })\\ 1-wp_{1}(\varvec{\theta }) \end{pmatrix} . \end{aligned}$$

The derivation for the the second order moment matrix calculation is given by

$$\begin{aligned} E\left[ Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right]&=Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] \Pr \left( V=1\right) \\&\quad +Var\left[ \left. n\varvec{e}_{2}\right| V=0\right] \Pr \left( V=0\right) \\&=Var\left[ \mathcal {M}\left( n, \begin{pmatrix} p_{1}(\varvec{\theta })\\ p_{2}(\varvec{\theta }) \end{pmatrix} \right) \right] w\\&=nwp_{1}(\varvec{\theta })\left( 1-p_{1}(\varvec{\theta })\right) \begin{pmatrix} 1 &{}\,\, -1\\ -1 &{}\,\, 1 \end{pmatrix}, \end{aligned}$$
$$\begin{aligned}&Var\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&\quad -E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] E^{T}\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] w\\&\quad +E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=0\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=0\right] (1-w)\\&\quad -n^{2} \begin{pmatrix} wp_{1}(\varvec{\theta })\\ 1-wp_{1}(\varvec{\theta }) \end{pmatrix} \begin{pmatrix} wp_{1}(\varvec{\theta })&1-wp_{1}(\varvec{\theta }) \end{pmatrix} \\&=n^{2}(1-w)wp_{1}^{2}(\varvec{\theta }) \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} , \end{aligned}$$

and hence

$$\begin{aligned}&Var\left[ \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right] \\&\quad =E\left[ Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] +Var\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&\quad =nwp_{1}(\varvec{\theta })\left[ \left( 1-p_{1}(\varvec{\theta })\right) +n(1-w)p_{1}(\varvec{\theta })\right] \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} \\&\quad =nwp_{1}(\varvec{\theta })(1-wp_{1}(\varvec{\theta }))(1+\rho ^{2}(n-1)) \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} , \end{aligned}$$

where

$$\begin{aligned} \rho ^{2}=\frac{(1-w)p_{1}(\varvec{\theta })}{1-wp_{1}(\varvec{\theta } )},\quad \text {for any }w\in (0,1). \end{aligned}$$

This result matches the one given in Morel and Neerchal (2012, p. 83). Let

$$\begin{aligned}&\left( \left. \varvec{Y}\right| V=v\right) \nonumber \\&\quad =\left\{ \begin{array}{l@{\quad }l@{\quad }l} \mathcal {M}\left( n,\varvec{p}(\varvec{\theta })\right) , &{} \text {if }v=1, &{} \text {with }\Pr (V=1)=w\\ n\varvec{e}_{M}, &{} \text {if }v=0, &{} \text {with }\Pr (V=0)=1-w \end{array} \right. \end{aligned}$$

be the multinomial distribution with zero inflation in the first \(M-1\) cells, i.e., n-inflation in the M-th cell.

For \(M\ge 3\), a univariate homogeneous intracluster correlation coefficient, \(\rho ^{2}\), seems not to be an appropriate measure to characterize the variability of this distribution, since the intracluster correlation along the cells seems to be heterogeneous. The reason for this is that for \(M\ge 3\) there is not an expression for the variance-covariance matrix of the multinomial distribution defined as a matrix not depending on parameters multiplied by a scalar with all the information about the parameters of the distribution.

Proof of Theorem 3.2

Let

$$\begin{aligned} \varvec{S}_{\varvec{Y}}=\frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) ^{T}, \end{aligned}$$

the matrix of quasi-variances and quasi-covariances of the simple random sample \(\varvec{Y}^{(1)},\ldots ,\varvec{Y}^{(N)}\) and

$$\begin{aligned} \overline{\varvec{S}}_{\varvec{Y}}&=\mathrm {diag}(\varvec{S} _{\varvec{Y}})= \begin{pmatrix} S_{Y_{1}}^{2} &{} &{} \\ &{} \ddots &{} \\ &{} &{} S_{Y_{M}}^{2} \end{pmatrix} ,\\ S_{Y_{r}}^{2}&=\frac{1}{N-1}\sum _{\ell =1}^{N}(Y^{(\ell ,r)}-n\widehat{p} _{r})^{2}. \end{aligned}$$

It is well-known that each diagonal element of \(\overline{\varvec{S} }_{\varvec{Y}}\) is a consistent estimator of each diagonal element of \(\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })}\), i.e.,

$$\begin{aligned} \mathrm {E}\left[ \overline{\varvec{S}}_{\varvec{Y}}\right] =\mathrm {diag}\{\mathrm {E}\left[ \varvec{S}_{\varvec{Y}}\right] \}= & {} \mathrm {diag}\{\mathrm {Var}[\varvec{Y}^{(\ell )}]\}\\= & {} \mathrm {diag} \{\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })}\}, \end{aligned}$$

and

$$\begin{aligned}&S_{Y_{r}}^{2}\overset{P}{\underset{N\rightarrow \infty }{\longrightarrow } }\vartheta _{n}np_{r}(\varvec{\theta })\left( 1-p_{r}(\varvec{\theta })\right) ,\quad r=1,\ldots ,M,\nonumber \\&\text {or}\quad \overline{\varvec{S}}_{\varvec{Y}} \overset{P}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathrm {diag} (\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta } )}). \end{aligned}$$
(8.1)

It is not difficult to establish that

$$\begin{aligned} \mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}})= & {} \sum _{r=1} ^{M}S_{Y_{r}}^{2}=\mathrm {trace}(\varvec{S}_{\varvec{Y}})\nonumber \\= & {} \frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) , \end{aligned}$$
(8.2)

which is consistent for \(\mathrm {trace}(\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })})=\vartheta _{n}n\sum _{r=1}^{M} p_{r}(\varvec{\theta })\left( 1-p_{r}(\varvec{\theta })\right) \). We know that the chi-square test-statistic \(X^{2}(\widetilde{\varvec{Y}})\), given in (3.3), has an asymptotic \(\mathcal {\chi }_{(N-1)(M-1)}^{2}\) distribution for fixed values of number of clusters N and an increasing cluster size, n, under the assumption of inter-cluster level homogeneity. However, this distribution is not a useful device for the proof. Based on the expression of the chi-square test-statistic, \(X^{2}(\widetilde{\varvec{Y} })\), in terms of the variance-covariance matrix, as well as the same steps to obtain the expression and consistency of (8.2), we are going to establish (3.4). We have

$$\begin{aligned}&\mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })}^{-1})\\&\quad =\frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p} }\right) ^{T}\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta } )}^{-1}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \end{aligned}$$

and

$$\begin{aligned}&\mathrm {E}\left[ \mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y} }\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })} ^{-1})\right] \\&\quad =\mathrm {traceE}\left[ \overline{\varvec{S} }_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p} (\varvec{\theta })}^{-1}\right] =\mathrm {trace}\left( \mathrm {E}\left[ \overline{\varvec{S}}_{\varvec{Y}}\right] \tfrac{1}{n}\varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\mathrm {trace}\left( \vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })} \tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\vartheta _{n}\mathrm {trace}\left( \varvec{\Sigma }_{\varvec{p} (\varvec{\theta })}\varvec{D}_{\varvec{p}(\varvec{\theta } )}^{-1}\right) \\&\quad =\vartheta _{n}\mathrm {trace}\left( \left( \varvec{D} _{\varvec{p}(\varvec{\theta })}-\varvec{p}(\varvec{\theta })\varvec{p}^{T}(\varvec{\theta })\right) \varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\vartheta _{n}\left[ \mathrm {trace}(\varvec{I}_{M})-\mathrm {trace} (\varvec{p}(\varvec{\theta })\varvec{1}_{M}^{T})\right] =\vartheta _{n}(M-1). \end{aligned}$$

Hence,

$$\begin{aligned}&\mathrm {E}\left[ \frac{1}{M-1}\mathrm {trace}(\overline{\varvec{S} }_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p} (\varvec{\theta })}^{-1})\right] \\&\quad =\mathrm {E}\Bigg [ \frac{1}{(N-1)(M-1)}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\\&\qquad \times \tfrac{1}{n}\varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \Bigg ] =\vartheta _{n}, \end{aligned}$$

and taking into account that \(\widehat{\varvec{p}}\) is a consistent estimator of \(\varvec{p}(\varvec{\theta })\), as \(N\rightarrow \infty \), as well as (8.1),

$$\begin{aligned}&\frac{1}{M-1}\mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}} \tfrac{1}{n}\varvec{D}_{\widehat{\varvec{p}}}^{-1})\\&\quad =\frac{1}{(N-1)(M-1)}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\tfrac{1}{n}\varvec{D} _{\widehat{\varvec{p}}}^{-1}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) \\&\quad =\frac{X^{2}(\widetilde{\varvec{Y}} )}{(N-1)(M-1)} \end{aligned}$$

tends in probability to \(\vartheta _{n}\), as \(N\rightarrow \infty \). In other words,

$$\begin{aligned}&\frac{X^{2}(\widetilde{\varvec{Y}})}{(N-1)(M-1)}\\&\quad =\frac{1}{(M-1)n} \sum _{r=1}^{M}\frac{1}{\widehat{p}_{r}}S_{Y_{r}}^{2}\overset{P}{\underset{N\rightarrow \infty }{\longrightarrow }}\frac{\vartheta _{n}n}{(M-1)n}\\&\qquad \sum _{r=1}^{M}\frac{p_{r}(\varvec{\theta })}{p_{r} (\varvec{\theta })}\left( 1-p_{r}(\varvec{\theta })\right) =\vartheta _{n}. \end{aligned}$$

In addition, taking into account (1.9), the right hand size of (3.4) follows. Finally, we like to mention that even though \(X^{2}(\widetilde{\varvec{Y}})\) and \(\vartheta _{n}(N-1)(M-1)\) have the same expectation for a fixed value of N, this proof is not trivial since \(\vartheta _{n}(N-1)(M-1)\) as well as \(X^{2}(\widetilde{\varvec{Y}})\) tend to infinite as \(N\rightarrow \infty \).

Proof of Theorem 2.2

By applying the Central Limit Theorem it holds (3.1). Hence, from Pardo (2006, formula (7.10)), for the minimum phi-divergence estimator of \(\varvec{\theta }\) of a log-linear model it holds

$$\begin{aligned} \sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0})= & {} \left( \varvec{\varvec{W}}^{T}\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\right) ^{-1}\varvec{W}^{T} \varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }\nonumber \\&\times \,\varvec{D} _{\varvec{p}\left( \theta _{0}\right) }^{-1}\sqrt{N}\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}\left( \varvec{1}_{M_{0}}\right) , \end{aligned}$$
(8.3)

and the variance-covariance matrix of \(\sqrt{N}(\widehat{\varvec{\theta } }_{\phi }-\varvec{\theta }_{0})\) is

$$\begin{aligned}&\tfrac{\vartheta _{n}}{n}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}\varvec{W}^{T}\varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1}\nonumber \\&\qquad \times \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{W}\left( \varvec{\varvec{W}}^{T}\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\right) ^{-1}\nonumber \\&\quad =\tfrac{\vartheta _{n}}{n}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}. \end{aligned}$$
(8.4)

The last equality comes from

$$\begin{aligned} \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1} \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }=\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }. \end{aligned}$$

From the Taylor expansion of \(\varvec{p}(\widehat{\varvec{\theta } }_{\phi })\) around \(\varvec{p}(\varvec{\theta }_{0})\) we obtain

$$\begin{aligned} \sqrt{N}(\varvec{p}(\widehat{\varvec{\theta }}_{\phi })-\varvec{p} (\varvec{\theta }_{0}))=\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0}) +o_{p}\left( \varvec{1}_{M}\right) , \end{aligned}$$
(8.5)

and the variance-covariance matrix of \(\sqrt{N}(\varvec{p} (\widehat{\varvec{\theta }}_{\phi })-\varvec{p}(\varvec{\theta } _{0}))\) is

$$\begin{aligned} \tfrac{\vartheta _{n}}{n}\varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }}W}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}\varvec{W}^{T}\varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }. \end{aligned}$$
(8.6)

Since \(\sqrt{N}\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \) is normal and centred, from (8.3) and (8.4), (2.8) is obtained. Similarly, since \(\sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0})\) is normal and centred, from (8.5) and (8.6), (2.9) is obtained.

Derivation of Formula (4.4)

Multiplying (4.3) by \(\sqrt{N_{g}}n_{g}\big / \sum \limits _{h=1}^{G}n_{h}N_{h}\)

$$\begin{aligned} w_{g}(\widehat{\varvec{p}}^{(g)}-\varvec{p}\left( \varvec{\theta }_{0}\right) )\overset{\mathcal {L}}{\underset{N_{g}\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0}_{M},\tfrac{n_{g} N_{g}\vartheta _{n_{g}}}{\left( \sum \nolimits _{h=1}^{G}n_{h}N_{h}\right) ^{2}}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta } _{0}\right) }\right) , \end{aligned}$$

hence summing up from \(g=1\) to G and by the independence of clusters

$$\begin{aligned}&\sum \limits _{g=1}^{G}w_{g}(\widehat{\varvec{p}}^{\left( g\right) }-\varvec{p}\left( \varvec{\theta }_{0}\right) )\\&\quad =\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \\&\qquad \overset{\mathcal {L}}{\underset{N_{g}\rightarrow \infty ,\;g=1,\ldots ,G}{\longrightarrow }}\mathcal {N}\left( \varvec{0}_{M} ,\tfrac{\sum \nolimits _{g=1}^{G}n_{g}N_{g}\vartheta _{n_{g}}}{\left( \sum \nolimits _{h=1}^{G}n_{h}N_{h}\right) ^{2}}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\right) . \end{aligned}$$

Finally multiplying the previous expression by \(\sum \nolimits _{h=1} ^{G}n_{h}N_{h}\big / \sqrt{\sum \nolimits _{g=1}^{G}n_{g}N_{g}\vartheta _{n_{g}}}\), the desired expression is obtained.

Algorithms for Dirichlet-multinomial, n-inflated and random-clumped distributions

The usual parameters of the M-dimensional random variable \(\varvec{Y} =(Y_{1},\ldots ,Y_{M})^{T}\) with Dirichlet-multinomial distribution are \(\varvec{\alpha }=\left( \alpha _{11},\ldots ,\alpha _{M1}\right) ^{T}\), where \(\alpha _{r1}=\frac{1-\rho ^{2}}{\rho ^{2}}p_{r}\left( \varvec{\theta }\right) , r=1,\ldots ,M\). For convenience it is considered with parameters \(\varvec{\beta }= {\begin{pmatrix} \rho ^{2}\\ \varvec{p}(\varvec{\theta }) \end{pmatrix}} , \varvec{p}\left( \varvec{\theta }\right) =\left( p_{1}\left( \varvec{\theta }\right) ,\ldots ,p_{M}\left( \varvec{\theta }\right) \right) ^{T}\), and is generated as follows:

STEP 1. :

Generate \( B_{1}\sim Beta(\alpha _{11},\alpha _{12})\), with \(\alpha _{11}=\frac{1-\rho ^{2}}{\rho ^{2}}p_{1}\left( \varvec{\theta }\right) , \alpha _{12}=\frac{1-\rho ^{2}}{\rho ^{2}}(1-p_{1}\left( \varvec{\theta }\right) )\).

STEP 2. :

Generate \(\left( Y_{1} |B_{1}=b_{1}\right) \sim Bin(n,b_{1})\).

STEP 3. :

For \(r=2,\ldots ,M-1\) do:

Generate \(B_{r}\sim Beta(\alpha _{r1},\alpha _{r2})\) , with \(\alpha _{r1} =\frac{1-\rho ^{2}}{\rho ^{2}}p_{r}\left( \varvec{\theta }\right) , \alpha _{r2}=\frac{1-\rho ^{2}}{\rho ^{2}}\left( 1-\sum _{h=1}^{r} p_{h}\left( \varvec{\theta }\right) \right) \).

Generate \(( Y_{r}|Y_{1}=y_{1},\ldots ,Y_{r-1} =y_{r-1},B_{r}=b_{r}) \sim Bin\left( n-\sum _{h=1}^{r-1} y_{h},b_{r}\right) \).

STEP 4. :

Do \(\left( Y_{M}|Y_{1}=y_{1},\ldots ,Y_{M-1}=y_{M-1}\right) =n-\sum _{h=1}^{M-1}y_{h} \).

The random variable \(\varvec{Y}=(Y_{1},\ldots ,Y_{M})^{T}\) of the n-inflated multinomial distribution with parameters \(\varvec{\beta }, \varvec{p}\left( \varvec{\theta }\right) \), is generated as follows:

STEP 1. :

Generate \(V\sim Ber(\rho ^{2})\).

STEP 2. :

Generate

$$\begin{aligned}&\left( \varvec{Y|}V=v\right) =\left\{ \begin{array}{l@{\quad }l} \mathcal {M}(n,\varvec{p}\left( \varvec{\theta }\right) ), &{} \text {if }v=0\\ n\mathcal {M}(1,\varvec{p}\left( \varvec{\theta }\right) ), &{} \text {if }v=1 \end{array} \right. . \end{aligned}$$

The random variable \(\varvec{Y}=(Y_{1},\ldots ,Y_{M})^{T}\) of the random clumped distribution with parameters \(\varvec{\beta }, \varvec{p} \left( \varvec{\theta }\right) \), is generated as follows:

STEP 1. :

Generate \(\varvec{Y}_{0}=(Y_{01} ,\ldots ,Y_{0M})^{T}\sim \mathcal {M}(1,\varvec{p}\left( \varvec{\theta }\right) )\).

STEP 2. :

Generate \(K_{1}\sim Bin(n,\rho )\).

STEP 3. :

Generate \(\left( \varvec{Y}_{1}|K_{1}=k_{1}\right) =\big ( (Y_{11},\ldots ,Y_{1M})^{T} | K_{1}=k_{1}\big ) \sim \mathcal {M}(n-k_{1},\varvec{p}\left( \varvec{\theta }\right) )\).

STEP 4. :

Do \(\left( \varvec{Y|}K_{1}=k_{1}\right) \varvec{=Y}_{0}k_{1}+\left( \varvec{Y}_{1}|K_{1}=k_{1}\right) \).

For the details about the equivalence of this algorithm and (1.12), see Morel and Nagaraj (1993).

It is interesting to note that there exists the package “Modeling overdispersion in R” useful to generate the distributions considered in this Appendix. For more details see Raim et al. (2015).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alonso-Revenga, J.M., Martín, N. & Pardo, L. New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes. Stat Comput 27, 193–217 (2017). https://doi.org/10.1007/s11222-015-9616-z

Download citation

Keywords

  • Clustered multinomial data
  • Consistent intracluster correlation estimator
  • Log-linear model
  • Overdispersion
  • Quasi minimum divergence estimator