Skip to main content
Log in

New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

It is usual to rely on the quasi-likelihood methods for deriving statistical methods applied to clustered multinomial data with no underlying distribution. Even though extensive literature can be encountered for these kind of data sets, there are few investigations to deal with unequal cluster sizes. This paper aims to contribute to fill this gap by proposing new estimators for the intracluster correlation coefficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ahn, H., James, J.C.: Generation of over-dispersed and under-dispersed binomial variates. J. Comput. Graph. Stat. 4, 55–64 (1995)

    MathSciNet  Google Scholar 

  • Altham, P.M.E.: Discrete variable analysis for individuals grouped into families. Biometrika 63, 263–269 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  • Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  • Brier, S.S.: Analysis of contingency tables under cluster sampling. Biometrika 67, 591–596 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  • Budowle, B., Moretti, T.R.: Genotype profiles for six population groups at the 13 CODIS short tandem repeat core loci and other PCR-based loci. Forensic Science Communications 1999 (1999). http://www.fbi.gov/about-us/lab/forensic-science-communications/fsc/july1999/budowle.htm

  • Cohen, J.E.: The distribution of the chi-squared statistic under clustered sampling from contingency tables. J. Am. Stat. Assoc. 71, 665–670 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  • Cressie, N., Pardo, L.: Minimum \(\phi \)-divergence estimator and hierarchical testing in loglinear models. Statistica Sinica 10, 867–884 (2000)

    MathSciNet  MATH  Google Scholar 

  • Cressie, N., Pardo, L.: Model checking in loglinear models using \(\phi \)-divergences and MLEs. J. Stat. Plan. Inference 103, 437–453 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Cressie, N., Pardo, L., Pardo, M.C.: Size and power considerations for testing loglinear models using \(\phi \)-divergence test statistics. Statistica Sinica 13, 555–570 (2003)

    MathSciNet  MATH  Google Scholar 

  • Fienberg, S.E., Rinaldo, A.: Maximum likelihood estimation in log-linear models. Ann. Stat. 40, 996–1023 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Grizzle, J.E., Starmer, C.F., Koch, G.G.: Analysis of categorical data by linear models. Biometrics 25, 489–504 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  • Haberman, S.J.: The Analysis of Frequency Data. University of Chicago Press, Chicago (1974)

    MATH  Google Scholar 

  • Hall, D.B.: Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Martín, N., Pardo, L.: New families of estimators and test statistics in log-linear models. J. Multivar. Anal. 99, 1590–1609 (2008a)

    Article  MathSciNet  MATH  Google Scholar 

  • Martín, N., Pardo, L.: Minimum phi-divergence estimators for loglinear models with linear constraints and multinomial sampling. Stat. Pap. 49, 15–36 (2008b)

    Article  MathSciNet  MATH  Google Scholar 

  • Martín, N., Pardo, L.: A new measure of leverage cells in multinomial loglinear models. Commun. Stat. 39, 517–530 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Martín, N., Pardo, L.: Fitting DNA sequences through log-linear modelling with linear constraints. Statistics 45, 605–621 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Martín, N., Pardo, L.: Poisson loglinear modeling with linear constraints on the expected cell frequencies. Sankhya 74B, 238–267 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Menéndez, M.L., Morales, D., Pardo, L., Vajda, I.: Divergence-based estimation and testing of statistical models of classification. J. Multivar. Anal. 54, 329–354 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Menéndez, M.L., Morales, D., Pardo, L., Vajda, I.: About divergence-based goodness-of-fit tests in the Dirichlet-multinomial model. Commun. Stat. 25, 1119–1133 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Morel, J.G., Nagaraj, N.K.: A finite mixture distribution for modelling multinomial extra variation. Biometrika 80, 363–371 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Morel, J.G., Neerchal, N.K.: Overdispersion Models in SAS. SAS Press, Cary (2012)

    Google Scholar 

  • Mosimann, J.E.: On the compound multinomial distributions, the multivariate \(\beta \)-distribution and correlation among proportions. Biometrika 49, 65–82 (1962)

    MathSciNet  MATH  Google Scholar 

  • Neerchal, N.K., Morel, J.G.: Large cluster results for two parametric multinomial extra variation models. J. Am. Stat. Assoc. 93, 1078–1087 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC, Boca Raton (2006)

    MATH  Google Scholar 

  • Raim, A.M.: Computational Methods for Finite Mixtures Using Approximate Information and Regression Linked to the Mixture Mean. PhD Thesis, University of Mayland (2014)

  • Raim, A.M. , Neerchal, N.K. Morel, J.G.: Modeling overdispersion in \(R\). Technical Report HPCI-2015-1 UMBCH High Performance Computing Facility, University of Maryland (2015)

  • Vos, P.W.: Minimum f-divergence estimators and quasi-likelihood functions. Ann. Inst. Stat. Math. 44, 261–279 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Wedderburn, R.W.M.: Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61, 439–447 (1974)

    MathSciNet  MATH  Google Scholar 

  • Weir, B.S., Hill, W.G.: Estimating F-statistics. Annu. Rev. Genet. 36, 721–750 (2002)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank the referees for their helpful comments and suggestions. This research is supported by the Spanish Grant MTM2012-33740 from Ministerio de Economia y Competitividad.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Martín.

Appendix

Appendix

1.1 Zero-inflated binomial distribution

The binomial distribution with zero inflation in the first cell, i.e., n-inflation in the second cell, is given by

$$\begin{aligned}&\left( \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=v\right) \\&\quad =\left\{ \begin{array}{lll} \mathcal {M}\left( n, \begin{pmatrix} p_{1}(\varvec{\theta })\\ p_{2}(\varvec{\theta }) \end{pmatrix} \right) , &{} \quad \text {if }v=1, &{} \text {with }\Pr (V=1)=w\\ n\varvec{e}_{2}, &{} \quad \text {if }v=0, &{} \text {with }\Pr (V=0)=1-w \end{array} \right. . \end{aligned}$$

Its first order moment vector is given by

$$\begin{aligned} E\left[ \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right]&=E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] \Pr \left( V=1\right) \\&\quad +E\left[ \left. n\varvec{e}_{2}\right| V=0\right] \Pr \left( V=0\right) \\&=n \begin{pmatrix} wp_{1}(\varvec{\theta })\\ 1-wp_{1}(\varvec{\theta }) \end{pmatrix} . \end{aligned}$$

The derivation for the the second order moment matrix calculation is given by

$$\begin{aligned} E\left[ Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right]&=Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] \Pr \left( V=1\right) \\&\quad +Var\left[ \left. n\varvec{e}_{2}\right| V=0\right] \Pr \left( V=0\right) \\&=Var\left[ \mathcal {M}\left( n, \begin{pmatrix} p_{1}(\varvec{\theta })\\ p_{2}(\varvec{\theta }) \end{pmatrix} \right) \right] w\\&=nwp_{1}(\varvec{\theta })\left( 1-p_{1}(\varvec{\theta })\right) \begin{pmatrix} 1 &{}\,\, -1\\ -1 &{}\,\, 1 \end{pmatrix}, \end{aligned}$$
$$\begin{aligned}&Var\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&\quad -E\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] E^{T}\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&=E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=1\right] w\\&\quad +E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=0\right] E^{T}\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V=0\right] (1-w)\\&\quad -n^{2} \begin{pmatrix} wp_{1}(\varvec{\theta })\\ 1-wp_{1}(\varvec{\theta }) \end{pmatrix} \begin{pmatrix} wp_{1}(\varvec{\theta })&1-wp_{1}(\varvec{\theta }) \end{pmatrix} \\&=n^{2}(1-w)wp_{1}^{2}(\varvec{\theta }) \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} , \end{aligned}$$

and hence

$$\begin{aligned}&Var\left[ \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right] \\&\quad =E\left[ Var\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] +Var\left[ E\left[ \left. \begin{pmatrix} Y_{1}\\ Y_{2} \end{pmatrix} \right| V\right] \right] \\&\quad =nwp_{1}(\varvec{\theta })\left[ \left( 1-p_{1}(\varvec{\theta })\right) +n(1-w)p_{1}(\varvec{\theta })\right] \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} \\&\quad =nwp_{1}(\varvec{\theta })(1-wp_{1}(\varvec{\theta }))(1+\rho ^{2}(n-1)) \begin{pmatrix} 1 &{}\quad -1\\ -1 &{}\quad 1 \end{pmatrix} , \end{aligned}$$

where

$$\begin{aligned} \rho ^{2}=\frac{(1-w)p_{1}(\varvec{\theta })}{1-wp_{1}(\varvec{\theta } )},\quad \text {for any }w\in (0,1). \end{aligned}$$

This result matches the one given in Morel and Neerchal (2012, p. 83). Let

$$\begin{aligned}&\left( \left. \varvec{Y}\right| V=v\right) \nonumber \\&\quad =\left\{ \begin{array}{l@{\quad }l@{\quad }l} \mathcal {M}\left( n,\varvec{p}(\varvec{\theta })\right) , &{} \text {if }v=1, &{} \text {with }\Pr (V=1)=w\\ n\varvec{e}_{M}, &{} \text {if }v=0, &{} \text {with }\Pr (V=0)=1-w \end{array} \right. \end{aligned}$$

be the multinomial distribution with zero inflation in the first \(M-1\) cells, i.e., n-inflation in the M-th cell.

For \(M\ge 3\), a univariate homogeneous intracluster correlation coefficient, \(\rho ^{2}\), seems not to be an appropriate measure to characterize the variability of this distribution, since the intracluster correlation along the cells seems to be heterogeneous. The reason for this is that for \(M\ge 3\) there is not an expression for the variance-covariance matrix of the multinomial distribution defined as a matrix not depending on parameters multiplied by a scalar with all the information about the parameters of the distribution.

1.2 Proof of Theorem 3.2

Let

$$\begin{aligned} \varvec{S}_{\varvec{Y}}=\frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) ^{T}, \end{aligned}$$

the matrix of quasi-variances and quasi-covariances of the simple random sample \(\varvec{Y}^{(1)},\ldots ,\varvec{Y}^{(N)}\) and

$$\begin{aligned} \overline{\varvec{S}}_{\varvec{Y}}&=\mathrm {diag}(\varvec{S} _{\varvec{Y}})= \begin{pmatrix} S_{Y_{1}}^{2} &{} &{} \\ &{} \ddots &{} \\ &{} &{} S_{Y_{M}}^{2} \end{pmatrix} ,\\ S_{Y_{r}}^{2}&=\frac{1}{N-1}\sum _{\ell =1}^{N}(Y^{(\ell ,r)}-n\widehat{p} _{r})^{2}. \end{aligned}$$

It is well-known that each diagonal element of \(\overline{\varvec{S} }_{\varvec{Y}}\) is a consistent estimator of each diagonal element of \(\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })}\), i.e.,

$$\begin{aligned} \mathrm {E}\left[ \overline{\varvec{S}}_{\varvec{Y}}\right] =\mathrm {diag}\{\mathrm {E}\left[ \varvec{S}_{\varvec{Y}}\right] \}= & {} \mathrm {diag}\{\mathrm {Var}[\varvec{Y}^{(\ell )}]\}\\= & {} \mathrm {diag} \{\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })}\}, \end{aligned}$$

and

$$\begin{aligned}&S_{Y_{r}}^{2}\overset{P}{\underset{N\rightarrow \infty }{\longrightarrow } }\vartheta _{n}np_{r}(\varvec{\theta })\left( 1-p_{r}(\varvec{\theta })\right) ,\quad r=1,\ldots ,M,\nonumber \\&\text {or}\quad \overline{\varvec{S}}_{\varvec{Y}} \overset{P}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathrm {diag} (\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta } )}). \end{aligned}$$
(8.1)

It is not difficult to establish that

$$\begin{aligned} \mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}})= & {} \sum _{r=1} ^{M}S_{Y_{r}}^{2}=\mathrm {trace}(\varvec{S}_{\varvec{Y}})\nonumber \\= & {} \frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) , \end{aligned}$$
(8.2)

which is consistent for \(\mathrm {trace}(\vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })})=\vartheta _{n}n\sum _{r=1}^{M} p_{r}(\varvec{\theta })\left( 1-p_{r}(\varvec{\theta })\right) \). We know that the chi-square test-statistic \(X^{2}(\widetilde{\varvec{Y}})\), given in (3.3), has an asymptotic \(\mathcal {\chi }_{(N-1)(M-1)}^{2}\) distribution for fixed values of number of clusters N and an increasing cluster size, n, under the assumption of inter-cluster level homogeneity. However, this distribution is not a useful device for the proof. Based on the expression of the chi-square test-statistic, \(X^{2}(\widetilde{\varvec{Y} })\), in terms of the variance-covariance matrix, as well as the same steps to obtain the expression and consistency of (8.2), we are going to establish (3.4). We have

$$\begin{aligned}&\mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })}^{-1})\\&\quad =\frac{1}{N-1}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p} }\right) ^{T}\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta } )}^{-1}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \end{aligned}$$

and

$$\begin{aligned}&\mathrm {E}\left[ \mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y} }\tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })} ^{-1})\right] \\&\quad =\mathrm {traceE}\left[ \overline{\varvec{S} }_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p} (\varvec{\theta })}^{-1}\right] =\mathrm {trace}\left( \mathrm {E}\left[ \overline{\varvec{S}}_{\varvec{Y}}\right] \tfrac{1}{n}\varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\mathrm {trace}\left( \vartheta _{n}n\varvec{\Sigma }_{\varvec{p}(\varvec{\theta })} \tfrac{1}{n}\varvec{D}_{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\vartheta _{n}\mathrm {trace}\left( \varvec{\Sigma }_{\varvec{p} (\varvec{\theta })}\varvec{D}_{\varvec{p}(\varvec{\theta } )}^{-1}\right) \\&\quad =\vartheta _{n}\mathrm {trace}\left( \left( \varvec{D} _{\varvec{p}(\varvec{\theta })}-\varvec{p}(\varvec{\theta })\varvec{p}^{T}(\varvec{\theta })\right) \varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\right) \\&\quad =\vartheta _{n}\left[ \mathrm {trace}(\varvec{I}_{M})-\mathrm {trace} (\varvec{p}(\varvec{\theta })\varvec{1}_{M}^{T})\right] =\vartheta _{n}(M-1). \end{aligned}$$

Hence,

$$\begin{aligned}&\mathrm {E}\left[ \frac{1}{M-1}\mathrm {trace}(\overline{\varvec{S} }_{\varvec{Y}}\tfrac{1}{n}\varvec{D}_{\varvec{p} (\varvec{\theta })}^{-1})\right] \\&\quad =\mathrm {E}\Bigg [ \frac{1}{(N-1)(M-1)}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\\&\qquad \times \tfrac{1}{n}\varvec{D} _{\varvec{p}(\varvec{\theta })}^{-1}\left( \varvec{Y}^{(\ell )}-n\widehat{\varvec{p}}\right) \Bigg ] =\vartheta _{n}, \end{aligned}$$

and taking into account that \(\widehat{\varvec{p}}\) is a consistent estimator of \(\varvec{p}(\varvec{\theta })\), as \(N\rightarrow \infty \), as well as (8.1),

$$\begin{aligned}&\frac{1}{M-1}\mathrm {trace}(\overline{\varvec{S}}_{\varvec{Y}} \tfrac{1}{n}\varvec{D}_{\widehat{\varvec{p}}}^{-1})\\&\quad =\frac{1}{(N-1)(M-1)}\sum _{\ell =1}^{N}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) ^{T}\tfrac{1}{n}\varvec{D} _{\widehat{\varvec{p}}}^{-1}\left( \varvec{Y}^{(\ell )} -n\widehat{\varvec{p}}\right) \\&\quad =\frac{X^{2}(\widetilde{\varvec{Y}} )}{(N-1)(M-1)} \end{aligned}$$

tends in probability to \(\vartheta _{n}\), as \(N\rightarrow \infty \). In other words,

$$\begin{aligned}&\frac{X^{2}(\widetilde{\varvec{Y}})}{(N-1)(M-1)}\\&\quad =\frac{1}{(M-1)n} \sum _{r=1}^{M}\frac{1}{\widehat{p}_{r}}S_{Y_{r}}^{2}\overset{P}{\underset{N\rightarrow \infty }{\longrightarrow }}\frac{\vartheta _{n}n}{(M-1)n}\\&\qquad \sum _{r=1}^{M}\frac{p_{r}(\varvec{\theta })}{p_{r} (\varvec{\theta })}\left( 1-p_{r}(\varvec{\theta })\right) =\vartheta _{n}. \end{aligned}$$

In addition, taking into account (1.9), the right hand size of (3.4) follows. Finally, we like to mention that even though \(X^{2}(\widetilde{\varvec{Y}})\) and \(\vartheta _{n}(N-1)(M-1)\) have the same expectation for a fixed value of N, this proof is not trivial since \(\vartheta _{n}(N-1)(M-1)\) as well as \(X^{2}(\widetilde{\varvec{Y}})\) tend to infinite as \(N\rightarrow \infty \).

1.3 Proof of Theorem 2.2

By applying the Central Limit Theorem it holds (3.1). Hence, from Pardo (2006, formula (7.10)), for the minimum phi-divergence estimator of \(\varvec{\theta }\) of a log-linear model it holds

$$\begin{aligned} \sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0})= & {} \left( \varvec{\varvec{W}}^{T}\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\right) ^{-1}\varvec{W}^{T} \varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }\nonumber \\&\times \,\varvec{D} _{\varvec{p}\left( \theta _{0}\right) }^{-1}\sqrt{N}\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}\left( \varvec{1}_{M_{0}}\right) , \end{aligned}$$
(8.3)

and the variance-covariance matrix of \(\sqrt{N}(\widehat{\varvec{\theta } }_{\phi }-\varvec{\theta }_{0})\) is

$$\begin{aligned}&\tfrac{\vartheta _{n}}{n}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}\varvec{W}^{T}\varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1}\nonumber \\&\qquad \times \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{W}\left( \varvec{\varvec{W}}^{T}\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\right) ^{-1}\nonumber \\&\quad =\tfrac{\vartheta _{n}}{n}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}. \end{aligned}$$
(8.4)

The last equality comes from

$$\begin{aligned} \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\varvec{D}_{\varvec{p}\left( \theta _{0}\right) }^{-1} \varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }=\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }. \end{aligned}$$

From the Taylor expansion of \(\varvec{p}(\widehat{\varvec{\theta } }_{\phi })\) around \(\varvec{p}(\varvec{\theta }_{0})\) we obtain

$$\begin{aligned} \sqrt{N}(\varvec{p}(\widehat{\varvec{\theta }}_{\phi })-\varvec{p} (\varvec{\theta }_{0}))=\varvec{\Sigma \varvec{_{\varvec{p} \left( \theta _{0}\right) }}W}\sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0}) +o_{p}\left( \varvec{1}_{M}\right) , \end{aligned}$$
(8.5)

and the variance-covariance matrix of \(\sqrt{N}(\varvec{p} (\widehat{\varvec{\theta }}_{\phi })-\varvec{p}(\varvec{\theta } _{0}))\) is

$$\begin{aligned} \tfrac{\vartheta _{n}}{n}\varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }}W}\left( \varvec{\varvec{W}}^{T} \varvec{\Sigma \varvec{_{\varvec{p}\left( \theta _{0}\right) }} W}\right) ^{-1}\varvec{W}^{T}\varvec{\Sigma }_{p\left( \varvec{\theta }_{0}\right) }. \end{aligned}$$
(8.6)

Since \(\sqrt{N}\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \) is normal and centred, from (8.3) and (8.4), (2.8) is obtained. Similarly, since \(\sqrt{N}(\widehat{\varvec{\theta }}_{\phi }-\varvec{\theta }_{0})\) is normal and centred, from (8.5) and (8.6), (2.9) is obtained.

1.4 Derivation of Formula (4.4)

Multiplying (4.3) by \(\sqrt{N_{g}}n_{g}\big / \sum \limits _{h=1}^{G}n_{h}N_{h}\)

$$\begin{aligned} w_{g}(\widehat{\varvec{p}}^{(g)}-\varvec{p}\left( \varvec{\theta }_{0}\right) )\overset{\mathcal {L}}{\underset{N_{g}\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0}_{M},\tfrac{n_{g} N_{g}\vartheta _{n_{g}}}{\left( \sum \nolimits _{h=1}^{G}n_{h}N_{h}\right) ^{2}}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta } _{0}\right) }\right) , \end{aligned}$$

hence summing up from \(g=1\) to G and by the independence of clusters

$$\begin{aligned}&\sum \limits _{g=1}^{G}w_{g}(\widehat{\varvec{p}}^{\left( g\right) }-\varvec{p}\left( \varvec{\theta }_{0}\right) )\\&\quad =\left( \widehat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \\&\qquad \overset{\mathcal {L}}{\underset{N_{g}\rightarrow \infty ,\;g=1,\ldots ,G}{\longrightarrow }}\mathcal {N}\left( \varvec{0}_{M} ,\tfrac{\sum \nolimits _{g=1}^{G}n_{g}N_{g}\vartheta _{n_{g}}}{\left( \sum \nolimits _{h=1}^{G}n_{h}N_{h}\right) ^{2}}\varvec{\Sigma }_{\varvec{p}\left( \varvec{\theta }_{0}\right) }\right) . \end{aligned}$$

Finally multiplying the previous expression by \(\sum \nolimits _{h=1} ^{G}n_{h}N_{h}\big / \sqrt{\sum \nolimits _{g=1}^{G}n_{g}N_{g}\vartheta _{n_{g}}}\), the desired expression is obtained.

1.5 Algorithms for Dirichlet-multinomial, n-inflated and random-clumped distributions

The usual parameters of the M-dimensional random variable \(\varvec{Y} =(Y_{1},\ldots ,Y_{M})^{T}\) with Dirichlet-multinomial distribution are \(\varvec{\alpha }=\left( \alpha _{11},\ldots ,\alpha _{M1}\right) ^{T}\), where \(\alpha _{r1}=\frac{1-\rho ^{2}}{\rho ^{2}}p_{r}\left( \varvec{\theta }\right) , r=1,\ldots ,M\). For convenience it is considered with parameters \(\varvec{\beta }= {\begin{pmatrix} \rho ^{2}\\ \varvec{p}(\varvec{\theta }) \end{pmatrix}} , \varvec{p}\left( \varvec{\theta }\right) =\left( p_{1}\left( \varvec{\theta }\right) ,\ldots ,p_{M}\left( \varvec{\theta }\right) \right) ^{T}\), and is generated as follows:

STEP 1. :

Generate \( B_{1}\sim Beta(\alpha _{11},\alpha _{12})\), with \(\alpha _{11}=\frac{1-\rho ^{2}}{\rho ^{2}}p_{1}\left( \varvec{\theta }\right) , \alpha _{12}=\frac{1-\rho ^{2}}{\rho ^{2}}(1-p_{1}\left( \varvec{\theta }\right) )\).

STEP 2. :

Generate \(\left( Y_{1} |B_{1}=b_{1}\right) \sim Bin(n,b_{1})\).

STEP 3. :

For \(r=2,\ldots ,M-1\) do:

Generate \(B_{r}\sim Beta(\alpha _{r1},\alpha _{r2})\) , with \(\alpha _{r1} =\frac{1-\rho ^{2}}{\rho ^{2}}p_{r}\left( \varvec{\theta }\right) , \alpha _{r2}=\frac{1-\rho ^{2}}{\rho ^{2}}\left( 1-\sum _{h=1}^{r} p_{h}\left( \varvec{\theta }\right) \right) \).

Generate \(( Y_{r}|Y_{1}=y_{1},\ldots ,Y_{r-1} =y_{r-1},B_{r}=b_{r}) \sim Bin\left( n-\sum _{h=1}^{r-1} y_{h},b_{r}\right) \).

STEP 4. :

Do \(\left( Y_{M}|Y_{1}=y_{1},\ldots ,Y_{M-1}=y_{M-1}\right) =n-\sum _{h=1}^{M-1}y_{h} \).

The random variable \(\varvec{Y}=(Y_{1},\ldots ,Y_{M})^{T}\) of the n-inflated multinomial distribution with parameters \(\varvec{\beta }, \varvec{p}\left( \varvec{\theta }\right) \), is generated as follows:

STEP 1. :

Generate \(V\sim Ber(\rho ^{2})\).

STEP 2. :

Generate

$$\begin{aligned}&\left( \varvec{Y|}V=v\right) =\left\{ \begin{array}{l@{\quad }l} \mathcal {M}(n,\varvec{p}\left( \varvec{\theta }\right) ), &{} \text {if }v=0\\ n\mathcal {M}(1,\varvec{p}\left( \varvec{\theta }\right) ), &{} \text {if }v=1 \end{array} \right. . \end{aligned}$$

The random variable \(\varvec{Y}=(Y_{1},\ldots ,Y_{M})^{T}\) of the random clumped distribution with parameters \(\varvec{\beta }, \varvec{p} \left( \varvec{\theta }\right) \), is generated as follows:

STEP 1. :

Generate \(\varvec{Y}_{0}=(Y_{01} ,\ldots ,Y_{0M})^{T}\sim \mathcal {M}(1,\varvec{p}\left( \varvec{\theta }\right) )\).

STEP 2. :

Generate \(K_{1}\sim Bin(n,\rho )\).

STEP 3. :

Generate \(\left( \varvec{Y}_{1}|K_{1}=k_{1}\right) =\big ( (Y_{11},\ldots ,Y_{1M})^{T} | K_{1}=k_{1}\big ) \sim \mathcal {M}(n-k_{1},\varvec{p}\left( \varvec{\theta }\right) )\).

STEP 4. :

Do \(\left( \varvec{Y|}K_{1}=k_{1}\right) \varvec{=Y}_{0}k_{1}+\left( \varvec{Y}_{1}|K_{1}=k_{1}\right) \).

For the details about the equivalence of this algorithm and (1.12), see Morel and Nagaraj (1993).

It is interesting to note that there exists the package “Modeling overdispersion in R” useful to generate the distributions considered in this Appendix. For more details see Raim et al. (2015).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alonso-Revenga, J.M., Martín, N. & Pardo, L. New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes. Stat Comput 27, 193–217 (2017). https://doi.org/10.1007/s11222-015-9616-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-015-9616-z

Keywords

Navigation