Several Proportions or Probabilities

Seber, George A. F.

doi:10.1007/978-3-642-39041-8_3

Several Proportions or Probabilities

George A. F. Seber²

Chapter
First Online: 01 January 2013

2166 Accesses

Part of the book series: SpringerBriefs in Statistics ((BRIEFSSTATIST))

Abstract

We discuss the Multi-hypergeometric and Multinomial distributions and their properties with the focus on exact and large sample inference for comparing two proportions or probabilities from the same or different populations. Relative risks and odds ratios are also considered. Maximum likelihood estimation, asymptotic normality theory, and simultaneous confidence intervals are given for the Multinomial distribution. The chapter closes with some applications to animal populations, including multiple-recapture methods, and the delta method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Softcover Book: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Agresti, A. (1999). On logit confidence intervals for the odds ratio with small samples. Biometrics, 55(2), 597–602.
Article MathSciNet MATH Google Scholar
Agresti, A., & Caffo, B. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. The American Statistician, 54(4), 280–288.
MathSciNet MATH Google Scholar
Agresti, A., & Min, Y. (2005). Frequentist performance of Bayesian confidence intervals for comparing proportions in $2 \times 2$ contingency tables. Biometrics, 61(2), 515–523.
Article MathSciNet MATH Google Scholar
Andrés, A. M., & Tejedor, I. H. (2002). Comment on “equivalence testing for binomial random variables: Which test to use?” The American Statistician, 56(3), 253–254.
Google Scholar
Brown, L., & Li, X. (2005). Confidence intervals for two sample binomial distribution. Journal of Statistical Planning and Inference, 130, 359–375.
Article MathSciNet MATH Google Scholar
Darroch, J. N. (1958). The multiple-recapture census. I. Estimation of a closed population. Biometrika, 45, 343–359.
MathSciNet MATH Google Scholar
Fagerland, M. W., Lydersen, S., & Laake, P. (2011). Recommended confidence intervals for two independent binomial proportions. Statistical Methods in Medical Research, to appear. doi:10.1177/0962280211415469.
Gart, J. J. (1966). Alternative analyses of contingency tables. Journal of the Royal Statistical Society, Series B, 28, 164–179.
MathSciNet MATH Google Scholar
Goodman, L. A. (1965). On simultaneous confidence intervals for multinomial proportions. Technometrics, 7, 247–254.
Article MATH Google Scholar
Hochberg, Y., & Tamhane, A. C. (1987). Multiple comparison procedures. New York: Wiley.
Book MATH Google Scholar
Johnson, N. L., Kotz, S., & Balakrishnan, A. (1997). Discrete multivariate distributions. New York: Wiley.
MATH Google Scholar
Katz, D., Baptista, J., Azen, S. P., & Pike, M. C. (1978). Obtaining confidence intervals for the risk ratio in Cohort studies. Biometrics, 34, 469–474.
Article Google Scholar
Krishnamoorthy, K., & Thomson, J. (2002). Hypothesis testing about proportions in two finite populations. The American Statistician, 56(3), 215–222.
Article MathSciNet MATH Google Scholar
Lahiri, S. N., Chatterjee, A., & Maiti, T. (2007). Normal approximation to the hypergeometric distribution in nonstandard cases and a sub-Gaussian Berry-Esseen theorem. Journal of Statistical Planning and Inference, 137, 3570–3590.
Article MathSciNet MATH Google Scholar
Mee, R. W. (1984). Confidence bounds for the difference between two probabilities. Biometrics, 40, 1175–1176.
MathSciNet Google Scholar
Miettinen, 0., & Nurminen, M. (1985). Comparative analysis of two rates. Statistics in Medicine, 4, 213–226.
Google Scholar
Miller, R. G, Jr. (1981). Simultaneous statistical inference (2nd edn.). New York: Springer-Verlag.
Google Scholar
Newcombe, R. G. (1998b). Interval estimation for the difference between two independent proportions: Comparison of eleven methods. Statistics in Medicine, 17(8), 873–890.
Article Google Scholar
Scott, A. J., & Seber, G. A. F. (1983). Difference of proportions from the same survey. The American Statistician, 37(4), 319–320.
Article Google Scholar
Seber, G. A. F. (1982). The estimation of animal abundance and related parameters (2nd edn.). London: Griffin. Also reprinted as a paperback in 2002 by Blackburn Press, Caldwell, NJ.
Google Scholar
Seber, G. A. F. (2008). A matrix handbook for statisticians. New York: Wiley.
Google Scholar
Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd edn.). New York: Wiley.
Google Scholar
Wild, C. J., & Seber, G. A. F. (1993). Comparing two proportions from the same survey. The American Statistician, 47(3), 178–181. (Correction: 1994, 48(3):269).
Google Scholar
Woolf, B. (1955). On estimating the relation between blood group and disease. Annals of Human Genetics, 19, 251–253.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, The University of Auckland, Princes Street 38, Auckland, 1010, New Zealand
George A. F. Seber

Authors

George A. F. Seber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George A. F. Seber .

Appendix: Delta Method

In this section we consider a well-known method for finding large sample variances. The theory is then applied to the Multinomial distribution. We also consider functions of Normal random variables.

3.1.1 General Theory

We consider general ideas only without getting too involved with technical details about limits. Let $X$ be a random variable with mean $\mu $ and variance $\sigma _X^2$, and let $Y=g(X)$ be a “well-behaved” function of $X$ that has a Taylor expansion

$$\begin{aligned} g(X)-g(\mu )=(X-\mu )g^{\prime }(\mu ) +\frac{1}{2}(X-\mu )^2g^{\prime }(X_0), \end{aligned}$$

where $X_0$ lies between $X$ and $\mu $ and $g^{\prime }(\mu )$ is the derivative of $g$ evaluated at $X=\mu $. Assuming second order terms can be neglected, we have $\mathrm{E}(Y)\approx g(\mu )$ and

$$\begin{aligned} \mathrm{{var}}(Y)&\approx \mathrm{E}[(g(X)-g(\mu ))^2]\\&\approx \mathrm{E}[(X-\mu )^2][g^{\prime }(\mu )]^2\\&= \sigma _X^2[g^{\prime }(\mu )]^2. \end{aligned}$$

For example, if $g(X)=\log X$ then, for large $\mu $,

$$\begin{aligned} \mathrm{{var}}(\log X)\approx \frac{\sigma _X^2}{\mu ^2}. \end{aligned}$$

(3.31)

If $\mathbf{X }=(X_1,X_2,\ldots ,X_k)^{\prime }$ is a vector with mean ${\varvec{\mu }}$, then for suitable $g$,

$$\begin{aligned} Y=g(\mathbf{X })-g({\varvec{\mu }})\approx \sum _{i=1}^k(X_i-\mu _i)g_i^{\prime }({\varvec{\mu }}) +\ldots , \end{aligned}$$

where $g_i^{\prime }({\varvec{\mu }})$ is $\partial g/\partial X_i$ evaluated at $\mathbf{X }={\varvec{\mu }}$. If second order terms can be neglected, we have

$$\begin{aligned} \mathrm{{var}}(Y)&\approx \mathrm{E}[(g(\mathbf{X })-g({\varvec{\mu }}))^2]\nonumber \\&\approx \mathrm{E}\left[ \sum _{i=1}^k\sum _{j=1}^k(X_i-\mu _i)(X_j-\mu _j)g_{i}^{\prime }({\varvec{\mu }})g_j^{\prime }({\varvec{\mu }})\right] \nonumber \\&= \sum _{i=1}^k\sum _{j=1}^k\mathrm{{cov}}(X_i,X_j)g_{i}^{\prime }({\varvec{\mu }})g_j^{\prime }({\varvec{\mu }}). \end{aligned}$$

(3.32)

3.1.2 Application to the Multinomial Distribution

Suppose $\mathbf{X }$ has the Multinomial distribution given by (3.10) and

$$\begin{aligned} g(\mathbf{X })=\frac{X_1X_2\cdots X_r}{X_{r+1}X_{r+2}\cdots X_s}\quad (s\le k). \end{aligned}$$

Then, using the above approach with $\mu _i=np_i$,

$$\begin{aligned} \frac{g(\mathbf{X })-g({\varvec{\mu }})}{g({\varvec{\mu }})}\approx \sum _{i=1}^r\frac{X_i-\mu _i}{\mu _i }-\sum _{i=r+1}^s\frac{X_i-\mu _i}{\mu _i },\end{aligned}$$

and it can be shown that (Seber 1982, pp. 8–9)

$$\begin{aligned} \mathrm{{var}}[g(\mathbf{X })]\approx \frac{[g({\varvec{\mu }})]^2}{n}\left\{ \sum _{i=1}^s\frac{1}{p_i}-(s-2r)^2\right\} . \end{aligned}$$

(3.33)

Two cases of interest in this monograph are, $s=2r=2$ and $s=2r=4$. In the first case $g(\mathbf{X })=X_1/X_2$ and

$$\begin{aligned} \mathrm{{var}}[g(X)]\approx [g({\varvec{\mu }})]^2\left( \frac{1}{\mu _1}+\frac{1}{\mu _2}\right) . \end{aligned}$$

(3.34)

We are particularly interested in $Y=\log g(\mathbf{X })$, so that from (3.31),

$$\begin{aligned} \mathrm{{var}}(Y)\approx \frac{\mathrm{{var}}[g(\mathbf{X })]}{[g({\varvec{\mu }})]^2}=\frac{1}{\mu _1}+\frac{1}{\mu _2}. \end{aligned}$$

(3.35)

If $g(\mathbf{X })$ is a product of two such independent ratios from independent Binomial distributions, then we just add two more terms to $\mathrm{{var}}(Y)$. We can estimate $\mathrm{{var}} (Y)$ by replacing each $\mu _i$ by $X_i$ in (3.35).

Using similar algebra, we find that

$$\begin{aligned} \mathrm{{var}}\left[ \log \left( \frac{X_1X_2}{X_3X_4}\right) \right] \approx \sum _{i=1}^4\frac{1}{\mu _i}. \end{aligned}$$

(3.36)

3.1.3 Asymptotic Normality

In later chapters we are interested in functions of a maximum likelihood estimator, which we know is asymptotically Normally distributed under fairly general conditions. For example, suppose $\sqrt{n}(\widehat{{\varvec{\mu }}}_n - {\varvec{\mu }})$ is asymptotically $N(\mathbf{{0}}, {\varvec{\Sigma }}({\varvec{\mu }}))$. Then using the delta method above, $\sqrt{n}(g(\widehat{{\varvec{\mu }}})-g({\varvec{\mu }}))$ is asymptotically distributed as $N(0,\sigma _g^2)$ as $n\rightarrow \infty $, where

$$\begin{aligned} \sigma _g^2=\left[ \left( \frac{\partial g}{\partial {\varvec{\mu }}}\right) {\varvec{\Sigma }}({\varvec{\mu }})\left( \frac{\partial g}{\partial {\varvec{\mu }}}\right) ^{\prime }\right] . \end{aligned}$$

This result also holds if we replace $g$ by a vector function $\mathbf{g }$ giving us $N(\mathbf{{0}}, {{\varvec{\Sigma }}}_{\mathbf{g }})$.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Seber, G.A.F. (2013). Several Proportions or Probabilities. In: Statistical Models for Proportions and Probabilities. SpringerBriefs in Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39041-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-39041-8_3
Published: 31 July 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39040-1
Online ISBN: 978-3-642-39041-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Delta Method

Appendix: Delta Method

3.1.1 General Theory

3.1.2 Application to the Multinomial Distribution

3.1.3 Asymptotic Normality

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation