Skip to main content

Several Proportions or Probabilities

  • Chapter
  • First Online:
  • 2166 Accesses

Part of the book series: SpringerBriefs in Statistics ((BRIEFSSTATIST))

Abstract

We discuss the Multi-hypergeometric and Multinomial distributions and their properties with the focus on exact and large sample inference for comparing two proportions or probabilities from the same or different populations. Relative risks and odds ratios are also considered. Maximum likelihood estimation, asymptotic normality theory, and simultaneous confidence intervals are given for the Multinomial distribution. The chapter closes with some applications to animal populations, including multiple-recapture methods, and the delta method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   39.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Agresti, A. (1999). On logit confidence intervals for the odds ratio with small samples. Biometrics, 55(2), 597–602.

    Article  MathSciNet  MATH  Google Scholar 

  • Agresti, A., & Caffo, B. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. The American Statistician, 54(4), 280–288.

    MathSciNet  MATH  Google Scholar 

  • Agresti, A., & Min, Y. (2005). Frequentist performance of Bayesian confidence intervals for comparing proportions in \(2 \times 2\) contingency tables. Biometrics, 61(2), 515–523.

    Article  MathSciNet  MATH  Google Scholar 

  • Andrés, A. M., & Tejedor, I. H. (2002). Comment on “equivalence testing for binomial random variables: Which test to use?” The American Statistician, 56(3), 253–254.

    Google Scholar 

  • Brown, L., & Li, X. (2005). Confidence intervals for two sample binomial distribution. Journal of Statistical Planning and Inference, 130, 359–375.

    Article  MathSciNet  MATH  Google Scholar 

  • Darroch, J. N. (1958). The multiple-recapture census. I. Estimation of a closed population. Biometrika, 45, 343–359.

    MathSciNet  MATH  Google Scholar 

  • Fagerland, M. W., Lydersen, S., & Laake, P. (2011). Recommended confidence intervals for two independent binomial proportions. Statistical Methods in Medical Research, to appear. doi:10.1177/0962280211415469.

  • Gart, J. J. (1966). Alternative analyses of contingency tables. Journal of the Royal Statistical Society, Series B, 28, 164–179.

    MathSciNet  MATH  Google Scholar 

  • Goodman, L. A. (1965). On simultaneous confidence intervals for multinomial proportions. Technometrics, 7, 247–254.

    Article  MATH  Google Scholar 

  • Hochberg, Y., & Tamhane, A. C. (1987). Multiple comparison procedures. New York: Wiley.

    Book  MATH  Google Scholar 

  • Johnson, N. L., Kotz, S., & Balakrishnan, A. (1997). Discrete multivariate distributions. New York: Wiley.

    MATH  Google Scholar 

  • Katz, D., Baptista, J., Azen, S. P., & Pike, M. C. (1978). Obtaining confidence intervals for the risk ratio in Cohort studies. Biometrics, 34, 469–474.

    Article  Google Scholar 

  • Krishnamoorthy, K., & Thomson, J. (2002). Hypothesis testing about proportions in two finite populations. The American Statistician, 56(3), 215–222.

    Article  MathSciNet  MATH  Google Scholar 

  • Lahiri, S. N., Chatterjee, A., & Maiti, T. (2007). Normal approximation to the hypergeometric distribution in nonstandard cases and a sub-Gaussian Berry-Esseen theorem. Journal of Statistical Planning and Inference, 137, 3570–3590.

    Article  MathSciNet  MATH  Google Scholar 

  • Mee, R. W. (1984). Confidence bounds for the difference between two probabilities. Biometrics, 40, 1175–1176.

    MathSciNet  Google Scholar 

  • Miettinen, 0., & Nurminen, M. (1985). Comparative analysis of two rates. Statistics in Medicine, 4, 213–226.

    Google Scholar 

  • Miller, R. G, Jr. (1981). Simultaneous statistical inference (2nd edn.). New York: Springer-Verlag.

    Google Scholar 

  • Newcombe, R. G. (1998b). Interval estimation for the difference between two independent proportions: Comparison of eleven methods. Statistics in Medicine, 17(8), 873–890.

    Article  Google Scholar 

  • Scott, A. J., & Seber, G. A. F. (1983). Difference of proportions from the same survey. The American Statistician, 37(4), 319–320.

    Article  Google Scholar 

  • Seber, G. A. F. (1982). The estimation of animal abundance and related parameters (2nd edn.). London: Griffin. Also reprinted as a paperback in 2002 by Blackburn Press, Caldwell, NJ.

    Google Scholar 

  • Seber, G. A. F. (2008). A matrix handbook for statisticians. New York: Wiley.

    Google Scholar 

  • Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd edn.). New York: Wiley.

    Google Scholar 

  • Wild, C. J., & Seber, G. A. F. (1993). Comparing two proportions from the same survey. The American Statistician, 47(3), 178–181. (Correction: 1994, 48(3):269).

    Google Scholar 

  • Woolf, B. (1955). On estimating the relation between blood group and disease. Annals of Human Genetics, 19, 251–253.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George A. F. Seber .

Appendix: Delta Method

Appendix: Delta Method

In this section we consider a well-known method for finding large sample variances. The theory is then applied to the Multinomial distribution. We also consider functions of Normal random variables.

3.1.1 General Theory

We consider general ideas only without getting too involved with technical details about limits. Let \(X\) be a random variable with mean \(\mu \) and variance \(\sigma _X^2\), and let \(Y=g(X)\) be a “well-behaved” function of \(X\) that has a Taylor expansion

$$\begin{aligned} g(X)-g(\mu )=(X-\mu )g^{\prime }(\mu ) +\frac{1}{2}(X-\mu )^2g^{\prime }(X_0), \end{aligned}$$

where \(X_0\) lies between \(X\) and \(\mu \) and \(g^{\prime }(\mu )\) is the derivative of \(g\) evaluated at \(X=\mu \). Assuming second order terms can be neglected, we have \(\mathrm{E}(Y)\approx g(\mu )\) and

$$\begin{aligned} \mathrm{{var}}(Y)&\approx \mathrm{E}[(g(X)-g(\mu ))^2]\\&\approx \mathrm{E}[(X-\mu )^2][g^{\prime }(\mu )]^2\\&= \sigma _X^2[g^{\prime }(\mu )]^2. \end{aligned}$$

For example, if \(g(X)=\log X\) then, for large \(\mu \),

$$\begin{aligned} \mathrm{{var}}(\log X)\approx \frac{\sigma _X^2}{\mu ^2}. \end{aligned}$$
(3.31)

If \(\mathbf{X }=(X_1,X_2,\ldots ,X_k)^{\prime }\) is a vector with mean \({\varvec{\mu }}\), then for suitable \(g\),

$$\begin{aligned} Y=g(\mathbf{X })-g({\varvec{\mu }})\approx \sum _{i=1}^k(X_i-\mu _i)g_i^{\prime }({\varvec{\mu }}) +\ldots , \end{aligned}$$

where \(g_i^{\prime }({\varvec{\mu }})\) is \(\partial g/\partial X_i\) evaluated at \(\mathbf{X }={\varvec{\mu }}\). If second order terms can be neglected, we have

$$\begin{aligned} \mathrm{{var}}(Y)&\approx \mathrm{E}[(g(\mathbf{X })-g({\varvec{\mu }}))^2]\nonumber \\&\approx \mathrm{E}\left[ \sum _{i=1}^k\sum _{j=1}^k(X_i-\mu _i)(X_j-\mu _j)g_{i}^{\prime }({\varvec{\mu }})g_j^{\prime }({\varvec{\mu }})\right] \nonumber \\&= \sum _{i=1}^k\sum _{j=1}^k\mathrm{{cov}}(X_i,X_j)g_{i}^{\prime }({\varvec{\mu }})g_j^{\prime }({\varvec{\mu }}). \end{aligned}$$
(3.32)

3.1.2 Application to the Multinomial Distribution

Suppose \(\mathbf{X }\) has the Multinomial distribution given by (3.10) and

$$\begin{aligned} g(\mathbf{X })=\frac{X_1X_2\cdots X_r}{X_{r+1}X_{r+2}\cdots X_s}\quad (s\le k). \end{aligned}$$

Then, using the above approach with \(\mu _i=np_i\),

$$\begin{aligned} \frac{g(\mathbf{X })-g({\varvec{\mu }})}{g({\varvec{\mu }})}\approx \sum _{i=1}^r\frac{X_i-\mu _i}{\mu _i }-\sum _{i=r+1}^s\frac{X_i-\mu _i}{\mu _i },\end{aligned}$$

and it can be shown that (Seber 1982, pp. 8–9)

$$\begin{aligned} \mathrm{{var}}[g(\mathbf{X })]\approx \frac{[g({\varvec{\mu }})]^2}{n}\left\{ \sum _{i=1}^s\frac{1}{p_i}-(s-2r)^2\right\} . \end{aligned}$$
(3.33)

Two cases of interest in this monograph are, \(s=2r=2\) and \(s=2r=4\). In the first case \(g(\mathbf{X })=X_1/X_2\) and

$$\begin{aligned} \mathrm{{var}}[g(X)]\approx [g({\varvec{\mu }})]^2\left( \frac{1}{\mu _1}+\frac{1}{\mu _2}\right) . \end{aligned}$$
(3.34)

We are particularly interested in \(Y=\log g(\mathbf{X })\), so that from (3.31),

$$\begin{aligned} \mathrm{{var}}(Y)\approx \frac{\mathrm{{var}}[g(\mathbf{X })]}{[g({\varvec{\mu }})]^2}=\frac{1}{\mu _1}+\frac{1}{\mu _2}. \end{aligned}$$
(3.35)

If \(g(\mathbf{X })\) is a product of two such independent ratios from independent Binomial distributions, then we just add two more terms to \(\mathrm{{var}}(Y)\). We can estimate \(\mathrm{{var}} (Y)\) by replacing each \(\mu _i\) by \(X_i\) in (3.35).

Using similar algebra, we find that

$$\begin{aligned} \mathrm{{var}}\left[ \log \left( \frac{X_1X_2}{X_3X_4}\right) \right] \approx \sum _{i=1}^4\frac{1}{\mu _i}. \end{aligned}$$
(3.36)

3.1.3 Asymptotic Normality

In later chapters we are interested in functions of a maximum likelihood estimator, which we know is asymptotically Normally distributed under fairly general conditions. For example, suppose \(\sqrt{n}(\widehat{{\varvec{\mu }}}_n - {\varvec{\mu }})\) is asymptotically \(N(\mathbf{{0}}, {\varvec{\Sigma }}({\varvec{\mu }}))\). Then using the delta method above, \(\sqrt{n}(g(\widehat{{\varvec{\mu }}})-g({\varvec{\mu }}))\) is asymptotically distributed as \(N(0,\sigma _g^2)\) as \(n\rightarrow \infty \), where

$$\begin{aligned} \sigma _g^2=\left[ \left( \frac{\partial g}{\partial {\varvec{\mu }}}\right) {\varvec{\Sigma }}({\varvec{\mu }})\left( \frac{\partial g}{\partial {\varvec{\mu }}}\right) ^{\prime }\right] . \end{aligned}$$

This result also holds if we replace \(g\) by a vector function \(\mathbf{g }\) giving us \(N(\mathbf{{0}}, {{\varvec{\Sigma }}}_{\mathbf{g }})\).

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Seber, G.A.F. (2013). Several Proportions or Probabilities. In: Statistical Models for Proportions and Probabilities. SpringerBriefs in Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39041-8_3

Download citation

Publish with us

Policies and ethics