Skip to main content
Log in

Bias Correction in Estimating Proportions by Imperfect Pooled Testing

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

In the estimation of proportions by pooled testing, the MLE is biased. Hepworth and Biggerstaff (JABES, 22:602–614, 2017) proposed an estimator based on the bias correction method of Firth (Biometrika 80:27–38, 1993) and showed that it is almost unbiased across a range of pooled testing problems involving no misclassification. We now extend their work to allow for imperfect testing. We derive the estimator, provide a Newton–Raphson iterative formula for its computation and test it in situations involving equal or unequal pool sizes, drawing on problems encountered in plant disease assessment and prevalence estimation of mosquito-borne viruses. Our estimator is highly effective at reducing the bias for prevalences consistent with the pooled testing procedure employed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Burkhalter, K.L., Horiuchi,K., Biggerstaff, B.J., Savage, H.M. and Nasci, R.S. (2014)  Evaluation of a Rapid Analyte Measurement Platform and real-time reverse-transcriptase polymerase chain reaction assay West Nile Virus detection system in mosquito pools. Journal of the American Mosquito Control Association, 30, 21–30.

    Article  Google Scholar 

  • Burrows, P.M. (1987)  Improved estimation of pathogen transmission rates by group testing. Phytopathology, 77, 363–365.

    Article  Google Scholar 

  • Colon, S., Patil, G.P. and Taillie, C.  (2001) Estimating prevalence using composites. Environmental and Ecological Statistics, 8, 213–236.

    Article  MathSciNet  Google Scholar 

  • Cowling, D.W., Gardner, I.A. and Johnson, W.O.  (1999) Comparison of methods for estimation of individual-level prevalence based on pooled samples. Preventive Veterinary Medicine, 39, 211–225.

    Article  Google Scholar 

  • Dhand, N.K., Eppleston, J., Whittington R.J. and Toribio, J.L. (2007)  Risk factors for ovine Johne’s disease in infected sheep flocks in Australia. Preventive Veterinary Medicine, 82, 51–71.

    Article  Google Scholar 

  • Firth, D. (1993)  Bias reduction of maximum likelihood estimates. Biometrika, 80, 27–38.

    Article  MathSciNet  Google Scholar 

  • Gart, J.J. (1991)  An application of score methodology: Confidence intervals and tests of fit for one-hit curves. In: Handbook of Statistics, C. R. Rao, R. Chakraborty (eds), 395–406. Amsterdam: Elsevier.

    Google Scholar 

  • Hepworth, G. (2005)  Confidence intervals for proportions estimated by group testing with groups of unequal size. JABES, 10, 478–497.

    Article  Google Scholar 

  • Hepworth G. and Biggerstaff, B.J. (2017)  Bias correction in estimating proportions by pooled testing. JABES, 22, 602–614.

    Article  MathSciNet  Google Scholar 

  • Hepworth, G. and Watson, R. (2009)  Debiased estimation of proportions in group testing. JRSS-C, 58, 105–121.

    MathSciNet  Google Scholar 

  • Hepworth, G. (2013)  Improved estimation of proportions using inverse binomial group testing. JABES, 18, 102–119.

    Article  MathSciNet  Google Scholar 

  • Hughes-Oliver J.M. (2006)  Pooling experiments for blood screening and drug discovery. In: Dean A., Lewis S. (eds) Screening. Springer, New York, NY.

    Google Scholar 

  • Komar, N., Colborn, J.M., Horiuchi, K., Delorey, M., Biggerstaff, B.J., Damian, D., Smith, K. and Townsend, J. (2015)  Reduced West Nile Virus transmission around communal roosts of Great-Tailed Grackle (Quiscalus mexicanus). EcoHealth, 12, 144–151.

  • Liu, S.C., Chiang, K.S., Lin, C.H., Chung, W.C., Lin, S.H. and Yang, T.C. (2011)  Cost analysis in choosing group size when group testing for Potato virus Y in the presence of classification errors. Annals of Applied Biology, 159, 491–502.

    Article  Google Scholar 

  • Liu, A., Liu, C., Zhang, Z. and Albert, P.S. (2012)  Optimality of group testing in the presence of misclassification. Biometrika, 99, 245–251.

    Article  MathSciNet  Google Scholar 

  • McMahan, C.S., Tebbs, J.M. and Bilder, C.R. (2013)  Regression models for group testing data with pool dilution effects Biostatistics, 14, 284–298.

    Article  Google Scholar 

  • Messam, L.L., Branscum, A.J., Collins, M.T. and Gardner, I.A. (2008)  Frequentist and Bayesian approaches to prevalence estimation using examples from Johne’s disease. Animal Health Research Reviews, 9, 1–23.

    Article  Google Scholar 

  • Mitchell, S. and Pagano, M. (2012)  Pooled testing for effective estimation of the prevalence of Schistosoma mansoni. American Journal of Tropical Medicine and Hygiene, 87, 850–861.

    Article  Google Scholar 

  • Reiczigel, J., Foldi, J. and Ozsvari, L. (2010)  Exact confidence limits for prevalence of a disease with an imperfect diagnostic test. Epidemiology and Infection, 138, 1674–1678.

    Article  Google Scholar 

  • Roy, S. and Banerjee, T. (2019)  Estimation of log-odds ratio from group testing data using Firth correction. Biometrical Journal, 61, 714–728.

    Article  MathSciNet  Google Scholar 

  • Swallow, W.H. (1985)  Group testing for estimating infection rates and probabilities of disease transmission. Phytopathology, 75, 882–889.

    Article  Google Scholar 

  • Tebbs, J.M., McMahan, C.S. and Bilder, C.R. (2013)  Two-stage hierarchical group testing for multiple infections with application to the Infertility Prevention Project. Biometrics, 69, 1064–1073.

    Article  MathSciNet  Google Scholar 

  • Tu, X.M., Litvak, E. and Pagano, M. (1994)  Screening tests: can we get more by doing less? Statistics in Medicine, 13, 1905–1919.

    Article  Google Scholar 

  • Youden, W.J. (1950)  Index for rating diagnostic tests. Cancer, 3, 32–35.

    Article  Google Scholar 

  • Zhang, Z., Liu, C., Kim, S. and Liu, A. (2014)  Prevalence estimation subject to misclassification: the mis-substitution bias and some remedies. Statistics in Medicine, 33, 4482–4500.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graham Hepworth.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Disclaimer: The findings and conclusions herein are those of the authors and do not necessarily represent the official position of the US Centers for Disease Control and Prevention.

Appendices

Appendix 1: Derivation of Firth’s Bias Correction Applied to Imperfect Pooled Testing

Recall \(\pi (p)= a - r(1-p)^m\). The first two derivatives are

$$\begin{aligned} \frac{\text{ d }\pi (p)}{\text{ d }p}= & {} mr(1-p)^{m-1} = \frac{m}{1-p} r(1-p)^m = \frac{m}{1-p} \left[ a-\pi (p)\right] , \\ \frac{\text{ d}^2\pi (p)}{\text{ d }p^2}= & {} -m(m-1)r(1-p)^{m-2} = -\frac{m(m-1)}{(1-p)^2} \left[ a-\pi (p)\right] . \end{aligned}$$

From here on, we drop the functional notation on \(\pi (p)\). We need the following derivatives of the log-likelihood:

$$\begin{aligned} \frac{\text{ d }\,l}{\text{ d }\pi }= & {} \frac{x}{\pi } - \frac{n-x}{1-\pi }, \\ \frac{\text{ d}^2 l}{\text{ d }\pi ^2}= & {} -\frac{x}{\pi ^2} - \frac{n-x}{(1-\pi )^2}, \\ \frac{\text{ d}^3 l}{\text{ d }\pi ^3}= & {} \frac{2x}{\pi ^3} - \frac{2(n-x)}{(1-\pi )^3}. \end{aligned}$$

From these, we have the score function

$$\begin{aligned} S(p) = \frac{\text{ d }\,l}{\text{ d }p}= & {} \frac{\text{ d }\,l}{\text{ d }\pi } \frac{\text{ d }\pi }{\text{ d }p} = \frac{m (a-\pi ) (x - n\pi )}{(1-p)\pi (1-\pi )}. \end{aligned}$$
(A.1)

The chain rule and product rule can be used to compute the higher-order derivatives of the likelihood, as follows:

$$\begin{aligned} \frac{\text{ d}^2 l}{\text{ d }p^2}= & {} \frac{\text{ d}^2 l}{\text{ d }\pi ^2} \left( \frac{\text{ d }\pi }{\text{ d }p}\right) ^2 + \; \frac{\text{ d }\,l}{\text{ d }\pi }\, \frac{\text{ d}^2\pi }{\text{ d }p^2}, \\ \frac{\text{ d}^3 l}{\text{ d }p^3}= & {} \frac{\text{ d}^3 l}{\text{ d }\pi ^3} \left( \frac{\text{ d }\pi }{\text{ d }p}\right) ^3 + 3\, \frac{\text{ d}^2 l}{\text{ d }\pi ^2}\, \frac{\text{ d}^2 \pi }{\text{ d }p^2}\, \frac{\text{ d }\pi }{\text{ d }p} + \frac{\text{ d }\,l}{\text{ d }\pi }\, \frac{\text{ d}^3 \pi }{\text{ d }p^3}. \end{aligned}$$

We therefore can obtain the information I(p) as:

$$\begin{aligned} I(p)= & {} E\left[ -\frac{\text{ d}^2 l}{\text{ d }p^2}\right] \\= & {} E\left[ -\left( -\frac{x}{\pi ^2} - \frac{n-x}{(1-\pi )^2}\right) \left( \frac{m}{1-p}(a-\pi )\right) ^2 - \left( \frac{x}{\pi } - \frac{n-x}{1-\pi }\right) \left( -\frac{m(m-1)(a-\pi )}{(1-p)^2}\right) \right] \\= & {} \left( \frac{n}{\pi } + \frac{n}{1-\pi }\right) \left( \frac{m}{1-p}(a-\pi )\right) ^2\\= & {} \frac{n m^2 (a-\pi )^2}{(1-p)^2 \pi (1-\pi )}. \end{aligned}$$

Computation of the bias (see Eq. 3) requires \(\text{ d }I(p) / \text{ d }p\) and \(E\left[ \text{ d}^3 l / \text{ d }p^3\right] \) in addition to I(p). The derivation of \(\text{ d }I(p) / \text{ d }p\) is more tedious than for perfect testing, because our expression for I(p) includes both p and \(\pi .\) We remedy this (i.e., put it in terms of \(\pi \) alone and p only implicitly) by writing

$$\begin{aligned} I(p)= & {} \frac{nm^2(a-\pi )^2}{(1-p)^2\pi (1-\pi )} = \frac{nm^2(a-\pi )^2}{\left( \frac{a-\pi }{r}\right) ^\frac{2}{m} \pi (1-\pi )}\,. \end{aligned}$$
(A.2)

Then

$$\begin{aligned} \frac{\text{ d }I(p)}{\text{ d }p}= & {} \frac{\text{ d }I(p)}{\text{ d }\pi } \frac{\text{ d }\pi }{\text{ d }p} \nonumber \\= & {} nm^2 \frac{\left( \frac{a-\pi }{r}\right) ^\frac{2}{m} \pi (1-\pi ) [-2(a-\pi )] - (a-\pi )^2\left[ \left( \frac{a-\pi }{r}\right) ^\frac{2}{m} (1-2\pi ) + \pi (1-\pi ) \frac{2}{m} \left( \frac{a-\pi }{r}\right) ^\frac{2-m}{m}\left( -\frac{1}{r}\right) \right] }{ \left( \frac{a-\pi }{r}\right) ^\frac{4}{m} \pi ^2 (1-\pi )^2} \nonumber \\&\times \left[ \frac{m}{1-p} (a - \pi )\right] \nonumber \\= & {} -\frac{nm^2(a-\pi )^2}{(1-p)^3\pi ^2(1-\pi )^2}\left[ 2(m-1)\pi (1-\pi ) + m(a-\pi )(1-2\pi )\right] \end{aligned}$$
(A.3)

noting that \(\frac{r}{a-\pi } = (1-p)^{-m}\) is useful during the simplification. The final quantity for computing the bias b(p) is

$$\begin{aligned} E\left[ \frac{\text{ d}^3 l}{\text{ d }p^3}\right]= & {} E\left[ \left( \frac{2x}{\pi ^3} - \frac{2(n-x)}{(1-\pi )^3}\right) \left( \frac{m}{1-p}(a-\pi )\right) ^3 \right. \\&+ 3\, \left( -\frac{x}{\pi ^2} - \frac{n-x}{(1-\pi )^2}\right) \left( \frac{m}{1-p}(a-\pi )\right) \left( -\frac{m(m-1)(a-\pi )}{(1-p)^3}\right) \\&+\left. \left( \frac{x}{\pi } - \frac{n-x}{1-\pi }\right) \left( \frac{m(m-1)(m-2)(a-\pi )}{(1-\pi )^3}\right) \right] \\= & {} \left( \frac{2n}{\pi ^2} - \frac{2n}{(1-\pi )^2}\right) \left( \frac{m}{1-p}(a-\pi )\right) ^3 + 3\,\left( \frac{n}{\pi } + \frac{n}{1-\pi }\right) \left( \frac{m^2(m-1)(a-\pi )^2}{(1-p)^3}\right) \\= & {} \frac{n m^2 (a-\pi )^2}{(1-p)^3\pi ^2(1-\pi )^2} \left[ 2m(1-2\pi )(a-\pi ) + 3(m-1)\pi (1-\pi ])\right] . \end{aligned}$$

Further simplification results in the following expression for the numerator of the bias (see Eq. 3):

$$\begin{aligned} 2\frac{\text{ d }I(p)}{\text{ d }p} + E\left[ \frac{\text{ d}^3 l}{\text{ d }p^3}\right]= & {} -\frac{nm^2(m-1)(a-\pi )^2}{(1-p)^3\pi (1-\pi )}. \end{aligned}$$
(A.4)

Appendix 2: R Code for Newton–Raphson Iteration to Find Firth’s Bias-Corrected Estimate of \({{\varvec{p}}}\)

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hepworth, G., Biggerstaff, B.J. Bias Correction in Estimating Proportions by Imperfect Pooled Testing. JABES 26, 90–104 (2021). https://doi.org/10.1007/s13253-020-00411-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-020-00411-5

Keywords

Navigation