1 Introduction

Pooling biospecimens (e.g., blood, urine, and swabs) to improve the efficiency of disease screening and monitoring has gained a great deal of popularity. Pooled testing was formally introduced by Dorfman (1943) in the context of screening American soldiers for syphilis. Dorfman suggested pooling a fixed number of individual blood specimens, conducting initial pool tests, and resolving the positive pools to identify diseased cases. Since then, this method, commonly known as two-stage hierarchical testing, has been extended to more advanced testing protocols (Quinn et al. 2000; Pilcher et al. 2005). Following Dorfman’s work, pooled testing has been used in applications for various infectious diseases, such as human immunodeficiency virus (HIV) and hepatitis B/C (Hourfar et al. 2008; Stramer et al. 2013), chlamydia and gonorrhea (Lindan et al. 2005), and influenza (Van et al. 2012). Pooled testing is also useful in genetics (Chi et al. 2009), animal disease testing (Dhand et al. 2010), and other fields. During the recent COVID-19 pandemic, pooling has received widespread attention because of the urgent need for rapid disease screening (Abdalhamid et al. 2020).

Statistical research in pooled testing has developed for both case identification and estimation. The aim in case identification research is to develop efficient testing protocols and assess the accuracy of classification (Kim et al. 2007). In estimation research, the primary goal is to estimate disease probabilities. Estimation can be performed either in a homogeneous population setting (Liu et al. 2012; Speybroeck et al. 2012; Ding and Xiong 2016; Haber et al. 2018; Nguyen et al. 2018) or in a regression context using individual covariates, such as age, race, sex, and symptoms (Xie 2001; Delaigle et al. 2014; Wang et al. 2014). Whether covariates are used or not, estimating the probabilities can be achieved using only initial pooled results or a combination of pooled and individual retest results—often with a fraction of the tests required by the usual individual testing.

This article focuses on optimizing the disease prevalence estimates calculated from pooled data. We consider a general pooled testing scenario, where both pooled and subsequent retesting responses are observed from individuals or pools. There are two reasons that cause modeling such data a challenging task. First, the observed pooled responses are correlated because an individual is potentially assigned to multiple pools in multiple stages. Second, the true disease status for all individuals is latent (i.e., unobservable) due to the errors occurred during the diagnostic procedure. These challenges are commonly overcome by resorting to the “missing data” technique where parameter estimation is accomplished using the expectation-maximization (EM) algorithm. However, evaluating the estimator efficiency and determining the optimal pool size is especially challenging because it requires deriving the expected variance. Using the methods available in the literature, this is rarely possible for pooled data when retesting responses are involved.

Optimization of the estimator of a disease prevalence using pooled data has been investigated by many researchers, but most studies used only initial pool tests (Hughes-Oliver and Swallow 1994; Liu et al. 2012; Tu et al. 1995). The advantages of using such simple pooling are that the testing cost can be lowered and the asymptotic efficiency results can be derived based on binomial distribution. However, this approach cannot leverage the wealth of retesting information naturally observed in public health applications. Zhang et al. (2020a) explored the benefits of using retesting data in the estimator precision for two-stage hierarchical testing. Brookmeyer (1999) explored pooled testing performed in multiple hierarchical stages, but it is limited to scenarios where a perfect diagnostic assay is available; i.e., Brookmeyer’s method cannot account for the misclassification errors (i.e., false negatives/positives). Furthermore, existing approaches cannot be used for other advanced protocols, such as array testing (Kim et al. 2007). That is, statistical methods that can use multistage pooled responses—potentially contaminated by misclassification errors—have been highly limited.

This article aims at addressing the research gap and providing methods that can be useful in designing pooled testing studies. We consider a general model framework, which can accommodate pooled data of any complexity, where the approach used for estimation is maximum likelihood. We assess the asymptotic efficiency and cost efficiency of the estimator and determine optimal pool sizes in the context of the surveillance application at the Louisiana Department of Health. It is worth noting that evaluating the optimality measures using analytical approaches is often challenging for multistage pooling methods. For such instances, we present a computation algorithm that can be widely used. Furthermore, because implementing such methods is non-trivial, we provide ready-to-use software tools using R (R Core Team 2023) for hierarchical and array testing, which can also be expanded for quality control (Hanson et al. 2006) and numerous other scenarios introduced during the COVID-19 pandemic (Daniel et al. 2021; Mutesa et al. 2021).

The subsequent sections are organized as follows. In Sect. 2, we summarize the screening results for four infectious diseases from the Louisiana Department of Health. In Sect. 3, we discuss different pooled testing protocols, assumptions, and the likelihood-based estimation framework. In Sect. 4, we assess the measures of efficiency and suggest optimal pooling configurations under three different constraints. In Sect. 5, we briefly describe our software tools. In Sect. 6, we conclude with a brief discussion. Additional information is provided in the electronic supplementary materials (Web Appendices A-D). Our software tools and code used to produce all results in this article have been uploaded as supplementary material.

2 Louisiana Infectious Disease Data

The Louisiana Department of Health (LDH) is a governing agency that oversees public health-related issues in the state of Louisiana. The LDH conducts screening and surveillance of various infectious diseases to monitor their prevalence, spread, and distribution. This includes regular testing, data collection, reporting, and analysis to identify outbreaks. In this article, we explore the practical aspects of pooled testing using LDH’s screening data to improve the screening and surveillance efficiency.

We obtained three separate datasets from the LDH that have test outcomes collected in the year 2021. The first dataset comprises HIV test results, while the second has outcomes for Neisseria gonorrhoeae (gonorrhea) and Chlamydia trachomatis (chlamydia). The third dataset consists of test results for SARS-CoV-2. The data collection procedure at LDH typically involves collecting specimens at different sites across the state, transporting them to designated laboratories, performing tests, and integrating the test outcomes into a central database for analysis.

The testing protocol adopted for these infections was the traditional one-at-a-time approach (i.e., individual testing). Both males and females were tested for HIV using blood serum samples. For chlamydia and gonorrhea, the testing was conducted on female subjects using urine specimens, while for SARS-CoV-2, both males and females were tested using nasopharyngeal swab samples. The assays used for HIV and SARS-CoV-2 were ARCHITECT HIV Ag/Ab Combo (AHAC) and Biofire Respiratory Panel 2.1 (BRP2.1), respectively, while the Aptima Combo 2 Assay (AC2A) was used for both chlamydia and gonorrhea. The sensitivity (\(S_e\)) and specificity (\(S_p\)) of these assays are reported in Table 1, which can be found in the assay product literature available at www.fishersci.com, www.biofiredx.com, and www.hologic.com.

Table 1 Historical data and estimates from the LDH for infectious diseases

In this article, we treat the test outcomes as historical data. Table 1 provides a summary of the collected data, which consists of the number of individuals tested (i.e., sample size, N) and the count of positive cases identified. The table also presents an estimate of the true prevalence for each infection, denoted by p. These estimates have been calculated by adjusting for testing errors using the prevalence estimator formula in Eq. (1), with pool size 1 for this special scenario of individual testing. The prevalence of HIV is low, while the prevalences of gonorrhea and chlamydia are moderate, which makes them ideal for the implementation of pooled testing. The prevalence of SARS-CoV-2 is fairly high for using pooled testing. This wide range of disease prevalence enables us to explore the effectiveness of pooled testing under different scenarios. We treat these estimates as the true values of p and use them in the methods described in Sects. 4.1, 4.2, 4.3.

3 Preliminaries

Consider a public health scenario where N individuals are tested for a disease, such as HIV. Instead of conducting separate tests for each individual, we consider the use of pooled testing. Let p denote the disease prevalence, which is the proportion of individuals who truly have the disease, and let \(\widehat{p}\) denote the maximum likelihood estimator (MLE) of p. In this article, \(\widehat{p}\) is calculated from pooled testing data, and studying its precision is of primary interest.

Pooled testing can be performed in many ways, depending on the needs and context in an application. The simplest one involves assigning individual specimens into a fixed number of non-overlapping initial pools and conducting only the initial pooled test (i.e., retesting is not performed). This is commonly referred to as “master pool testing” and is often used only for prevalence estimation purposes (Tu et al. 1995). When the goal is to identify positive cases, initial pools that test positively in stage 1 are resolved in the second stage by individual retesting or resolved in multiple stages in a hierarchical manner. Non-hierarchical methods, such as array testing and quality control-type testing, are also commonly used. The work presented in this article can accommodate both hierarchical or non-hierarchical data.

Suppose N individuals are tested using a pooling protocol, with a total number of T tests expended. Let \(Z_i\) denote the binary test response for pool i where \(Z_i = 1\) if the ith pool is diagnosed as positive and \(Z_i = 0\) otherwise, for \(i=1, 2,..., T\). Denote by \(\textbf{Z}=(Z_1, Z_2,..., Z_T)'\) the vector of all test responses. Let \(S_e\) and \(S_p\) denote the assay sensitivity and specificity, respectively. We assume throughout that \(S_e\) and \(S_p\) do not depend on the pool size—a common and reasonable assumption when the pool size is not too large (Kim et al. 2007). We also assume \(S_e\) and \(S_p\) are known and can be estimated from a pilot study or obtained from the assay product literature.

A few remarks are as follows. First, to simplify our notation, we use \(Z_i\) generically for any test response, whether it is from a pool, subpool, or individual (where the pool size is 1). Second, the test result \(Z_i\) is error-prone and potentially different from its true status, denoted by \(\widetilde{Z_i}\). As commonly used in the literature, our method accounts for the test errors (false negatives and false positives) through the specification of the sensitivity and specificity of the assay, defined as \(S_e = \text {pr}(Z_i=1|\widetilde{Z_i}=1)\) and \(S_p=\text {pr}(Z_i=0|\widetilde{Z_i}=0)\). Third, instead of parameter estimation, our work focuses on a design framework in which the number of pooled tests that would be needed for case identification cannot be determined prior to the study. Consequently, T, the number of expended tests in a pooling application, is best regarded as random when a multistage protocol, such as hierarchical or array testing, is used.

For the illustration of maximum likelihood estimation from pooled testing data, consider master pool testing where n initial pools are tested; i.e., in this particular case, T is fixed and \(T=n\). Then \(X=\sum _{i=1}^n Z_i\), the number of positive pooled tests, has binomial distribution. In this case, using the standard statistical techniques of maximum likelihood and information theory, the MLE \(\widehat{p}\) and its large-sample variance can be derived in closed form as

$$\begin{aligned} \widehat{p}&= 1 - \left[ \frac{S_e - \text {min}\{S_e, \text {max}(1-S_p, \lambda )\}}{r}\right] ^{1/k} \end{aligned}$$
(1)
$$\begin{aligned} \text {var}(\widehat{p})&= \frac{\{S_e - r(1-p)^k \}\{1 - S_e + r(1-p)^k\}}{r^2 k^2 n (1-p)^{2(k-1)}}, \end{aligned}$$
(2)

where k is the initial pool size, \(r=S_e + S_p - 1\), and \(\lambda = X/n\); for more details, refer to Tu et al. (1995). Based on the variance in Eq. (2), the asymptotic properties of \(\widehat{p}\) have been broadly studied, but they are sparse or unavailable for multistage pooling scenarios.

For a multistage protocol, such as hierarchical or array testing, the likelihood function is often intractable, which makes the computation of the MLE and its variance using observed-data likelihood difficult. These issues have been discussed in the literature for regression problems, and the EM algorithm has been introduced for estimation (Xie 2001; Zhang et al. 2013; Warasi 2023). In the absence of covariates, as is the case in this article, we outline the EM algorithm for the estimation of p and the variance estimation technique in Web Appendix A. It is worth noting that our objective is not parameter estimation; we instead focus on optimizing the estimation and cost efficiencies.

4 Efficiency

We study efficiency from three perspectives. We first consider minimizing \(E[(\widehat{p}-p)^2]\), the mean squared error of the MLE \(\widehat{p}\), to achieve the best precision in estimation. Our next consideration is to minimize the expected number of tests, E[T], focusing on improving the screening efficiency. The third criterion is to minimize the cost per unit information \(E[T(\widehat{p}-p)^2]\). This combines the consideration of both screening and estimation and is aimed at achieving precise estimation while reducing testing costs. To assess these criteria while comparing to individual testing, we study three relative measures of optimality: relative testing efficiency (RTE), relative estimation efficiency (REE), and relative cost efficiency (RCE), which are defined as

$$\begin{aligned} \text {RTE}(\widehat{p}_G, \widehat{p}_I)&= \frac{E_G[T]}{E_I[T]}\\ \text {REE}(\widehat{p}_G, \widehat{p}_I)&= \frac{E_G[(\widehat{p}-p)^2]}{E_I[(\widehat{p}-p)^2]} \\ \text {RCE}(\widehat{p}_G, \widehat{p}_I)&= \frac{E_G[T(\widehat{p}-p)^2]}{E_I[T(\widehat{p}-p)^2]}, \end{aligned}$$

where ‘G’ and ‘I’ stand for group/pooled testing and individual testing, respectively. When the relative value is 1, both pooled testing and individual testing offer the same efficiency. On the other hand, when the relative value is smaller than 1 (or greater than 1), pooled testing offers higher (or lower) efficiency when compared to individual testing. Thus, finding an optimal pooling strategy requires minimizing each relative measure as a function of the pool size(s). We do so with three constraints as follows in the subsections.

4.1 Efficiency with a Fixed Number of Individuals

We first consider the setting where the number of individuals, N, tested in a study is fixed. For master pooled testing, the number of expended tests \(T=N/k\) is also fixed, so \(E_G[T]=N/k\) and \(\text {RTE}(\widehat{p}_G, \widehat{p}_I) = 1/k\) because \(E_I[T]=N\) for individual testing; i.e., the expected number of tests decreases with pool size k. However, with fewer test responses, the precision of the MLE \(\widehat{p}\) may be compromised. We explore this using the large-sample properties of the MLE. In that case, the MLE \(\widehat{p}\) is unbiased, and consequently, \(E_G[(\widehat{p}-p)^2]\) reduces to the variance expression in Eq. (2). The variance of \(\widehat{p}\) for individual testing can also be found from Eq. (2) by setting \(n=N\) and \(k=1\). Then the relative estimation efficiency is

$$\begin{aligned} \text {REE}(\widehat{p}_G, \widehat{p}_I) = \frac{\{S_e - r(1-p)^k\}\{1 - S_e + r(1-p)^k\}}{k(1-p)^{2(k-1)}(1-S_p+rp)(S_p - rp)}. \end{aligned}$$
(3)

As for cost efficiency, we find \(E_G[T(\widehat{p}-p)^2] = (N/k)\times \text {var}(\widehat{p}_G)\) which results in \(\text {RCE}(\widehat{p}_G, \widehat{p}_I)=(1/k)\times \text {REE}(\widehat{p}_G, \widehat{p}_I)\) and can be evaluated using Eq. (3).

For multistage protocols, expressions of \(E_G(T)\) are provided in Kim et al. (2007) for a few particular cases (e.g., two-stage hierarchical and array protocols) but not for the general scenario. Whenever applicable, we use those expressions for the evaluation of \(\text {RTE}(\widehat{p}_G, \widehat{p}_I)\). For two-stage hierarchical testing, \(\text {REE}(\widehat{p}_G, \widehat{p}_I)\) can be derived using the Dorfman-type work in Zhang et al. (2020a). Unfortunately, for array testing, hierarchical testing with three or more stages, and other multistage protocols, no methods are available in the literature for the calculation of \(E_G[(\widehat{p}-p)^2]\). Evaluation of \(\text {RCE}(\widehat{p}_G, \widehat{p}_I)\) involves even further complexity because \(E_G[T(\widehat{p}-p)^2]\) requires deriving the joint distribution of \(\widehat{p}\) and T. To overcome these challenges, we provide a computation algorithm that enables us to numerically evaluate \(E_G[(\widehat{p}-p)^2]\), \(E_G[T]\), \(E_G[T(\widehat{p}-p)^2]\), or any complex quantities of interest.

To present a general computation technique, we let \(\textbf{Z}\) denote the collection of all test responses for a pooling protocol and let \(\widehat{p}\) denote the MLE of p calculated from this dataset. For complex pooling scenarios, we compute \(\widehat{p}\) using the EM algorithm. To approximate the expected Fisher information \(\mathcal {I}(p)\), we compute the observed Fisher information, denoted by I(p), using the missing data principle and Louis’s (1982) method; see Web Appendix A. It is worth mentioning that I(p) is a random quantity because it depends on the observed data \(\textbf{Z}\) and, thus, cannot be directly used for \(\mathcal {I}(p)\) when assessing efficiency. The computation algorithm is provided below.

  1. 1.

    Specify \(p=p_0\) as the true value, where \(p_0\) is an estimate from historical or pilot data.

  2. 2.

    Choose B, the number of repetitions of steps 2(a)-2(c). For each \(s=1,2,...,B\):

    1. (a)

      Simulate the pooled data \(\textbf{Z}^{(s)}\) for a given pooling protocol and record \(T^{(s)}\), the number of tests expended.

    2. (b)

      Compute the MLE \(\widehat{p}^{(s)}\) using the EM algorithm and find \(T^{(s)}(\widehat{p}^{(s)}-p)^2\).

    3. (c)

      Compute the observed Fisher information \(I(p)^{(s)}\) using Louis’s method.

  3. 3.

    Calculate the sample means:

    • \(\overline{I}=\frac{1}{B}\sum _{s=1}^B I(p)^{(s)}\), \(\overline{T}=\frac{1}{B}\sum _{s=1}^B T^{(s)}\), and \(\overline{V}=\frac{1}{B}\sum _{s=1}^B T^{(s)}(\widehat{p}^{(s)}-p)^2\).

This algorithm provides \(\overline{I}\), \(\overline{T}\), and \(\overline{V}\), which serve as approximations of \(\mathcal {I}(p)\), E[T], and \(E[T(\widehat{p}-p)^2]\), respectively, and then \(\text {REE}(\widehat{p}_G, \widehat{p}_I)\), \(\text {RTE}(\widehat{p}_G, \widehat{p}_I)\) and \(\text {RCE}(\widehat{p}_G, \widehat{p}_I)\) can be evaluated. When B is large, by the law of large numbers, these approximations are reasonable. Note that the EM algorithm and Louis’s method in steps 2(b)-2(c) involve implementing a Gibbs sampler to recover individual latent true statuses, where we use \(G=3000\) Gibbs iterates, which generally provides sufficient random draws to accurately estimate \(\widehat{p}\) and I(p); refer to Web Appendix A for more details. We explored convergence of the computation algorithm under different scenarios of the disease prevalence, p, and found that \(B=5000\) repetitions may be sufficient for convergence; see Web Appendix B. Therefore, we used \(B=5000\) repetitions throughout. Interested readers may also refer to Warasi et al. (2022), where a similar computation algorithm was proposed for a multiple-infection estimation problem of animal diseases.

Now, we explore the efficiency for five commonly used protocols: master pool testing (MPT), hierarchical testing with two stages (H2) and three stages (H3), and array testing without master pool test (A2) and with master pool test (A2M). Our software tools can provide efficiency results for four-stage hierarchical testing (H4) as well, but we do not use it because the prevalence rates in our LDH infectious disease data are too high for a four-stage protocol. As mentioned in Sect. 3, MPT uses only initial pools. In H2, initial positive pools are resolved by individual retesting in stage 2, while for H3 an intermediate stage is used for subpooling before resolving positive pools. For A2, individual specimens are first placed on the rows and columns of a square array, and then pooled samples comprised of the specimens from each row/column are tested in stage 1. In stage 2, individual retesting is conducted for case identification based on the strategy described in Kim et al. (2007). Similarly, the A2M protocol implements array testing but conducts an initial screening test for all specimens of the rows and columns. Note that testing for H3 and H4 proceeds with the framework that the pool size in one stage is evenly divisible by the pool size used in the immediately next stage—an assumption broadly adopted in the pooled testing literature; see Kim et al. (2007).

Fig. 1
figure 1

Relative efficiency results as a function of the initial pool size, k. Results are depicted for MPT (left), H2 (middle), and A2 (right). The top, middle, and bottom panels correspond to RTE, REE, and RCE, respectively. Note that, for the MPT protocol, the RTE results are identical for all four infections. For MPT, the REE values with chlamydia and SARS-CoV-2 in the left-middle subplot exceed the upper limit of the vertical axis. This is because the prevalence is high for these two infections

We use \(k=16\) as the maximum initial pool size, as it is commonly used for screening infectious diseases (Dodd et al. 2002; Bilder and Tebbs 2012). Although our work can accommodate pools of any size, larger pools are generally avoided as a precaution against potential dilution effects. The relative measures are calculated using the LDH data prevalence estimates (p) and the assay sensitivity/specificity (\(S_e\) and \(S_p\)) from Table 1. These calculations are performed over all possible configurations of pool size and are depicted in Fig. 1. For convenience, Fig. 1 shows results for only MPT, H2, and A2, while the entire results are presented in the electronic supplementary materials. Note that the RTE for all protocols is calculated using the closed-form expressions in Kim et al. (2007) and the REE for H2 is calculated using the variance presented in Web Appendix A. For other multistage scenarios, REE and RCE are evaluated using the computational algorithm above.

For MPT, the RTE decreases with the initial pool size k, because \(\text {RTE}(\widehat{p}_G, \widehat{p}_I) = 1/k\) as discussed above. For the H2 protocol, RTE is minimized at 9, 6, 4, and 3 for HIV, gonorrhea, chlamydia, and SARS-CoV-2, respectively. When estimation is of concern, interesting patterns are observed. First, when only initial pools are tested (i.e., MPT), there could be a substantial loss in estimation efficiency if the pool size is not carefully chosen, as depicted in the left-middle subplot of Fig. 1. Second, one might expect pools of smaller size would be better because that would generate more test responses. While this is generally true, in cases of very low prevalence, larger pools (i.e., fewer tests) may actually be more effective for precise estimation, as seen with HIV. For a multistage protocol, such as H2, estimation precision is not much affected by the pool size (the lines are fairly flat). This is why when the number of tests is taken into account, the cost-efficiency values are optimized at pool sizes that are identical or close to the optimal pool sizes based on RTE. For the A2 protocol, similar patterns are observed, but the optimal efficiencies occur at larger pools. Refer to Web Appendix D where we show three best pooling configurations, which might be useful when selecting an optimal or suboptimal configuration depending on the goal of a surveillance study.

4.2 Efficiency with a Fixed Number of Tests

This section proceeds assuming that the number of tests is fixed, a scenario commonly encountered when the testing budget is limited. In this case, we use the MPT protocol where only initial pools are tested. We do not use a multistage protocol, such as H2, because the number of tests, T, required for case identification is random and depends on whether the initial pools test positive or not. In this section, T is fixed, but the pool size k may vary.

Under this constraint, \(\text {RTE}(\widehat{p}_G, \widehat{p}_I)=1\) and \(\text {RCE}(\widehat{p}_G, \widehat{p}_I)\) reduces to \(\text {REE}(\widehat{p}_G, \widehat{p}_I)\), because T is a constant and identical for both pooled and individual testing. After simple algebraic steps, we find that \(\text {REE}(\widehat{p}_G, \widehat{p}_I)\) is (1/k) times the expression shown in Eq. (3). This indicates that the estimation efficiency discussed in Sect. 4.1 can be increased by a factor of k. The \(\text {REE}(\widehat{p}_G, \widehat{p}_I)\) expression can be simplified using the approximations \((1-p)^k \approx 1-p k\) and \((1-p)^{2(k-1)} \approx 1-2p(k-1)\), and more theoretical insights can be extracted as a function of p, \(S_e\), \(S_p\), and k. Many of the theoretical properties have been studied in the literature; e.g., see Liu et al. (2012) and Tu et al. (1995). However, we limit our discussion to determining only the optimal pool size and developing software tools so clinicians can directly use the optimality methods in applications.

Table 2 Relative estimation efficiency (REE) when the number of expended tests, T, is fixed

Using the p, \(S_e\), and \(S_p\) values from Table 1, \(\text {REE}(\widehat{p}_G, \widehat{p}_I)\) is calculated and presented in Table 2. Here, the optimal pool size is much larger than that calculated under the constraint of a fixed number of individuals in Sect. 4.1. For gonorrhea, the minimum REE value is 0.0674, corresponding to a pool size of \(k=28\). This implies that the variance of the MLE, \(\widehat{p}\), using individual testing is approximately 15 times greater than using pooled testing with \(k=28\). However, increasing the pool size is not always better as is seen in Table 2. Another important aspect is that 28 times more individuals can be screened using pooled testing, when compared to individual testing. This can be useful in blood transfusion, animal testing, or other applications where a quick screening test is necessary. For diseases with larger prevalence, the optimal precision is achieved at smaller pool sizes and the loss in estimation precision can be enormous if the pool size is not optimally determined. For example, for SARS-CoV-2, the optimal REE at \(k=6\) is 0.33, while the REE value at \(k=35\) is 783.67. For HIV, where the prevalence is very low, the minimal REE value is 0.0130, which occurs at \(k=102\). However, in practical applications, pools of much smaller size may be used. In such cases, the optimal pool size can be determined within the upper threshold, such as \(k=16\), as discussed in Sect. 4.1.

4.3 Efficiency with a Fixed Minimum Level of Precision

In this section, our goal is to determine the pooling configuration that allows us to maintain a minimum level of precision in estimation. To accomplish this, we set an upper bound on the mean squared error \(E_G[(\widehat{p}-p)^2]\) or, equivalently, the standard error. Unlike the first two constraints in Sects. 4.1 and 4.2, the constraint here allows that both the pool size, k, and the number of tests, T, can vary over their possible values. As in Sect. 4.2, we use only the MPT protocol for this investigation.

Let \(E_0\) denote the maximum standard error allowed for the estimator \(\widehat{p}\) and let \(T^*\) denote the minimum number of tests required to achieve that precision. Then setting the constraint \(\text {max}_T\,\, E_G[(\widehat{p}-p)^2] \le E_0^2\) and using the large-sample property of the MLE, we find

$$\begin{aligned} T^* = \frac{1}{E_0^2}\times \frac{\{S_e - r(1-p)^k\}\{1-S_e + r(1-p)^k\}}{r^2 k^2 (1-p)^{2(k-1)}} \end{aligned}$$

for a given pool size k; i.e., \(T^*\) is the T that satisfies the above constraint. Note that the final result needs to be reported as \(\big \lceil T^*\big \rceil \), the smallest integer greater than or equal to \(T^*\) (i.e., the ceiling of \(T^*\)).

Table 3 Minimum number of tests, \(T^*\), required for a certain level of precision in estimation

At the prevalence, sensitivity, and specificity values in Table 1, we calculate \(T^*\) for four choices of standard error: \(E_0 = 0.005, 0.010, 0.015, 0.020\). The results, presented in Table 3 for pool sizes \(k = 2, 3,..., 35\), reveal interesting patterns. For gonorrhea, the number of tests decreases sharply as the pool size increases, but becomes stable at around \(k = 10\). However, increasing the pool size is not always beneficial; for instance, when \(E_0 = 0.005\), \(T^*\) reaches its optimal (i.e., minimum) value at \(k = 27\) but then begins to increase. Similar patterns are noticed for other infections as well. A key observation is that as the prevalence p increases, the minimum number of tests required to reach a certain precision also increases. This can be explained as follows. If a pool comprises multiple positive individuals, which is roughly the case when \(p > 0.10\), a positive pooled response yields less estimation information than what would be generated by the usual one-at-a-time testing. With a larger p, this information loss further increases. Another observation is that with a smaller standard error (i.e., when a more precise estimate is desired), more tests are needed, and the optimal \(T^*\) is found at larger pool size. However, caution needs to be practiced when pooled dilution is a concern. In that case, the optimal configuration can be sought within a range of pool sizes, such as \(k=2, 3,..., 16\).

5 Software Package

Implementing optimization techniques in real-world applications is crucial for practitioners. To facilitate this, we offer two software tools: a package in R and an application written using the shiny package. While the package is developed for R users, the software application is designed for a broader audience. For any range of initial pool sizes, our software can provide the efficiency results discussed in Sects. 4.1, 4.2, 4.3.

figure a

In the R package, we provide the function mle.prop.eff (as shown above), which can be used to evaluate the relative efficiencies (RTE, REE, and RCE) for six pooling protocols. Initial pool sizes can be specified through ‘initial.psz.’ Because the computation algorithm in Sect. 4.1 is computationally intensive (as it involves a Gibbs sampler), we developed compiled Fortran and C programs and integrated them into the R function to boost its computing power. For further efficiency gain, our software offers the option to use multiple processing cores through ‘ncore.’ For the H3 protocol with pool sizes 9, 3, and 1 in stages 1-3 (with \(B=5000\) repetitions, each having \(G=3000\) Gibbs iterates), the computing times were approximately 0.01, 20, and 33 minutes for RTE, REE, and RCE, respectively. These times were recorded when executed using a single core on a computer of Intel Core i7-10750 @2.60GHz processor and 32GB of RAM. With multiple cores, the execution can be substantially faster; refer to Web Appendix C, where the function mle.prop.eff is briefly illustrated. Additional information and examples can be found in the documentation of the R package.

We have uploaded our software tools and code as supplementary material with this article, which can be used to reproduce all results presented. Furthermore, we provided the R function in the package groupTesting (Warasi 2024), which is available on the Comprehensive R Archive Network (CRAN), and uploaded the software application to https://mdwarasi.shinyapps.io/optimizeGT-app/.

6 Discussion

Pooled testing has been studied over the decades and recognized as an attractive alternative to individual testing. However, two challenges often limit its practical implementation. First, determining an optimal pooling configuration is not always possible using existing methods, except in simple pooling scenarios. Second, although optimal methods may exist, the implementation of such methods is not straightforward due to the complicated nature of pooled data and models. In this article, we made an effort to unify numerous contributions in a common framework and presented a computation technique that is conceptually simple but can be broadly useful. We also have developed software tools for optimization in disease surveillance studies, focusing on their user-friendliness, computing efficiency, and portability.

An important assumption adopted in our methods is that the prevalence p can be reliably estimated from historical or pilot study data and can be regarded as the true value of p. This assumption, although somewhat restrictive, is not unreasonable because a plethora of disease screening data becomes available every year to clinicians and public health officials. However, if the uncertainty in the prior estimates of p is of major concern, more advanced methods such as multistage adaptive pooling (Hughes-Oliver and Swallow 1994) or Bayesian formulation (Atkinson et al. 1993) can be pursued in future work for more robustness in the optimization methods. Another simplification in our work is that it focuses on the most natural and basic model structure. As a result, we did not address additional complexities, such as dilution effects (Zenios and Wein 1998), differential errors (Zhang et al. 2020b), and correlated data (Lendle et al. 2012), in our model framework. We conjecture that deriving analytical results for such advanced methods may rarely be possible. However, the computation algorithm outlined in this article can be extended to numerically calculate the results, as is illustrated in Sect. 4.1. The software we provide is also easily expandable with a straightforward effort.

We illustrated our work using infectious disease data from the LDH and demonstrated that major cost savings can be realized when pool sizes are chosen optimally. Similar benefits may be attained for other screenings in the LDH, including hepatitis B/C and influenza. In addition, our approach has the potential to be applied to infectious disease studies elsewhere, such as chlamydia/gonorrhea surveillance studies in Iowa (Tebbs et al. 2013) and rapid SARS-CoV-2 screening practices worldwide, particularly in resource-limited regions (Mutesa et al. 2021).