Nested Group Testing Procedure

Xiong, Wenjun; Ding, Juan; Zhang, Wei; Liu, Aiyi; Li, Qizhai

doi:10.1007/s40304-021-00269-0

Download PDF

Wenjun Xiong¹,
Juan Ding²,
Wei Zhang³,
Aiyi Liu⁴ &
…
Qizhai Li^3,5

1876 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We investigated the false-negative, true-negative, false-positive, and true-positive predictive values from a general group testing procedure for a heterogeneous population. We show that its false (true)-negative predictive value of a specimen is larger (smaller), and the false (true)-positive predictive value is smaller (larger) than that from individual testing procedure, where the former is in aversion. Then we propose a nested group testing procedure, and show that it can keep the sterling characteristics and also improve the false-negative predictive values for a specimen, not larger than that from individual testing. These characteristics are studied from both theoretical and numerical points of view. The nested group testing procedure is better than individual testing on both false-positive and false-negative predictive values, while retains the efficiency as a basic characteristic of a group testing procedure. Applications to Dorfman’s, Halving and Sterrett procedures are discussed. Results from extensive simulation studies and an application to malaria infection in microscopy-negative Malawian women exemplify the findings.

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Evaluating significance in linear mixed-effects models in R

Article 12 September 2016

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Article Open access 19 December 2014

1 Introduction

Screening for diseases especially those with low prevalences can be very costly and time-consuming. Group testing, as a cost-effective strategy, has been widely used in many fields to identify diseased subjects, for example, genetics [10], infectious disease screening [9, 11, 26], pharmaceutical industries [14], and agriculture [20], among others. Such a strategy was first used by [6] to test pooled blood samples for syphilis antigen in the US army recruitment. In Dorfman’s [6] procedure, the blood samples of subjects are pooled in some groups prior to testing. If a group is tested negative, all subjects in the group are declared negative. Otherwise, at least one subject in the group is infected and retesting is subsequently conducted on all subjects to identify the diseased ones. It is clear that this screening procedure as compared to individual testing can greatly save cost if the disease prevalence is low since a much larger number of groups are tested negative. Meanwhile, it could reduce the turnaround time for test results. Recently, group testing has been used for SARS-CoV-2 detection Lagopati et al. [16].

Since Dorfman’s [6] seminal work on group testing, a lot of work has been done in this area [12, 22, 23, 25]. When the personal information is available, some informative group testing procedures have been developed to further improve screening accuracy in term of certain operating characteristic [2, 3, 21]. Common operating characteristics include efficiency (expected number of tests per subject), pooling specificity and sensitivity, and positive and negative predictive values [18].

Predictive value, as one of the most important measures of a diagnostic test’s accuracy [4, 7], has also been used to evaluate the performance of a group testing procedure [15, 19]. Four predictive values including true-positive, false-positive, true-negative, and false-negative predictive values are usually investigated [15, 17]. The true (false)-positive predictive value is the probability that a subject tested positive is truly diseased (disease-free), and the true (false)-negative predictive value is the probability that a subject tested negative is truly disease-free (diseased). The smaller the false-negative and false-positive values are, the better a group testing procedure is. Low false-negative predictive value is particularly desirable for life-threatening diseases such as human immunodeficiency virus due to the serious consequences if missing treatment of the disease, and coronavirus disease for its quick transmission among human beings.

In this work, we study four predictive values from a general group testing procedure in which the test is conducted at multiple stages. By comparing it with individual testing procedure, we show that the false-negative predictive value for a group testing procedure is larger, while the false-positive predictive value is smaller. Moreover, as the testing stage increases, the false-negative predictive value increases. So, we propose a nested group testing strategy by retesting negative groups, which is shown to yield smaller false-negative and false-positive predictive value than individual testing procedure. The remaining parts of the paper are arranged as follows. In Sect. 2, we introduce the predictive values from a general group testing procedure and show that its false-negative predictive value is larger, and the false-positive predictive value is lower than those from individual testing procedure. Then we propose a nested group testing procedure that can improve the false-negative predictive value. Applications of the new method to Dorfman’s, Halving and Sterrett procedures are discussed in Sect. 3. The extensive simulation studies and a real data analysis are conducted in Sect. 4 to investigate the performance of the proposed method. Section 5 concludes the work. The technical details are provided in Appendix.

2 Main Results

2.1 Notations

Consider a general multi-stage group testing procedure, shown in Fig. 1a, being denoted as ${\mathcal {A}}_{O}$, where groups tested positive are successively randomly split into subgroups for retesting. In the first stage, all the available specimens are randomly divided into a certain number of groups and testing are conducted at the group level. In the subsequent stages, if a group is tested negative, no further splitting is needed and all its members are declared negative; groups tested positive are further split into subgroups and tested until all subgroups are declared negative or individual testing occurs. Assume that there are n specimens $X_1, \ldots , X_n$ conducted in ${\mathcal {A}}_{O}$ of $L(\ge 1)$ stages. Denote the probability that the sth specimen is truly diseased by $p_{s}$, $s=1,\ldots ,n$, which are allowed to be different among subjects. Suppose specimens are tested by an assay with the sensitivity $S_e$ and specificity $S_p$. We assume that $S_e$ and $S_p$ do not depend on the group size.

Denote the testing results of these n specimens by ${\mathcal {I}}_1^{(l_1)},\ldots ,{\mathcal {I}}_n^{(l_n)}$ taking 0 if being declared negative and 1 otherwise, with the corresponding true diseased statuses being $\widetilde{{\mathcal {I}}}_1,\ldots ,\widetilde{{\mathcal {I}}}_n$, where $l_s\in \{1,\ldots ,L\}$ is the stage at which the disease status of the sth specimen is declared, $s=1,\ldots ,n$. For example, if $X_1$ is tested negative in the third stage, then $l_1=3$. For $s=1,\ldots , n$, the false-negative and true-negative predictive values are defined, respectively, as $\xi _{1,{\mathcal {A}}_{O}}(X_s)= \mathrm{Pr}\big (\widetilde{{\mathcal {I}}}_s=1|{\mathcal {I}}^{(l_s)}_s=0\big )~\hbox {and}~\xi _{2,{\mathcal {A}}_{O}}(X_s)= \mathrm{Pr}\big (\widetilde{{\mathcal {I}}}_s=0|{\mathcal {I}}^{(l_s)}_s=0\big ).$ Similarly, the false-positive and true-positive predictive values are $\eta _{1,{\mathcal {A}}_{O}}(X_s)= \mathrm{Pr}\big (\widetilde{{\mathcal {I}}}_s=0|{\mathcal {I}}^{(l_s)}_s=1\big )~\hbox {and}~\eta _{2,{\mathcal {A}}_{O}}(X_s)= \mathrm{Pr}\big (\widetilde{{\mathcal {I}}}_s=1|{\mathcal {I}}^{(l_s)}_s=1\big )$, respectively. Denote the true and testing status of the group containing $X_{s}$ in the jth stage by ${\widetilde{G}}^{(j)}(X_s)$ and $G^{(j)}(X_s)$ , $j\le l_s$, $s=1,\ldots ,n$.

We use a toy example to illustrate the effectiveness of implementing retesting. Consider the Dorfman’s algorithm. Under the same size of the master pool, suppose we perform testing in two ways: one is to use a group size of k, denoted by ${\mathcal {A}}_D(k)$; the other is to use a group size of 2k but with each group tested twice, denoted by ${\mathcal {A}}'_D(2k)$. Then the overall number of tests is almost the same for ${\mathcal {A}}_D(k)$ and ${\mathcal {A}}'_D(2k)$. The corresponding false-negative predictive values of specimens X tested negative in the first stage are $\xi _{1,{\mathcal {A}}_D(k)}(X) =\mathrm{Pr}\big (\widetilde{{\mathcal {I}}}=1|G_{k}=0\big )$ and $\xi _{1,{\mathcal {A}}'_D(2k)}(X) =\mathrm{Pr}\big (\widetilde{{\mathcal {I}}}=1|G_{1,2k}=0,G_{2,2k}=0\big )$ respectively, where $G_{1,2k}$ and $G_{2,2k}$ representing the groups which are tested twice. Using the Bayesian rule, we obtain $\frac{\xi _{1,{\mathcal {A}}_D(k)}(X)}{\xi _{1,{\mathcal {A}}'_D(2k)}(X)}=1+\frac{S_p(S_e+S_p-1)(1-p)^k}{(1-S_e)[1-S_e+(S_e+S_p-1)(1-p)^k]}.$ The detailed derivation is given in Appendix A. This ratio will be far larger than 1 if the sensitivity $S_e$ is approaching 1. For example, let $S_e=S_p=0.99$, $p=0.01$ and $k=10,$ then the ratio is 98.9. This means the false-negative predictive value could be significantly reduced through retesting, while using almost the same number of tests as using ordinary Dorfman’s algorithm. This toy example shows the advantages of implementing retesting in group testing. It motivates us to investigate thoroughly the properties of group testing algorithms by incorporating retesting.

2.2 Predictive Values of ${\mathcal {A}}_{O}$

For some $s\in \{1,2,\ldots ,n\}$, if a specimen $X_s$ is tested negative at the $l_s$ stage. Those groups that containing this specimen $X_s$ should be tested positive at the previous stages, which is $G^{(j)}(X_s)=1,j< l_{s}$ and $G^{(l_s)}(X_s)=0.$ On the other hand, if $X_s$ is tested positive at the $l_s$ stage, then all groups containing $X_s$ are tested positive at the previous stage, which is $G^{(j)}(X_s)=1,j\le l_{s}.$ Therefore, the predictive values defined above is actually determined by the process a specimen has went through until being declared as positive or negative finally. For $s=1,\ldots ,n$, the false-negative, true-negative, false-positive, and true-positive predictive values using ${\mathcal {A}}_{O}$ can be derived as, respectively,

$$\begin{aligned} \begin{array}{ccl} \xi _{1,{\mathcal {A}}_{O}}(X_{s}) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{(j)}(X_s)=1,j< l_{s}, G^{(l_s)}(X_s)=0\right) , \\ \xi _{2,{\mathcal {A}}_{O}}(X_{s}) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=0|G^{(j)}(X_s)=1,j< l_{s}, G^{(l_s)}(X_s)=0\right) ,\\ \eta _{1,{\mathcal {A}}_{O}}(X_{s}) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=0|G^{(j)}(X_s)=1,j\le l_{s}\right) ,\\ \eta _{2,{\mathcal {A}}_{O}}(X_{s}) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{(j)}(X_s)=1,j\le l_{s}\right) . \end{array} \end{aligned}$$

If a specimen $X_s$ is tested negative at the second stage, that is $l_s=2$, then the false- and true-negative predictive values are expressed as $\mathrm{Pr}\big (\widetilde{{\mathcal {I}}}_{s}=1|G^{(1)}(X_s)=1, G^{(2)}(X_s)=0 \big )$ and $\mathrm{Pr}\big (\widetilde{{\mathcal {I}}}_{s}=0|G^{(1)}(X_s)=1,G^{(1)}(X_s)=0\big )$, respectively. It is clear that $\xi _{1,{\mathcal {A}}_{O}}(X_{s})+\xi _{2,{\mathcal {A}}_{O}}(X_{s})=1$. The expressions for $\eta _{1,{\mathcal {A}}_{O}}(X_{s})$ and $\eta _{2,{\mathcal {A}}_{O}}(X_{s})$ can be similarly derived. It is worth mentioning that, since a specimen could be detected positive only after the first stage, the false- and true-positive predictive value are only defined for stages $l_s>1$. After some algebraic manipulations given in Appendix A, we have

$$\begin{aligned} \begin{array}{ccl} \xi _{1,{\mathcal {A}}_{O}}\left( X_s\right) &{}=&{}\left( 1-S_e\right) S_e^{l_s-1}p_{s}/\phi \left( l_s,s\right) , \\ \xi _{2,{\mathcal {A}}_{O}}\left( X_{s}\right) &{}=&{} 1-\left( 1-S_e\right) S_e^{l_s-1}p_{s}/\phi \left( l_s,s\right) ,\\ \eta _{1,{\mathcal {A}}_{O}}\left( X_{s}\right) &{}=&{} 1-S^{l_s}_ep_{s}/\psi \left( l_s,s\right) ,\\ \eta _{2,{\mathcal {A}}_{O}}\left( X_{s}\right) &{}=&{} S^{l_s}_ep_{s}/\psi \left( l_s,s\right) , \end{array} \end{aligned}$$

where

$$\begin{aligned} \begin{array}{lll} \phi (l_s,s)&{}=&{}S_e^{l_s-1} (1-S_e) \left( 1-\varphi \left( G^{(l_s)}(X_s)\right) \right) +S_p(1-S_p)^{l_s-1}\varphi \left( G^{(1)}(X_s)\right) \\ &{}&{}+\sum \limits _{\tau =1}^{l_s-1}S_p S^{\tau }_e(1-S_p)^{l_s-1-\tau }\left( 1-\varphi \left( G^{(\tau )}(X_s)\backslash G^{(\tau +1)}(X_s)\right) \right) \varphi \left( G^{(\tau +1)}(X_s)\right) ,\\ \psi (l_s,s)&{}=&{}S_e^{l_s}\left( 1-\varphi \left( G^{(l_s)}(X_s)\right) \right) +(1-S_p)^{l_s}\varphi \left( G^{(1)}(X_s)\right) \\ &{}&{}+\sum \limits _{\tau =1}^{l_s-1} S_e^{\tau }(1-S_p)^{l_s-\tau } \left( 1-\varphi \left( G^{(\tau )}(X_s)\backslash G^{(\tau +1)}(X_s)\right) \right) \varphi \left( G^{(\tau +1)}(X_s)\right) ,\\ \end{array} \end{aligned}$$

with $\varphi (A)=\prod _{\{j:X_j\in A\}}(1-p_j)$.

When $L=1$, the procedure ${\mathcal {A}}_{O}$ reduces to individual testing procedure, denote it by ${\mathcal {A}}_I.$ Then four predictive values for individual testing $s\in \{1,2,\ldots ,n\}$ are

$$\begin{aligned} \begin{array}{ccl} \xi _{1,{\mathcal {A}}_I}\left( X_{s}\right) &{}=&{} \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|{\mathcal {I}}^{\left( l_s\right) }_{s}=0\right) =\frac{\left( 1-S_e\right) p_{s}}{S_p\left( 1-p_{s}\right) +\left( 1-S_e\right) p_{s}}, \\ \xi _{2,{\mathcal {A}}_I}\left( X_{s}\right) &{}=&{} \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=0|{\mathcal {I}}^{\left( l_s\right) }_{s}=0\right) =\frac{S_p\left( 1-p_{s}\right) }{S_p\left( 1-p_{s}\right) +\left( 1-S_e\right) p_{s}},\\ \eta _{1,{\mathcal {A}}_I}\left( X_{s}\right) &{}=&{} \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=0|{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) =\frac{\left( 1-S_p\right) \left( 1-p_{s}\right) }{S_ep_{s}+\left( 1-S_p\right) \left( 1-p_{s}\right) },\\ \eta _{2,{\mathcal {A}}_I}\left( X_{s}\right) &{}=&{} \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) =\frac{S_ep_{s}}{S_ep_{s}+\left( 1-S_p\right) \left( 1-p_{s}\right) }. \end{array} \end{aligned}$$

2.3 Predictive Value Comparison Between ${\mathcal {A}}_O$ and ${\mathcal {A}}_I$

The predictive value is important in real applications. A good group testing strategy should have lower false-negative and lower false-positive predictive values compared to individual testing procedure. Intuitively, ${\mathcal {A}}_{O}$ should bring better false-positive predictive value than ${\mathcal {A}}_I$ since the average number of testing for a positive specimen is beyond one as $L>1$. However, its false-negative predictive value is a little bit optimistic because the testing results of the group containing it in the previous stages are tested positive. The following Theorem 2.1 verifies these results, whose proof is given in Appendix B.

Theorem 2.1

Suppose a specimen $X_{s}$ is tested using ${\mathcal {A}}_{O}$ and ${\mathcal {A}}_I$ respectively, $s=1,\ldots ,n$, then

$$\begin{aligned} \begin{aligned}&\xi _{1,{\mathcal {A}}_{O}}\left( X_{s}\right)>\xi _{1,{\mathcal {A}}_I}\left( X_{s}\right) ,~ \xi _{2,{\mathcal {A}}_{O}}\left( X_{s}\right)<\xi _{2,{\mathcal {A}}_I}\left( X_{s}\right) ,\\&\eta _{1,{\mathcal {A}}_{O}}\left( X_{s}\right) <\eta _{1,{\mathcal {A}}_I}\left( X_{s}\right) ,~ \eta _{2,{\mathcal {A}}_{O}}\left( X_{s}\right) >\eta _{2,{\mathcal {A}}_I}\left( X_{s}\right) . \end{aligned} \end{aligned}$$

$\xi _{1,{\mathcal {A}}_{O}}(X_{s})$ and $\xi _{2,{\mathcal {A}}_{O}}(X_{s})$ are specified by the stage in which a specimen is tested positive or negative. Intuitively, the false-negative values of a specimen tested negative at different stages should be different. Theorem 2.2 below, whose proof is given in Appendix C, reveals that the false-negative predictive value of a specimen increases when it is declared in high stage under certain condition. Let $\xi _{1,{\mathcal {A}}_{O}}(X_{s}|l_s=l)$ represent the false-negative predictive value of a specimen $X_s$ declared negative in the lth stage, $l\le L$. To simplify the notations, denote $V_{s,z}=\mathrm{Pr}\Big ({\widetilde{W}}_{G^{(z)}(X_s)\backslash G^{(z+1)}(X_s)}=0\Big )=\prod \limits _{\{j:X_j\in G^{(z)}(X_s)\backslash G^{(z+1)}(X_s)\}}(1-p_j),$ where ${\widetilde{W}}_{A\backslash B}$ represents the true disease status of the difference between two sets A and B containing specimen $X_s$ tested negative in the zth stage, $z\le L$.

Theorem 2.2

Suppose a specimen $X_s$ is tested negative in the lth stage using ${\mathcal {A}}_{O}$. Then the following relationships are established:

(1)
$\xi _{1,{\mathcal {A}}_{O}}(X_{s}|l_s=1)<\xi _{1,{\mathcal {A}}_{O}}(X_{s}|l_s=2)$ for $l=1$,
(2)
$\xi _{1,{\mathcal {A}}_{O}}(X_{s}|l_s=l)<\xi _{1,{\mathcal {A}}_{O}}(X_{s}|l_s=l+1)$ if $S_pV_{s,l}(1-V_{s,l-1})>S_e(1-V_{s,l})$ for $l>1$.

Group testing is usually applied for rare diseases, therefore the condition in result (2) of Theorem 2 is frequently satisfied. Taking a homogeneous population as an example, this condition reduces to $S_p/S_e V_{s,l+1}(1+V_{s,l+1}+\ldots +V_{s,l+1}^{\lfloor a_{l}\rfloor -1})>1$ and $a_{l}=(k_{l-1}-k_{l})/(k_{l}-k_{l+1})\ge 2$, where $V_{s,l+1}=(1-p)^{k_l-k_{l+1}}$, $k_{l}$ is the group size in the lth stage, and $\lfloor a \rfloor $ is the largest integer less then or equal a. This condition is easy to satisfy, for example, if $S_p\ge $0.8 and $(1-p)^{k_2}\ge $ 0.75.

2.4 Nested Group Testing Procedure

A good group testing procedure is expected to have low false-negative and false-positive and high true-negative and true-positive predictive values. As shown in Theorem 1, using ${\mathcal {A}}_{O}$ yields higher false-negative predictive values than ${\mathcal {A}}_I$. To improve it, we propose a nested group testing procedure based on ${\mathcal {A}}_{O}$ where the negative groups are retested at each stage. With this doings, all specimens in the groups tested negative at the $l_s$ stage are randomly split into a certain number of groups for retesting using Dorfman’s procedure (${\mathcal {A}}_D$). We call it the nested group testing procedure, denoted by ${\mathcal {A}}_N$. See Fig. 1b for illustration.

Denote the false-negative predictive value of the specimen $X_{s}$ from ${\mathcal {A}}_N$ by $\xi _{1,{\mathcal {A}}_{N}}(X_{s};k)$, which is expressed as $\xi _{1,{\mathcal {A}}_{N}}(X_{s};k)=\mathrm{Pr}\big (\widetilde{{\mathcal {I}}}_{s}=1|G^{(l_{{s}})}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k)=0\big ),s\in \{1,\ldots ,n\},$ where $B_s(k)$ denotes the group containing $X_{s}$, and k is the group size of retesting. The following Theorem 3, whose proof is given in Appendix D, shows that using ${\mathcal {A}}_N$ yields lower false-negative predictive value than ${\mathcal {A}}_I$.

Theorem 2.3

If $(1-S_e)/S_p\le \mathrm{Pr}\big ({\widetilde{W}}_{G^{(l_s-1)}(X_s)\backslash G^{(l_s)}(X_s)}=1\big )$, then there exists a group size $k^{(l_s)}_{s}$ such that, for any $k_1 \le k^{(l_s)}_{s}$ and $k_2 > k^{(l_s)}_{s}$ the false-negative predictive value of a specimen $X_s$ using nested group testing procedure ${\mathcal {A}}_{N}$ satisfies $\xi _{1,{\mathcal {A}}_{N}}(X_{s};k_1)\le \xi _{1,{\mathcal {A}}_I}(X_{s}),$ and $ \xi _{1,{\mathcal {A}}_{N}}(X_{s};k_2)> \xi _{1,{\mathcal {A}}_I}(X_{s}),s\in \{1,\ldots ,n\}.$

With the proposed procedure, if a group size larger than $k_{s}^{(l_s)}$ is used, the false-negative predictive value of $X_s$ would be larger than that from individual testing procedure; otherwise, it would be smaller. For a homogeneous population where all specimens have the same probability of being positive (that is, $p_s=p$ for all $s\in \{1,\ldots ,n\}$), the condition in Theorem 3 reduces to $(1-S_e)/S_p\le 1-(1-p)^{k_{l_s-1}-k_{l_s}}.$ Since the upper-bounded group size $k_{s}^{(l_s)}$ for different specimens may not be the same, we split the specimens in the negative groups by the order of their upper-bounded retesting size $k_{s}^{(l_s)}$ in the nested group testing procedure. To be specific, in each stage, we calculate $k_{s}^{(l_s)}$ for all specimens in the negative groups and sort them in an ascending order; the specimens are subsequently split based on the ordered upper-bounded sizes.

For convenience, we could directly use the group size $k_{s}^{(l_s)}$ for retesting. In this way, the false-negative predictive values from our nested group testing procedure ${\mathcal {A}}_{N}$ are reduced and more stable regardless of the stages. As shown in Theorem 2.2, the false-negative predictive value of ${\mathcal {A}}_{O}$ is far larger than that of ${\mathcal {A}}_{I}$ especially when the stage l goes high. The retest group size $k_{s}^{(l_s)}$ is obtained through targeting the individual testing. Therefore, the false-negative predictive value of ${\mathcal {A}}_{N}$ is close to ${\mathcal {A}}_{I}$ and maintains the efficiency. In another word, our procedure ${\mathcal {A}}_{N}$ is more stable than ${\mathcal {A}}_{O}.$ To measure it, denote by ${\bar{\xi }}_{1,{\mathcal {A}}_N}=\frac{1}{n}\sum _{s=1}^n\xi _{1,{\mathcal {A}}_N}(X_{s})$, and

$$\begin{aligned} \varDelta _{{\mathcal {A}}_N}= \frac{1}{n}\sum _{s=1}^n\left( \xi _{1,{\mathcal {A}}_N}(X_{s})-{\bar{\xi }}_{1,{\mathcal {A}}_N}\right) ^2. \end{aligned}$$

For individual testing, we have $\varDelta _{{\mathcal {A}}_I}=0$ for homogeneous population. This measurement $\varDelta _{{\mathcal {A}}_N}$ features the difference of false-negative predictive value of specimens. We name it as false-negative(FN)-alike measurement. In next section, we will show in detail that the nested group testing procedure has smaller FN-alike measurement $\varDelta _{{\mathcal {A}}_N},$ compared with the original group testing procedure ${\mathcal {A}}_O.$ In another word, the nested group testing procedure has more stable performance on the false-negative predictive value at each stage.

3 Applications to the Existing Group Testing Procedures

Considerable attention has been given to Dorfman’s [6] group testing strategy since its appearance, resulting in various extensions and improvements. Here we focus on hierarchical group testing algorithms including two-stage Dorfman’s procedure, three-stage Halving procedure and one-step Sterrett procedure [13, 24].

3.1 Dorfman’s and Halving Procedure

To investigate the performance of the proposed method, we apply it to two common group testing procedures including Dorfman’s procedure (${\mathcal {A}}_{D}$) with $L=2$ (see Fig. 2a) and Halving procedure (${\mathcal {A}}_{H}$) with $L=3$ (see Fig. 2c) [1, 3, 5]. We also construct versions of nested group testing procedures for these two procedures (see Fig. 2b, d).

A specimen can be tested negative in stage 1 or 2 for Dorfman’s procedure (Fig. 2a), and in stage 1, 2, or 3 for the three-stage halving procedure (Fig. 2c). The predicative values for these two nested procedures are given as follows.

Dorfman’s procedure:

$$\begin{aligned} \begin{array}{ccl} \xi _{1,{\mathcal {A}}_{D}}\left( X_{s}|l_s=1\right) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_s\right) =0\right) = \frac{\left( 1-S_e\right) p_{s}}{1-S_e+r\varphi \left( G^{\left( 1\right) }\left( X_s\right) \right) }, \\ \xi _{1,{\mathcal {A}}_{D}}\left( X_{s}|l_s=2\right) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_s\right) =1,G^{\left( 2\right) }\left( X_s\right) =0\right) \\ &{}=&{}\frac{\left( 1-S_e\right) S_ep_{s}}{S_e\left( 1-S_e\right) +rS_e\left( 1-p_{s}\right) -rS_p\varphi \left( G^{\left( 1\right) }\left( X_s\right) \right) }.\\ \end{array} \end{aligned}$$

Halving procedure:

$$\begin{aligned} \begin{array}{ccl} \xi _{1,{\mathcal {A}}_{H}}\left( X_{s}|l_s=1\right) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_s\right) =0\right) = \frac{\left( 1-S_e\right) p_{s}}{1-S_e+r\varphi \left( G^{\left( 1\right) }\left( X_s\right) \right) },\\ \xi _{1,{\mathcal {A}}_{H}}\left( X_{s}|l_s=2\right) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_s\right) =1,G^{\left( 2\right) }\left( X_s\right) =0\right) \\ &{}=&{}\frac{\left( 1-S_e\right) S_ep_{s}}{S_e\left( 1-S_e\right) +rS_e\varphi \left( G^{\left( 2\right) }\left( X_s\right) \right) -rS_p\varphi \left( G^{\left( 1\right) }\left( X_s\right) \right) },\\ \xi _{1,{\mathcal {A}}_{H}}\left( X_{s}|l_s=3\right) &{}=&{} \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_s\right) =1,G^{\left( 2\right) }\left( X_s\right) =1,G^{\left( 3\right) }\left( X_s\right) =0\right) \\ &{}=&{}\frac{\left( 1-S_e\right) S_e^2p_{s}}{S_e^2\left( 1-S_e\right) +rS_e^2\varphi \left( G^{\left( 3\right) }\left( X_s\right) \right) -rS_eS_p\varphi \left( G^{\left( 2\right) }\left( X_s\right) \right) -rS_p\left( 1-S_p\right) \varphi \left( G^{\left( 1\right) }\left( X_s\right) \right) }, \end{array} \end{aligned}$$

where $\varphi (A)=\prod _{\{j:X_j\in A\}}(1-p_j)$ and $r=S_e+S_p-1$. The detailed derivations are given in Appendix A.

Set the prevalence as $p=0.005$, 0.01 and 0.03 and initial group size as $k_1=80$, 40 and 20. The sensitivity and specificity $(S_e,S_p)$ are set to be (0.8,0.8), (0.85,0.85), (0.9,0.9), (0.95,0.95), and (0.99,0.99). Table 1 presents the false-negative predictive values for the Dorfman’s procedure and Halving procedure, respectively, and their corresponding nested procedures. In addition, the results for ${\mathcal {A}}_I$ are also reported. The notations are as follows. Denote the lth stage of Dorfman’s procedure and the corresponding nested group testing procedure by $D_{l}$ and $nD_{l}$, and the lth stage of three-stage halving procedure and the corresponding nested group testing procedure by $H_{l}$ and $nH_{l}$, respectively. Since $D_1$ and $H_1$ are the same, we omit the the results of $H_1$. Denote the retest group size in the $l^{th}$ stage by $k^{(l)}$. If a maximum tolerance group size is imposed, say, $k_{\max }=100,$ then the group size for the implemented Dorfman’s procedure in our nested group testing procedure is $k_*^{(l)}=\min \{k^{(l)},k_{\max }\}$.

Table 1 The false-negative predictive values of $D_1$ and $D_2$ (the first and second stage of the Dorfman’s procedure), $nD_1$ and $nD_2$ (the nested Dorfman), $H_1$ and $H_2$ (the first and second stage of Halving procedure), and $nH_2$ and $nH_3$ (the nested Halving)

Full size table

As expected, the false-negative predictive values of ${\mathcal {A}}_I$ are lower than those of $D_1$ and $D_2$, and $H_2$ and $H_3$. For example, when $p=$ 0.005, $k_1=80$ and $S_e=S_p=$ 0.85, the false-negative predictive values of $D_1$, $D_2$, $H_2$, and $H_3$ are, $1.212\times 10^{-3}$, $1.985\times 10^{-3}$, $2.952\times 10^{-3}$ and $4.329\times 10^{-3}$, respectively, which is by far larger than $0.886\times 10^{-3}$ of ${\mathcal {A}}_I$. Instead, using nested group testing procedure with group size $k_*^{(l)}=100, 40, 35$ and 7, these values become $0.248\times 10^{-3}$, $0.522\times 10^{-3}$, $0.565\times 10^{-3}$, and $0.78\times 10^{-3}$, respectively. Obviously, the false-negative predictive values are greatly reduced.

Subsequently, we explore the possibility of a specimen tested negative at the $l_s$ stage with $l_s>1$, which is defined as $\phi (l_s,s)=\mathrm{Pr}(G^{(j)}(X_s)=1,j<l_s,G^{(l_s)}(X_s)=0)$. Table 2 displays the results for selected values of prevalences, group sizes and testing error rates. From this table, we see that the probability is nonignorable. For example, when $p=$0.005, $k_1=80$, $S_e=$0.99, and $S_p=$0.99, the probability of a specimen reported negative in the third stage of Halving procedure is 0.173.

Table 2 The probability of a specimen tested negative at lth stage for Dorfman’s procedure ($l=1,2$) and Halving procedure ($l=2,3$), and the FN-alike measurements $\varDelta _{{\mathcal {A}}_D}$, $\varDelta _{{\mathcal {A}}_{nD}}$

Full size table

In Table 2, we also report the FN-alike measurement $\varDelta _{{\mathcal {A}}_D}$ for Dorfman’s algorithm and $\varDelta _{{\mathcal {A}}_{nD}}$ for the nested Dorfman’s algorithm, along with their ratio $r_D =\varDelta _{{\mathcal {A}}_{nD}}/\varDelta _{{\mathcal {A}}_{D}}.$ This measurement is computed based on Table 1. The small ratio $r_D$ indicates specimens having similar performance on the false-negative predictive value although they might have went through different processes. It is an appealing characteristic since it is expected to have comparable false-negative predictive value for all specimens. The results are similar for Halving algorithm.

3.2 One-Step Sterrett Procedure

The difference between Sterrett’s and Dorman’s procedures is that in Sterrett’s procedure a unit randomly chosen from a positive group is for testing until a positive one is identified, then the remained units are formed to a group for testing. A graphical presentation of one-step Sterrett procedure is given in Fig. 3, denoted by ${\mathcal {A}}_S.$

Denote the $l^{th}$ stage of Sterrett procedure by $S_l$, $l=1,2,3$. Note that in stage 2, specimens could be tested negative in two different ways. Without loss of generality, assume the first group in stage 1 is tested positive, and subsequently in stage 2 the first $d-1$ individuals are tested negative, while the $d^{th}$ individual is tested positive. Denote by $S_{2^{(1)}}$ the procedure for specimens using individual testing in stage 2. For those remaining specimens in that positive group, denote by $S_{2^{(2)}}$ the procedure for specimens using group testing in stage 2. We use $\xi _{1,{\mathcal {A}}_S}(X_{s}|l_s=2^{(1)})$ or $\xi _{1,{\mathcal {A}}_S}(X_{s}|l_s=2^{(2)})$ to denote the false-negative predictive value of $X_s$ that is tested negative in stage 2 by $S_{2^{(1)}}$ or $S_{2^{(2)}}$, respectively. Specifically, the false-negative predictive value of $X_s$ is denoted by

$$\begin{aligned} \begin{array}{ccl} \xi _{1,{\mathcal {A}}_S}\left( X_{s}|l_s=1\right) &{}=&{}\hbox {Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_{s}\right) =0\right) ,~~~~~~~~~~ \\ \xi _{1,{\mathcal {A}}_S}\left( X_{s}|l_s=2^{\left( 1\right) }\right) &{}=&{}\hbox {Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_{s}\right) =1,\sum \limits _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\right) , \\ \xi _{1,{\mathcal {A}}_S}\left( X_{s}|l_s=2^{\left( 2\right) }\right) &{}=&{}\hbox {Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_{s}\right) =1,\sum \limits _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1,G^{\left( 2\right) }\left( X_s\right) =0\right) , \\ \xi _{1,{\mathcal {A}}_S}\left( X_s|l_s=3\right) &{}=&{}\hbox {Pr}\left( \widetilde{{\mathcal {I}}}_s=1|\sum \limits _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1,G^{\left( l\right) }\left( X_s\right) =1,l\le 2,G^{\left( 3\right) }\left( X_s\right) =0\right) . \end{array} \end{aligned}$$

Similarly, we could obtain the true-positive predictive value of $X_s$. We summarize the results in the following theorem.

Theorem 3.1

Suppose a specimen $X_s$ is tested using one-step Sterrett procedure ${\mathcal {A}}_S$ and individual testing ${\mathcal {A}}_I$ respectively, $s=1,\ldots ,n,$ then

$$\begin{aligned} \begin{aligned}&\xi _{1,{\mathcal {A}}_S}\left( X_{s}\right)>\xi _{1,{\mathcal {A}}_I}\left( X_{s}\right) ,~ \xi _{2,{\mathcal {A}}_S}\left( X_{s}\right)<\xi _{2,{\mathcal {A}}_I}\left( X_{s}\right) ,\\&\eta _{1,{\mathcal {A}}_S}\left( X_{s}\right) <\eta _{1,{\mathcal {A}}_I}\left( X_{s}\right) ,~ \eta _{2,{\mathcal {A}}_S}\left( X_{s}\right) >\eta _{2,{\mathcal {A}}_I}\left( X_{s}\right) . \end{aligned} \end{aligned}$$

This result is parallel to Theorem 1, with proof given in Appendix E. Note that Sterrett procedure is slightly different with Dorfman’s or Halving procedure. Theorem 4 shows that improvement is also necessary for Sterrett procedure. Similarly, we propose to retest those specimens which is declared negative. For example, suppose a specimen $X_{s}$ belongs to the set $S_{2^{(1)}}$. Denote by $B_{s}(k)$ the retesting group with group size k which contains $X_s$. Then the false-negative predictor value of $X_{s}$ after retesting is defined as

$$\begin{aligned} \begin{array}{lcl} \xi _{1,{\mathcal {A}}_{N}}\left( X_{s};k_2\right) =\hbox {Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_{s}\right) =1,\sum \limits _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1, B_s\left( k_2\right) =0\right) . \end{array} \end{aligned}$$

The false-negative predictor value of $X_{s}$ belonging to $S_1$, $S_{2^{(2)}}$ or $S_3$ is defined in the same way. In the following theorem, we will show that it is an increasing function with respect to the retesting size $k_r.$ The situation in stage 2 of Sterrett procedure is complex due to the characteristic of this procedure. Therefore, we cannot obtain a result parallel to Theorem 3, but we can still obtain Theorem 3.2. The proof is given in Appendix F. Simulations reported in Table 3 show the false-negative predictor value is considerably reduced using nested Sterrett procedure.

Theorem 3.2

The false-negative predictor values of a specimen using nested Sterrett procedure are strictly increasing with respect to the retest group size $k_r.$

To investigate the false-negative predictive value of Sterret procedure and the nested Sterrett procedure, we run simulations with the prevalence $p=0.005$, 0.01 and 0.03 and initial group size as $k_1=80$, 40 and 20 correspondingly. Set number of groups by 200 and repeat the simulations by 1000 times. Note that a positive group will be tested individually until the first positive specimen comes out. Obviously, the position of the first positive specimen varies. So, we reported average false-negative predictive value of different types of specimens in Table 3. We omit the retesting group sizes $k^*$ since they are also varies.

Table 3 The false-negative predictive values (the values have been multiplied by $10^3$) and the FN-alike measurement ratio $\varDelta _{{\mathcal {A}}_{nS}}/\varDelta _{{\mathcal {A}}_S}$. $S_{2^{(1)}}$ ($nS_{2^{(1)}}$), $S_{2^{(2)}}$ ($nS_{2^{(2)}}$) and $S_3$ ($nS_3$) represent the second and third stage of one-step Sterrett procedure and the corresponding nested group testing procedure, respectively

Full size table

From Table 3, the false-negative predictive value from individual testing (${\mathcal {A}}_I$) is lower than that from Sterrett procedure ($S_{2^{(1)}}$, $S_{2^{(2)}}$ and $S_3$). For example, when $p=$ 0.005, $k_1=80$ and $S_e=S_p=$ 0.9, the false-negative predictive values from $S_{2^{(1)}}$, $S_{2^{(2)}}$, and $S_3$ are $1.460\times 10^{-3}$, $3.847\times 10^{-3}$ and $1.813\times 10^{-3}$, respectively, which is by far larger than $0.552\times 10^{-3}$ for individual testing procedure. After using nested group testing procedure, these values become $0.282\times 10^{-3}$, $0.758\times 10^{-3}$, $0.371\times 10^{-3}$ respectively. Although the false-negative predictive value of $nS_{2^{(2)}}$ is slightly higher than $0.552\times 10^{-3}$, all the false-negative predictive values are greatly reduced, while compared with the original Sterrett procedure. Meanwhile, we report the FN-alike measurement of Sterret algorithm. Not surprisingly, the nested Sterrett procedure has more stable performance on the false-negative predictive value. Additionally, the ratio $\varDelta _{{\mathcal {A}}_{nS}}$/$\varDelta _{{\mathcal {A}}_{S}}$ are all smaller than 0.5.

4 Further Evaluation of Nested Group Testing Procedures

4.1 Pooling Sensitivity and Specificity

In this part, we compare the pooling sensitivity and specificity of individual testing, Dorfman’s and Halving procedure, and the nested group testing procedures. The initial group size $k_1$ is set to be 40 and the prevalence p to be from 0.003 to 0.03. The group sizes for retesting are calculated based on Theorem 3, with a maximum tolerance group size $k_{\max }=100$. Set sensitivity and specificity to $S_e=S_p=$0.95, and repeating time of $M=1000$. We simulate the four group testing procedures and then calculate pooling sensitivity and pooling specificity of these procedures; the results are presented in Fig. 4.

From this figure, we observe that the nested Dorfman’s procedure always has larger pooling sensitivity than individual testing procedure. For example, when the prevalence is $p=0.01$, the pooling sensitivity from individual testing, Dorfman’s procedure and the nested group testing method are 0.9494, 0.9013 and 0.9907, respectively. We notice that the nested group testing method has slightly lower pooling specificity than Dorfman’s procedure. Nevertheless, it outperforms individual testing based on all commonly used operating characteristics including false (true)-negative predictive value, false (true)-positive predictive value, pooling sensitivity and pooling specificity. For Halving procedure, similar conclusions can be drawn.

4.2 Malaria Infection Group Testing

Zhou et al. [27] reported a study of detecting malaria infection in microscopy-negative Malawian women using nested PCR (nPCR) [27]. They found that about 3.2% subjects in histology-negative group (433 dried blood spot) were nPCR positive. The method PCR had a median sensitivity of 96% and specificity of 99.1%. The group size was 10 for each group. So we set $p=0.032$, $S_e=0.96$ and $S_p=0.991.$ According to [8], the maximum group size is set by $k_{\max }=20$. Based on the configuration of parameters, we record the process of decoding the specimens. Upon completion of the procedure, the observed status of each specimen can be obtained. Then we calculate the false-negative predictive values for specimens at different stages, the efficiency, pooling sensitivity, and pooling specificity.

The results are summarized in Table 4, which show that the Dorfman’s procedure, Halving procedure and Sterrett procedure have slightly higher false-negative predictive values than that from individual testing. The proposed nested group testing method can substantially reduce the false-negative predictive values, which are lower than those for individual testing procedure. For example, the false-negative predictive value at the third stage of the nested halving procedure ($nH_3$) is $0.986\times 10^{-3}$, while that of Halving procedure ($H_3$) is $10.81\times 10^{-3}$. We also consider two different settings of $S_e=S_p=0.95, 0.99$. The performances are similar. Moreover, as compared to individual testing procedure and original group testing procedures, the proposed nested method can improve the pooling sensitivity.

Table 4 The false-negative predictive values ($\xi _{1,{\mathcal {A}}}$, the values have been multiplied by $10^3$), efficiency (Eff), pooling sensitivity (poolSe), and pooling specificity (poolSp) for malaria infection

Full size table

5 Conclusion

Group testing strategy is cost-effective for rare disease. However, such efficiency gain often couples with higher false-negative predictive value, which is not desirable for life-threatening diseases such as malaria infection during pregnancy which may result in severe maternal anemia, prematurity and low birth weight of babies, increasing the risk of maternal and neonatal deaths [27]. In the present paper, we investigated predictive values including false (true)-negative predictive value and false (true)-positive predictive values from a group testing algorithm. As compared to the individual testing, we theoretically showed that multi-stage group testing procedures have higher false-negative predictive value, and lower false-positive predictive value. Our proposed nested group testing procedure can reduce the false-negative predictive value, lower than that from individual testing through careful selection of group sizes. We provide formulas and demonstrate the usage in details for commonly used group testing procedures, including Dorfman’s algorithm, Halving and Sterrett algorithms.

As alternatives to hierarchical group testing procedures, non-hierarchical procedures such as array testing (or matrix pooling) and three-dimensional procedures are also used in group testing. Due to the overlapped groups, it is not a trivial thing to extend the proposed method to non-hierarchical group testing procedures. Besides this, we assumed that the sensitivity and specificity are known and do not depend on the group size. It is better to consider the dilution effect with different group sizes. Both issues might be future topics.

References

Avrahami-Moyal, L., Tam, Y., Brumin, M., Prakash, S., Leibman, D., Pearlsman, M., Bornstein, M., Sela, N., Zeidan, M., Dar, Z., Zig, U.: Detection of potato virus Y in industrial quantities of seed potatoes by TaqMan Real Time PCR. Phytoparasitica 45(4), 591–598 (2017)
Article Google Scholar
Bilder, C.R., Tebbs, J.M., Chen, P.: Informative retesting. J. Am. Stat. Assoc. 105(491), 942–955 (2010)
Article MathSciNet MATH Google Scholar
Black, M.S., Bilder, C.R., Tebbs, J.M.: Group testing in heterogeneous populations by using halving algorithms. J. Roy. Stat. Soc. C-App. 61(2), 277–290 (2012)
Article MathSciNet Google Scholar
Demoly, P., Romano, A., Botelho, C., Bousquet-Rouanet, L., Gaeta, F., Silva, R., Rumi, G., Rodrigues Cernadas, J., Bousquet, P.J.: Determining the negative predictive value of provocation tests with beta-lactams. Allergy 65(3), 327–332 (2010)
Article Google Scholar
Dinesha, T.R., Boobalan, J., Sivamalar, S., Subashini, D., Solomon, S.S., Murugavel, K.G., Balakrishnan, P., Smith, D.M., Saravanan, S.: Occult HBV infection in HIV-infected adults and evaluation of pooled NAT for HBV. J. Viral Hepatitis 25(6), 718–723 (2018)
Article Google Scholar
Dorfman, R.: The detection of defective members of large populations. Ann. Math. Stat. 14(4), 436–440 (1943)
Article Google Scholar
Duston, M.A., Skinner, M., Meenan, R.F., Cohen, A.S.: Sensitivity, specificity, and predictive value of abdominal fat aspiration for the diagnosis of amyloidosis. Arthritis Rheum. 32(1), 82–85 (1989)
Article Google Scholar
Edouard, S., Prudent, E., Gautret, P., Memish, Z.A., Raoult, D.: Cost-effective pooling of DNA from nasopharyngeal swab samples for large-scale detection of bacteria by real-time PCR. J. Clin. Microbiol. 53(3), 1002–1004 (2015)
Article Google Scholar
Finucane, M.M., Rowley, C.F., Paciorek, C.J., Essex, M., Pagano, M.: Estimating the prevalence of transmitted HIV drug resistance using pooled samples. Stat. Methods Med. Res. 25(2), 917–935 (2016)
Article MathSciNet Google Scholar
Gastwirth, J.L.: The efficiency of pooling in the detection of rare mutations. Am. J. Hum. Genet. 67(4), 1036–1039 (2000)
Article Google Scholar
Gregory, K.B., Wang, D., McMahan, C.S.: Adaptive elastic net for group testing. Biometrics 75(1), 13–23 (2019)
Article MathSciNet MATH Google Scholar
Hou, P., Tebbs, J.M., Bilder, C.R., McMahan, C.S.: Hierarchical group testing for multiple infections. Biometrics 73(2), 656–665 (2017)
Article MathSciNet MATH Google Scholar
Johnson, N.L., Kotz, S., Wu, X.Z.: Inspection Errors for Attributes in Quality Control. Chapman and Hall, New York (1991)
Book Google Scholar
Jones, C.M., Zhigljavsky, A.A.: Comparison of costs for multi-stage group testing methods in the pharmaceutical industry. Commun. Stat.-Theory. 30(10), 2189–2209 (2001)
Article MathSciNet MATH Google Scholar
Kim, H.Y., Hudgens, M.G., Dreyfuss, J.M., Westreich, D.J., Pilcher, C.D.: Comparison of group testing algorithms for case identification in the presence of test error. Biometrics 63(4), 1152–1163 (2007)
Article MathSciNet MATH Google Scholar
Lagopati, N., Tsioli, P., Mourkioti, I., Polyzou, A., Papaspyropoulos, A., Zafiropoulos, A., Evangelou, K., Sourvinos, G., Gorgoulis, V.G.: Sample pooling strategies for SARS-CoV-2 detection. J. Virol. Methods 289, 114044 (2020)
Article Google Scholar
Litvak, E., Tu, X.M., Pagano, M.: Screening for the presence of a disease by pooling sera samples. J. Am. Stat. Assoc. 89(426), 424–434 (1994)
Article MATH Google Scholar
McMahan, C.S., Tebbs, J.M., Bilder, C.R.: Informative dorfman screening. Biometrics 68(1), 287–296 (2012)
Article MathSciNet MATH Google Scholar
McMahan, C.S., Tebbs, J.M., Bilder, C.R.: Two-dimensional informative array testing. Biometrics 68(3), 793–804 (2012)
Article MathSciNet MATH Google Scholar
Montesinos-López, O.A., Eskridge, K., Montesinos-López, A., Crossa, J., Cortés-Cruz, M., Wang, D.: A regression model for pooled data in a two-stage survey under informative sampling with application for detecting and estimating the presence of transgenic corn. Seed Sci. Res. 26(2), 182–197 (2016)
Article Google Scholar
Wang, D., McMahan, C.S., Gallagher, C.M.: A general regression framework for group testing data, which incorporates pool dilution effects. Stat. Med. 34(27), 3606–3621 (2015)
Article MathSciNet Google Scholar
Wang, D., McMahan, C.S., Tebbs, J.M., Bilder, C.R.: Group testing case identification with biomarker information. Comput. Stat. Data Anal. 122, 156–166 (2018)
Article MathSciNet MATH Google Scholar
Xiong, W., Ding, J., He, Y., Li, Q.: Improved matrix pooling. Stat. Methods Med. Res. 28(1), 211–222 (2019)
Article MathSciNet Google Scholar
Xiong, W., Ding, J., Li, Q.: The Review of Statistical Methods in Group Testing. J. Appl. Stat. Manag. 35(5), 838–855 (2016)
Google Scholar
Zhang, W., Liu, A., Li, Q., Albert, P.S.: Incorporating retesting outcomes for estimation of disease prevalence. Stat. Med. 39(6), 687–697 (2020)
Article MathSciNet Google Scholar
Zhang, W., Liu, A., Li, Q., Albert, P.S.: Nonparametric estimation of distributions and diagnostic accuracy based on group-tested results with differential misclassification. Biometrics 76(4), 1147–1156 (2020)
Article MathSciNet MATH Google Scholar
Zhou, Z., Mitchell, R.M., Gutman, J., Wiegand, R.E., Mwandama, D.A., Mathanga, D.P., Skarbinski, J., Shi, Y.P.: Pooled PCR testing strategy and prevalence estimation of submicroscopic infections using Bayesian latent class models in pregnant women receiving intermittent preventive treatment at Machinga District Hospital, Malawi, 2010. Malaria J. 13(1), 509 (2014)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 11801102, 11861017), Beijing Natural Science Foundation (Z180006), and the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Guangxi Normal University, Guilin, 541004, People’s Republic of China
Wenjun Xiong
Department of Information and Computing Science, College of Sciences, Hohai University, Nanjing, 210098, People’s Republic of China
Juan Ding
LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, People’s Republic of China
Wei Zhang & Qizhai Li
Biostatisics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health, Bethesda, 20817, USA
Aiyi Liu
University of Chinese Academy of Sciences, Beijing, 100049, People’s Republic of China
Qizhai Li

Authors

Wenjun Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Juan Ding
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Aiyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qizhai Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qizhai Li.

Appendix

1.1 A. Derivation of Predictive Values

In the toy example, we have investigated the false-negative predictive value of Dorfman’s algorithm through comparing $\xi _{1,{\mathcal {A}}_D(k)}(X) =\mathrm{Pr}\big (\widetilde{{\mathcal {I}}}=1|G_{k}=0\big )$ and $\xi _{1,{\mathcal {A}}'_D(2k)}(X) =\mathrm{Pr}\big (\widetilde{{\mathcal {I}}}=1|G_{1,2k}=0,G_{2,2k}=0\big )$ respectively, where $G_{1,2k}$ and $G_{2,2k}$ representing the groups which are tested twice. A detailed derivation is as follows:

$$\begin{aligned} \begin{array}{ccl} \frac{\xi _{1,{\mathcal {A}}_D\left( k\right) }\left( X\right) }{\xi _{1,{\mathcal {A}}'_D\left( 2k\right) }\left( X\right) }&{}=&{}\frac{\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}=1|G_{k}=0\right) }{\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}=1|G_{1,2k}=0,G_{2,2k}=0\right) } =\frac{\left( 1-S_e\right) p}{\mathrm{Pr}\left( G_k=0\right) }\frac{\mathrm{Pr}\left( G_{1,2k}=0,G_{2,2k}=0\right) }{\left( 1-S_e\right) ^2p}\\ &{}=&{}\frac{S^2_p\left( 1-p\right) ^k+\left( 1-S_e\right) ^2[1-\left( 1-p\right) ^k]}{S_p\left( 1-p\right) ^k+\left( 1-S_e\right) [1-\left( 1-p\right) ^k]}\frac{1}{\left( 1-S_e\right) } =1+\frac{S_p\left( S_e+S_p-1\right) \left( 1-p\right) ^k}{\left( 1-S_e\right) [1-S_e+\left( S_e+S_p-1\right) \left( 1-p\right) ^k]}. \end{array} \end{aligned}$$

Next, for a general hierarchical group testing procedure, the false-negative predictive value is derived as follows:

$$\begin{aligned} \begin{array}{l} \xi _{1,{\mathcal {A}}_O}\left( X_{s}\right) =\frac{\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1,G^{\left( j\right) }\left( X_s\right) =1,j<l_s,G^{\left( l_s\right) }\left( X_s\right) =0\right) }{\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s,G^{\left( l_s\right) }\left( X_s\right) =0\right) }, \end{array} \end{aligned}$$

in which the numerator is derived as follows:

$$\begin{aligned}&\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1,G^{\left( j\right) }\left( X_s\right) =1,j<l_s,G^{\left( l_s\right) }\left( X_s\right) =0\right) \\&\quad =\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,{\widetilde{G}}^{\left( j\right) }\left( X_s\right) =1,j< l_s,G^{\left( l_s\right) }\left( X_s\right) =0,{\widetilde{G}}^{\left( l_s\right) }\left( X_s\right) =1,\widetilde{{\mathcal {I}}}_{s}=1\right) \\&\quad =\left( 1-S_e\right) S_e^{l_s-1}p_{s}. \end{aligned}$$

Denote the denominator by $\phi (l_s,s)$, and

$$\begin{aligned} \begin{array}{ccl} \phi \left( l_s,s\right) &{}=&{}\sum _{\varOmega _s} \mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s,G^{\left( l_s\right) }\left( X_s\right) =0,{\widetilde{G}}^{\left( u\right) }\left( X_s\right) =d_u,u\le l_s\right) \\ &{}=&{}\sum _{\varOmega _s} \mathrm{Pr}\left( G^{\left( l_s\right) }\left( X_s\right) =0|{\widetilde{G}}^{\left( l_s\right) }\left( X_s\right) =d_{l_s}\right) \\ &{}&{}\times \prod _{j=1}^{l_s-1}\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1|{\widetilde{G}}^{\left( j\right) }\left( X_s\right) =d_j\right) \mathrm{Pr}\left( {\widetilde{G}}^{\left( u\right) }\left( X_s\right) =d_u,u\le l_s\right) \\ &{}=&{}\sum _{\varOmega _s} \left( 1-S_e\right) ^{d_{l_s}}S_p^{1-d_{l_s}}\prod _{j=1}^{l_s-1}S_e^{d_j}\left( 1-S_p\right) ^{1-d_j}\times \mathrm{Pr}\left( {\widetilde{G}}^{\left( u\right) }\left( X_s\right) =d_u, \right. \\ &{}&{}\left. u\le l_s\right) , \end{array} \end{aligned}$$

where $\varOmega _s=\{d_j,j=1,\ldots ,l_s|d_{l_s}\le \cdots \le d_1\in \{0,1\}\}.$ Note that $d_{l_s}\le \cdots \le d_1$, that is, if a group ${\widetilde{G}}^{(j_1)}(X_s)$ is positive, then ${\widetilde{G}}^{(j_2)}(X_s)=1$ for all $j_2\le j_1.$ Therefore we have

I: if ${\widetilde{G}}^{(l_s)}(X_s)=1$, then we have ${\widetilde{G}}^{(j)}(X_s)=1$ for $j\in \{1,\ldots ,l_s\}.$

II: if ${\widetilde{G}}^{(l_s)}(X_s)=0,$ then there exists $\tau $ that ${\widetilde{G}}^{(j)}(X_s)=1,$ for $j\in \{1,\ldots ,\tau \}$ and ${\widetilde{G}}^{(j)}(X_s)=0,$ for $j\in \{\tau +1,\ldots ,l_s\}.$ Note that $\tau =0$ if $l_s=1$, so ${\widetilde{G}}^{(1)}(X_s)=0.$

So, the function $\phi (l_s,s)$ is expressed as follows, with the first component represents case I and the second component represents the case of $\tau =0$:

$$\begin{aligned} \begin{array}{ccl} \phi \left( l_s,s\right) &{}=&{}S^{l_s-1}_e\left( 1-S_e\right) \mathrm{Pr}\left( {\widetilde{G}}^{\left( l_s\right) }\left( X_s\right) =1\right) +S_p\left( 1-S_p\right) ^{l_s-1}\mathrm{Pr}\left( {\widetilde{G}}^{\left( 1\right) }\left( X_s\right) =0\right) \\ &{}&{}+\sum \limits _{\tau =1}^{l_s-1}S_p S^{\tau }_e\left( 1-S_p\right) ^{l_s-1-\tau } \mathrm{Pr}\left( {\widetilde{G}}^{\left( j_1\right) }\left( X_s\right) =1,j_1\le \tau ,{\widetilde{G}}^{\left( j_2\right) }\left( X_s\right) \right. \\ &{}&{} \left. =0,j_2\ge \tau +1\right) . \end{array} \end{aligned}$$

Define by ${\widetilde{W}}_{A\backslash B}$ the true diseased status of the difference between two sets A and B. Then $\mathrm{Pr}({\widetilde{G}}^{(j_1)}(X_s)=1,j_1\le \tau ,{\widetilde{G}}^{(j_2)}(X_s)=0,j_2\ge \tau +1)$, is equal to $\mathrm{Pr}({\widetilde{G}}^{(\tau +1)}(X_s)=0)\mathrm{Pr}({\widetilde{W}}_{G^{(\tau )}(X_s)/G^{(\tau +1)}(X_s)}=1).$ Let $\varphi (A)=\prod _{\{j:j\in A\}}(1-p_j).$ In $\varphi (\cdot )$, the notation $G^{(l)}(X_s)$ represents the set containing $X_s$ at the $j^{th}$ stage. Therefore,

$$\begin{aligned} \begin{array}{ccl} \phi \left( l_s,s\right) &{}=&{}S_e^{l_s-1} \left( 1-S_e\right) \left( 1-\varphi \left( G^{\left( l_s\right) }\left( X_s\right) \right) \right) +S_p\left( 1-S_p\right) ^{l_s-1}\varphi \left( G^{\left( 1\right) }\left( X_s\right) \right) \\ &{}&{}+\sum \limits _{\tau =1}^{l_s-1}S_p S^{\tau }_e\left( 1-S_p\right) ^{l_s-1-\tau }\left( 1-\varphi \left( G^{\left( \tau \right) }\left( X_s\right) \backslash G^{\left( \tau +1\right) }\left( X_s\right) \right) \right) \varphi \\ &{}&{}\left( G^{\left( \tau +1\right) }\left( X_s\right) \right) . \end{array} \end{aligned}$$

Specifically, for Dorfman’s procedure ${\mathcal {A}}_D$ and halving procedure ${\mathcal {A}}_H$, the false-negative predictive value is

$$\begin{aligned} \begin{array}{ccl} \xi _{1,{\mathcal {A}}_D}\left( X_{s}|l_s=2\right) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_s\right) =1,G^{\left( 2\right) }\left( X_s\right) =0\right) \\ &{}=&{}\frac{\left( 1-S_e\right) S_ep_{s}}{S_e\left( 1-S_e\right) +rS_e\left( 1-p_{s}\right) -rS_p\prod _{j\in G^{\left( 1\right) }\left( X_s\right) }\left( 1-p_{j}\right) },\\ \xi _{1,{\mathcal {A}}_H}\left( X_{s}|l_s=2\right) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_s\right) =1,G^{\left( 2\right) }\left( X_s\right) =0\right) \\ &{}=&{}\frac{\left( 1-S_e\right) S_ep_{s}}{S_e\left( 1-S_e\right) +rS_e\prod _{j\in G^{\left( 2\right) }\left( X_s\right) }\left( 1-p_{j}\right) -rS_p\prod _{j \in G^{\left( 1\right) }\left( X_s\right) }\left( 1-p_{j}\right) },\\ \xi _{1,{\mathcal {A}}_H}\left( X_{s}|l_s=3\right) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( 1\right) }\left( X_s\right) =1,G^{\left( 2\right) }\left( X_s\right) =1,G^{\left( 3\right) }\left( X_s\right) =0\right) \\ &{}=&{}\frac{\left( 1-S_e\right) S_e^2p_{s}}{S_e^2\left( 1-S_e\right) +rS_e^2\varphi \left( G^{\left( 3\right) }\left( X_s\right) \right) -rS_eS_p\varphi \left( G^{\left( 2\right) }\left( X_s\right) \right) -rS_p\left( 1-S_p\right) \varphi \left( G^{\left( 1\right) }\left( X_s\right) \right) }, \end{array} \end{aligned}$$

where $\varphi (A)=\prod _{\{j:j\in A\}}(1-p_j)$ and $r=S_e+S_p-1.$

Next, we derive the false-positive predictive value, $ \eta _{1,{\mathcal {A}}_O}(X_{s})=1-\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1|G^{(j)}(X_s)=1,j\le l_s).$ Straightforwardly, we have $\eta _{1,{\mathcal {A}}_O}(X_{s})=1-S_e^{l_s}p_s/\mathrm{Pr}(G^{(j)}(X_s)=1,j\le l_s).$ Denote by $\psi (l_s,s)$ the denominator $\mathrm{Pr}(G^{(j)}(X_s)=1,j\le l_s)$. Following the way of deriving $\phi (l_s,s),$ it is formulated as follows:

$$\begin{aligned} \psi \left( l_s,s\right)= & {} S_e^{l_s}\mathrm{Pr}\left( {\widetilde{G}}^{\left( l_s\right) }=1\right) +\left( 1-S_p\right) ^{l_s}\mathrm{Pr}\left( {\widetilde{G}}^{\left( 1\right) }=0\right) \\&+\sum \limits _{\tau =1}^{l_s-1}S_e^{\tau }\left( 1-S_p\right) ^{l_s-\tau }\mathrm{Pr}\left( {\widetilde{G}}^{\left( j_1\right) }=1,j_1\le \tau , {\widetilde{G}}^{\left( j_2\right) }=0,j_2\ge \tau +1\right) \\= & {} S_e^{l_s}\left( 1-\varphi \left( G^{\left( l_s\right) }\left( X_s\right) \right) \right) +\left( 1-S_p\right) ^{l_s}\varphi \left( G^{\left( 1\right) }\left( X_s\right) \right) \\&+\sum \limits _{\tau =1}^{l_s-1} S_e^{\tau }\left( 1-S_p\right) ^{l_s-\tau } \left( 1-\varphi \left( G^{\left( \tau \right) }\left( X_s\right) \backslash G^{\left( \tau +1\right) }\left( X_s\right) \right) \right) \varphi \left( G^{\left( \tau +1\right) }\left( X_s\right) \right) . \end{aligned}$$

1.2 B. Proof of Theorem 2.1

Suppose a specimen $X_{s}$ is tested negative at $l_s$th stage. Rewrite $\xi _{1,{\mathcal {A}}_O}(X_{s})$ as follows:

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_O}\left( X_{s}\right) &{}=&{}\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( j\right) }\left( X_s\right) =1,j<l_s,G^{\left( l_s\right) }\left( X_s\right) =0\right) \\ &{}=&{}\frac{\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s,G^{\left( l_s\right) }\left( X_s\right) =0|\widetilde{{\mathcal {I}}}_{s}=1\right) \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1\right) }{\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s,G^{\left( l_s\right) }\left( X_s\right) =0\right) }\\ &{}=&{}\frac{\mathrm{Pr}\left( G^{\left( l_s\right) }\left( X_s\right) =0|\widetilde{{\mathcal {I}}}_{s}=1\right) \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1\right) }{\mathrm{Pr}\left( G^{\left( l_s\right) }\left( X_s\right) =0\right) }\cdot \frac{\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s|\widetilde{{\mathcal {I}}}_{s}=1\right) }{\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s|G^{\left( l_s\right) }\left( X_s\right) =0\right) }. \end{array} \end{aligned}$$

If $S_e>0.5$ and $S_p>0.5,$ then

$$\begin{aligned}&\frac{\mathrm{Pr}(G^{(j)}(X_s)=1,j<l_s |\widetilde{{\mathcal {I}}}_{s}=1)}{\mathrm{Pr}(G^{(j)}(X_s)=1,j<l_s|G^{(l_s)}(X_s)=0)}\\&\quad =\frac{S^{l_s-1}_e}{\sum _\mathbf{{b}}\mathrm{Pr}(G^{(j)}(X_s)=1,j<l_s|{\widetilde{G}}^{(j)}(X_s)=b_j,j<l_s)\mathrm{Pr}({\widetilde{G}}^{(j)}(X_s)=b_j,j<l_s|G^{(l_s)}(X_s)=0)}\\&\quad \ge \frac{S^{l_s-1}_e}{\sum _\mathbf{{b}}S^{l_s-1}_e \mathrm{Pr}({\widetilde{G}}^{(j)}(X_s)=b_j,j<l_s|G^{(l_s)}(X_s)=0)}=1. \end{aligned}$$

The inequality is established because $\sum _{j=1}^{l_s-1} b_j \le l_s-1$ and then $\mathrm{Pr}(G^{(j)}(X_s)=1,j<l_s|{\widetilde{G}}^{(j)}(X_s)=b_j,j<l_s)\le S^{l_s-1}_e.$ Therefore, we have

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_O}(X_{s})\ge \frac{\mathrm{Pr}(G^{(l_s)}(X_s)=0|\widetilde{{\mathcal {I}}}_{s}=1)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1)}{\mathrm{Pr}(G^{(l_s)}(X_s)=0)} =\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1|G^{(l_s)}(X_s)=0). \end{array} \end{aligned}$$

For individual testing ${\mathcal {A}}_{I},$ the false-negative predictive value is defined as follows: $\xi _{1,{\mathcal {A}}_I}(X_{s})=\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1|{\mathcal {I}}^{(l_s)}_{s}=0).$ We have

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_O}(X_{s})&{}\ge \mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1|G^{(l_s)}(X_s)=0)=\frac{\mathrm{Pr}(G^{(l_s)}(X_s)=0|\widetilde{{\mathcal {I}}}_{s}=1)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1)}{\mathrm{Pr}(G^{(l_s)}(X_s)=0)}\\ &{}\ge \frac{(1-S_e)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1)}{\mathrm{Pr}({\mathcal {I}}^{(l_s)}_{s}=0)}=\xi _{1,{\mathcal {A}}_I}(X_{s}). \end{array} \end{aligned}$$

The equality is established if and only if the group size of $G^{(l_s)}(X_s)$ is equal to 1, which reduces to individual testing.

Next, the false-positive predictive value is defined as follows:

$$\begin{aligned} \eta _{1,{\mathcal {A}}_O}\left( X_{s}\right)&=1-\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{\left( j\right) }\left( X_s\right) =1,j<l_s,G^{\left( l_s\right) }\left( X_s\right) =1\right) \\= & {} 1-\eta _{2,{\mathcal {A}}_O}\left( X_{s}\right) . \end{aligned}$$

Note that $G^{(l_s)}(X_s)$ is a single point set, containing the specimen $X_s$. So it is identical to ${\mathcal {I}}^{(l_s)}_{s}.$ Rewrite $\eta _{2,{\mathcal {A}}_O}(X_{s})$ as follows

$$\begin{aligned} \begin{array}{lll} \eta _{2,{\mathcal {A}}_O}\left( X_{s}\right)&=\frac{S^{l}_ep_{s}}{\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s,{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) } =\frac{S_ep_{s}}{\mathrm{Pr}\left( {\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) }\cdot \frac{S^{l_s-1}_e}{\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s|{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) }. \end{array} \end{aligned}$$

The false-positive predictive value of individual testing is defined as

$$\begin{aligned} \begin{array}{lll} \eta _{1,{\mathcal {A}}_I}\left( X_{s}\right) =1- \eta _{2,{\mathcal {A}}_I}\left( X_{s}\right) =1-\frac{S_ep_{s}}{\mathrm{Pr}\left( {\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) }. \end{array} \end{aligned}$$

We proceed to show that $\mathrm{Pr}(G^{(j)}(X_s)=1,j<l_s|{\mathcal {I}}^{(l_s)}_{s}=1)\le S_e^{l_s-1}.$

$$\begin{aligned}&\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s|{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) \\&=\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s|\widetilde{{\mathcal {I}}}_{s}=1,{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) \mathrm{Pr}\left( \tilde{{\mathcal {I}}}_{s}=1|{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) \\&+\mathrm{Pr}\left( G^{\left( j\right) }\left( X_s\right) =1,j<l_s|\widetilde{{\mathcal {I}}}_{s}=0,{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=0|{\mathcal {I}}^{\left( l_s\right) }_{s}=1\right) \\&\le S_e^{l_s-1}. \end{aligned}$$

Since $\mathrm{Pr}(G^{(j)}(X_s)=1,j<l_s|\widetilde{{\mathcal {I}}}_{s}=0,{\mathcal {I}}^{(l_s)}_{s}=1)\le S^{l_s-1}_e$, the last inequality is established. Therefore, we have $\eta _{1,{\mathcal {A}}_O}(X_{s})\le \eta _{1,{\mathcal {A}}_I}(X_{s}).$

1.3 C. Proof of Theorem 2.2

Suppose a specimen $X_s$ is diagnosed negative at stage $l_s$, then its false-negative predictive value is $\xi _{1,{\mathcal {A}}_O}(X_{s}|l_s)$. We aim to compare the false-negative predictive value of a specimen $X_s$ tested negative at different stages. We start with $l_s=l$ or $l_s=l+1$. Firstly, consider $l=1$, then

$$\begin{aligned} \begin{array}{lll} \frac{\xi _{1,{\mathcal {A}}_O}(X_{s}|l_s=2)}{\xi _{1,{\mathcal {A}}_O}(X_{s}|l_s=1)}-1&{}=\frac{\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1|G^{(2)}(X_s)=0,G^{(1)}(X_s)=1)}{\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1|G^{(1)}(X_s)=0)}-1\\ &{}=\frac{S_e\mathrm{Pr}(G^{(1)}(X_s)=0)-\mathrm{Pr}(G^{(2)}(X_s)=0,G^{(1)}(X_s)=1)}{\mathrm{Pr}(G^{(2)}(X_s)=0,G^{(1)}(X_s)=1)}\\ &{}=\frac{(S_e+S_p-1)\mathrm{Pr}({\widetilde{G}}^{(2)}(X_s)=0)[S_e-(S_e+S_p)\mathrm{Pr}({\widetilde{W}}_{G^{(1)}(X_s)\backslash G^{(2)}(X_s)}=1)]}{\mathrm{Pr}(G^{(2)}(X_s)=0,G^{(1)}(X_s)=1)}. \end{array} \end{aligned}$$

The probability $\mathrm{Pr}({\widetilde{W}}_{G^{(1)}(X_s)\backslash G^{(2)}(X_s)}=1)$ is usually very small since group testing is often used for rare diseases. Especially, it is usually smaller than $S_e/(S_e+S_p)$. So, two stage hierarchical group testing procedures, for example Dorfman’s procedure, satisfy $\xi _{1,{\mathcal {A}}_O}(X_{s}|l_s=2)> \xi _{1,{\mathcal {A}}_O}(X_{s}|l_s=1).$

Secondly, consider $l\in \{2,\ldots ,L-1\}$. Denote by $D_l$ the event $\{G^{(j)}(X_s)=1,j<l\},$ therefore

$$\begin{aligned} \begin{array}{lll} &{}\xi _{1,{\mathcal {A}}_O}(X_{s}|l_s=l)=\frac{(1-S_e)S_e^{l-1}\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1)}{\mathrm{Pr}(G^{(l)}(X_s)=0|D_l)\mathrm{Pr}(D_l)}=\frac{(1-S_e)S_e^{l}\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1)}{S_e\mathrm{Pr}(G^{(l)}(X_s)=0|D_l)\mathrm{Pr}(D_l)},\\ &{}\xi _{1,{\mathcal {A}}_O}(X_{s}|l_s=l+1) =\frac{(1-S_e)S_e^{l}\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1)}{\mathrm{Pr}(G^{(l+1)}(X_s)=0,G^{(l)}(X_s)=1|D_l)\mathrm{Pr}(D_l)}.\\ \end{array} \end{aligned}$$

Let

$$\begin{aligned} p_{ij|D_l}=\mathrm{Pr}\left( {\widetilde{G}}^{(l+1)}(X_s)=i,{\widetilde{W}}_{G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)}=j|D_l\right) . \end{aligned}$$

We proceed to verify that

$$\begin{aligned}&\mathrm{Pr}\left( G^{\left( l+1\right) }\left( X_s\right) =0,G^{\left( l\right) }\left( X_s\right) =1|D_l\right) -S_e \mathrm{Pr}\left( G^{\left( l\right) }\left( X_s\right) =0|D_l\right) \\&=-r[S_p p_{00|D_l}-S_ep_{01|D_l}]\le 0. \end{aligned}$$

First,

$$\begin{aligned} p_{00|D_l}&=\frac{1}{\mathrm{Pr}(D_l)}\mathrm{Pr}(D_l|{\widetilde{G}}^{(l+1)}(X_s)=0,{\widetilde{W}}_{G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)}=0)\times \mathrm{Pr}\left( {\widetilde{G}}^{(l+1)}(X_s)\right. \\&=\left. 0,{\widetilde{W}}_{G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)}=0\right) \\&=\frac{1}{\mathrm{Pr}(D_l)}\mathrm{Pr}\left( D_l,{\widetilde{G}}^{(l)}(X_s)=0\right) \\&>\frac{1}{\mathrm{Pr}(D_l)}\mathrm{Pr}\left( D_l,{\widetilde{G}}^{(l)}(X_s)=0,{\widetilde{W}}_{G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)}=1\right) \\&>\frac{1}{\mathrm{Pr}(D_l)}\mathrm{Pr}\left( D_l|{\widetilde{G}}^{(l)}(X_s)=0,{\widetilde{W}}_{G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)}=1\right) \\&\times \mathrm{Pr}\left( {\widetilde{G}}^{(l)}(X_s)=0,{\widetilde{W}}_{G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)}=1\right) , \end{aligned}$$

and therefore

$$\begin{aligned} \frac{S_pp_{00|D_l}}{S_ep_{01|D_l}}>\frac{S_p\mathrm{Pr}\left( {\widetilde{G}}^{(l)}(X_s)=0,{\widetilde{W}}_{G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)}=1\right) }{S_e\mathrm{Pr}\left( {\widetilde{G}}^{(l+1)}(X_s)=0,{\widetilde{W}}_{G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)}=1\right) }. \end{aligned}$$

The right hand side of the above inequality is derived as

$$\begin{aligned} \begin{array}{lll} &{}\frac{S_p\prod _{\{j:X_j\in G^{(l)}(X_s)\}}(1-p_{j})[1-\prod _{\{j:X_j\in G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)\}}(1-p_{j})]}{S_e\prod _{\{j:X_j \in G^{(l+1)}(X_s)\}}(1-p_{j})[1-\prod _{\{j:X_j \in G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)\}}(1-p_{j})]}\\ &{}=\frac{S_p\prod _{\{j:X_j \in G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)\}}(1-p_{j})[1-\prod _{\{j:X_j\in G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)\}}(1-p_{j})]}{S_e[1-\prod _{\{j:X_j \in G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)\}}(1-p_{j})]}. \end{array} \end{aligned}$$

Denote by $V_{s,l}=\mathrm{Pr}({\widetilde{W}}_{G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)}=0)$, then the last equation above is identical to $S_pV_{s,l+1}(1-V_{s,l})/(S_e(1-V_{s,l+1})).$ If this term is larger than 1, that is $S_pV_{s,l+1}(1-V_{s,l})/(S_e(1-V_{s,l+1}))>1$ for $l>1,$ then the conclusion $\xi _{1,{\mathcal {A}}_O}(X_{s}|l_s=l+1)>\xi _{1,{\mathcal {A}}_O}(X_{s}|l_s=l)$ will be established. We continue to investigate this condition for homogeneous population. In this case, $V_{s,l}=(1-p)^{k_{l-1}-k_{l}}$ and

$$\begin{aligned} \begin{array}{lll} \frac{S_pp_{00|D_l}}{S_ep_{01|D_l}} &{}>\frac{S_p\prod _{\{j:X_j \in G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)\}}(1-p_{j})[1-\prod _{\{j:X_j\in G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)\}}(1-p_{j})]}{S_e[1-\prod _{\{j:X_j \in G^{(l)}(X_s)\backslash G^{(l+1)}(X_s)\}}(1-p_{j})]}\\ &{}=\frac{S_pV_{s,l+1}[1-V_{s,l}]}{S_e (1-V_{s,l+1})}=\frac{S_pV_{s,l+1}[1-V_{s,l+1}^{\frac{k_{l-1}-k_{l}}{k_{l}-k_{l+1}}}]}{S_e (1-V_{s,l+1})}\\ &{}\ge \frac{S_pV_{l+1}[1-V_{s,l+1}^{\lfloor \frac{k_{l-1}-k_{l}}{k_{l}-k_{l+1}}\rfloor }]}{S_e (1-V_{s,l+1})}=\frac{S_pV_{s,l+1}[1-V_{s,l+1}^{\lfloor a_{l} \rfloor }]}{S_e (1-V_{s,l+1})}\\ &{}=\frac{S_p}{S_e } V_{s,l+1}(1+V_{s,l+1}+\cdots +V_{s,l+1}^{\lfloor a_{l} \rfloor -1}), \end{array} \end{aligned}$$

where $a_{l}=(k_{l-1}-k_{l})/(k_{l}-k_{l+1})$ and $k_{l}$ denotes the group size in $l^{th}$ stage. Suppose each group in the $l^{th}$ stage is split into $m_l$ groups, then $a_{l}=(m_l-1)m_{l+1}/(m_{l+1}-1)$. If $m_{l}\ge 3$, then $a_{l}\ge 2$. Or, if $m_{l}=2$ and $m_{l+1}=2$, then $a_{l}\ge 2$. Generally, for hierarchical group testing procedure, we have $a_l\ge 2$. In Halving procedure, when $l=L-1$, in which $m_{L-1}=2$ and $m_L=k_{L-1}>2,$ then $a_l<2$. However, if $l=L-1$, then $a_l=2$. In this case, the condition becomes $S_p/S_eV_{s,l+1}(1+V_{s,l+1})>1.$ So, the conclusion $\xi _{1,{\mathcal {A}}_H}(X_{s}|l_s=l)<\xi _{1,{\mathcal {A}}_H}(X_{s}|l_s=l+1)$ is established if $V_{s,l+1}=(1-p)^{k_{l+1}}\ge (-1+(1+4S_e/S_p)^{1/2})/2.$ Suppose $S_e,S_p \ge 0.8$, then $S_e/S_p\le 1/0.8$ and $(-1+(1+4S_e/S_p)^{1/2})/2\le 0.73.$ Then $(1-p)^{k_{2}}\ge 0.75$ is sufficient.

1.4 D. The Proof of Theorem 2.3

The following two lemmas are needed for the proof of Theorem 2.3. As defined, $B_s(k)$ represents the group containing $X_{s}$ while retesting, and $\xi _{1,{\mathcal {A}}_N}(X_{s}; k)$ represent the false-negative predictive value of the specimen $X_s$ after it has been retested in a group of size k. It is formulated as

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_N}(X_{s}; k)&{}=\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1|G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k)=0)\\ &{}=\frac{(1-S_e)^2S^{l_s-1}_e}{\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k)=0)}. \end{array} \end{aligned}$$

Lemma 5.1

Denote $h(k)=\xi _{1,{\mathcal {A}}_N}(X_{s}; k)$ as a function of the group size k while retesting the specimen $X_s$. The function h(k) is monotonically increasing with respect to k.

Proof

We only need to verify the denominator of $\xi _{1,{\mathcal {A}}_N}(X_{s}; k)$ is a decreasing function with respect to k, which is $\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k)=0)$. Suppose we add an individual ${\mathcal {I}}^{(l_s)}_{*}$ to the group $B_{s}(k)$. Then a new group $B_{s}(k+1)$ is formed with group size of $k+1$.

$$\begin{aligned}&\mathrm{Pr}\left( G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k+1)=0\right) \\&=\sum _{i,j \in \{0,1\}} \mathrm{Pr}({\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j)\\&\times \mathrm{Pr}\left( G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k+1)=0|{\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j\right) \\&=\sum _{i,j \in \{0,1\}} \mathrm{Pr}\left( G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s|{\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j\right) \\&\times \mathrm{Pr}(B_{s}(k+1)=0|{\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j) \mathrm{Pr}({\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j)\\&=S_p A_{00} p_{00}+(1-S_e) A_{01} p_{01}+(1-S_e) (A_{10} p_{10}+A_{11} p_{11}), \end{aligned}$$

where $A_{ij}=\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s|{\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j)$ and $p_{ij}=\mathrm{Pr}({\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j).$ Similarly, we have

$$\begin{aligned}&\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k)=0)\\&\quad =\sum _{i,j \in \{0,1\}} \mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s|{\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j)\\&\qquad \times \, \mathrm{Pr}(B_{s}(k)=0|{\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j) \mathrm{Pr}({\widetilde{B}}_{s}(k)=i,\widetilde{{\mathcal {I}}}^{(l_s)}_{*}=j)\\&\quad =S_p A_{00} p_{00}+S_p A_{01} p_{01}+(1-S_e)(A_{10} p_{10}+A_{11} p_{11}). \end{aligned}$$

If $S_p>0.5$ and $S_e>0.5$, then we have

$$\begin{aligned}&\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k+1)=0)\\&<\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k)=0). \end{aligned}$$

Thus, the function

$$\begin{aligned} \begin{aligned} \xi _{1,{\mathcal {A}}_N}(X_{s}; k)=\frac{(1-S_e)^2S^{l_s-1}_e}{\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s,B_{s}(k)=0)} \end{aligned} \end{aligned}$$

is an increasing function with respect to the retest group size k.

Lemma 5.2

Suppose a specimen $X_s$ is tested negative using group testing procedure. Suppose $(1-S_e)/S_p\le \mathrm{Pr}({\widetilde{W}}_{G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)}=1)$. Retest the specimen $X_s$ individually will result in a lower false-negative predictive value, $\xi _{1,{\mathcal {A}}_N}(X_{s}; 1)\le \xi _{1,{\mathcal {A}}_I}(X_{s}).$

Proof

For $l_s=1$, we have

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_N}(X_{s}; k|l_s=1)&{}=\mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|G^{(1)}(X_s)=0,B_{s}(k)=0\right) \\ &{}=\frac{\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1,B_{s}(k)=0)}{\mathrm{Pr}(B_{s}(k)=0)} \frac{1-S_e}{\mathrm{Pr}(G^{(1)}(X_s)=0|B_{s}(k)=0)}\\ &{}<\frac{\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1,B_{s}(k)=0)}{\mathrm{Pr}(B_{s}(k)=0)}. \end{array} \end{aligned}$$

If the rest group size is $k=1$, then

$$\begin{aligned} \xi _{1,{\mathcal {A}}_N}\left( X_{s}; k=1|l_s=1\right) \le \mathrm{Pr}\left( \widetilde{{\mathcal {I}}}_{s}=1|B_{s}(1)=0\right) =\xi _{1,{\mathcal {A}}_I}(X_{s}). \end{aligned}$$

For $l_s>1$, we have

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_N}(X_{s}; k) &{}=&{}\frac{\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1,B_{s}(k)=0)}{\mathrm{Pr}(B_{s}(k)=0)} \frac{\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s|\widetilde{{\mathcal {I}}}_{s}=1,B_{s}(k)=0)}{\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s|B_{s}(k)=0)}\\ &{}=&{}\frac{\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1,B_{s}(k)=0)}{\mathrm{Pr}(B_{s}(k)=0)} \frac{(1-S_e)S^{l_s-1}_e}{\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s|B_{s}(k)=0)}. \end{array} \end{aligned}$$

So, $\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1|B_{s}(k)=0)-\xi _{1,{\mathcal {A}}_N}(X_{s}; k)\ge 0$ is equivalent to

$$\begin{aligned} \frac{\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s|B_{s}(k)=0)}{(1-S_e)S^{l_s-1}_e}-1\ge 0. \end{aligned}$$

Let $k=1$, then

$$\begin{aligned}&{\mathrm{Pr}(G^{(l_s)}(X_s)=0,G^{(j)}(X_s)=1,j<l_s|B_{s}(1)=0)}{(1-S_e)S^{l_s-1}_e}-1\\&>\mathrm{Pr}({\tilde{G}}^{(1_s)}(X_s)=1|B_{s}(1)=0)-1\\&+\frac{S_p}{1-S_e}\mathrm{Pr}({\tilde{G}}^{(1_s)}(X_s)=0,{\widetilde{W}}_{G^{(l_s-1)}(X_s)\backslash G^{(l_s)}(X_s)}=1|B_{s}(1)=0)\\&=\frac{S_p\mathrm{Pr}({\tilde{G}}^{(l_s)}(X_s)=0)}{B_{s}(1)=0}\left[ \frac{S_p}{1-S_e}\mathrm{Pr}({\widetilde{W}}_{G^{(l_s-1)}(X_s)\backslash G^{(l_s)}(X_s)}=1)-1 \right] . \end{aligned}$$

Therefore, we have $\xi _{1,{\mathcal {A}}_N}(X_{s}; 1)<\mathrm{Pr}({\tilde{I}}_{l,s}=1|B_{s}(1)=0)=\xi _{1,{\mathcal {A}}_I}(X_{s})$ if $(1-S_e)/S_p\le \mathrm{Pr}({\widetilde{W}}_{G^{(l_s-1)}(X_s)\backslash G^{(l_s)}(X_s)}=1), l>1.$ Note that the probability $\mathrm{Pr}({\widetilde{W}}_{G^{(l-1)}(X_s)\backslash G^{(l)}(X_s)}=1)$ is identical to $1-(1-p)^{k_{l_s-1}-k_{l_s}}$ for homogeneous population. In this case, the condition in Lemma 5.2 reduces to $(1-S_e)/S_p\le 1-(1-p)^{k_{l_s}-k_{l_s}}.$

Based on the result of Lemmas 5.1 and 5.2, we have $\xi _{1,{\mathcal {A}}_N}(X_{s}; 1)\le \xi _{1,{\mathcal {A}}_I}(X_{s})$ and $\xi _{1,{\mathcal {A}}_N}(X_{s}; k)$ is an increasing function with respect to k. Therefore, there exists a group size $k^{(l_s)}_{s}$ satisfying Theorem 2.3. Then the proof of Theorem 2.3 is complete.

1.5 E. The Proof of Theorem 3.1

As shown in Fig. 3a, specimens are classified into four types. The false-negative predictive value of $S_1$ is the same as $\xi _{1,{\mathcal {A}}_D}(X_{s}|l_s=1)$ or $\xi _{1,{\mathcal {A}}_H}(X_{s}|l_s=1),$ so we obtain that $\xi _{1,{\mathcal {A}}_S}(X_{s_1})>\xi _{1,{\mathcal {A}}_I}(X_{s_1})$ for $s_1\in S_1.$

For $s_2\in S_{2^{(1)}},$ note that there is no common specimen between $\{\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1,G_1\}$ and $\{\widetilde{{\mathcal {I}}}_{s_2}=1\}.$ Then we have

$$\begin{aligned} \xi _{1,{\mathcal {A}}_S}(X_{s_2})&=\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_2}=1|G^{(1)}(X_{s_2})=1,{\mathcal {I}}_{s_2}=0,\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\\&=\mathrm{Pr}(G^{(1)}(X_{s_2})=1,{\mathcal {I}}_{s_2}=0|\widetilde{{\mathcal {I}}}_{s_2}=1,\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\\&\quad \times \frac{\mathrm{Pr}(\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1,G^{(2)}(X_{s_2})|\widetilde{{\mathcal {I}}}_{s_2}=1)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_2}=1)}{\mathrm{Pr}(G^{(1)}(X_{s_2})=1,{\mathcal {I}}_{s_2}=0,\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)}\\&=\frac{S_e(1-S_e)\mathrm{Pr}(\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1,G^{(2)}(X_{s_2}))\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_2}=1)}{\mathrm{Pr}(G^{(1)}(X_{s_2})=1,{\mathcal {I}}_{s_2}=0,\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)}. \end{aligned}$$

In order to verify it is large than $ \xi _{1,{\mathcal {A}}_I}(X_{s_2}) = \frac{(1-S_e)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_2}=1)}{\mathrm{Pr}({\mathcal {I}}_{s_2}=0)}$, we only need to verify

$$\begin{aligned} \begin{array}{lll} \frac{S_e\cdot \mathrm{Pr}\left( \sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\right) }{\mathrm{Pr}\left( G^{(1)}(X_{s_2})=1,{\mathcal {I}}_{s_2}=0,\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\right) }\ge \frac{1}{\mathrm{Pr}({\mathcal {I}}_{s_2}=0)}, \end{array} \end{aligned}$$

which is equivalent to

$$\begin{aligned}&\mathrm{Pr}\left( G^{(1)}(X_{s_2})=1,{\mathcal {I}}_{s_2}=0|\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\right) \nonumber \\&\le S_e\cdot \mathrm{Pr}\left( {\mathcal {I}}_{s_2}=0\right) . \end{aligned}$$

(5.1)

Since

$$\begin{aligned}&\mathrm{Pr}(G^{(1)}(X_{s_2})=1,{\mathcal {I}}_{s_2}=0,\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\\&=\mathrm{Pr}({\mathcal {I}}_{s_2}=0)\mathrm{Pr}(\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\times \mathrm{Pr}\left( G^{(1)}(X_{s_2})=1|{\mathcal {I}}_{s_2}=0,\right. \\&\quad \left. \sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\right) , \end{aligned}$$

we can rewrite (5.1) as $\mathrm{Pr}(G^{(1)}(X_{s_2})=1|{\mathcal {I}}_{s_2}=0,\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1,G_1)\le S_e. $ It’s straightforward that the above inequality holds.

Next we proceed to prove for $s_3 \in S_{2^{(2)}}.$ Note that there is no overlap between $\{\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\}$ and $\{G^{(2)}(X_{s_3})\},$ so they are independent. Then we have

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_S}(X_{s_3})&{}=\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_3}=1|G^{(1)}(X_{s_3})=1,G^{(2)}(X_{s_3})=0,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\\ &{}=\frac{S_e(1-S_e)\mathrm{Pr}(\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_3}=1)}{\mathrm{Pr}(G^{(1)}(X_{s_3})=1,G^{(2)}(X_{s_3})=0,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)}. \end{array} \end{aligned}$$

To verify that $\xi _{1,{\mathcal {A}}_S}(X_{s_3})$ is larger than $\xi _{1,{\mathcal {A}}_D}(X_{s}|l_s=1)=\frac{(1-S_e)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s}=1)}{\mathrm{Pr}(G^{(1)}(X_{s})=0)},$ we need to verify that

$$\begin{aligned} \begin{array}{lll} \frac{S_e \cdot \mathrm{Pr}(\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)}{\mathrm{Pr}(G^{(1)}(X_{s_3})=1,G^{(2)}(X_{s_3})=0,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)}\ge \frac{1}{\mathrm{Pr}(G^{(2)}(X_{s_3})=0)}, \end{array} \end{aligned}$$

which is equivalent to

$$\begin{aligned} \mathrm{Pr}(G^{(1)}(X_{s_3})=1,G^{(2)}(X_{s_3})=0|\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\le S_e\cdot \mathrm{Pr}(G^{(2)}(X_{s_3})=0).\nonumber \\ \end{aligned}$$

(5.2)

Since there is no overlap between $\{\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\}$ and $\{G^{(2)}(X_{s_3})\},$ then we can rewrite (5.2) as

$$\begin{aligned} \begin{array}{lll} \mathrm{Pr}(G^{(1)}(X_{s_3})=1|G^{(2)}(X_{s_3})=0,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\le S_e. \end{array} \end{aligned}$$

It also holds. Finally, we proceed to prove the result of $S_3.$ For $s_4\in S_3,$ note that there is no overlap between $\{\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\}$ and $\{G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0\},$ so they are independent. Then we have

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_S}(X_{s_4})&{}=\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_4}=1|G^{(1)}(X_{s_4})=1,G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0,\\ &{}\quad \sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\\ &{}=\frac{S^2_e(1-S_e)\mathrm{Pr}(\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_4}=1)}{\mathrm{Pr}(G^{(1)}(X_{s_4})=1,G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)}. \end{array} \end{aligned}$$

We proceed to prove $\xi _{1,{\mathcal {A}}_S}(X_{s_4})$ is larger than $ \xi _{1,{\mathcal {A}}_H}(X_{s_4}|l_s=2) =\frac{(1-S_e)S_e \cdot \mathrm{Pr}(\widetilde{{\mathcal {I}}}_{s_4}=1)}{\mathrm{Pr}(G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0)}.$ Therefore, we need to prove

$$\begin{aligned}&\frac{S_e\cdot \mathrm{Pr}(\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)}{\mathrm{Pr}(G^{(1)}(X_{s_4})=1,G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)}\nonumber \\&\quad \ge \frac{1}{\mathrm{Pr}(G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0)}, \end{aligned}$$

(5.3)

which is equivalent to

$$\begin{aligned}&\mathrm{Pr}\left( G^{(1)}(X_{s_4})=1,G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0|\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\right) \nonumber \\&\quad \le S_e\cdot \mathrm{Pr}(G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0). \end{aligned}$$

Since there is no overlap between $\{\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\}$ and $\{G^{(2)}(X_{s_3})=1,{\mathcal {I}}_{s_4}=0\},$ we can rewrite (5.3) as

$$\begin{aligned} \begin{array}{lll} \mathrm{Pr}(G^{(1)}(X_{s_4})=1|G^{(2)}(X_{s_4})=1,{\mathcal {I}}_{s_4}=0,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1)\le S_e. \end{array} \end{aligned}$$

The above inequality holds, so we have proved that the false-negative predictive value of Sterret procedure is larger than that of individual testing. We continue to investigate the true-positive predictive value. Suppose the first group in stage 1 is tested positive. In stage 2, the $d^{th}$ specimen is tested positive and the remaining specimens are tested positive as a group, denoted by $T_1=\{d:{\mathcal {I}}_{d}=1\}$ and $T_2=\{j:d<j\le k\}$, respectively. The true-positive predictive value is denoted by

$$\begin{aligned} \eta _{2,{\mathcal {A}}_S}(X_{t_1})= & {} \mathrm{Pr}(\widetilde{{\mathcal {I}}}_{t_1}=1|G^{(1)}(X_{t_1})=1,{\mathcal {I}}_{d}=1,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0), t_1\in T_1,\\ \eta _{2,{\mathcal {A}}_S}(X_{t_2})= & {} \mathrm{Pr}(\widetilde{{\mathcal {I}}}_{t_2}=1|G^{(1)}(X_{t_2})=1,{\mathcal {I}}_{d}=1,\sum _{j=1}^{d-1}{\mathcal {I}}_{j}\\= & {} 0, G^{(2)}(X_{t_2})=1, {\mathcal {I}}_{t_2}=1), t_2 \in T_2. \end{aligned}$$

We will give proof for $\eta _{2,{\mathcal {A}}_S}(X_{t_1})$. The proof for $\eta _{2,{\mathcal {A}}_S}(X_{t_2})$ is similar and is omitted. For $t_1\in T_1$, let $A=\{G^{(1)}(X_{t_1})=1,{\mathcal {I}}_{d}=1\}$ and $B=\{\sum _{j=1}^{d-1}{\mathcal {I}}_{j}=0\}.$ Note that there is no overlap between B and ${\mathcal {I}}_{t_1}.$ So we have

$$\begin{aligned} \begin{array}{lll} \eta _{2,{\mathcal {A}}_S}(X_{t_1})&{}=\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{t_1}=1|A,B)=\frac{\mathrm{Pr}(A|\widetilde{{\mathcal {I}}}_{t_1}=1,B)\mathrm{Pr}(B|\widetilde{{\mathcal {I}}}_{t_1}=1)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{t_1}=1)}{\mathrm{Pr}(A,B)}\\ &{}=\frac{S^2_e\mathrm{Pr}(B)\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{t_1}=1)}{\mathrm{Pr}(A,B)}. \end{array} \end{aligned}$$

In order to verify it is larger than the false-positive predictive value of individual testing, which is $ \eta _{2,{\mathcal {A}}_I}(X_{t_1}) = \hbox {Pr}(\widetilde{{\mathcal {I}}}_{s}=1|{\mathcal {I}}^{(l_s)}_{s}=1)=\frac{S_e\mathrm{Pr}(\widetilde{{\mathcal {I}}}_{t_1}=1)}{\mathrm{Pr}({\mathcal {I}}_{t_1}=1)}. $ We need to show

$$\begin{aligned} \begin{array}{lll} \frac{S_e\mathrm{Pr}(B)}{\mathrm{Pr}(A,B)}\ge \frac{1}{\mathrm{Pr}({\mathcal {I}}_{t_1}=1)}, ~\text {which is}~ \mathrm{Pr}(A|B)\le S_e\mathrm{Pr}({\mathcal {I}}_{t_1}=1). \end{array} \end{aligned}$$

This inequality holds because

$$\begin{aligned} \begin{array}{lll} \mathrm{Pr}(A|B)&{}=\mathrm{Pr}(G^{(1)}(X_{t_1})=1,{\mathcal {I}}_{d}=1|B)=\mathrm{Pr}(G^{(1)}(X_{t_1})\\ &{}=1|{\mathcal {I}}_{d}=1,B) \mathrm{Pr}({\mathcal {I}}_{d}=1)\\ &{}\le S_e\mathrm{Pr}({\mathcal {I}}_{d}=1). \end{array} \end{aligned}$$

1.6 F. The Proof of Theorem 3.2

For $S_{2^{(1)}},$ let $r_2=r_1+1,$ and denote the event $A_1=\{B_{s_2}(k_{r_1})=0\},$ $A_2=\{B_{s_2}(k_{r_2})=0\},$ $B=\{G^{(1)}(X_{s_2})=1,{\mathcal {I}}_{s_2}=0\}$ and $C=\{\sum _{j=1,j\ne s_2}^{d-1}{\mathcal {I}}_{j}=0,{\mathcal {I}}_{d}=1\}.$ Note that there is no overlap between B and ${\mathcal {I}}_{s_2},$ so they are independent. Then we have

$$\begin{aligned} \begin{array}{lll} \xi _{1,{\mathcal {A}}_S}(X_{s_2};k_{i})&{}=P\left( \widetilde{{\mathcal {I}}}_{s_2}=1|A_i,B,C\right) \\ &{}=\frac{P\left( A_i,B|\widetilde{{\mathcal {I}}}_{s_2}=1,C\right) P(C|\widetilde{{\mathcal {I}}}_{s_2}=1)P(\widetilde{{\mathcal {I}}}_{s_2}=1)}{P(A_i,B,C)}\\ &{}=\frac{S_e(1-S_e)^2P(C)P(\widetilde{{\mathcal {I}}}_{s_2}=1)}{P(A_i,B,C)}. \end{array} \end{aligned}$$

Since

$$\begin{aligned} \begin{array}{lll} &{}P\left( B_{s_2}(k_{r_1})=0,B,C\right) - P\left( B_{s_2}(k_{r_2})=0,B,C\right) \\ &{} =\left[ S_p-P(B_{ s_2}(k_{r_1})=0|{\widetilde{B}}_{s_2}(k_{r_2})=0,B,C)\right] P\left( {\widetilde{B}}_{s_2}(k_{r_2})=0,B,C\right) > 0. \end{array} \end{aligned}$$

Then we finish the proof of $\xi _{1,{\mathcal {A}}_S}(X_{s_2};k_{r_2})\ge \xi _{1,{\mathcal {A}}_S}(X_{s_2};k_{r_1})$ for any positive integer $r_1.$ For other types, $S_{2^{(2)}}$ and $S_3$, we could prove in the same way. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, W., Ding, J., Zhang, W. et al. Nested Group Testing Procedure. Commun. Math. Stat. 11, 663–693 (2023). https://doi.org/10.1007/s40304-021-00269-0

Download citation

Received: 09 May 2021
Revised: 27 June 2021
Accepted: 13 October 2021
Published: 01 October 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s40304-021-00269-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Nested Group Testing Procedure

Abstract

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Evaluating significance in linear mixed-effects models in R

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

1 Introduction

2 Main Results

2.1 Notations

2.2 Predictive Values of \({\mathcal {A}}_{O}\)

2.3 Predictive Value Comparison Between \({\mathcal {A}}_O\) and \({\mathcal {A}}_I\)

Theorem 2.1

Theorem 2.2

2.4 Nested Group Testing Procedure

Theorem 2.3

3 Applications to the Existing Group Testing Procedures

3.1 Dorfman’s and Halving Procedure

3.2 One-Step Sterrett Procedure

Theorem 3.1

Theorem 3.2

4 Further Evaluation of Nested Group Testing Procedures

4.1 Pooling Sensitivity and Specificity

4.2 Malaria Infection Group Testing

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 A. Derivation of Predictive Values

1.2 B. Proof of Theorem 2.1

1.3 C. Proof of Theorem 2.2

1.4 D. The Proof of Theorem 2.3

Lemma 5.1

Proof

Lemma 5.2

Proof

1.5 E. The Proof of Theorem 3.1

1.6 F. The Proof of Theorem 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation