11.1 Introduction

When testing for a disease such as COVID-19, the standard method is individual testing: we take a sample from each individual and test these samples separately. Under the convenient mathematical model of perfect testing, a sample from an infected individual always gives a positive result, while a sample from a noninfected individual always gives a negative result. For N individuals, this requires N tests, and we can accurately classify all the individuals as infected or noninfected. The infected individuals can be advised to self-isolate and their contacts can be traced, while the noninfected individuals are reassured that they are free of the disease.

An alternative to individual testing is pooled testing, also called group testing. Instead of testing individual samples, we can instead pool samples together and test that pooled sample. Again under the convenient model of perfect testing, a pool consisting entirely of uninfected samples gives a negative result, while a pool containing one or more infected samples gives a positive result. Thus a negative result demonstrates that every individual in the pool is noninfected, while a positive result requires further information to work out which individuals in the pools are infected.

As we shall see in this chapter, when the prevalence of a disease is low enough and the accuracy of a test is high enough, pooled testing can accurately classify individuals as infected or noninfected in fewer than N tests. This can be more efficient—and often much more efficient—than individual testing.

This chapter is structured as follows. In the remainder of this section, we introduce background material. In Sect. 11.2, we analyse some algorithms for pooled testing under an idealised model of perfect tests. In Sect. 11.3, we adapt this analysis to more realistic models of testing with errors. In Sect. 11.4, we discuss some practical issues/problems with the application of pooled testing for COVID-19. In Sect. 11.5, we survey some uses of pooled testing during the pandemic, so far. In Sect. 11.6, we conclude, and give some of our own views on potential applications of pooled testing for COVID-19.

11.1.1 Testing for COVID-19

As well as discussing the general theory of pooled testing, much of this chapter concerns applications of pooled testing in the COVID-19 pandemic, so we proceed to give some background on the existing tests for detecting current SARS-CoV-2 infection.

In the real world, testing is not perfect. We distinguish between two types of test errors:

  • False positive test errors, where a sample (individual or pool) that does not contain any infection wrongly gives a positive result. The probability that an infection-free sample correctly gives a negative result is called the specificity.

  • False negative test errors, where a sample (individual or pool) that does contain infection wrongly gives a negative result. The probability that an infected sample correctly gives a positive result is called the sensitivity.

The most commonly used test for current SARS-CoV-2 infection is the RT-PCR test (reverse transcription polymerase chain reaction test, or just ‘PCR test’ for short). A PCR test for SARS-CoV-2 infection typically works as follows. First, a swab is taken from the nose or upper throat of the individual to be tested. The swab is then sent to a laboratory, where material from the swab analysed to find out whether it contains genetic material from the SARS-CoV-2 virus. We refer the reader to the previous chapter of this book, by Dunbar and Tang, for more details. The process typically takes from four to six hours from the receipt of the swab until the output of the result, depending on the laboratory (Mahase 2020).

The PCR test is very highly specific, with specificity estimated at ranging from 97.4 to 99.98% (Skitrall et al. 2021), meaning that false positive test errors are extremely rare. (The tests used in the UK’s ONS Coronavirus Infection Survey, for example, have a specificity of more than 99.92%; see Office for National Statistics (2020).) On the other hand, the PCR test is only moderately sensitive, with sensitivity in the range 70–\(90\%\) being typical (Böger et al. 2021; Woloshin et al. 2020). The sensitivity depends on the laboratory protocol being used, and can be affected by shortages of reagent or improper procedures. Another significant source of insensitivity is improperly taken swabs; this can depend on the level of training of the person taking the swabs, so sensitivity can be lower in community settings than in healthcare settings (Watson and Whiting 2020). Sensitivity for a given individual also depends on the viral load and the how long after illness onset the swab was taken. The mathematical lesson from all this is that a negative test does not definitively rule out the individual being infected.

Another test for SARS-CoV-2 infection is the RT-LAMP (reverse transcription loop-mediated isothermal reaction) test. Pooled testing can certainly be used with RT-LAMP tests; however, they are not yet widely available, so we will focus our attention here on PCR tests when discussing COVID-related applications of pooled testing.

A third test is the lateral flow test, which at the time of writing is being used in the UK for mass-testing in certain areas (such as areas where there is high prevalence, or where certain new variants of SARS-CoV-2 have been detected), and for the regular screening of secondary-school pupils (with pupils being asked to self-test twice per week, at home, from the week beginning 15 March 2021). Lateral flow testing kits are cheap, easily portable, require little training to use, and produce a result in around 30 min; on the flip side, they have much lower sensitivity than PCR tests. For example, in a large pilot study in the city of Liverpool, the sensitivity of community-based lateral flow testing was estimated at 48.9% (95% CI: 33.7–64.2%) (García-Fiñana et al. 2020). We believe that pooled testing is unlikely to be compatible with the lateral flow testing programme in the UK, in view of the low sensitivity of the test, the level of training of those administering the tests, and the premium placed on rapid turnaround time.

Much of our analysis in this chapter is applicable to testing for other pandemic diseases, such as pandemic influenza, sometimes with adjustments to the assumed sensitivities and specificities of the tests being used. In particular, the mathematical models used are independent of the disease in question, though the assumptions may be more or less appropriate in the case of other diseases. For example, if very accurate and rapid tests are available for a certain pandemic disease, then pooled testing algorithms with more than two sequential stages may be well worth considering, as they can yield even greater resource-savings than pooled testing algorithms with one or two stages.

11.1.2 Stages of a Pooled Testing Algorithm

Pooled testing was first proposed in 1943 by Dorfman (1943) for the detection of cases of syphilis in those called up for US army service during the Second World War. (The textbook of Du and Hwang (2020, Chap. 1) gives more information about the early history of pooled testing.) Dorfman’s algorithm is perhaps the simplest of all pooled testing algorithms, and has also been the most widely-used one in disease control, both prior to and during the COVID-19 pandemic. It proceeds as follows. We assume for the moment that tests are perfectly accurate, with \(100\%\) sensitivity and \(100\%\) specificity.

Suppose we have N individuals, and we wish to identify who among those N individuals is infected.

  1. 1.

    We choose a pool size s, and we divide the N individuals into N/s disjoint groups of size s each. (We assume, for simplicity, that N is an exact multiple of s.) We take a sample from each of the N individuals, and then, for each of the N/s groups, we pool the samples from that group into a single pooled sample. We then run a test on each of the N/s pooled samples.

  2. 2.
    1. a.

      If a pool tests negative, we know all the individuals in the corresponding group are noninfected.

    2. b.

      If a pool tests positive, we then follow up by individually test all the individuals in the corresponding group. These individual tests discover which of the samples in the pool were infected or noninfected.

At the end of this process, under our perfect testing model, we have correctly classified all the individuals as infected or noninfected. This is illustrated in Fig. 11.1, in the case \(N=15\) and \(s=5\).

Fig. 11.1
figure 1

Schematic illustration of the use of Dorfman’s algorithm (under perfect testing) to identify all the infected individuals in a group of 15, using pools of size 5. In the above case, there are two infected individuals, and only eight tests are required to identify them. If the two infected individuals had been in different pools at the first step, then 13 tests would have been required

We shall see later that:

  • Under perfect testing, if the prevalence is lower than \(30\%\), then Dorfman’s algorithm uses fewer than N tests on average, so is more efficient than individual testing. (See Sect. 11.2.3.)

  • Under perfect testing, the optimal pool size s is easy to calculate, and is approximately \(s = 1/\sqrt{p}\), where p is the prevalence of the disease. (See Sect. 11.2.1 for the formal definition of prevalence.)

  • Even under imperfect noisy testing, Dorfman’s algorithm can be more efficient than individual testing for sufficiently low prevalence. (See Sect. 11.3.2.)

  • For even lower prevalence and higher accuracy, other pooled testing algorithms cannot only outperform individual testing but outperform Dorfman’s algorithm as well. (See Sects. 11.2.4, 11.2.5, and 11.3.2.)

Note that individual testing is a one-stage or nonadaptive algorithm, in that all the tests are designed in advance and can be carried out in parallel. Meanwhile, Dorfman’s algorithm is a two-stage algorithm: the first stage of pooled tests is designed in advance and carried out in parallel, but then the results must be analysed before designing and carrying out the follow-up individual tests in a second stage. There are also pooled testing algorithms with more than two stages. There is typically a tradeoff between the number of stages and the efficiency of the algorithm—more stages allows one to use fewer tests, but more stages take more laboratory time.

It is estimated that approximately 70% (95% CI: 52–90%) of the transmission of COVID-19 typically takes place either before symptom onset or in the first 48 h after symptom onset (see He et al. (2020), and the very slight correction in Ashcroft et al. (2020)). Hence, a fast turnaround time is an important factor to consider when choosing which protocol to use for case detection. If the tests were to have a very rapid processing time, it might be possible to use algorithms with many stages. However, as stated above, PCR tests for COVID-19 typically have a processing-time of four to six hours. It is likely that many laboratories worldwide will be able to perform two sequential stages in a 24-h period (for logistical reasons, a turnaround time of less than 24 h from swabbing to result announcement is often hard to achieve anyway), but adding more sequential stages may increase turnaround time too much. Moreover, laboratories under pressure may struggle to keep track of samples over more than two sequential stages. For these reasons, we focus our attention in this chapter on pooled testing algorithms with at most two sequential stages. For the state of the art in fully adaptive algorithms, with no limit on the number of stages, see Aldridge (2019), as surveyed in Aldridge (2019, Sect. 5.5).

For a more comprehensive (but pre-pandemic) surveys of the mathematics of pooled testing, we refer the reader to Aldridge et al. (2019), Du and Hwang (2020).

In this chapter, we only consider pooled testing where each test-result is simply either ‘positive’ or ‘negative’. A different form of pooled testing is where an attempt is made to measure how much viral RNA is present in each pooled sample, and to make use of this information; this is known as quantitative pooled testing, or quantitative group testing, and it is a special case of the well-studied ‘compressed sensing’ problem. For a comprehensive introduction to the mathematics of compressed sensing, with emphasis on algorithms, the reader is referred to Foucart and Rauhut (2013). For an account of a protocol for using quantitative pooled testing for COVID-19 case identification, the reader is referred to Ghosh et al. (2020).

11.1.3 Who and Why to Test

There are two different potential applications of pooled testing for a pandemic diseases such as COVID-19. The first application, which we have discussed so far, is for case identification, where it is desired to identify which members of a group are infected, for the purposes of infection-control. There is also a second application, for surveillance, where the goal is only to estimate the infection prevalence, without necessarily identifying which individuals are those infected.

In this chapter, we focus mainly on the first application, for case identification, as we believe this is where the most useful applications of pooled testing for COVID-19 are most likely to be found. Briefly, we believe that in the UK, for example, the utility of pooled testing for surveillance on a national scale may be limited in the medium term, because, first, the UK already has a well-developed and extensive national surveillance programmes based on individual testing using random population sampling, such as the ONS Coronavirus Infection Survey, and second, using pooled testing for surveillance only yields large efficiency gains over individual testing when prevalence is lower than it has often been in the UK since the start of the pandemic. Pooled testing for surveillance does, however, still have some potential utility—for example, if prevalence in the UK becomes sufficiently low and it is desired to reduce the resource requirements of the ONS Infection Survey while still monitoring the prevalence of infection. It is also quite possible that pooled testing could be useful for the surveillance of new variants of the coronavirus, which is important in view of the risks posed by the latter to the effectiveness of vaccination programmes. We return to these issues in Sect. 11.6.

We also draw a distinction between testing symptomatic people, among whom the prevalence is likely to be high, and testing asymptomatic people, where the prevalence is likely to be lower. As we shall argue in Sect. 11.6, we believe that pooled testing for case identification is most likely to be useful for the screening of asymptomatic people—and possibly for the testing of contacts of confirmed cases, provided the prevalence of infection among the group to be tested is thought to be sufficiently low. On the other hand, we believe that pooled testing is unlikely to be useful for the testing of symptomatic people, because the prevalence of COVID-19 infection among those presenting symptoms is usually sufficiently high that the resource savings of pooled testing would be modest compared to individual testing, and are arguably outweighed by the down-sides of pooled testing, such as increased turnaround time compared to individual testing.

11.2 Pooled Testing Algorithms for Perfect Tests

11.2.1 Outline and Model

In this section, we look at some algorithms for pooled testing for a disease, and assess their performance under the mathematically convenient model of perfect test results. (Later, in Sect. 11.3, we look at the performance of these algorithms in the more realistic model of tests that are highly-but-imperfectly specific and moderately sensitive.)

Unsurprisingly, a key quantity in our model is the prevalence of the disease, denoted by p: this is the fraction of individuals in the population in question, who are infected with the disease, at the time when testing is being done. Equivalently, it is the probability that an individual selected at random from the population, is infected.

We assume that N individuals are being tested, and that these individuals are drawn from a large population. Each member of the population is assumed to be infected with probability p (where p is the prevalence, as defined above), independently of all other members of the population. (This independence assumption is not quite realistic in many settings, since some of the individuals being tested will often be contacts of one another, so clustering can occur. However, as we shall see, clustering actually makes pooled testing algorithms more efficient than if clustering is not present.)

Assuming that tests are perfect, we usually aim to correctly classify all N individuals as infected or noninfected. We often summarise the performance of algorithms through the expected tests per individual. If an algorithm uses a (possibly random) number of tests T to classify N individuals as either ‘infected’ or ‘non-infected’, then the expected tests per individual is \((\mathbb ET)/ N\), where \(\mathbb {E} T\) denotes the expectation (or mean, or average) of the random variable T. Clearly, it is desirable for the expected tests per individual to be as small as possible. Note that individual testing clearly has \((\mathbb ET )/ N = N/N = 1\). The expected tests per individual is useful for comparing how much better (or worse) an algorithm is than individual testing.

A standard information-theoretic bound called the counting bound (see, for example, Aldridge (2019), Aldridge et al. (2019), Baldassini et al. (2013)) states that, for any successful pooled testing procedure, the expected tests per individual satisfies the lower bound

$$\begin{aligned} \frac{\mathbb ET }{ N} \ge H(p). \end{aligned}$$
(11.1)

Here, p is the prevalence of the disease, and H(p) is the binary entropy function, defined by

$$\begin{aligned} H(p) = p \log _2 \frac{1}{p} + (1-p)\log _2 \frac{1}{1-p}. \end{aligned}$$

The bound (11.1) immediately implies that, when the prevalence is high enough, pooled testing cannot significantly outperform individual testing (under the model of perfect tests). For example, when the prevalence of infection in the population being tested is 20% (\(p = 0.2\)), we have \(H(0.2) \approx 0.72\), and therefore no pooled testing algorithm can use less than 72% of the number of tests per individual required by individual testing, at this prevalence level (under the model of perfect tests). In fact, a deeper mathematical result of Fischer et al. (1999) states that (under the model of perfect tests), individual testing is optimal among all pooled testing algorithms whenever the prevalence is at most 38.2% (\(p \le 0.382\)).

It is useful to consider, for different pooled testing algorithms, how close their ‘expected tests per individual’ is to the counting bound (11.1), at different prevalence levels, under the model of perfect tests, and this is what we shall do in this section.

Fig. 11.2
figure 2

Performance of one- and two-stage group testing algorithms under perfect testing, as measured by a the expected tests per item, and b the rate. ‘Grid’ and ‘(rs)-regular’ refer to the conservative two-stage variants. At all but the lowest prevalences, the (rs) regular design has optimal parameter \(r =1 \), so is equivalent to Dorfman’s algorithm, or \(r =2\), and is equivalent to the grid algorithm

In this section, we study the following four classes of pooled testing algorithms:

  • individual testing;

  • Dorfman’s algorithm, where each individual’s sample appears in exactly one pool, in the first stage;

  • grid-based designs, where each individual’s sample appears in exactly two pools, in the first stage;

  • (rs)-regular designs, where each individual’s sample appears in exactly r pools in the first stage, each pool containing s samples.

Figure 11.2 shows the performance of these algorithms (with the grid and (rs)-regular methods algorithms in what we will later call their ‘conservative two-stage’ variants—see Sect. 11.2.4.1). The top subfigure shows the expected tests per item, and indicates the potential benefit of using pooled testing (compared to individual testing) when the prevalence is below \(20\%\).

However, this top subfigure does not sufficiently convey the difference between the pooling algorithms at very low prevalence. The bottom subfigure shows the rate, defined by

$$\begin{aligned} \text {rate} = \frac{H(p)}{\mathbb ET / N} = \frac{H(p)N}{\mathbb ET}. \end{aligned}$$

The rate measures how close an algorithm gets to the counting bound (11.1)—higher rates are better. (We remark that the rate can also be interpreted information-theoretically, as the number of bits of information learned per test. See for example Aldridge et al. (2019), Baldassini et al. (2013).) The rate illustrates better the comparison between the different pooling methods, and shows the advantage of the (rs)-regular design at very low prevalences (e.g., below \(2\%\)).

11.2.2 Individual Testing

Clearly, the individual testing of N individuals requires \(T = N\) tests, yielding an expected number of tests per individual equal to \(T/N = N/N = 1\).

An obvious advantage of individual testing is that one does not need an estimate of the prevalence. It is a one-stage algorithm, so has the fastest possible turnaround time. It is also the simplest testing algorithm to implement. However, we will see that other algorithms yield huge resource savings over individual testing, at low prevalence-levels, and are therefore well worth considering, when prevalence is fairly low and testing capacity is constrained.

11.2.3 Dorfman’s Algorithm

In Sect. 11.1.2, we introduced Dorfman’s original two-stage algorithm. In this section, we discuss it in more detail.

Suppose we receive samples from N individuals in a large population where the prevalence of infection is p, and we run Dorfman’s algorithm on these N individuals, using pools of size \(s \ge 2\), where N is assumed to be a multiple of s. Clearly, one test for each pool is always used at the first pooled testing stage. A pool will also require an extra s individual tests in the second stage if the pooled test was positive. The pooled test will be positive unless all s items are noninfected, so the probability it will test positive is \(1-(1-p)^s\). So for each of the N/s pools we definitely have 1 pooled test, then with probability \(1 - (1-p)^s\) another s individual tests are required, giving an expected number of tests as

$$\begin{aligned} \mathbb ET = \frac{N}{s} \left( 1 + s\big (1 - (1-p)^s\big ) \right) = \left( \frac{1}{s} + 1 - (1-p)^s \right) N . \end{aligned}$$

Hence the expected tests per individual is

$$\begin{aligned} \frac{\mathbb ET}{N} = \frac{1}{s} + 1 - (1-p)^s . \end{aligned}$$

It is easy to check that \(s = 2\) is never the best choice of pool size s, but that \(s = 3\) improves on individual testing for \(p < 1 - (1/3)^{1/3} = 0.307\). That is, Dorfman’s algorithm improves on individual testing for prevalences below roughly \(30\%\).

If the prevalence p is known accurately beforehand, we should choose the pool size s so as to minimise the quantity \(1/s + 1-(1-p)^s\), and thereby minimise the expected number of tests required. For fixed p, the function \(s \mapsto 1/s + 1-(1-p)^s\) is very well-behaved, and it is very easy to numerically find the integer s that minimises it. But it is useful also to note that when p is small, we may use the approximation

$$\begin{aligned} \frac{1}{s} +1-(1-p)^s \approx \frac{1}{s} +1 - (1 -ps) = \frac{1}{s} +ps , \end{aligned}$$

which is minimised over the reals at \(s = 1/\sqrt{p}\). This gives an expected tests per individual of approximately \(2\sqrt{p}\). More formally, choosing \(s = \lfloor 1/\sqrt{p} \rfloor \) yields that the expected number of tests per individual is \(1/s +1-(1-p)^{s} = 2\sqrt{p} + O(p)\), where the error term O(p) is small compared to \(\sqrt{p}\) when p is small. Hence, when p is small, a good estimate for p is known beforehand, and the pool-size s is chosen sensibly, our model predicts that Dorfman’s algorithm uses, on average, approximately \(2\sqrt{p} N\) tests to identify all the infected individuals among a population of size N.

Table 11.1 shows the predicted resource requirements (under the simple model above, i.e., perfect tests), of using Dorfman’s algorithm to test N individuals, at different prevalence levels, when the prevalence level is known accurately beforehand, and the pool-size s is chosen optimally.

Table 11.1 Resource requirements of using Dorfman’s algorithm to test N individuals, at different prevalence levels, with tests of perfect sensitivity and specificity

There are shortcomings in this analysis. We mention three here. Firstly, tests used in the real world do not have perfect sensitivity or perfect specificity; we will refine the model to take this into account, in the following section.

Secondly, an accurate estimate for the prevalence may not be known beforehand, so it may not be possible to choose the pool size s optimally beforehand. We return to this issue in Sect. 11.4.

Thirdly, the independence assumption may fail because infections of different individuals being tested by the laboratory are unlikely to be truly independent. But if the overall prevalence remains the same, then Dorfman’s algorithm will not perform any worse than the above analysis predicts. In fact, it is advantageous in terms of resource requirements if there is a ‘clustering’ of infected individuals in the same pool. In terms of resource use, the worst case for Dorfman’s algorithm is when infected individuals are spread between many pool, and the best-case is when infected individuals are concentrated in few pools.

11.2.4 Grid Algorithms

In Dorfman’s algorithm, each individual was tested once in the first stage. In a family of algorithms called grid algorithms, each item is tested twice in the first stage. One variant attempts to classify samples as infected or noninfected just from this single stage, while other variants follow up with a second stage of individual testing.

11.2.4.1 Variants of Grid Algorithms

The grid algorithms always begin as follows. Suppose we have N individuals to test. We split these into \(N/s^2\) groups each of \(s^2\). (We assume for simplicity that N is a multiple of \(s^2\).) Let us concentrate on a single group. Each of the \(s^2\) individuals is swabbed, and the sample from each swab is divided in two, so it can later be a part of two different sample pools. We now picture the \(s^2\) individuals as laid out on a \(s \times s\) grid. In the first stage we conduct 2s pooled tests: we make one pool from each of the s rows of the grid, and one pool from each of the s columns. A PCR test is run on each of these sample pools.

We assume, again, that tests are perfectly accurate. What can we learn from these results of these pooled tests?

Case 0::

If none of the \(s^2\) individuals are infected, then all 2s tests will be negative. We can confidently state that all the samples are noninfected.

Case 1::

If exactly one out of the \(s^2\) individuals is infected, then exactly one row test and one column test will be positive. We can confidently state that the individual at the intersection of that row and column in the grid is infected and that all the other individuals are noninfected.

Case 2::

If two or more of the \(s^2\) individuals are infected, then the test results may be ambiguous. If we are lucky, it could be that all the infected individuals are in the same row or the same column of the grid, in which case they can be identified with complete confidence—we call this Case 2A—but more often we cannot be certain exactly which individuals are infected—we call this Case 2B.

What should one do after receiving the pooled test results? Here, we briefly look at three possible choices, leading to three different variants of the grid algorithm:

  • One-stage grid algorithm: In the one-stage variant, we do not perform any follow-up tests. Any individual that was in at least one negative-testing pool is confidently declared to be noninfected. In Cases 0, 1 or 2A, the remaining individuals (i.e., those who appeared in two positive-testing pools) can be confidently declared to be infected, whereas in Case 2B, the remaining individuals are declared to have unclear results. Running just one single stage has the benefit that a laboratory can quickly process the results, but the downside of sometimes producing inconclusive results.

  • Standard two-stage grid algorithm: In this variant, we can run a second stage of individual testing, to clear up ambiguous results, if there are any. That is, in Cases 0, 1 or 2A, the algorithm is exactly the same as the one-stage grid algorithm above, but in Case 2B, all individuals who were not in at least one negative-testing pool, are given an individual test in the second stage; those who test negative are then declared negative, and those who test positive are declared positive.

  • Conservative two-stage grid algorithm: Alternatively, we can run a second stage where all individuals that appeared in two positive-testing pools are given an individual test—even in Case 1 or Case 2A, where the infected individuals can be logically determined after the first stage. This has the advantage that every infected individual can be definitely confirmed as infected by the ‘gold standard’ of an individual (non-pooled) PCR test, which might be personally reassuring for the individual or their employer, beneficial when tests are imperfect, or required by regulators. Two-stage algorithms where infection must be confirmed with an individual test are often known in the literature as trivial two-stage algorithms, but in this chapter we will use the term conservative two-stage algorithm, as it is more descriptive. Conservative two-stage algorithms can be easier to analyse mathematically, than the standard two-stage ones.)

In real-life situations, when using the one-stage variant of the grid algorithm, one must make a decision on what to do with individuals with an inconclusive result. One option would be to inform all those individuals they should self-isolate as a precaution; another option would be to individually re-test each of them at a later date (effectively running a two-stage algorithm with a delayed second stage); another option would be to restart the testing from scratch with different grids; a more reckless option would be to inform all the individuals being tested that the test results were inconclusive and that they should continue their lives as if they had tested negative. Which option the laboratory or the regulatory authorities choose may depend on the impact of letting an infection go undetected: if the individual in question is a school pupil or a member of the general community in a mass-testing programme, the impact is likely to be much less than if the individual is a healthcare worker working with highly vulnerable patients or a resident-facing social care worker, for example.

In the next subsection, we give a brief analysis of the conservative two-stage variant of the grid algorithm, under our convenient assumption that the tests are perfectly accurate. We summarise these results in Table 11.2.

Table 11.2 Resource requirements of using two variants of the grid algorithm at different prevalence levels, with tests of perfect sensitivity and specificity. In the one-stage variant, parameters are chosen such that each grid of \(s^2\) individuals is correctly classified with probability at least 0.99. A dash—means that the method is worse than individual testing

11.2.4.2 An Analysis of the Conservative Two-Stage Grid Algorithm

Recall that in the conservative two-stage variant of the grid algorithm, every individual in two positive pooled tests receives an individual test. Our analysis here is similar to that in Aldridge (2020), Broder and Kumar (2020).

For each of the \(N/s^2\) grids of \(s^2\) individuals, there are 2s pooled tests in the first stage. An individual can then receive an individual test for one of two reasons. One is that the individual is infected, which occurs with probability p. The second is that the individual is uninfected (which occurs with probability \(q=1-p\)), and that further, its row contains an infected individual (this happens with probability \(1 - q^{s-1}\)), and its column also contains an infected individual (this also happens with probability \(1 - q^{s-1}\)).

Hence, using the linearity of expectation, the expected number of tests is

$$\begin{aligned} \mathbb ET = \frac{N}{s^2} \big (2s + s^2(p + q(1 - q^{s-1})^2 )\big ) = N \left( \frac{2}{s} + p + q(1-q^{s-1})^2 \right) , \end{aligned}$$

and the expected tests per individual is \((\mathbb {E}T)/N = \frac{2}{s} + p + q(1-q^{s-1})^2\).

As before, given the prevalence p, this is a well-behaved function of s, so it is easy to numerically choose the optimal s. Results were shown in Table 11.2 above, and far outperform the one-stage variant when a second stage is available. Comparing with Table 11.1, we see that the conservative two-stage grid algorithm also outperforms Dorfman’s algorithm for the values of p under consideration, though not by more than a factor of approximately two, for these values of p. We note, however, that the optimal choice of s (the pool-size in the conservative two-stage grid algorithm) for \(p=0.1\%\), is 106, and large pool-sizes do have practical down-sides (usually requiring automation, reducing sensitivity, and posing regulatory problems—see Sect. 11.4).

From a more mathematical perspective, a similar approximation to that for Dorfman’s algorithm shows that, for small p, the optimal s is of the order \(1/p^{2/3}\); this yields an expected number of tests per individual of the order \(p^{2/3}\). For p sufficiently small, this is an improvement on Dorfman’s algorithm (where the expected number of tests per individual is of the order \(p^{1/2}\)).

11.2.5 Pooling Algorithms Based on (rs)-Regular Designs

The (rs)-regular designs are a family of algorithms that generalise the algorithms we have seen so far. These algorithms have a first stage where each individual’s sample appears in exactly r different pools, and each pool contains samples from exactly s individuals. The second stage (if there is a second stage) consists of individual testing—as with the grid algorithm discussed in the previous section, in the ‘standard’ variant, an individual is given an individual test at the second stage only when their infection status cannot be determined from the first stage, whereas in the ‘trivial’ variant, an individual is given an individual test at the second stage whenever all the pools they appear in test positive at the first stage.

Individual testing is a special case of an (rs)-regular design, with \(r = 1\), \(s = 1\) and a single stage. Dorfman’s algorithm is a special case with \(r = 1\), \(s > 1\), with two stages. The grid algorithms we discussed above are special cases with \(r = 2\).

There are a number of ways to construct a pooling design that is (rs) regular.

  • Randomly: Given a number of individuals N, a testing procedure that tests each individual in r pools with each pool consisting of s samples can be chosen uniformly at random from all such procedures. This is easy to do computationally, and convenient for proving mathematical statements. However, the random choice means that rare bad designs are possible, and the lack of structure can make it awkward to carry out in a laboratory setting.

  • Hypercube: This method generalises the grid algorithm to higher dimensions. It is required that \(s = a^{r-1}\) for some positive integer a. Assume that N is a multiple of \(a^r\). We split the N individuals into groups of size \(a^r\), and we focus our attention on just one group. Imagine that those \(a^r\) individuals are placed on an r-dimensional \(a \times a \times \cdots \times a\) hypercube. Each pool corresponds to an \((r-1)\)-dimensional slice of this hypercube, containing \(a^{r-1} = s\) individuals. Note that each individual is sampled in r pools, one for each of the r slice directions. Taking \(r = 2\), we obtain the grid algorithm. The structure of the hypercube can be convenient for implementation, although automation is usually required for pooling the samples, and for \(r \ge 3\), the conditions give a somewhat restricted set of possibilities for s.

  • Code-based: A classical construction of Kautz and Singleton shows how to construct an (rs)-regular design from an error-correcting linear code with appropriate parameters. We point readers to Aldridge et al. (2019, Sect. 5.7) or Kautz and Singleton’s original work (Kautz and Singleton 1964) for further details. The extra structure often gives good performance when N is small, although for some values of r and s it is not possible to find a code with appropriate parameters.

We note that, by counting the number of times a sample appears in a pool in two different ways, the number of pooled tests \(T_1\) used by the first stage of an (rs)-regular design satisfies \(Nr = T_1s\), and therefore the number of tests per individual used by the first-stage of an (rs)-regular design is \(T_1/N = r/s\).

As stated above, an (rs)-regular pooled testing algorithm can be used in the form of a one-stage, a ‘standard’ two-stage, or a conservative two-stage algorithm, just as with the grid algorithm.

We briefly present a summary of an analysis of the conservative two-stage variant, following Aldridge (2020), Broder and Kumar (2020). With this variant, any individual whose r stage-one pooled tests are all positive receives and individual test in the second stage. For large N, the expected tests per individual in the random (rs)-design (described above) satisfies the following with high probability:

$$\begin{aligned} \frac{\mathbb ET}{N} \sim \frac{r}{s} + p + q (1 - q^{s-1})^r , \end{aligned}$$
(11.2)

where p is the prevalence and \(q = 1-p\). Here, r/s is the number of tests per individual in the first stage. An individual requires retesting in the second stage either if it is infected, with probability p, or if it is noninfected, with probability q, but all r of its tests are positive, each of which happens with probability \(1 - q^{s-1}\). Thus if the results of the tests containing a given noninfected individual were independent, then (11.2) would hold exactly. It turns out that a randomly sampled (rs)-design satisfies this independence condition for most individuals (with high probability); in fact, with positive probability, it satisfies the condition for all individuals.

Further, a lower bound is given in Aldridge (2020), which shows that, among all conservative two-stage algorithms, the random (rs)-regular design is extremely close to optimal for all \(p < 0.3\).

Table 11.3 Resource requirements of using an (rs)-regular design in a conservative two-stage algorithm to test N individuals, at different prevalence levels, with tests of perfect sensitivity and specificity

Table 11.3 shows the performance of the (rs)-regular design according to (11.2) with an optimal choice of r and s. We note that the (rs)-regular algorithm outperforms individual testing, Dorfman’s algorithm, and the grid algorithms for all values of p in the table.

Table 11.3 shows results with the mathematically optimal choice of (rs), but as the prevalence gets small, these parameter choices can get quite large. This could be unwieldy or even infeasible for a laboratory to carry out, and large values of s (the pool-size) provoke worries about sensitivity (with imperfectly sensitive tests). However, typically r can be reduced somewhat and s reduced quite a lot with only a marginal reduction in performance. For example, at \(p = 0.5\%\), the optimal choice is \(r = 7\), \(s = 147\), giving an expected tests per individual of 0.063. But reducing the parameters to the much more manageable \(r = 3\), \(s = 62\) still gives an expected tests per individual of 0.072, which is only slightly worse. The practically best choice of parameters will depend on a laboratory’s capability for carrying out complicated procedures, and worries about the impact of dilution on test sensitivity (see Sect. 11.4).

11.3 Pooled Testing Algorithms for Imperfect Tests

11.3.1 The Model

In this section, we refine the model of the previous section to take into account the fact that the tests we are dealing with do not always give the correct answer. Recall that the PCR test has very high sensitivity, typically higher than \(99\%\), meaning false positive test results are extremely rare, and has moderate sensitivity, typically between 70 and \(90\%\), meaning that false negative results are not uncommon.

Here, we use a very simple model for such tests. We assume that each test on a pool containing at least one infected sample has a fixed probability u of correctly returning a positive result, and that each test on a pool of entirely noninfected samples has a fixed probability v of correctly returning a negative result, independently of the outcomes of all other tests (including of tests on overlapping pools), and independently of the size of the pool (that is, with no ‘dilution’ effect).

Whether or not this is a realistic model will depend upon the main sources of false negatives and of false positives, and therefore on the precise protocol being used and the practical situation. If the main source of insensitivity or nonspecificity is a shortage of reagents, faulty equipment, or faulty lab-procedures, then it is probably quite realistic. If individuals are frequently swabbed incorrectly (as can happen when individuals are asked to self-swab), then incorrectly taken swabs will be an important source of insensitivity, and in this case, unless individuals are re-swabbed at each successive stage of a group-testing algorithm and there are no overlapping pools at any single step, the independence assumption will not be valid. Moreover, the assumption that dilution (where a small number of positive samples are diluted by a large number of negative samples), does not affect sensitivity, is likely to be fairly realistic with pool-sizes of 10 or less and with typical viral loads, but will be less realistic with pool-sizes of 100 or more. See Sect. 11.4 for a further discussion of this issue.

In the rest of this section, we look at some results regarding two-stage and one-stage algorithms, under this model for noisy tests.

11.3.2 Analysis of Individual Testing and Dorfman’s Algorithm

For a given algorithm, there are (at least) three things we want to know: First, how many tests do we expect to use? Second, how many false negative declarations do we expect to make? Third, how many false positive declarations do we expect to make? A useful quantity for comparing algorithms is the expected number of tests per isolated individual (ETI): that is, the expected number of tests used, divided by the expected number of infected individuals correctly discovered and instructed to isolate. Since the isolation of infected individuals is the main public-health goal of a screening programme, ETI is a good measure of how much benefit we are getting per test used (though it does not take into account turnaround time).

Let us start with individual testing. To test N individuals, this requires exactly N tests. There are pN infected individuals on average, and we find each one if its test correctly gives a positive result, which happens with probability u. So on average we correctly find upN infected individuals but falsely miss \((1-u)pN\) of them; so the ETI is up. Similarly, also on average, of the \((1-p)N\) noninfected individuals, we correctly identify \(v(1-p)N\) of them, but falsely declare \((1-v)(1-p)N\) of them to be infected.

Now consider using Dorfman’s algorithm to test N individuals with pools of size s, with N a multiple of s, and suppose we use the protocol of declaring an individual to be infected only if both their pooled test and their individual test are positive. (If a pool tests positive in the first stage but all the corresponding individual tests in the second stage are negative, the pooled test is assumed to be a false positive.)

First, an individual will make it through to the second stage if either the pool is infected, and correctly gives a positive result; or if the pool is noninfected, but incorrectly gives a positive result. Thus the expected tests per individual is

$$\begin{aligned} \frac{\mathbb E T}{N} = \frac{1}{s} + u(1-q^s) + (1-v)q^s , \end{aligned}$$

where \(q = 1-p\). Here, 1/s represents the requirements of the first stage (i.e., the pooled tests), \(u(1-q^s)\) represents the requirements of the second stage in the case of a true positive pool result, and \((1-v)q^s\) represents the requirements of the second stage in the case of a false positive pool result. For small p, it turns out that an essentially optimal choice for minimising this quantity is \(s = \lfloor 1/\sqrt{up} \rfloor \); this can be shown in a similar way to in the previous section. This yields an expected number of tests per individual which is approximately \(2\sqrt{up} + (1-v)\), compared to 1 for individual testing. For \(p = 0.02\), \(u = 0.8\), \(v = 0.995\), we get an improvement from 1 test per individual to 0.25 tests per individual.

Second, the expected number of infected individuals found is \(u^2pN\), as there are pN infected individuals on average, and they are found if both their pooled test and their individual test are correctly positive. The other (on average) \((1-u^2)pN\) infected individuals get false negative declarations. Compared to individual testing, where the total expected number of false negatives is simply \((1-u)pN\), we have

$$\begin{aligned} \frac{(1-u^2)pN}{(1-u)pN} = 1+u \le 2 , \end{aligned}$$

and therefore the expected number of false negatives under Dorfman’s algorithm can never be more than twice that when individual testing is used.

Third, a noninfected individual is falsely declared infected if both their pooled test is positive—either due to a false positive test or the presence of an infected individual and a true positive test—and their individual test is a false positive. This event has probability

$$\begin{aligned} \big ((1-q^{s-1})u + q^{s-1} (1-v)\big )(1-v) . \end{aligned}$$

Hence, the expected number of false positives is

$$\begin{aligned} \big ((1-q^{s-1})u + q^{s-1} (1-v)\big )(1-v)qN. \end{aligned}$$

Again, for small p with \(s = \lfloor 1/\sqrt{up} \rfloor \), this is approximately \((\sqrt{pu} + 1 - v)(1-v)qN\). Compared to the expected number of false positives under individual testing, which is \((1-v)qN\), Dorfman gives an improvement by a factor of \(\sqrt{up} + 1 - v\). For \(p = 0.02\), \(u = 0.8\), \(v = 0.995\), Dorfman gives an expected number of false positives which is approximately 0.12 times its value under individual testing. Thus Dorfman produces far fewer false positives than individual testing, a feature which is common to many other pooled testing algorithms.

Table 11.4 Expected resource requirements and impact of using individual testing or Dorfman’s algorithm to test N individuals, at different prevalence levels, with tests of sensitivity 80% and specificity 99.5%

Under the assumption that the sensitivity u of each test is 0.8 and the specificity v of each test is 0.995, Table 11.4 summarises, for different prevalences, the expected number of tests, false negatives, and false positives for individual testing and for Dorfman’s algorithm. Note that, compared to individual testing, Dorfman’s algorithm dramatically decreases the number of tests required and the number of false positives, but roughly doubles the number of false negatives.

The ETI of Dorfman’s algorithm (used in the way above), is \((1/s + u(1-q^s) + (1-v)q^s)/(u^2 p)\); if p is small and we choose \(s = \lfloor 1/\sqrt{up} \rfloor \), as described above, then this is approximately \((2\sqrt{up} +1-v)/(u^2 p)\).

11.3.3 One-Stage Testing

We now briefly consider general one-stage (nonadaptive) pooling algorithms, under our simple model of imperfect tests. Here, we see the results of the tests, and we must try to come up with a ‘best guess’, from those results, as to which individuals were infected. The precise meaning of ‘best guess’ depends (for example) on the down-sides of missing infected cases (false negatives), and of false positives. This kind of problem is known as an inference problem.

We suppose the pooling design is chosen according to a pooling matrix \(\mathsf {A} = (a_{ti}) \in \{0,1\}^{T \times N}\), a matrix of zeros and ones, where \(a_{ti} = 1\) if the sample from individual i is included in the tth pooled test, and \(a_{ti} = 0\) otherwise. The T rows of the matrix \(\mathsf {A}\) represent the T pooled tests, and the N columns represent the N individuals being tested.

Some further notation is useful. Let \(\mathbf {x} = (x_i) \in \{0,1\}^N\) be the vector of zeros and ones where \(x_i = 1\) if individual i is infected, and \(x_i = 0\) if individual i is uninfected; the vector \(\mathbf {x}\) represents which individuals are truly infected and which are not, and it is what we really want to guess; we refer to it as the ‘infection vector’.

We write \(\mathbf {y} = (y_t) \in \{0,1\}^T\) for the actual outcomes of the tests, i.e., \(y_t = 1\) if the tth pool tests positive, and \(y_t = 0\) otherwise. Finally, we write \(\tilde{\mathbf {y}} = \tilde{\mathbf {y}}(\mathsf {A}, \mathbf{x})\) for what the T outcomes of the pooled tests on the infection vector \(\mathbf {x}\) would be, under perfect testing, i.e., \(\tilde{y}_t = 1\) if the tth pool would test positive under perfect testing, and \(\tilde{y}_t=0\) otherwise. Explicitly, \(\tilde{y}_t=1\) if \((\mathsf {A} \mathbf {x})_t \ge 1\), and \(\tilde{y}_t = 0\) if \((\mathsf {A} \mathbf {x})_t = 0\).

Given \(\mathbf {y}\) and \(\mathsf {A}\), we must come up with an estimate \(\hat{\mathbf {x}}\) for \(\mathbf {x}\). If we are only interested in estimating the most likely set of infected individuals (i.e., we do not want to err on the side of caution when we report to the individuals whether they are infected or not), then it makes sense to report a maximum a posteriori (MAP) estimate (though there are other reasonable alternatives, e.g. minimising the expected number of false positives plus false negatives). It can be shown using the standard techniques that the MAP choice for \(\hat{\mathbf {x}}\) is one where \(\tilde{\mathbf {y}}(A,\hat{\mathbf {x}})\) is chosen to minimize the ‘penalty function’

$$ f(\hat{\mathbf {x}}) = a\times \#\{i : \hat{x}_i = 1\} + b\times \#\{t : y_t = 1, \tilde{y}_t = 0\} +c\times \#\{t : y_t = 0, \tilde{y}_t = 1\}, $$

for some constants \(a, b, c \ge 0\)—these constants depend on the assumed prevalence, and the assumed sensitivity/specificity of the (pooled) tests. As a simple example, the P-BEST algorithm (see Sect. 11.5.2) minimises the penalty function with \(a = 0\) and \(b = c = 1\). We note that, if the estimated cost of declaring an infected person uninfected is much greater than the estimated cost of declaring an uninfected person infected, then the estimate with minimum expected cost may be different from the MAP estimate, and the penalty function to be minimized, should be changed, but Bayesian analysis can still be applied. We do not here get into the question of how to efficiently perform such an analysis.

11.4 Practical Challenges for Pooled Testing

The practical challenges and the downsides of implementing a pooled testing algorithm for COVID-19 testing—either as part of a national or local testing programme or within an autonomous institution or company—depend to some extent on the algorithm in question. A cost-benefit analysis is of course desirable, in each setting where pooled testing may be considered.

Small benefit at high prevalence.    As mentioned earlier, when prevalence is greater than 38.2%, no pooled testing algorithm can outperform individual testing (Fischer et al. 1999) (under the assumption of perfect tests). Even when prevalence is significantly less than 38.2%, it will often be judged that the down-sides of pooled testing outweigh the advantage of resource-savings. For example, pooled testing has been approved in India only for use in areas where the population prevalence is 2% or less (see Sect. 11.5.4).

Increased turnaround time.    As mentioned above, a single PCR test can typically be performed in four to six hours. If pooling can be automated, using e.g. a pipetting robot, then one-stage (non-adaptive) pooled testing algorithms will not have a significantly longer turnaround time than individual testing. For multistage pooled testing algorithms, such as Dorfman’s algorithm, conservative two-stage pooled testing algorithms based on (rs)-regular designs, or the multistage algorithm piloted in Rwanda (see Sect. 11.5.3), the increase in turnaround time (relative to individual testing) will depend partly on the laboratory set-up. The impact of increased turnaround times clearly depends upon the damage done by letting infections go undetected for longer; in the case of screening healthcare workers or social care workers, who work with individuals highly vulnerable to severe illness if exposed to SARS-CoV-2, this impact is likely to be much greater than in the case of screening university students or factory workers (for example).

Laboratory infrastructure.    Dorfman’s algorithm does not require any sophisticated equipment to implement: the pooling of the samples can be done by hand. The pooling of the samples can even be done, as at the University of Cambridge (see Sect. 11.5.1), by the individuals to be tested, thus imposing no extra workload on laboratory staff. It only requires the laboratory to keep track of which individuals correspond to which pooled samples, and a capability to perform individual follow-up tests on the individuals whose pools test positive (or alternatively, for part-samples from each individual to be kept back during the first step, in case follow-up testing is needed on that individual in the second step). Some laboratories, e.g., those suffering from a shortage of well-trained personnel, or of equipment, would struggle to implement even Dorfman’s algorithm (the simplest to implement): even keeping track of which samples belong to which individuals, once these have been divided in half, may prove challenging under conditions of extreme pressure and consequent disorganisation.

The grid-algorithm is most efficiently implemented using a pipetting robot with an arm that can move in two dimensions; this piece of equipment, while commercially available, may be too expensive for organisations operating with a low budget, and for poorer countries. An alternative is to do the pooling manually, if sufficient manpower is available.

The impact of dilution on test sensitivity.    When one infected sample is pooled with several others that are uninfected, the viral RNA is diluted, and this dilution leads to a decrease in the sensitivity of the PCR test on the pooled sample. The precise impact on sensitivity depends upon the laboratory protocol used, and also upon the distribution of viral loads in the samples being tested (which, in turn, depends upon the stage of the illness in the individuals being tested, as well as on individual biological factors). The following, however, gives a rough idea of the impact of dilution on sensitivity. Using a common protocol, and a set of 838 SARS-CoV-2 positive specimens, Bateman et al. (2021) found that dilution by a factor of five led (on average) to a 7% reduction in sensitivity, dilution by a factor of ten led (on average) to a 9% reduction in sensitivity, and dilution by a factor of 50 led (on average) to a 19% reduction in sensitivity. By contrast, a systematic review of the accuracy of individual PCR tests found false negative rates ranging from 2 to 29%, using repeated PCR-testing as the gold standard (for true positivity). Using repeated PCR testing as the gold standard for positivity is likely to underestimate the true rate of false negatives.

More importantly, for tests taken in the field, is the fact that swabs are sometimes taken incorrectly (see the previous chapter of this book, by Dunbar and Tang, for further discussion of this issue). One community-based study of close contacts of confirmed COVID-19 cases in China, found an overall sensitivity of 71% for upper-throat swabbing (by trained medical personnel) followed by an individual PCR test, using repeated PCR tests as the gold standard (for true positivity). When individuals self-swab, sensitivity is likely to be lower, unless the individuals themselves are appropriately trained (e.g., healthcare workers). Compared to these factors, overall, the impact of dilution on sensitivity, is relatively minor.

In the theoretical results we have described here, the number of samples per test s can get extremely large when the prevalence p is very small. In real applications, laboratories would be unwilling (due to dilution concerns) or unable (due to equipment capacity) to pool together very high numbers of samples. Thus for extremely low prevalences, the gains of pooled testing are unlikely to be as high as the theory suggests.

How to deal with inconsistent test results.    When the sensitivity or the specificity of the tests being used is less than 100%, a pooled testing algorithm can yield inconsistent results. For example, when Dorfman’s algorithm is implemented, a particular pool can test positive due to the presence of one infected individual (a true positive), but then in the second stage of follow-up (individual) testing, all the individuals in that pool can test negative, due to the infected individual testing negative (a false negative). In this case, the testing authorities are faced with a dilemma: they could assume (wrongly, in this case), that the first (pooled test) was a false positive and that the follow-up tests were true negatives, or they could simply choose to declare the individuals in question free from infection (to avoid having to recall them for further testing), but they could also decide to repeat the second round of individual tests on that group (pool) of individuals, to hedge against the possibility that there was a false negative in the second round of individual tests. This could be regarded as a third, ‘confirmatory’, testing-step in the algorithm. (If extra samples cannot be taken from the individuals in question, a third, confirmatory, testing-step may in fact be impossible.)

Which option the authorities choose, may depend on the impact of letting an infection go undetected; if the individual in question is a school pupil or a member of the general community (in a mass-testing programme), the impact will be less than if the individual is a healthcare worker working with patients highly vulnerable to severe illness or death in the case of COVID-19 infection, or if the individual is a resident-facing social care worker. The extra resource-requirements, and the delay, of a third (confirmatory) round of testing, should it be judged necessary, would have to be taken into account when deciding whether to adopt the pooled testing strategy.

Prior prevalence estimates.    In situations where surveillance is poor, or infection levels are changing very rapidly, a laboratory may have very little idea of the prevalence of infection in an incoming batch of samples to be tested for SARS-CoV-2 infection. In such circumstances, a suboptimal choice of the parameters in a pooled testing algorithm algorithm (due to an underestimate of the prevalence) can lead to an inefficient second step—less efficient, in fact, than individual testing. For example, if the prevalence is 25%, then using Dorfman’s algorithm with pools of size 10 requires on average approximately 1.04 tests per individual, which is slightly worse than individual testing (in addition to having a longer turnaround time). However, if an upper bound on the prevalence is known with a reasonably high degree of certainty (and this upper bound is not too high), then the parameters in a pooled testing algorithm can still be chosen so as to achieve a significant resource-saving over individual testing, even though the parameters cannot be fined-tuned to the exact prevalence. For example, the number of tests per individual for Dorfman’s algorithm with fixed pool size s, is increasing in p, so the ‘worst case’ is when p is maximal. Hence, if the prevalence is known to be at most \(1\%\), then Dorfman’s algorithm with pools of size 10, will require at most \(\tfrac{1}{10} + 1 - (1-p)^{10} \le \tfrac{1}{10} + 1 - (1-0.01)^{10} \approx 0.196\) tests per individual (on average). Thus we get at least a five-fold improvement on individual testing, as long as the prevalence does not rise above \(1\%\).

Regulatory approval.    All of the issues listed above may be obstacles to regulatory approval. An additional obstacle to regulatory approval is the complexity of pooled testing algorithms, compared to individual testing. Often, policymakers will need to give their approval, bearing in mind public opinion, and a procedure which cannot be understood by a large percentage of the public (or by policymakers without the requisite quantitative training), may be less likely to gain such approval. On the other hand, the simpler pooled testing algorithms, such as Dorfman’s algorithm and the grid algorithm, almost certainly can be explained in such a way that policymakers (and the majority of the public) can understand them—and this may well be part of the reason why these two algorithms have seen the widest use, of all pooled testing algorithms, during the COVID-19 pandemic so far.

11.5 Uses of Pooled Testing in the COVID-19 Pandemic

Hitherto in the COVID-19 pandemic, Dorfman’s algorithm has been the most widely-used pooled testing strategy. This is almost certainly because it is (i) easy to implement without necessarily needing large changes in laboratory equipment or infrastructure, (ii) relatively robust to changes in prevalence, (iii) simple and transparent enough for non-scientific decision makers or the public to understand and for regulators to approve, and (iv) of easily predictable and controllable sensitivity. At the same time, it can yield large efficiency gains over individual testing, as outlined above. In this section, we give some examples of places where pooled testing has been applied during the COVID-19 pandemic, focussing on examples where detailed information is available.

11.5.1 Dorfman’s Algorithm at the University of Cambridge

During Autumn Term of 2020 (6 October–4 December 2020), University of Cambridge students in college accommodation were asked to participate in the University’s asymptomatic COVID-19 screening programme (University of Cambridge 2020).

Students were divided into ‘bubbles’ of average size 8 and maximum size 10, with each bubble consisting of students sharing facilities (e.g., a bathroom, kitchen or living room). In a typical week, half of the students in each ‘bubble’ were requested to provide nasal swabs, which were then collected together into a single container by one of the students. This container was then sent to a local laboratory, where a pooled PCR test was performed on the pooled samples. If a pooled sample tested positive, each student in the corresponding bubble was informed, and instructed to take an individual PCR test at one of the National Testing sites. A simple rota determined which students were asked to provide swabs on which weeks; on average, a student was asked to provide a swab approximately once per fortnight. Students who were symptomatic did not take part on weeks when they were symptomatic, as all students experiencing symptoms were instructed to seek an individual test. Students who had recently tested PCR-positive did not take part either. Participation was voluntary, but consent rates were high, starting at 75% of all 15,479 eligible students during the first week of term, and steadily increasing to 82% of all 15,310 eligible students in the last week.

Students were requested not to socialise outside their bubbles. Assuming a high level of compliance, this would mean that positive cases were more likely to cluster within bubbles, leading to a resource-saving at the second stage (of follow-up individual testing): this is the ‘best case’ in Dorfman’s algorithm, in terms of resource-use at the second step, when the infected individuals are distributed among as few pools as possible. In the sixth week of term, for example, 80 students individually tested positive across 59 positive-testing pools (Warne 2020), giving an average of approximately 1.3 infected students per positive testing pool. Had the infected individuals been more evenly distributed, there would have been only one positive-testing student per positive-testing pool.

11.5.2 The Grid and P-BEST Algorithms in Israel

In August 2020, Israel’s Ministry of Health approved two single-step pooled testing protocols for use in clinical laboratories in the country: one based upon the ‘grid algorithm’, and one based upon the ‘P-BEST’ algorithm of Shental et al. (2020). The grid algorithm has been described earlier.

P-BEST uses an \((r = 6, s = 48)\)-regular design with a code-based construction. This deals with \(N = 384\) individuals in \(T = Nr/s = 48\) pooled tests. A ‘best guess’ for the identifies of the infected individuals (from the results of the pooled tests), is obtained using the method described in Sect. 11.3.3, above.

It should be noted that the P-BEST algorithm does require software and computing resources to implement, in addition to a pipetting robot with an arm that can move in two dimensions. However, this equipment is affordable by most countries.

In trials where at most 5 out of 384 individuals are PCR-positive under individual PCR testing (corresponding to a prevalence of 1.3% or lower), the P-BEST algorithm usually correctly identified all infected and uninfected individuals. Problems can arise when a batch of samples is received with a much higher prevalence than 1.3%; in this case, even if there are no false positive and or false negative test results, it is often not possible to use P-BEST to determine which individuals are infected and which are not.

In Israel, in clinical laboratories where P-BEST (or the grid algorithm) has been employed, data analysis and machine-learning has been employed also, to predict which batches of samples are likely to have much higher prevalence rates than the national average (based on origin); such batches were typically dealt with using individual testing.

11.5.3 A Multi-stage (rs)-Regular Algorithm in Rwanda

Starting in August 2020, a multi-stage algorithm was piloted in Rwanda, where infection prevalence was low but the supply of PCR tests was limited (Mutesa et al. 2021).

The stages are as follows. It is required to pick two integer parameters, a and \(r_2\).

  1. 1.

    A Dorfman-like stage with \(r_1 = 1\) pooled test for each individual and \(s_1 = a^{r_2}\) samples in each test. A negative result shows that all \(s_1\) individuals are noninfected; individuals in a positive pool go through to stage two.

  2. 2.

    An \((r_2, s_2 = a^{r_2-1})\)-regular design, with a hypercube construction (see Sect. 11.2.5).

  3. 3.

    If the hypercube contains zero or one infected individuals, they can be identified. Otherwise, further stages of testing are used to disambiguate the results; we don’t go into details here.

The parameters are chosen to be as efficient as possible while still ensuring that the chance of more than two stages being required is very small. One common choice is \(a = 3, r_2 =3\), so that the first stage is a (1, 81)-regular Dorfman-like design, and the second stage is a (3, 9)-regular hypercube design of 9 tests for 81 individuals.

11.5.4 Other Uses of Pooled Testing

Here is a brief (and far from exhaustive) list of some other examples of the use of pooled testing in the COVID-19 pandemic.

  • PCR testing using Dorfman’s algorithm in Wuhan, China (Fan 2020). Between 12 May and 1 June 2020, 9.9 million Wuhan residents were tested; the vast majority were asymptomatic. Dorfman’s algorithm was reportedly used for approximately 25% of this testing, with pools of sizes between 5 and 10. Only 300 positive cases were identified.

  • Screening of students on University campuses, using Dorfman’s algorithm or the grid algorithm, during the Autumn/Fall term of 2020: Université de Liège, Belgium (saliva samples using Dorfman’s algorithm with pools of size 8) (Université de Liège 2020); Duke University, USA (saliva samples using Dorfman’s algorithm with pools of size between 5 and 10; participation mandatory for students on campus) (Denny et al. 2020); Michigan State University, USA (saliva samples using a grid algorithm) (Michigan State University 2020); Syracuse University, USA (saliva samples using Dorfman’s algorithm with pools of size between 20 and 25) (Syracuse University 2020); Shenandoah University, (saliva samples using Dorfman’s algorithm with pools of size 4 or 5) USA (Shenandoah University 2020).

  • PCR testing using Dorfman’s algorithm with pools of size up to 20, by Fundación Biomédica Galicia Sur, Galicia, Spain, to screen asymptomatic healthcare workers, social care workers, industrial workers and port workers in the province of Galicia from September 2020 onwards, raising screening-capacity to 100,000 screenings per month (with health and social care workers being screened twice per week). Pipetting robots have been used (La Voz de Galicia 2020).

  • PCR testing using Dorfman’s algorithm with pools of size up to 30, by Saarland University Hospital, Germany, for the regular screening of asymptomatic hospital patients and hospital staff, and care home residents in Saarland, from March 2020 onwards. Approximately 22,000 people screened (Universität des Saarlandes 2020).

  • PCR testing using Dorfman’s algorithm with pools of size 10, by Noguchi Memorial Institute for Medical Research, Ghana, to test contacts of confirmed cases. Initially 10,000 people tested per day, from April 2020 onwards (World Health Organization 2020).

  • PCR testing using Dorfman’s algorithm with pools of size 5, by the states of Uttar Pradesh (Sharda 2020) and West Bengal (Yengkhom 2020), India, in areas with estimated prevalence of 2% or lower.

11.6 Applications of Pooled Testing for COVID-19: Some Conclusions

In this section, we conclude, by drawing some of the above analysis together and discussing our own personal perspective on the practical settings where pooled testing for COVID-19 is likely to be useful, or at least may merit serious consideration.

11.6.1 Pooled Testing for Asymptomatic Subpopulations

As stated in the introduction, we believe that pooled testing is most likely to be useful for the screening of asymptomatic people, for surveillance, and possibly for the testing of contacts of confirmed cases, provided the prevalence of infection among the group to be tested is sufficiently low. On the other hand, we believe that in most countries, pooled testing is unlikely to be useful for the testing of symptomatic people. (Here, we use the term ‘asymptomatic’ to denote someone who is not experiencing the recognised symptoms at the time of their test. This includes ‘true asymptomatics’, who never experience symptoms, and ‘pre-symptomatics’, who go on to develop symptoms after their test.)

There are two main reasons why we believe that, in most countries, pooled testing will be of limited use for the testing of symptomatic people. First, the prevalence of COVID-19 infection among those presenting symptoms is usually sufficiently high that the resource savings of pooled testing are modest compared to individual testing, and may be outweighed by the down-sides of pooled testing, such as increased turnaround time. Among those presenting COVID-like symptoms, prevalences of between 4 and \(33\%\) are realistic, depending upon the setting, the location, the symptoms used in the definition of ‘symptomatic’, and the prevalence of other respiratory viruses (which in turn depends on the time of year) (Menni 2020; Pueyo 2020). Second, many countries already have well-established testing programmes using individual testing for those presenting symptoms. In many countries, including the UK, an individual test on symptomatic people is mandated by the regulatory authorities to confirm infection, even if they can be proved positive solely via pooled tests.

On the other hand, in many countries, the prevalence of COVID-19 infection among the general population has for quite long periods been at levels low enough that pooled testing of asymptomatic people can yield large efficiency gains over individual testing. For example, the estimated prevalence of current SARS-CoV-2 infection in the general community in England, as estimated by the ONS Infection Survey (Office for National Statistics 2021), has ranged from 0.026% in early July, to 2.1% in early January (and 3.6% in London in early January). Hence, for the prevalence among asymptomatics in England, values of p between 0.02 and \(1.6\%\) are good estimates. Countries that have adopted more stringent non-pharmaceutical interventions (such as stricter lockdowns or strongly enforced quarantines for international arrivals) have experienced lower prevalence rates; for example, New Zealand probably eliminated COVID-19 infections in the general community between early May and mid-August 2020, except for international arrivals, who were quarantined (Baker et al. 2020).

It is often desirable to screen subpopulations where the prevalence of infection is likely to be significantly higher than in the general population: for example, patient-facing healthcare workers, resident-facing social care workers, and factory workers in high-risk environments such as meat-processing plants. But it may also be desirable to screen subpopulations where the prevalence of infection is likely to be similar to the general population: see some of the examples below.

We list here some of the settings where we believe pooled testing for COVID-19 is most likely to be useful. Of course, a careful cost-benefit analysis should be carried out for each potential application, with the decision to adopt or not depending on certain factors, including the prevalence level, laboratory resources and capabilities, the impact of increasing the turnaround time, and regulatory constraints.

  • University students. As may be apparent from the relatively large number of examples of this in Sect. 11.5, screening of asymptomatic university students is one of the less controversial applications of pooled testing. Severe illness is very rare among those of student age, so the impact on students of an increase in the turnaround time associated with pooled testing is slight. If a significant amount of in-person teaching is taking place, there are higher risks to older members of staff in the event that they are infected, so the impact on them of increased turnaround time should be taken into account.

  • Key workers—for example factory workers, warehouse workers and port workers (but excluding patient-facing healthcare workers and resident-facing social care workers) provided the prevalence is not too high among the workers in question. We recall from Sect. 11.5.4 that pooled testing has been used in Galicia for the screening of factory and port workers.

  • School pupils and staff. The logistical challenges of the regular screening of asymptomatic school pupils are greater than in the case of university students (who can, if necessary, organise much of the process themselves; see Sect. 11.5.1). Schoolchildren—particularly younger schoolchildren—cannot do this, so the additional organisational burden is placed on schools, who are already overstretched. A further problem is that there is little incentive for low-income families to agree for their children to participate in regular screening, since if they test positive and this is reported, the parents are likely to have to take time off work. (Financial compensation for this may help.) Another concern is that school-aged children—particularly primary-school aged children—may not tolerate regular swabbing, although saliva tests would not have this problem.

  • Members of sports teams. Screening of asymptomatic members of sports teams is suggested in Mutesa et al. (2021).

  • Airline passengers. This is also suggested in Mutesa et al. (2021).

  • Non-household contacts of confirmed cases, provided the estimated prevalence among these is sufficiently low. We recall from Sect. 11.5.4 that this has been done in Ghana.

One potential application of pooled testing that many authors are circumspect about is the screening of asymptomatic healthcare workers and social care workers. Given the major risks to vulnerable patients and social care residents associated with any additional delay in finding positive cases among these workers, screening using individual testing is often thought to be preferable, if there are sufficient resources. Even in the presence of very severe resource constraints, a careful cost-benefit analysis should be performed to compare the impact of screening based on individual testing with that based on pooled testing, taking into account the increased turnaround time. Even Mutesa et al. (2021), who are in general strong advocates of the use of pooled testing for COVID-19, state explicitly that they do not advocate its use for the screening of healthcare workers. We do note, however, that pooled testing was used by Saarland University Hospital for this purpose (see Sect. 11.5.4).

11.6.2 Pooled Testing and Vaccination Programmes

At the time of writing (March 2021), vaccination programmes are proceeding rapidly in many developed countries and are having a large effect in reducing the number of COVID-19 cases (Aran 2021). It might be thought that, in such countries, there will soon be no need to consider pooled testing. We believe that such an assumption may be premature at this stage, mainly because we do not yet have reliable data on the extent to which the vaccines currently being distributed reduce the number of cases (symptomatic and asymptomatic) of new variants of COVID-19, particularly the South African and Brazilian variants (Mahase 2021). (See Aldridge and Ellis (2021) for a discussion of the evidence base on this to date.) Bearing in mind this uncertainty, and the risk of further new variants arising that are resistant to available vaccines, we believe it would be wise for decision-makers to bear in mind the possibility that it may be desirable to rapidly increase testing-capacity using pooled testing in the medium term. In poorer countries, vaccination programmes are likely to be long delayed. In such countries, pooled testing may still be a valuable tool to consider for the foreseeable future.

11.6.3 Pooled Testing for Surveillance

At low prevalence levels, pooled testing has the potential for very large resource-saving in national COVID-19 surveillance programmes. In some countries with a very large testing capacity, this may not be necessary—for example, the UK’s ONS ( Coronavirus) Infection Survey currently tests a random sample of approximately 400,000 members of the community population in England, once per fortnight, using individual testing (Office for National Statistics 2020); compared to the UK’s Pillar II testing capacity of more than 200,000 tests per day, this is not too great a resource requirement. However, if it is desired to reduce the resource requirements of a nationwide surveillance programme, pooled testing provides a way of doing so. Using pools of size up to 100, the hypercube-based algorithm piloted in Rwanda by Mutesa et al. (2021) can estimate the prevalence fairly accurately, while achieving an approximately 100-fold reduction in number the tests used when the prevalence level is at \(0.05\%\). The main concern here is the reduction in sensitivity caused by dilution, but Mutesa et al. (2021) report proof-of-concept experiments which suggest that, using an appropriate protocol, sensitivities of 98% or 92% (depending on the gene targeted), can be achieved at a 100-fold level of dilution. This suggests that their scheme can be used reliably to monitor prevalence. The extra turnaround time compared to individual testing is likely to be much less of an issue with surveillance than with case identification, particularly at low prevalence levels.

It is also plausible that pooled testing could be used to monitor the prevalence of new variants. This may become particularly important if new variants begin to seriously hinder the success of vaccination programmes. A new PCR-testing method (involving only a minor update to existing PCR tests) that can detect which variant of SARS-CoV-2 a patient is carrying is currently undergoing clinical trials by the biotechnology firm Novozymes (Merrifield 2021).