Background

Detection of current infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a crucial component of targeted policy responses to the COVID-19 pandemic that involve minimising infection within vulnerable groups. For instance, residents and staff in care homes may be tested regularly to minimise outbreaks among elderly populations [1]. Alternatively, healthcare workers (HCWs) may be routinely tested to prevent nosocomial transmission to patients who may have other comorbidities [2, 3]. Both of these populations have a substantially higher risk of fatality from COVID-19 infection than the general population [4, 5].

In the UK, testing commonly uses polymerase chain reaction (PCR) to detect the presence of viral ribonucleic acid (RNA) in the nasopharynx of those sampled [6]. The sensitivity of PCR tests at any given point during infection depends upon the amount of viral RNA present; this increases at the start of the infection up to the peak viral load, which appears to occur just before, or at, the time of symptom onset [7,8,9]. Viral load then decreases, but infected individuals continue to shed viral RNA for an average of 17 days after initial infection (but this can be far longer than the average, the longest observed duration has been 83 days) [10]. A greater severity of illness is frequently associated with a significantly longer duration of viral shedding [11,12,13]. Asymptomatic infections have been found to have similar viral loads to symptomatic cases around the time of infection, but instead exhibit shorter durations of viral shedding in some studies [14].

Estimates of temporal variation in the probability of detecting infections by PCR are crucial for planning effective routine asymptomatic testing strategies in settings with vulnerable populations. The testing frequency required to detect the majority of infections before they can transmit onwards will depend on both how soon—and how long—an individual remains positive by PCR test. Measuring the probability that testing will detect SARS-CoV-2 at a given time-since-infection is challenging for two main reasons. First, it requires knowledge of the timing of infection, which is almost always unobserved. Second, it requires a representative sample of tests done on people with and without symptoms performed at many different times with regards to the time of infection. Testing is usually performed on symptomatic infections after symptom onset, leading to an unrepresentative sample [15].

To address these challenges, we analysed data that covered the regular testing of healthcare workers in London, UK. We inferred their likely time of infection and used the results of the repeated tests performed over the course of their infection to infer the probability of testing positive depending on the amount of time elapsed since infection. This overcame the bias towards testing around the time of symptom onset, although we focused on data from symptomatic infections so that the timing of symptom onset could be used to infer the likely time of infection.

Methods

We used data from the SAFER study [16] conducted at University College London Hospitals between 26 March and 5 May 2020, which repeatedly tested 200 patient-facing HCWs by PCR and collected data on COVID-19 symptoms at the time of sampling [16]. Samples were tested utilising the pipeline established by the COVID-Crick-Consortium. Individuals were asymptomatic at enrolment and were tested for SARS-CoV-2 antibodies at the beginning and end of the study period. During the study, HCWs were asked to try and provide two samples per week. Out of the 200 HCWs enrolled in the study, 46 were seropositive at the first antibody test, which occurred some time between 27 March and the 6 April. Out of the remaining HCWs, 36 seroconverted over the study period, and 42 returned a positive PCR test at some point during the study (a detailed analysis of the characteristics of this HCW cohort can be found in Houlihan et al. [16]). We focused on a subset of 27 of these HCWs that seroconverted during the study period and reported COVID-19 symptoms at one or more sampling times (Fig. 1); the other 15 seropositive individuals were excluded because they had asymptomatic infections. Combining data on 241 PCR tests performed on self-administered nasopharyngeal samples from these 27 individuals, we estimated the time of infection for each HCW as well as simultaneously estimating the probability of a positive test depending on the time since infection.

Fig. 1
figure 1

Testing and symptom data for the 27 individuals used in the analysis. Each point represents a symptom report and PCR test result. The border of the point is green if the PCR test result was positive and purple if it was negative. The inside of the point is red if the individual reported symptoms and white if they did not. Black crosses show the date of the initial negative serological test. Points are aligned along the x-axis by the timing of each participant’s last asymptomatic report

We developed a Bayesian model to jointly infer both the likely infection time for each individual and the probability of a positive PCR test depending on the time since infection across all individuals. We used a likelihood function specifically for inferring parameters from censored data [17] to derive a posterior distribution for the time of infection. This accounts for the fact that the true onset time is censored, i.e. symptom onset for each individual could have occurred anywhere between their last asymptomatic report and their first symptomatic report. Specifically, individual i has their likely infection time, Ti, inferred based on the interval between their last asymptomatic report, \( {t}_i^{last} \), and their first symptomatic report, \( {t}_i^{first} \). The log-likelihood for the infection time for person i is as follows:

$$ \mathcal{L}\left({T}_i|{t}_i^{first},{t}_i^{last}\right)=\mathit{\log}\left(F\left({t}_i^{first}-{T}_i\right)-F\left({t}_i^{last}-{T}_i\right)\right) $$

where F is the cumulative density function of the lognormal distribution for the incubation period of COVID-19 as estimated in Lauer et al. [18]. For a detailed description of the procedure used to arrive at the onset times from the censored data and list of the sources of uncertainty in our model, see Additional file 1: Section D.

For a given inferred infection time for person i, the relationship between the time since infection and a positive PCR test on person i, \( {\mathrm{PCR}}_{n,i}^{+} \), administered at time tn, i is given by a piecewise logistic regression model with a single breakpoint:

$$ {\mathrm{PCR}}_{n,i}^{+}\sim \mathrm{Bernoulli}\left({\mathrm{logit}}^{\hbox{-} 1}\left({\beta}_1+{\beta}_2x+{\beta}_2{\beta}_3 xI(x)\right)\right), $$
$$ x:= {t}_{n,i}-{T}_i-C, $$

where C is the time of the breakpoint, x is the amount of time between infection and testing minus the value of the breakpoint, I(x) is a step function that equals 0 if x < 0 or equals 1 if x > 0, and the β terms define the regression coefficients fit across all tests and people (see Table 1 for parameter details).

Table 1 Summary of model parameters and the median and 95% credible interval from their fitted posterior distributions

To ensure biological plausibility, each individual was assumed to have a negative result at their precise time of infection to constrain the PCR positivity curve to have 0 probability of detection at 0 days since infection. We fitted the model using R 4.0.3 [19] and Stan 2.21.2 [20]; the data and the code required to reproduce the figures and results of this study can be found at the public github repository: https://github.com/cmmid/pcr-profile. We ran four Markov chain Monte Carlo chains for 2000 samples each, discarding the first 1000 samples from each chain as warm-up iterations. Convergence of the chains was assessed using the R-hat statistic being \( \hat{R}<1.05 \) for each model parameter.

We also performed a sensitivity analysis whereby the testing data for one HCW at a time was left out from the model fitting procedure to see if the PCR testing data for any individual HCW had an undue influence on the overall regression fit (results are shown in Additional file 1: Fig. S2).

We looked at two different ways of assessing the performance of different routine asymptomatic testing frequencies. Firstly, we calculated the probability that a symptomatic case would be detected before symptom onset; this demonstrates the ability of testing to catch infections before people eventually self-isolate due to symptoms (by which point they may already have infected someone). Secondly, we calculated the probability that an asymptomatic case is caught within 7 days of infection, estimating how frequently testing would need to be to detect asymptomatic infections in a timely manner. The mathematical equations used to calculate each of these probabilities are shown in Additional file 1: Section C.

Results

The model found that the majority of individuals included in this analysis were infected around the beginning of the study period in late March (Fig. 2). This corresponds with a period of greatly increased hospitalisation in London, which could potentially mean much higher exposure to infectious COVID-19 patients. However, this analysis cannot say for certain where these HCWs were infected.

Fig. 2
figure 2

The posterior of the infection time (Ti) of each participant. The posterior distribution of the infection time for each participant (purple) alongside the censored interval within which their symptom onset occurred (green dashed lines). The square points show the results of PCR tests on each individual; black points denote negative tests and red points denote positive tests

We estimated that the peak median posterior probability of a positive PCR test is 77% (54–88%) at 4 days after infection. The median posterior positivity curve is smoother than any individual posterior sample; this is why this peak does not match the median value for the breakpoint parameter, C, in Table 1 (see Additional file 1: Fig. S3 for examples of unsmoothed posterior positivity curve samples). The probability of a positive PCR test then decreases to 50% (38–65%) by 10 days after infection and reaches virtually 0% probability by 30 days after infection (Fig. 3a, b). Summary statistics for the posterior distributions of the piecewise logistic regression parameters are shown in Table 1. We compared our results for the probability of detection throughout infection to previous results in Additional file 1: Section A; we found a greater probability of detection 1 to 3 days after infection and a consistently lower probability of detection around 10 to 30 days after infection when compared with previous results.

Fig. 3
figure 3

Estimation of positivity over time, and probability that different testing frequencies with PCR would detect infection. a Ct value data for the PCR tests in the SAFER trial. This plot does not show data for every individual included in the analysis. The x-axis shows a time since infection using the median infection date inferred by the model. Points below the threshold of 37, indicating a positive result, are shown in red. Negative results above 37 are shown in black. All negative results for which there is no ct value specified are given the value of 40. b Temporal variation in PCR-positivity based on time since infection. The grey interval and solid black line show the 95% uncertainty interval and the mean, respectively, for the empirical distribution calculated from the posterior samples of the times of infection (see Additional file 1: Section D for methodology). The blue interval and dashed black line show the 95% credible interval and median, respectively, of the logistic piecewise regression described above. c Probability of detecting virus before expected onset of symptoms, based on curve in b, assuming delay from test to results is either 1 or 2 days. Dashed black box shows a site of possible trade-off between testing frequency and results delay discussed in the text. d Probability of detecting an asymptomatic case within 7 days, based on curve in b, assuming delay from test to results is either 24 or 48 h

Our routine asymptomatic testing scenarios established that the higher the frequency of testing, the higher the probability that a symptomatic case will be detected before symptom onset (Fig. 3c) and the higher the probability that an asymptomatic case is detected within 7 days (Fig. 3d). If there is a 1-day delay from performing the test to delivering the result, then increasing the testing frequency from every 4 days to every 2 days increases the probability of detecting an asymptomatic infection within 7 days from 76% (59–87%) to 95% (86–98%). A 2-day delay between testing and notification compared to a 1-day delay led to reduced probability of timely detection in both testing scenarios (Fig. 3c, d). For example, when testing every 2 days, the probability of detecting a symptomatic infection before symptom onset is 58% (CI 40–74%) with a 1-day delay and 42% (CI 27–57%) with a 2-day delay. This is because a longer delay means that an infection must be caught earlier to allow for a longer period of time between a test being administered and the infected person being notified of the results. An increased delay from testing to notification caused a greater relative reduction in the probability of detecting an asymptomatic case within 7 days of infection when the testing frequency was lower (Fig. 3d). Considering a smaller window of detection for asymptomatic infections (i.e. within 5 days rather than 7 days) resulted in reduced probability of detecting asymptomatic infections within such a window (see Additional file 1: Fig. S4).

When considering what is an acceptable testing frequency for detecting a desired proportion of symptomatic cases prior to their symptom onset, there may be a trade-off between testing frequency and the delay from testing to notification. For example, the probability of detecting a symptomatic case prior to onset is very similar for a 2-day testing frequency with a 2-day notification delay (42%, 27–57%) compared to a 4 day testing frequency with a 1-day notification delay (40%, 27–53%). This trade-off is depicted graphically in the dashed black box in Fig. 3b.

During 2020, lateral flow tests (LFTs) with a turnaround time of roughly 30 min for the detection of SARS-CoV-2 have been developed and evaluated [21]. Such tests typically have a lower mean sensitivity than standard PCR tests. However, the faster turnaround time can aid the logistical challenge posed by rapid large-scale testing. Thus far in our analysis, a positive PCR test has been defined by a cycle threshold (Ct) value of less than or equal to 37. However, given that Ct values are also available for the tests in our dataset, we were able to redefine test outcomes using different Ct value thresholds that reflect the potential sensitivity of the more recent LFTs, which can generally detect infectiousness (when viral loads are high) but not always infection (when viral loads may be lower) [22].

The model was re-fitted using two potential LFT-like definitions of a positive test: a Ct value of less than or equal 28, or less than or equal to 25. The newly defined test outcomes are shown in panel a of Figs. 4 and 5, along with the corresponding estimates of test sensitivity as a function of time since infection in panel b. We then used the sensitivity curves in the symptomatic and asymptomatic testing scenarios with frequent testing, assuming no delay between rapid test and result (reflecting the imagined use case of LFTs, results shown in panels c and d).

Fig. 4
figure 4

A copy of Fig. 3 using a Ct value of 28 (instead of 37) to classify a test as positive or not. This is instructive of how a lateral flow test (LFT) might perform as they seem to be less sensitive to infections with lower viral loads than PCR tests. In c and d, the probabilities of detection are now considered with a 0-day delay since LFTs give results within minutes that can be passed on to the person being tested quickly

Fig. 5
figure 5

A copy of Fig. 3 using a Ct value of 25 (instead of 37) to classify a test as positive or not. This is instructive of how a lateral flow test (LFT) might perform as they seem to be less sensitive to infections with lower viral loads than PCR tests. In c and d, the probabilities of detection are now considered with a 0-day delay since LFTs give results within minutes that can be passed on to the person being tested quickly

For the hypothetical LFT test scenario compared to the PCR tests, the peak probability of detection is lower, with a peak probability of detection of 64% (33–85%) at 4.3 days after infection and 42% (13–70%) at 3.8 days after infection for Ct values of 28 and 25, respectively. The probability of detection by LFT also declines to negligible values far sooner after infection, by around 18 days, compared to around 30 days for PCR. However, the uncertainty in the probability of detection curve is wider for these hypothetical LFT tests compared to PCR because there were fewer positive tests to fit to overall. The probability of detecting symptomatic cases before symptom onset, or asymptomatic cases within 7 days of infection, decreases when the Ct threshold for a positive test is lower (panels c and d of Figs. 4 and 5). When the Ct threshold is defined to be 25, even testing every 2 days yields a median probability of detecting symptomatic cases before onset below 50%.

Discussion

The ongoing COVID-19 pandemic has led to increasing focus on routine asymptomatic testing strategies that could prevent sustained transmission in hospitals and other defined settings with at-risk individuals such as care homes. Using data on repeated testing of healthcare workers, we estimated that peak positivity for PCR tests for SARS-CoV-2 infections occurs 4 days after infection, which is just before the average incubation duration, in agreement with other studies finding that viral load in the respiratory tract is highest at this point [23, 24]. We show the sensitivity of the results to the choice of incubation period distribution in Additional file 1: Fig. S5.

We found a substantially higher probability of detection by PCR between 1 and 3 days after infection than a previous study [25]. The low detection probabilities estimated in the previous study for the period 1 to 3 days after infection were fitted to very small amounts of data: one observed negative test on each of 1, 2, and 3 days after infection. Due to the fact that HCWs in the SAFER study were repeatedly tested even when asymptomatic, many of the tests took place close to the inferred infection times. This provided more test data for our model to fit to for the period just after infection. We provide a more rigorous exploration of the differences between our results and existing work in Additional file 1: Section A.

Our model also estimated much lower probabilities of detection between 7 and 30 days after infection compared to the models by Kucirka et al. and Hay and Kennedy-Schaffer et al. A plausible explanation for this difference could be due to the sample collection method and disease severity of the people being tested, leading to different observed viral load dynamics. The SAFER study data used here was collected from self-administered tests by HCWs and the symptoms recorded were those that were compatible with SARS-CoV-2 according to Public Health England, including a ‘new continuous cough or alteration in sense of taste or smell’ [16]. Conversely, the datasets used for fitting the Kucirka model consist mainly of HCW-administered tests on hospitalised patients who are likely to have more severe infections, a factor that has been associated with a longer duration of viral shedding [10] in some studies. As such, our curve for the probability of detection by PCR may constitute a closer approximation of PCR test sensitivity over time in individuals with mild symptomatic infections. This would make it particularly useful for estimating the effectiveness of routine asymptomatic testing strategies, which would seek to detect all infections, not just the most severe.

Incorporating our estimates of PCR detection probability into a model of routine asymptomatic testing strategies, we found that there is the potential for a trade-off between the turnaround time for test results and testing frequency (example in dashed black box, Fig. 3c). This could be particularly relevant for settings that do not have the resources or capacity for very high frequency testing but could ensure prompt results. Although our analysis focuses on the probability of testing positive, any potential testing and isolation strategy would also need to consider the potential for false positives, particularly at low prevalence [26].

The maximum probability of detection of 77% shown by the curve in Fig. 3b refers to the whole population and does not imply that an individual person’s peak probability of being detected by a PCR test is 77%. The curve is fitted to combined test results for many individuals, each of whom will have had variation in the timing of their particular peak probability of detection. This variation is smoothed out over all individuals to lead to the curve shown in Fig. 3b.

To explore the potential for rapid testing of individuals, we examined how the curve in Fig. 3b would change if the cycle threshold used to define a positive result was lowered, which mimics the detection capabilities of lateral flow tests that are less able to detect infections at higher Ct values [22, 27]. We estimated that the probability of detection post-infection still peaks around 4 days after infection, but that the peak probability of detection is lower and the probability of detection declines much faster after the peak. The reduced period of time after infection during which a case might be detected in our hypothetical LFT scenario compared to PCR may help to explain some of the low sensitivities for LFTs reported during the evaluation of LFT testing programmes such as in Liverpool, where LFTs detected only 48.89% of the infections that were later confirmed by PCR [28]. In general, our estimates correspond with previous observations that infections with lower viral loads (which are likely to be older infections and will have higher Ct values) are less likely to be detected by LFTs compared to PCR.

We assumed that symptoms reported during the study were due to clinical episodes of COVID-19 infection, and not due to other respiratory infections with similar symptoms. All individuals in the analysis seroconverted over the course of the study, suggesting that such symptoms were likely to be associated with SARS-CoV-2 infection.

Our analysis is also limited by excluding asymptomatic HCWs that seroconverted over the course of the study. Symptomatic infections may have higher viral loads and be more likely to be detected than asymptomatic infections, however this has not been found to be the case elsewhere [14]. Our repeated testing model presents results for detecting asymptomatic infections that relies on the assumption that the probability of detection over time is the same for symptomatic and asymptomatic infections. If asymptomatic infections are instead less likely to be detected, then our estimate of the probability of detection within 7 days of infection will be an overestimate.

Conclusions

Routine asymptomatic testing is a crucial component of effective targeted control strategies for COVID-19, and our results suggest that frequent testing and fast turnaround times could yield high probabilities of detecting infections—and hence prevent outbreaks—early in at-risk settings.