FormalPara Key Summary Points

SARS-CoV-2 real-time RT-PCR Ct values are associated with epidemiological trends.

Negative cross-correlation between Ct values and new daily cases were observed.

Ct values may be useful in predicting upcoming peaks in variant waves.

Predictive modeling using Ct values may enable assessment of epidemic trajectory.

Introduction

In a world of increasing travel and migration, the numbers of known infections and deaths from the coronavirus disease 2019 (COVID-19) global pandemic, caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), highlight the rapid spread of this pandemic and the importance of understanding infectious disease dynamics to be better prepared for a future pandemic. As of July 2022, there have been almost 548 million confirmed cases worldwide (229.6 million in Europe and 86.7 million cases in the USA), including more than 6.3 million deaths [1]. One important lesson from the COVID-19 pandemic is that correctly predicting epidemic waves and implementing timely appropriate preventative measures could potentially save millions of lives.

COVID-19 epidemic monitoring has focused on case counts, environmental levels from wastewater monitoring, test positivity rates, and reported deaths or hospitalizations. These parameters are used to estimate the growth rate of positive tests (the estimated effective reproduction number, Rt). These traditional estimates of the epidemic trajectory can result in a limited, biased, and delayed view of the epidemic. Viral loads of infected people are associated with infectivity and could be used to predict future case rates; a higher viral load in a patient suggests that more virus is being shed, and thus increases the chance of others being exposed to an infectious dose permissible for acquiring disease. High viral loads in a community can indicate increasing prevalence of disease. Viral load measures can improve epidemic predictions, especially in low surveillance settings where true case counts over time are not easily available [2].

Real-time reverse transcription polymerase chain reaction (RT–PCR) is regarded as the gold standard method for COVID-19 diagnosis. Although a qualitative result is usually obtained, certain instruments provide end users with access to cycle threshold (Ct) values. Real-time RT–PCR Ct values represent the number of amplification cycles required for the target gene to exceed a threshold level, with a low Ct value corresponding to a high pathogen load. Ct values are inversely related to viral load and can provide an indirect method for quantifying the copy number of sample viral ribonucleic acid (RNA) [3]. Several studies have demonstrated a link between lower Ct values and increased disease severity [4,5,6]. Ct values can support public health, infection control, and patient management decisions [7, 8].

Symptomatic presentation of COVID-19 has been significantly associated with lower Ct values, meaning higher viral load and prolonged virus shedding, which may play a role in determining the transmissibility and contagiousness of disease [3]. Therefore, a possible link between Ct values and epidemiology trends needs to be explored. Using population-based variations in Ct values could improve Rt predictions of a 7-day period or longer.

The main aim of this systematic review is to identify the presence or absence of a correlation between Ct values of patients diagnosed with COVID-19 and population dynamics of the disease, and determine whether temporal trends in SARS-CoV-2 Ct levels are predictive of future population epidemiology trends.

Methods

This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors. The review was undertaken according to the principles outlined in the Cochrane Handbook [9]. A comprehensive search of PubMed was conducted on August 22 2022, to identify studies reporting on the association between real-time RT–PCR Ct values of SARS-CoV-2 and epidemiological trends. The search strategy involved the terms “Ct value” or “viral load,” “SARS-CoV-2” and “epidemiological data,” along with relevant synonyms. The full search string is presented in the supplementary data (see Supplementary Table 1).

All studies conducted in humans diagnosed with COVID-19 and reporting on the presence or absence of an association between real-time RT–PCR Ct values, or viral load specifically determined via real-time RT–PCR Ct values, and epidemiological trends were included. Exclusion criteria included the following: prereview/preprint articles, animal/nonhuman and studies on wastewater, review articles, manuscripts not in English, and studies not reporting SARS-CoV-2 RT–PCR Ct values, population epidemiological trends of SARS-CoV-2, or correlations between the two.

Titles and abstracts were screened for relevance by two independent reviewers, while a third reviewer resolved conflicts. The full texts of relevant studies were assessed for inclusion with a focus on the association between SARS-CoV-2 RT-PCR Ct values or viral loads specifically determined via RT-PCR Ct values and epidemiological trends by two independent reviewers. The process for study selection was reported using the flow diagram of the Preferred-Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 [10]. Key data from all included studies were captured using a data extraction form. All extracted data were verified by an independent reviewer. Outcomes were reported in a narrative format and meta-analyses were deemed inappropriate owing to wide variation in reported outcomes.

Given the novelty of RT–PCR Ct values as an epidemiological tool, there was no precedent in risk assessment for this study’s reference. To assess the methodological quality of the included studies, the risk of bias for these studies was assessed using an adaption of the modified Newcastle–Ottawa Scale [11, 12].

Results

Included Studies

PubMed searches identified 1369 unique records for screening. The PRISMA flowchart of included studies is shown in Fig. 1. Studies that were excluded at the full-text stage are outlined (including the reason why they were excluded) in Supplementary Table 2. Data from the 16 studies that were relevant for inclusion, and their respective study characteristics, are outlined in Table 1. Study outcomes of the 16 included studies detailing correlation between SARS-CoV-2 real-time RT–PCR Ct values and epidemiological trends are summarized in Table 2, and as follows.

Fig. 1
figure 1

PRISMA flow diagram. Ct cycle threshold, RT–PCR real-time reverse transcription polymerase chain reaction, SARS-CoV-2 severe acute respiratory syndrome coronavirus-2

Table 1 Study characteristics of included studies
Table 2 Summary of reported data relating to SARS-CoV-2 real-time RT–PCR Ct value correlation with epidemiological trends

The studies were conducted in the USA [2, 13,14,15,16,17], Europe [18,19,20,21], Africa [22], Australia [23], China [24], India [25], the Kingdom of Bahrain [3], and Lebanon [26]. RT–PCR Ct values were measured from public samples only (national or local), or from hospital and public samples in ten studies [3, 14, 18,19,20,21,22,23,24,25], and from hospital only or hospital and care home samples only in six studies [2, 13, 15,16,17, 26]. Eleven studies included nasopharyngeal swabs [2, 3, 13,14,15,16,17,18, 20,21,22], one study used nose and throat swabs [19], and four studies did not record the sample type [23,24,25,26]. At least 30 different polymerase chain reaction (PCR) assays were included in the studies, but not all were described in sufficient detail to give a precise count, with two studies not recording the PCR assay used [25, 26]. The real-time RT–PCR targets included the E gene, M gene, N gene, S gene, RdRp, and ORF1ab. Ten studies included more than one gene target. The time point of assessment after symptom onset was not recorded for most studies. The Rt number was used as the measure of the population/epidemic trend levels in five studies [2, 14, 18, 22, 24]. Seven of the studies used predictive models based on the inverse association between Ct values and epidemiological trends to prospectively predict the epidemiological trajectory [2, 13, 18, 20, 22, 24, 26].

Risk of Bias of Included Studies

A modified Newcastle–Ottawa Scale was used to assess risk of bias of the included studies. All the studies were classed as being of low quality (Supplementary Table 3). The reasons for low quality included a lack of comparability of the RT–PCR Ct values of the exposure samples between studies, a lack of comparability of the temporal cohorts at the population level within each study, and that there is no way of quantifying respondents and nonrespondents. In addition, the novelty of this review adds to the challenge of the determination of the risk of bias of included studies (Supplementary Table 3).

Correlation between Ct Value and Community Prevalence of SARS-CoV-2

All the included studies examined the association between Ct values and epidemiological trends (Table 2). Many retrospectively examined the cross-correlation (lag lead) between Ct values and epidemiological trends. Almost half the studies reported a time lag in the negative cross-correlation between Ct values and new daily cases, with seven studies reporting time lags between 1 and 3 weeks [2, 3, 13, 15, 18, 20, 26], and one study reporting a time lag of 33 days [14] (Table 2). Five studies looked at the correlation between Ct values and the stages of different epidemic waves of SARS-CoV-2 [15,16,17, 21, 23]. These studies found that as case numbers increased towards the peak of an epidemic wave there was an increase in viral loads as determined by low Ct values; towards the end of epidemic waves, there was also a higher proportion of patients with high Ct values, and therefore a decrease in viral load (Table 2).

Predictive Models of Future Epidemic Trends

To assess whether Ct values themselves predicted future SARS-CoV-2 prevalence, eight studies used the data from the retrospective analysis of the correlation between Ct values and community prevalence to develop models predicting future epidemic trends [2, 3, 13, 18, 20, 22, 24, 26], although one of these did not test their predictive model prospectively [3].

In one US study (Hay et al.), models that use information inherent in Ct values were developed to estimate the epidemic trajectory from a cross-section of positive samples (Table 1) [2]. The authors used three mathematical models to describe daily SARS-CoV-2 transmission: (1) The deterministic susceptible-exposed-infectious-recovered (SEIR) model; (2) the Exponential Growth Model that assumes new infections arise under a constant exponential growth rate; and (3) the Gaussian Process (GP) Model that describes the epidemic trajectory as a vector of daily infection probabilities, where the GP prior ensures that daily infection probabilities are correlated in time [2]. The model outcomes using Ct values were comparable to a modification of the SEIR model, the SEEIRR model (that included additional compartments for PCR negative exposed individuals and PCR positive individuals who are recovered), which used prevalence data to determine baseline estimates in a closed population. More complex epidemic trajectories were estimated using cross-section models including the GP model using Ct values of hospital-based surveillance at a local hospital compared with daily confirmed case counts for the state (Table 2) [2]. The estimated epidemic trajectory using hospital Ct values correlated with community-level viral load changes obtained from wastewater monitoring [2].

Khalil et al. evaluated six data-driven models that utilized Ct and previous number of cases to predict the epidemic trajectory using mean square error (MSE). The ordinary least squares (OLS) polynomial regression and support vector machine regression (SVR) had the best performance during independent validation (MSE = 0.1596 and MSE = 0.16754, respectively). The OLS and SVR models also accurately predicted the COVID-19 incidence in an external institution (Table 2) [26].

Abdulrahman et al. carried out a pilot study for the development of a predictive model based on the linear regression analysis on the association between Ct values and new daily cases. There was a significant negative correlation between Ct values with new daily cases (Pearson’s correlation r =  −0.06 (95% CI: −0.06, −0.05, P < 0.001). The average weekly Ct values negatively correlated with daily cases occurring 1 week after specimen collection (Table 2) [3].

The predictive model in Phillips et al. was created using linear regression analysis on data from all Ct values within a closed institute setting between October 2020 and March 2021. Average weekly Ct values negatively correlated with the number of cases occurring 2 weeks after specimen collection (Pearson’s correlation r =  −0.75, P < 0.0001). Ct values for symptomatic patients correlated with tests performed the week of collection, whereas asymptomatic patient Ct values correlated best with tests performed 1 month later [13]. The data from this model was validated when Ct values collected May–July 2021 were compared with future cases, correctly predicting increases or decreases 65% of the time for this surge (Table 2). The predictive model performed better when cases increased dramatically (greater than 30% increase), with the model capturing 82% of these increases in cases at the individual hospital level (Table 2) [13].

Alizon et al. used Rt of the epidemic as a measure of prevalence for the predictive model. A time series analysis using autoregressive integrated moving average (ARIMA) predictive models was used to estimate whether Ct data improves short-term predictions of disease epidemiology. A time lag of 6–7 days appeared to be the most significant time lag for cross-correlation between Ct and Rt (tested between 0 and 20 days) (Table 2). Rt was calculated using hospital admissions data for COVID-19 in addition to RT–PCR screening data. The error in predicting Rt improved when Ct quartiles and Ct skewness were known. The prediction error of Rt using Ct values was lower than that using the ratio of positive tests [18].

Andriamandimby et al. compared the results of cross-sectional Ct distributions with Rt estimates derived from more traditional case count methods to estimate the epidemic growth rate (Table 1) [22]. A decline in population-level Ct was associated with the epidemic peak both regionally and nationally. A population-level SEIR model framework, and the flexible GP model developed by Hay et al. [2], correctly estimated epidemic growth rates from Ct distributions locally in the months preceding epidemic sub-peaks (Table 2). The epidemic growth estimates were largely congruent with those using Rt estimates from daily reported incidence. The authors noted that cross-sectional Ct distributions would have predicted the possibility of an epidemic resurgence in Madagascar that had been missed during declining surveillance [22].

Lin et al. fitted a log linear regression to the daily Rt, on daily mean and skewness of Ct values at sampling during a training period wave to determine real-time assessment of transmission in the community using Ct values (Table 2). The results from the training model were used to successfully predict real-time estimations of Rt in a subsequent wave [24]. The temporal Ct distribution correlated with the incidence-based Rt over both epidemic waves. Higher values of incidence-based Rt were associated with decreasing average Ct values (Spearman’s correlation coefficient, ρ = −0.79, P < 0.001, and ρ = −0.52, P < 0.001, for the initial and subsequent wave respectively).

Yin et al. found that each epidemic wave was preceded by a drastic decrease in Ct values several days earlier. There was a 17-day time lag in the median Ct value negative correlation with the mean daily positive tests. An extended SEIR-type model found that the Ct values by all ages and age classes was negatively correlated with the estimated daily prevalence in Belgium (Table 2) [20].

Discussion

To the best of our knowledge, this is the first review to systematically assess and consolidate available evidence on associations between SARS-CoV-2 Ct values and epidemiological trends. Previous systematic reviews have demonstrated Ct value correlations with disease severity and transmissibility [12, 27, 28]. Our review differs by highlighting the potential relevance for determining trends in COVID-19 epidemiology, such as predicting peaks in variant waves, which in turn may be useful for investigations into other circulating respiratory pathogens.

All 16 studies included in this review reported an inverse correlation between SARS-CoV-2 Ct values and epidemiological trends at a single unit, locally, or nationally. Eight of the studies observed the inverse correlation of Ct values and incidence rates over two waves of the epidemic [15, 19,20,21, 23,24,25,26]. Seven of the studies reported a prediction time in the negative cross-correlation between Ct values and new daily cases of around 1–3 weeks [2, 3, 13, 15, 18, 20, 26]. Mathematical models using Ct values predicted epidemic trajectories of a few weeks to longer term incidence curves [2, 3, 13, 18, 20, 22, 24, 26]. Many of these epidemic models accounted for complex populations, where interventions may be implemented and relaxed over time and new variants may arise.

Existing surveillance systems using case-counting methods and test positivity rates to estimate the epidemic trajectories can suffer from reliability issues. Predictive modeling using Ct values might provide a more reliable estimate, or useful adjunct to assessing the epidemic trajectory. These models could be extended to predict the epidemic trajectory of other contagious viral diseases diagnosed by RT–PCR after at least one wave of disease. Better estimates of epidemic trajectories can also allow for better epidemic planning and the implementation of more targeted epidemiological measures.

Similar to using patient viral loads as a proxy to population-level exposure to SARS-CoV-2, the association between exposure to high viral loads in the environment and epidemiological trends has also been investigated with wastewater viral loads. Several studies have examined how wastewater surveillance can complement clinical surveillance to infer COVID-19 prevalence. A systematic review by Shah et al. found an association between wastewater viral load and prevalence in the community in 53 studies [7]. In addition, wastewater sample positivity preceded confirmed cases in the community up to 63 days, with 13 out of 87 studies reporting wastewater sample positivity before the detection of cases in the community [7].

The possible impact of the later identified variants of SARS-CoV-2 may be similar to the earlier variants. Stevens et al. [23] analyzed the SARS-CoV-2 real-time PCR test Ct values in a population in Australia from the end of November 2021 to the end of March 2022 including ~26,388 PCR positive samples. This period overlapped with the emergence of the Omicron variant and there was indeed an inverse relationship between the Ct value and the number of cases of the Omicron variant in this included paper as in the Alpha and Delta variants from the other included studies.

This review has several limitations due to heterogeneity among the included studies. The number of PCR positive samples included in the studies varied widely: not recorded in three studies [2, 23, 26], < 500 in two studies [16, 17] and 500 to ~800,000 in the remaining studies. There were several Ct value-related limitations such as a broad range of RT–PCR assays and PCR target genes. As SARS-CoV-2 viral load varies over time from initial infection [29,30,31,32,33,34,35], the absence of recording time since symptom onset in many studies was a limitation. Sample type can also affect Ct values [32] and less than half the included studies used nasopharyngeal samples with others using either a mix of sample types or not recording it. In addition, actual exposure was not measured, most studies looked at local populations for exposure (Ct values) and larger populations for outcome. However, one study validated a predictive model of SARS-CoV-2 prevalence trends from Ct values of samples from a closed system/single unit and then developed predictive models from Ct values of samples in a single unit to predict the regional epidemic trajectories [2]. Another study looked at temporal trends in Ct values in a single unit and how they correlated with future case numbers in that single unit [13].

The limitations of the included studies notably strengthen the findings of this review. A correlation between Ct values and future number of cases is demonstrable despite the heterogeneity in sample types, PCR assays, and variants. The studies revealed sufficient evidence that the number of COVID-19 positive cases in the community rise around 2 weeks after a drop in the Ct values; thus, predicting a wave 1 month in advance may even be possible.

However, there are several considerations for future work. Establishing an aggregated Ct cutoff value that predicts new waves with greater accuracy would be useful. As different RT–PCR methods report different Ct values, standard curves should be used to accurately quantify the expected viral copy number [36]. This would translate into better comparability between cohorts and more accurate correlation and prediction times. In addition, as sample type is known to affect Ct values [32], it would be useful to determine the impact of sample type on the correlation and prediction time of epidemic spread for future studies. The effect of the length of time from symptom onset to sample collection and testing should also be determined to examine whether there is an optimal time from symptom onset for correlation with disease epidemiology. The minimum number of cases needed to correctly assess the crossing of the Ct cutoff should also be established, and predictions should be validated across different pathogens and their variants. Public health decision-makers could then use the aggregated Ct cutoff value to estimate the prediction time to an upcoming wave and implement appropriate epidemic planning more efficiently in low-surveillance settings [2].

Conclusions

Ct values of current positive COVID-19 cases can be used to predict community infection rates at least 2 weeks in advance, irrespective of the prevalent variant or sample type and RT–PCR method used for diagnosis.