## Abstract

Motivated by the current COVID-19 health crisis, we consider data analysis for quantitative polymerase chain-reaction (qPCR) measurements. We derive a theoretical result specifying the conditions under which all qPCR amplification curves (including their plateau phases) are identical up to an affine transformation, i.e. a multiplicative factor and horizontal shift. We use this result to develop a data analysis procedure for determining when an amplification curve exhibits characteristics of a true signal. The main idea behind this approach is to invoke a criterion based on constrained optimization that assesses when a measurement signal can be mapped to a *master* reference curve. We demonstrate that this approach: (i) can decrease the fluorescence detection threshold by up to a decade; and (ii) simultaneously improve confidence in interpretations of late-cycle amplification curves. Moreover, we demonstrate that the master curve is transferable reference data that can harmonize analyses between different labs and across several years. Application to reverse-transcriptase qPCR measurements of a SARS-CoV-2 RNA construct points to the usefulness of this approach for improving confidence and reducing limits of detection in diagnostic testing of emerging diseases.

## Introduction

Quantitative polymerase chain-reaction measurements (qPCR) are the mainstay tool diagnosing COVID-19 [1], since they detect viral RNA up to a week before the formation of antibodies. However, preliminary studies indicate that the rate of false-negatives may be as high as 30% for SARS-CoV-2 testing [2], driven in large part by asymptomatic patients and/or those in the earliest stages of the disease [3]. Methods that can increase the sensitivity of qPCR techniques, improve confidence in measurements, and harmonize results between laboratories are therefore critical for helping to control the outbreak by providing a more accurate picture of infections.

The present manuscript addresses this problem by developing a mathematical procedure that enables more robust analysis and interpretation of qPCR measurements. We first derive a new theoretical result that, under general conditions, all qPCR amplification curves (including their plateau phases) are the same up to an affine transformation. Using this, we develop a data analysis approach employing constrained optimization to determine if an amplification curve exhibits characteristics that are representative of a true signal. This decision is made by projecting data onto a *master curve*, which leverages information about both the signal shape and noise-floor in a way that allows use of lower fluorescence thresholds. We illustrate the validity of this approach on experimental data and demonstrate how it can improve interpretation of late-cycle amplification curves corresponding to low initial DNA concentrations [4]. Moreover, we apply our analysis to qPCR measurements of a SARS-CoV-2 RNA construct to illustrate its potential benefits for testing of emerging diseases.

A key theme of this work is the idea that advanced uncertainty quantification (UQ) techniques are necessary to extract the full amount of information available in data.^{Footnote 1} For example, amplification curves with late-cycle growth may only have a few points above the baseline. Moreover, these data are often noisy, which confounds attempts to distinguish them from random effects not associated with DNA replication. In such cases, classification strategies based on subjective thresholds are prone to mistakes because they do not assess statistical significance of signal behavior. In contrast, the tools we develop address such issues, allowing one to more confidently use low-amplification data.

In a related vein, our underlying mathematical framework makes few and extremely general assumptions about the operation of a qPCR instrument. This feature is critical to ensuring *robustness* and validity of our results across the range of systems encountered in practical applications. Previous works (see, e.g. Ref.[5,6,7,8,9,10,11,12,13] and the references therein) have treated analysis of qPCR data as a task in mathematical modeling, wherein parameterized equations are assumed to fully describe the signal shape. While such approaches can be powerful tools for elucidating the mechanisms driving PCR reactions, they invariably introduce *model-form errors*, i.e. errors arising from an inability of the model to fully describe the underlying physics [14]. From a metrological perspective, such effects introduce unwanted uncertainties that can decrease the sensitivity of a measurement. However, this uncertainty is entirely eliminated when data analysis can be performed without prescribing a detailed model, as is our goal.

While a key focus of this work is improving measurement sensitivity, we do not directly address issues associated with limits of detection (LOD). Formal definitions of this concept have been established by certain standards organizations [15, 16]. However, there is a lack of consensus as to which definition is most suitable for qPCR measurements; compare, e.g. Refs. [16,17,18] and the references therein. Moreover, LOD often depends on the specifics of an assay and, for reverse-transcriptase qPCR (RT-qPCR), the RNA extraction-kit; see, for example, the FDA *Emergency Use Authorizations* for SARS-CoV-2 testing [19]. This motivates us to restrict our discussion to those aspects of analysis that hold in general and are not chemistry-specific. Thus, we only consider, for example, the extent to which one can lower the fluorescence threshold used to detect positive signals. Nonetheless, we anticipate that such improvements will have positive impacts on LODs.

Finally, we note that our analysis cannot undo systematic errors due to improper sample collection and preparation, contamination, or non-optimal assay conditions. In some cases, the constrained optimization can assist in the identification of systemic assay issues by failure to achieve data collapse, thereby adding an automated quality control to the measurement. However, false positives due to contamination may exhibit the same curve morphology and not be detected by our analysis. Moreover, the affine transformations cannot amplify signals or otherwise improve signal quality when the target concentration is far below limits of detection. In such cases, refining experimental protocols and amplification kits may be the only routes to improving quality of qPCR measurements.

The manuscript is organized as follows. “Universal Behavior of PCR Amplification Curves” derives the new, universal property of qPCR measurements (“Theoretical Derivation”) used in our analysis and validates it against experimental data (“Validation of Data Collapse”). “Decreasing Detection Thresholds” illustrates how this result and our analysis can be used to lower the fluorescence thresholds for qPCR. “Transferability of the Master Curve” explores the idea that a master curve is transferable reference data. “Application to SARS-CoV-2 RNA” applies our analysis to SARS-CoV-2 RNA constructs as proof-of-concept for improving detection of emerging diseases. “Discussion” discusses our work in the greater context of qPCR and points to open directions.

## Universal Behavior of PCR Amplification Curves

Our data analysis leverages a universal property of qPCR, which states that under very general conditions, all amplification curves are the same up to an affine transformation, i.e. a multiplicative factor and horizontal shift. While this observation bears similarities to work by Pfaffl [20], we emphasize that our result is more general and develops mathematical properties of qPCR measurements that have not yet been studied. We begin with a derivation and experimental validation of this result.

### Theoretical Derivation

The underlying conceptual framework is based on a generic formulation of a PCR measurement. We denote the number of DNA strands at the *n* th amplification cycle by *d*_{n}, which, in a noise-less environment, is taken to be proportional to the fluorescence signal measured by the instrument. The outcome of a complete measurement is a vector of the form ** d** = (

*d*

_{1},

*d*

_{2},...

*d*

_{N}), where

*N*is the maximum cycle number. We also assume that

**=**

*d***(**

*d**x*,

*y*) is a function of the initial template copy number

*x*and the numbers of all other reagents, which we denote generically by

*y*.

Within this framework, we require three assumptions.

First, we require that *y* be a scalar. Physically, this amounts to the assumption that there is a single experimental variable (besides initial DNA copies) that controls the progression of the reactions. In practice, this condition is satisfied if either (I) there is a single limiting reagent (e.g. primers), or (II) multiple limiting reagents have the same relative concentrations in all samples we wish to analyze. In the latter case, knowing the concentration of any limiting reactant determines them all, so that they are all specified by a single number. It was recently demonstrated that condition (I) may hold for a large class of commercial PCR kits in which the number of primers is the limiting reagent [21, 22],^{Footnote 2} whereas condition (II) is true for any PCR protocol that uses a master-mix.^{Footnote 3}

Second, we require that there be a *p* > 1 such that to good approximation (e.g. better than 1 in 10 000), a *p*-fold increase in the initial template number shifts the PCR curve to the left by one cycle.^{Footnote 4} Within our analytical framework, this amounts to

where the notation \(\mathcal O(p^{q}/y)\) indicates that *d*_{n−q}(*p*^{q},*y*) and *d*_{n}(1,*y*) are the same up to an error that is of the same order of magnitude as *p*^{q}/*y*. This error arises from the fact that \(\mathcal O(p^{q})\) primers will be consumed in the first *q* reactions. Thus, a system starting with one template copy will have \(\mathcal O(p^{q}/y)\) fewer relative primers by the time it reaches the same template number as a system initialized with *p*^{q} such copies. Given that PCR is always run in a regime where *y* ≫ *x*, such errors should be negligible. We emphasize that Eq. 1 only requires the amplification efficiency to remain constant over some initial set of cycles \(q_{\max \limits }\) corresponding to the maximum initial template copy number expected in any given experimental system. For later convenience, we note that Eq. 1 implies

where we have omitted the error term.

Our third and most important assumption is the requirement that signal generation be a *linear process*. By this, we mean that: (i) each sample (e.g. in a well-plate) can be thought of as comprised of multiple sub-samples defined in such a way that the relative fractions of initial DNA and reagents is in proportion to their volumes; and (ii) the total signal generated by a sample is equal to the sum of signals generated by these sub-samples if they had been separated into different wells.^{Footnote 5} Because both the initial template copy and reagent numbers are partitioned into these sub-samples, the linearity assumption amounts to the mathematical statement that

for any *κ* > 0. Physically we interpret Eq. 3 as the requirement that the processes driving replication only depend on *intensive variables*, i.e. ones that are independent of the absolute magnitude of the system size [23]. See, e.g. Ref. [21] for more discussion on related concepts.

We now arrive at our key result. Consider two systems with initial values (*x*,*y*) and (*χ*,*γ*). Using Eq. 2 and Eq. 3, it is straightforward to show that

Critically Eq. 4 implies that under the assumptions listed above, all PCR signals are the same up to a multiplicative factor *a* = *γ*/*y* and horizontal shift \(b=\log _{p}[\chi y / (\gamma x)]\). The usefulness of this result arises from the fact that this universal property holds irrespective of knowledge of the actual shape of the amplification curve and under a few generic assumptions. Thus, it can be used to facilitate robust analysis of data. *Note that in the last line of Eq.* 4*, the amplification efficiency does not appear, highlighting that it does not play a role in our derivation.*

### Validation of Data Collapse

#### Experimental Methods

To validate Eq. 4 we conducted a series of PCR measurements using the Quantifiler Trio (Thermo Fisher) commercial qPCR chemistry.^{Footnote 6} Extraction blanks were created by extracting six individual sterile cotton swabs (Puritain) using the Qiagen EZ1 Advanced XL and DNA Investigator kit (Qiagen). 290 *μ* L of G2 buffer and 10 *μ* L of Proteinase K were added to the tube and incubated in a thermal mixer (Eppendorf) at 56 ^{∘C} for 15 minutes prior to being loaded onto the purification robot. The “Trace Tip Dance” protocol was run on the EZ1 Advanced LX with elution of the DNA into 50 *μ* L of TE (Qiagen). After elution, all EBs were pooled into one tube for downstream analysis.

Human DNA Quantitation Standard (Standard Reference Material 2372a) [24] Component A and Component B were each diluted 10-fold. Component A was diluted by adding 10 *μ* L of DNA to 90 *μ* L of 10 mmol/L 2 amino 2 (hydroxymethyl) 1,3 propanediol hydrochloride (Tris HCl) and 0.1 mmol/L ethylenediaminetetraacetic acid disodium salt (disodium EDTA) using deionized water adjusted to pH 8.0 (TE^{− 4}, pH 8.0 buffer) from its certified concentration. Component B was diluted by adding 8.65 *μ* L of DNA to 91.35 *μ* L of TE^{− 4}. From the initial 10-fold dilution, additional serial dilutions were performed down to 0.0024 pg into a regime to produce samples with high *C*_{q} values (> 35).

For all qPCR reactions, Quantifiler Trio was used. Each reaction consisted of 10 *μ* L qPCR Reaction mix, 8 *μ* L Primer mix, and 2 *μ* L of sample [i.e. DNA, non-template control (NTC), or extraction blank (EB)] setup in a 96-well optical qPCR plate (Phoenix) and sealed with optical adhesive film (VWR). After sealing the plate, it was briefly centrifuged to eliminate bubbles in the wells. qPCR was performed on an Applied Biosystems 7500HID instrument with the following 2-step thermal cycling protocol: 95 ^{∘}C for 2 min followed by 40 cycles of 95 ^{∘}C for 9 sec and 60 ^{∘}C for 30 sec. Data collection takes place at the 60 ^{∘}C stage for 30 sec for each of the cycles across all wells. Upon completion of every run, data was analyzed in the HID Real Time qPCR Analysis Software v 1.2 (Thermo Fisher) with a fluorescence threshold of 0.2. Raw and multicomponent data was exported into Excel for further analysis.

#### Data Analysis

As an initial preconditioning step, all fluorescence signals were normalized so that the maximum fluorescence of any amplification curve is of order 1. Depending on the amplification chemistry, this was accomplished by either: (i) dividing the raw fluorescence signals by the signal associated with a passive reporter dye (e.g. ROX); or in the event that there is no passive dye, (ii) dividing all signals by the same (arbitrary) constant. In the latter case, the actual constant used is unimportant, provided the maximum signals are on the numerical scale of unity. Moreover, EBs and non-template controls NTCs were normalized in the same way. This preconditioning is important because subsequent analyses introduce dimensionless parameters and linear combinations of signals that are referenced to a scale that is dimensionless and of order 1. Moreover, such steps stabilize optimization discussed below, since numerical tolerances are often specified relative to such a scale.

Baseline subtraction was performed using an optimization procedure that leverages information obtained from NTCs and/or EBs. The main idea behind this approach is to postulate that the fluorescence signal can be expressed in the form

where *s*_{n} is the “true,” noiseless signal, *b*_{n} is the average over EB signals (or NTCs in the absence of EB measurements), the *β*, *c* are unknown parameters quantifying the amount of systematic background effects and offset (e.g. due to photodetector dark currents) contributing to the measured signal, and *η* is zero-mean, delta-correlated background noise; that is, the average over realizations of *η* satisfies \(\langle \eta _{n} \eta _{n^{\prime }}\rangle = \sigma ^{2} \delta _{n,n^{\prime }}\), where \(\delta _{n,n^{\prime }}\) is the Kronecker delta and *σ*^{2} is independent of *n*. Next, we minimize an objective function of the form

with respect to *β* and *c*, which determines optimal values of these parameters. In Eq. 6, *𝜖* is a regularization parameter satisfying 0 < *𝜖* ≪ 1 (we always set *𝜖* = 10^{− 3}), Δ*N* = *N*_{h} − *N*_{0}, and *N*_{0} and *N*_{h} are lower and upper cycles for which *s*_{n} is expected to be zero. The parameter *N*_{0} is set to 5 to accommodate transient effects associated with the first few cycles. The *N*_{h} is determined iteratively by: (I) setting *N*_{h} = 15 and minimizing Eq. 6; *C*_{q} as the (integer) cycle closest to a threshold of 0.1; and (III) defining *N*_{h} as the nearest integer to *C*_{q} − *z*. Here we set *z* = 6, which, assuming perfect amplification, corresponds to *N*_{h} falling within the cycles for which *d*_{n} is dominated by the noise *η*_{n}. In general this value and the corresponding threshold can be changed as needed, but the precise details are unimportant provided *z* is large enough to ensure that the above criterion is satisfied. Note that optimization of Eq. 6 amounts to calculating the amount of EB signal that, when subtracted from *d*_{n}, minimizes the mean-squared and variance of *s*_{n} in the region where it is expected to be dominated by the noise *η*. See also Ref. [25] for a thorough treatment of unconstrained optimization.

As a next step, we fixed a reference signal *δ* by fitting a cubic spline to the amplification curve with the smallest *C*_{q} value as determined by thresholding the fluorescence at 0.1. While interpolation will introduce some small uncertainty at non-integer cycle numbers, we find that such effects are negligible in downstream computations. Moreover, comparison of amplification curves with initial template numbers that do not differ by multiples of *p* requires estimation of *δ* at non-integer cycles, necessitating some form of interpolation. Cubic splines are an attractive choice because they minimize curvature and exhibit non-oscillatory behavior for the data sets under consideration [26].

To test for data collapse, we formulated an objective function of the form

where *a*, *c*, *β*, and *k* are unspecified parameters, and \(N_{\min \limits }\) and \(N_{\max \limits }\) are indices characterizing the cycles for which *d*_{n} is above the noise floor. Minimizing Ł with respect to its arguments yields the transformation that best matches *d*_{n} onto the reference curve *δ*(*n*). The background signal *b*_{n} is included in this optimization to ensure that any over or under-correction of the baseline relative to *δ* is undone.

The quantity \(N_{\min \limits }\) is taken to be the last cycle for which *d*_{n} < *μ* + 3*σ*, where

are estimates of the mean and variance associated with the noise *η*. In principle, *μ* should be zero, but in practice, background subtraction does not exactly enforce this criterion; thus, we choose to incorporate *μ* into our analysis. If \(N_{\min \limits }\) was less than or equal to 30, we set \(N_{\max \limits }=37\); otherwise we set \(N_{\max \limits }=40\). While it is generally possible to set \(N_{\max \limits }=40\) for all data sets, on rare occasions we find that an amplification curve with a lower nominal *C*_{q} may saturate faster than *δ*. In this case, it is necessary to decrease \(N_{\max \limits }\) so that the interval \([N_{\min \limits },N_{\max \limits }]\) falls entirely within the domain of cycles spanned by *δ*. In practice, we find that, except in the cases noted above, the solutions do not meaningfully change if we impose this restriction for all curves with nominal *C*_{q} values less than or equal to 30.

The objective Eq. 7 is minimized subject to constraints that ensure the solution provides fidelity of the data collapse. In particular, we require that

Inequalities Eq. 9a–Eq. 9c require that the constant offset, noise correction, and linear combination thereof be within the 99% confidence interval of the noise-floor plus any potential offset in the mean (which should be close to zero). Inequality Eq. 9d prohibits the multiplicative scale factor from adopting extreme values that would make noise appear to be true exponential growth; unless otherwise stated, we take \(a_{\min \limits }=0.7\) and \(a_{\max \limits }=1.3\). Note that range of admissible values of *a* corresponds to the maximum variability in the absolute number of reagents per well, which is partially controlled by pipetting errors. Inequality Eq. 9e controls the range of physically reasonable horizontal offsets. Inequality Eq. 9f requires that the last data-point of \(ad_{N_{\max \limits }}\) be above some threshold *τ*. Finally, inequality Eq. 9g requires that the absolute error between the reference and scaled curves be less than or equal to *ς*. In an idealized measurement, we would set *ς* = 3*σ*, but in multichannel systems, imperfections in demultiplexing and/or inherent photodetector noise can introduce additional uncertainties that limit resolution. In the first example below, we take *ς* = 0.03, which corresponds to roughly 1% of the full scale of the measurement. While we state Eq. 9g in terms of absolute values, it can be restated in a differentiable form as two separate inequalities. See also Ref. [27] for a related model.

Having determined the optimal transformation parameters *a*_{⋆}, *c*_{⋆}, *k*_{⋆}, and *β*_{⋆}, the transformed signal is defined as

where *x* + *k*_{⋆} is required to be an integer in the interval \([N_{\min \limits },N_{\max \limits }]\). Figures 1 and 2 demonstrates the remarkable validity of Eq. 10 for a collection of datasets using a threshold *τ* = 0.05; in this and subsequent computations, the Matlab general nonlinear programming solver, *fmincon*, was used [28]. Note that the agreement is excellent down to fluorescence values of roughly 0.01, which is more than a decade below typical threshold values used to compute *C*_{q} for this amplification chemistry. In the next section, we pursue the question of how to leverage affine transformations to increase measurement sensitivity.

## Decreasing Detection Thresholds

A key strength of the constrained optimization problem specified in expressions Eq. 7 – Eq. 9g is the ability to determine when the data set gives rise to a consistent set of constraints [29]. In particular, inequality Eq. 9g requires that the transformed signal be within a noise-threshold of the reference, guaranteeing exponential growth. In more mathematical language, a non-empty *feasible region* of the constraints provides a necessary and sufficient condition for determining which data sets have behavior that can be considered statistically meaningful, which in turn can be used to lower the fluorescence thresholds.

To demonstrate this, we consider an empirical test as follows. Taking the data sets used to generate Fig. 1, we remove all data points above a normalized fluorescence value of 0.05, which is a factor of two to four below the typical values used for this system. Then we repeat the affine transformation according to expressions Eq. 7–Eq. 9g, applied to only the last six data points; we also set *ς* = 3*σ* and *τ* = *μ* + 6*σ* in inequality Eq. 9f. Note that the value of *ς* is determined entirely by the noise floor in this example as we do not anticipate spectral overlap to be significant at such low fluorescence values.

Figure 3 shows the truncated data used in this test, while Fig. 4 shows the results of the affine transformation for the 43 datasets. Notably, the collapse is successfully achieved using the tightened uncertainty threshold given in terms of the noise-floor. The inset shows that the errors relative to the reference curve are less than 0.01 on the normalized fluorescence scale.

To demonstrate that optimization of the transformation parameters does not generate false positives, we performed the analysis described above on 17 non-template control (NTC) datasets that were baseline-corrected according to the same procedure used on the amplification curves. For background subtraction, we used *N*_{h} = 30. As before, the last six datapoints of the background-corrected NTCs were used for optimization of Eq. 7. Figure 5 illustrates the outcome of this exercise. When *τ* is too small, solving the optimization problem maps the NTCs into the background of the reference curve, illustrating the critical role of inequality Eq. 9f. When *τ* is large, the optimization programs are all infeasible; i.e. there is no transformation satisfying the constraints.

Figure 6 repeats this exercise for the 43 amplification curves and 17 NTCs for values of *τ* − *μ* ranging from 0 to *μ* + 10*σ*. Unsurprisingly, for small values of *τ* − *μ*, a large fraction of the NTCs can be transformed into the noise of the reference curve, and thereby yield feasible optimization programs. However, from *τ* = *μ* + *σ* to *τ* = *μ* + 4*σ* the number of such false positives drops precipitously. While more work is needed to assess the universality of this result, Fig. 6 suggests that there may be a window between *τ* = *μ* + 5*σ* and *τ* = *μ* + 8*σ* for which the affine transformation yields neither false positives nor false negatives.

## Transferability of the Master Curve

The generality of the assumptions underpinning Eq. 4 suggests that a master curve may be useful for characterizing qPCR data irrespective of when or where either was collected. Such universality would be a powerful property because it would facilitate transfer of our analysis between labs without the need to generate independent master curves. The latter could be developed once with the creation of an assay and used as a type of standard reference data. Such approaches could further harmonize analyses across labs and thereby reduce uncertainty in qPCR testing.

While a more in-depth study of such issues is beyond the scope of this manuscript, we performed a preliminary analysis to test the reasonableness of this universality. In particular, we analyzed 223 datasets collected over 3.5 years, from November 2017 to April 2020. A single, low *C*_{q} curve measured in early 2018 was chosen at random from this set as a master curve, and we performed data collapse of the remaining 222 according to the optimization program given by expressions Eq. 7–Eq. 9g. In all cases, the DNA and amplification chemistry was the same as those in our previous examples. Moreover, the majority (176) of these amplification curves were generated before conception of this work.

Figure 7 illustrates the results of this exercise. Remarkably, we find that a master curve can be used for accurate data collapse over the entire time-frame of the measurements considered. Moreover, use of a reference from 2018 to characterize data from both 2017 and 2020 indicates backwards and forwards compatibility of our analysis. In this example, we find that, as before, the collapse is accurate to within about 1% of the full scale for nearly all of the measurements (we set *ς* = 0.04). In this example, it was necessary to set the minimum and maximum values of *a* to be 0.1 and 3 to account for large variations in peak fluorescence. We also note that because these data were collected before we had developed our background subtraction algorithm, there were fewer high-quality NTC measurements for use in Eq. 6. (No EB data from that time was available.) Thus, we speculate that the variation inherent in this data can likely be reduced in future studies by more careful characterization of control experiments.

## Application to SARS-CoV-2 RNA

### SARS-CoV-2 RNA Constructs

To test the validity of our analysis on RT-qPCR of emerging diseases, we applied our analysis to corresponding measurements of the N1 and N2 fragments of SARS-CoV-2 RNA. The underlying samples were derived from an in-house, in-vitro transcribed RNA fragment containing approximately 4000 bases of SARS-CoV-2 RNA sequence. This non-infectious fragment contains the complete N gene and E gene, as well as the intervening sequence. As this material is intended to help researchers and laboratory professionals develop and benchmark assays, discussion of its production and characterization are reserved for another manuscript.

Neat samples of this material were diluted 1:100, 1:500, 1:1000 and 1:1500 in RNA Storage Solution (Thermo Fisher) with 5 ng/*μ* L Jurkat RNA (Thermo Fisher) prior to being run for qPCR. qPCR measurements were performed using the 2019-nCoV CDC Assays (IDT). The N1 and N2 targets on the N gene were measured [30]. Each reaction consisted of 8.5 *μ* L water, 5 *μ* L TaqPath RT-qPCR Master Mix, 1.5 *μ* L of the IDT primer and probe mix for either N1 or N2, and 5 *μ* L of sample setup in a 96-well optical qCPR plate (Phoenix) and sealed with optical adhesive film (VWR). After sealing the plate, it was briefly centrifuged to eliminate bubbles in the wells. qPCR was performed on an Applied Biosystems 7500 HID instrument with the following thermal cycling protocol: 25 ^{∘}C for 2 min, 50 ^{∘}C for 15 min, 95 ^{∘}C for 2 min followed by 45 cycle of 95 ^{∘}C for 3 sec and 55 ^{∘}C for 30 sec. Data collection takes place at the 55 ^{∘}C stage for 30 sec for each of the cycles across all wells. Upon completion of every run, data was exported into an Excel for further analysis in Matlab.

### Analysis of RT-qPCR Measurements

Data analysis proceeded using NTCs in lieu of EBs for the background signal *b*_{n}. Figure 8 shows the results of this analysis applied to the N1 fragment of a SARS-CoV-2 RNA construct. As before, the level of agreement between curves after data collapse confirms that these signals are virtually identical up to an affine transformation. We find analogous results for the N2 assay; see Fig. 9.

Figure 9 also illustrates an interesting aspect of our analysis. In particular, we attempt to transform the N2 amplification curves onto the N1 master curve in the bottom plot. However, these transformations are not feasible; the N1 master curve is different in shape from its N2 counterparts. This demonstrates that while the master curve may be transferable across labs, it is still specific to the particular amplification chemistry and target under consideration.

## Discussion

### Relationship to Thresholding

While thresholding is the most common method for identifying exponential growth, there is no clear best practice on using this technique. In fact, the accepted guidance is sometimes to ignore a fixed rule and adjust the threshold by eye [19]. That being said, an often quoted (although in our experience, rarely followed) rule is to set the threshold ten standard deviations *σ* of the background above the noise floor. For reference, a 10*σ* event has a probability of roughly 1 × 10^{− 23} of being random if the underlying distribution is Gaussian, which should be a reasonable model of noise in the photodetectors of a PCR instrument.

While this probability appears absurdly small, it is worth considering why the 10*σ* criterion is reasonable. If we assume that the first fluorescence value having reasonable probability (e.g. 95%) of being non-random occurs at 2*σ* above the noise-floor, then the next three data points should occur at 4*σ*, 8*σ*, and 16*σ*, assuming doubling per cycle. Thus, the 10*σ* criterion practically amounts to the requirement that at least four data-points have a confidence of 95% or greater of being non-random. That being said, *thresholding neither requires that more than one point be statistically meaningful nor directly checks for exponential growth.*

While it could be argued that operators will detect such errors, this becomes impossible with automated testing routines and/or without uniform training. Moreover, it is possible for systematic effects associated with improper background subtraction to artificially raise the baseline at late cycles. Such effects can be difficult to distinguish from low-efficiency amplification and negate the usefulness of detection criteria based only on 10*σ* thresholds.

A constrained optimization approach as formulated in terms of expressions Eq. 6–Eq. 9g overcomes many of these obstacles by directly testing for exponential growth. Provided the master curve is of suitable quality, *the signal must increase* *p*-*fold every cycle to within noise.* As a result, systematic errors, e.g. due to improper baseline subtraction can be detected on-the-fly. For values of \(\tau \gtrsim \mu + 8 \sigma \), this necessarily strengthens any conclusions inferred from the analysis because multiple datapoints are required to lie above the 3*σ* (i.e. 99%) confidence envelope around the baseline. For concreteness, setting *τ* = *μ* + 10*σ* and *ς* = 3*σ* entails that the first data-point below the threshold will be at least 2*σ* and at most 8*σ* (i.e. 5*σ* ± 3*σ*) away from baseline essentially 100% of the time. There is less than a 2.5% chance that such a point could be a baseline. When considered with the point to the right, which is above the threshold, the probability that the signal is noise drops to virtually zero.

When \(\tau \lesssim \mu + 6\sigma \), the significance of any points below the threshold becomes questionable. For example, if *τ* = *μ* + 6*σ* and *ς* = 3*σ*, the first data point below the threshold can be anywhere from 0 to 6*σ* above the baseline. The corresponding probability that the data point could be due to the background noise *η* is approximately 50%. But when taken with the measurement above the threshold, the probability that both measurements are due to *η* is again virtually zero.

While it appears that this second scenario reverts to standard thresholding, it is important to note that the constrained optimization incorporates additional consistency checks above and beyond standard practice. Specifically, the optimization *requires* that the first point below the threshold be explainable as background noise, which excludes the possibility of constant or slowly varying signals above the threshold. As shown in Fig. 5 optimization attempts to raise such signals to the level of the threshold. In doing so, either (i) no point will fall below the threshold, in which case inequality Eq. 9g is violated, or, (ii) the signal must be raised too far, violating inequalities Eq. 9a–Eq. 9c. This explains the precipitous drop in false positives in Fig. 6. Moreover, it highlights the importance of considering at least six data-points in the optimization in order to activate inequality Eq. 9g in the event that inequalities Eq. 9a–Eq. 9c can be satisfied.

Finally, we note that the intermediate regime 6*σ* ≤ *τ* ≤ 8*σ* represents a compromise in which there are grounds to argue that the first point below the threshold is a meaningful (but noisy) characterization of the DNA number. As before, the consistency checks enforce that data further to the left fall within the noise. While it is beyond the scope of this manuscript to argue for a specific threshold, this intermediate regime may provide reasonable settings for which confidence in the data analysis is high, but not unreasonably so. Moreover, as the inset to Fig. 6 shows, the fluorescence threshold can (for this particular system) be reduced from 0.2 to 0.03 on average, with a spread from 0.01 to 0.05. Remarkably, this corresponds to anywhere from a factor of 4 to a factor of 10 decrease using data analysis alone. Provided a given setting requires fewer false negatives, there may be grounds to consider levels as low as *τ* = 5*σ*.

It is also important to note that inequalities Eq. 9a – Eq. 9d play a fundamental role insofar as they preclude non-physical affine transformations. In more detail, we interpret *β* and *c* as small systematic errors in the baseline, which should therefore be within a few *σ* of zero. Recalling that *a* is the ratio of limiting reactants [see Eq. 4], we see immediately that this parameter will explore the typical variability (across all sources) of reactant numbers.^{Footnote 7} While variation within 20% to 30% may be expected, it is not reasonable that *a* can change by decades.

To illustrate what happens without these constraints, we considered a situation in which systematic background effects can be transformed to yield exponential-like growth. The top plot of Fig. 10 shows a master curve alongside 17 NTC datasets with an added linear component. Eliminating the inequality constraints Eq. 9a – Eq. 9d leads to feasible transformations for 7 of these curves (bottom subplot), but the constant offsets and multiplicative factor are unphysical; e.g. *c* and *β* are \(\mathcal O(1)\). Such systematic errors in the baseline would be sufficient to call into question the stability of the instrument electronics. In all cases, reintroducing the constraints yields an infeasible collection of constraints, thus illustrating the importance of inequalities Eq. 9a – Eq. 9d.

### Extensions to Quantitation

Conventional approaches to quantifying initial DNA copy numbers require the creation of a calibration curve that relates *C*_{q} values to samples with known initial DNA concentrations. Importantly, this approach can only be expected to interpolate *C*_{q} values within the range dictated by the calibration process. Measurements of *C*_{q} falling outside this range may have added uncertainty. Moreover, it is well known in the community that even within the domain of interpolation, concentration uncertainties are often as high as 20% to 30%.

Equation 3 and Eq. 4 are therefore powerful results insofar as they may allow for more accurate quantification of initial template copies. Equation 3 quantifies the extent to which changing the reagent concentration alters the amplification curve. In the case that *κ* < 1 (*κ* > 1) the entire curve will be shifted down (up), which, in the case of small changes, can be conflated with a shift to the right (left). Thus, Eq. 3 suggests that a significant portion of the uncertainty in quantitation measurements may be due to variation in the relative concentrations of reagents arising from pipetting errors.

Equation 4 provides a means of reducing this uncertainty insofar as it directly quantifies the effect of reagents through the scale parameter *a*. Moreover, the affine transformation approach does not rely on a calibration curve using multiple samples with known DNA concentrations. In effect, it provides a physics-based model which can be used for extrapolation. Our approach thereby allows one to use a single master curve with a large initial template copy number as a reference to which all other measurements are scaled. As shown in our examples above, the requirement for data collapse provides an additional consistency check that may be able to detect contamination and/or other deleterious processes affecting the data.

Ultimately a detailed investigation is needed to establish the validity of Eq. 3 and Eq. 4 as tools for quantitation. As this will require development of uncertainty quantification methods for both conventional approaches and our own, we leave such tasks for future work.

### Limitations and Open Questions

A key requirement of our work is a master amplification curve to be used as a reference for all transformations. The quality of data collapse and subsequent improvements in fluorescence thresholds are therefore tied to the quality of this reference. Master curves that are excessively noisy and/or exhibit systematic deviations from exponential growth at fluorescence values a few sigma above the baseline may lead to false negatives. Likewise, systematic effects present in late cycle amplification data but not found in the master curve can lead to infeasible optimization problems. Robust background subtraction is therefore a critical element of our analysis.

In spite of this, empirically measured master curves (as we have used here) may exhibit random fluctuations that cannot be entirely eliminated at low fluorescence values. While we find that it is often best to work directly with raw data, there may be circumstances in which it is desirable to smooth a master curve within a few *σ* of the noise floor, especially when physically informed models based on exponential growth can be leveraged.

## Data Availability

Data and scripts are available upon a reasonable request

## Notes

- 1.
We use “uncertainty quantification” in a broad sense to mean the set of analyses that increase confidence in data and conclusions drawn from it.

- 2.
For reference, the systems used in this work have 250

*μ*M primer pair solutions. In a 20*μ*L sample, there are roughly 3× 10^{15}primer pairs. Amplification of one template would consume roughly 10^{12}primers over 40 cycles; 1000 initial templates would consume 10^{15}pairs. - 3.
The arguments that follow rely on conditions (I) and/or (II) to prove that amplification curves are similarity solutions of an underlying (unknown) model. That is, there is no inherent scale to the problem because it can be expressed as dimensionless ratios of concentrations; see, e.g. Ref. [22] and Eq. 4. This conclusion is false, however, if the limiting reactants are shared with internal process controls. Those reactions typically involve amplification with an initial DNA template copy that is constant, which can thereby introduce a fixed scale. Thus, it is essential, for example, that the nucleotides not be a limiting reagent.

- 4.
The parameter

*p*corresponds to the amplification efficiency and is often assumed to be 2, although this requirement is unnecessary in our approach. - 5.
Because the subsystems are all in the same well, the assumption that they operate independently is violated by interactions at their boundaries. However, such effects can be ignored because the ratio of surface area to volume is negligible for systems in the thermodynamic limit.

- 6.
Certain commercial equipment, instruments, software, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

- 7.
Note that in our formulation, variation over absolute number, not concentration, is what matters.

## References

- 1.
Corman VM, Landt O, Kaiser M, Molenkamp R, Meijer A, Chu DKW, Bleicker T, Brünink S, Schneider J, Schmidt ML, Mulders DGJC, Haagmans BL, Van der Veer B, Van den Brink S, Wijsman L, Goderski G, Romette J-L, Ellis J, Zambon M, Peiris M, Goossens H, Reusken C, Koopmans M PG, Drosten C. Detection of 2019 novel coronavirus (2019-ncov) by real-time rt-pcr. Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin 2020;25(3):2000045. https://doi.org/10.2807/1560-7917.ES.2020.25.3.2000045.

- 2.
Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Tao Q, Sun Z, Xia L. Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: A report of 1014 cases. Radiology 2020;0(0):200642. https://doi.org/10.1148/radiol.2020200642.

- 3.
Liu Y, Yan L-M, Wan L, Xiang T-X, Le A, Liu J-M, Peiris M, Poon LLM, Zhang W. 2020. Viral dynamics in mild and severe cases of covid-19, The Lancet Infectious Diseases, https://doi.org/10.1016/S1473-3099(20)30232-2.

- 4.
Duewer DL, Kline MC, Romsos EL. Real-time cdpcr opens a window into events occurring in the first few pcr amplification cycles. Anal Bioanal Chem 2015; 407 (30): 9061–9069. https://doi.org/10.1007/s00216-015-9073-8.

- 5.
Chen P, Huang X. Comparison of analytic methods for quantitative real-time polymerase chain reaction data. J Comput Biol 2015;22(11):988–996. https://doi.org/10.1089/cmb.2015.0023.

- 6.
Bar T, Ståhlberg A, Muszta A, Kubista M. Kinetic outlier detection (kod) in real-time pcr. Nucleic Acids Res 2003;31(17):e105–e105.

- 7.
Ruijter JM, Ramakers C, Hoogaars WMH, Karlen Y, Bakker O, Van den Hoff MJB, Moorman AFM. Amplification efficiency: linking baseline and bias in the analysis of quantitative pcr data. Nucleic Acids Res 2009;37(6):e45–e45.

- 8.
Rebrikov DV, Trofimov DY. Real-time pcr: A review of approaches to data analysis. Appl Biochem Microbiol 2006;42(5):455–463.

- 9.
Rutledge RG. Sigmoidal curve-fitting redefines quantitative real-time pcr with the prospective of developing automated high-throughput applications. Nucleic Acids Res 2004;32(22):e178–e178. https://doi.org/10.1093/nar/gnh177.

- 10.
Rutledge RG, Stewart D. A kinetic-based sigmoidal model for the polymerase chain reaction and its application to high-capacity absolute quantitative real-time pcr. BMC Biotech 2008;8(1):47. https://doi.org/10.1186/1472-6750-8-47.

- 11.
Lievens A, Van Aelst S, Van den Bulcke M, Goetghebeur E. Enhanced analysis of real-time pcr data by using a variable efficiency model: Fpk-pcr. Nucleic Acids Res 2012;40(2):e10–e10. https://doi.org/10.1093/nar/gkr775.

- 12.
Ruijter JM, Pfaffl MW, Zhao S, Spiess AN, Boggy G, Blom J, Rutledge RG, Sisti D, Lievens A, Preter] KD, Derveaux S, Hellemans J, Vandesompele J. Evaluation of qpcr curve analysis methods for reliable biomarker discovery: Bias, resolution, precision, and implications. Methods 2013;59 (1):32–46. https://doi.org/10.1016/j.ymeth.2012.08.011.

- 13.
Spiess A-N, Feig C, Ritz C. Highly accurate sigmoidal fitting of real-time pcr data by introducing a parameter for asymmetry. BMC Bioinf 2008;9(1):221. https://doi.org/10.1186/1471-2105-9-221.

- 14.
Smith RC. 2013. Uncertainty quantification: Theory, implementation, and applications Computational Science and Engineering SIAM.

- 15.
E2677-20 . 2020. standard test method for estimating limits of detection in trace detectors for explosives and drugs of interest, ASTM International.

- 16.
Tholen DW, Linnet K, Kondratovich MV, Armbruster DA, Garrett P, Jones RL, Kroll MH, Lequin RM, Pankratz T, Scassellati GA, Schimmel H, Tsai J. Protocols for determination of limits of detection and limits of quantitation; approved guidelines; 2004.

- 17.
Forootan A, Sjöback R, Björkman J, Sjögreen B, Linz L, Kubista M. Methods to determine limit of detection and limit of quantification in quantitative real-time pcr (qpcr). Biomol Detect Quantif 2017;12:1–6.

- 18.
Fonollosa J, Vergara A, Huerta R, Marco S. Estimation of the limit of detection using information theory measures. Anal. Chim. Acta 2014;810:1–9. https://doi.org/10.1016/j.aca.2013.10.030.

- 19.
Emergency use authorizations. 2020. https://www.fda.gov/medical-devices/emergency-situations-medical-d%evices/emergency-use-authorizations#covid19ivd.

- 20.
Pfaffl MW. A new mathematical model for relative quantification in real-time rt-pcr. Nucleic Acids Res. 2001;29(9):e45–e45.

- 21.
Jansson L, Hedman J. Challenging the proposed causes of the pcr plateau phase. Biomol. Detect. Quantif. 2019;17:100082. https://doi.org/10.1016/j.bdq.2019.100082.

- 22.
Barenblatt GI, Crighton DG, Isaakovich BG, Ablowitz MJ, Davis SH, Hinch EJ, Iserles A, Ockendon J, Olver PJ. 1996. Scaling, self-similarity, and intermediate asymptotics: Dimensional analysis and intermediate asymptotics Cambridge Texts in Applied Mathematics Cambridge University Press.

- 23.
Pathria RK, Beale PD. 1996. Statistical mechanics Elsevier Science.

- 24.
Romsos EL, Kline MC, Duewer DL, Toman B, Farkas N. 2018. Certification of standard reference material 2372a human dna quantitation standard Natl. Inst. Stand. Technol. Spec. Publ. 260-189.

- 25.
Dennis JE, Schnabel RB. 1996. Numerical methods for unconstrained optimization and nonlinear equations Society for Industrial and Applied Mathematics.

- 26.
Stoer J, Bulirsch R. Introduction to numerical analysis. New York: Springer; 2002.

- 27.
Patrone PN, Kearsley AJ, Majikes JM, Liddle JA. Analysis and uncertainty quantification of dna fluorescence melt data: Applications of affine transformations. Anal. Biochem. 2020;607:113773.

- 28.
2018. Matlab optimization toolbox,The MathWorks, Natick, MA, USA.

- 29.
Nocedal J, Wright S. 2006. Numerical optimization Springer Science & Business Media.

- 30.
Cdc 2019-novel coronavirus. 2020. (2019-ncov) real-time rt-pcr diagnostic panel, https://www.cdc.gov/coronavirus/2019-ncov/lab/virus-requests.html.

## Acknowledgments

The authors thank Dr. Charles Romine for catalyzing a series of discussions that led to this work

## Funding

This work is a contribution of the National Institute of Standards and Technology and is not subject to copyright in the United States

## Author information

### Affiliations

### Corresponding author

## Ethics declarations

###
**Conflict of interests**

The National Institute of Standards and Technology has submitted a provisional patent application covering the work described in this manuscript on behalf of authors P. Patrone, E. Romsos, P. Vallone, and A. Kearsley

## Additional information

### Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

### Research involving Human Participants and/or Animals

Use of the Human DNA Quantitation Standard Reference Material (SRM 2372a) has been reviewed and approved by the NIST Research Protections Office

## Rights and permissions

## About this article

### Cite this article

Patrone, P.N., Romsos, E.L., Cleveland, M.H. *et al.* Affine analysis for quantitative PCR measurements.
*Anal Bioanal Chem* **412, **7977–7988 (2020). https://doi.org/10.1007/s00216-020-02930-z

Received:

Revised:

Accepted:

Published:

Issue Date:

### Keywords

- qPCR
- DNA detection
- Measurement sensitivity
- Data analysis
- SARS-CoV-2