What can be observed in real time PCR and when does it show?

Real time, or quantitative, PCR typically starts from a very low concentration of initial DNA strands. During iterations the numbers increase, first essentially by doubling, later predominantly in a linear way. Observation of the number of DNA molecules in the experiment becomes possible only when it is substantially larger than initial numbers, and then possibly affected by the randomness in individual replication. Can the initial copy number still be determined? This is a classical problem and, indeed, a concrete special case of the general problem of determining the number of ancestors, mutants or invaders, of a population observed only later. We approach it through a generalised version of the branching process model introduced in Jagers and Klebaner (J Theor Biol 224(3):299–304, 2003. doi:10.1016/S0022-5193(03)00166-8), and based on Michaelis–Menten type enzyme kinetical considerations from Schnell and Mendoza (J Theor Biol 184(4):433–440, 1997). A crucial role is played by the Michaelis–Menten constant being large, as compared to initial copy numbers. In a strange way, determination of the initial number turns out to be completely possible if the initial rate v is one, i.e all DNA strands replicate, but only partly so when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v<1$$\end{document}v<1, and thus the initial rate or probability of succesful replication is lower than one. Then, the starting molecule number becomes hidden behind a “veil of uncertainty”. This is a special case, of a hitherto unobserved general phenomenon in population growth processes, which will be adressed elsewhere.


Introduction
In the polymerase chain reaction a molecule replicates with a probability p(z), which will be of the form under the asumption of Michaelis-Menten kinetics. Here, K is the Michaelis-Menten constant, large in terms of molecule numbers, z the number of DNA molecules at the actual round, and C a constant, which can be written as v K , where v is the maximal rate or speed of the reaction, corresponding to z = 0. Then, v = p(0) is the probability of successful replication under the most benign circumstances, and the decrease of p(z), as the number z of DNA strands present increases, mirrors that the latter are being synthesized from DNA building blocks, which disappear as the number of DNA molecules increases. As has been observed recently, though this is the general pattern, there are exceptions where the replication probability actually increases in the very first generation, due to impurities in templates (Ståhlberg et al. 2016).
In this paper we disregard this and rely upon the Michaelis-Menten based approach in Jagers and Klebaner (2003), where it was used to explain the first exponential but later linear growth of molecule numbers, see also Best et al. (2015), Lalam et al. (2004), Lievens et al. (2012). For a statistical analysis, where PCR is modeled by branching processes without environmental change due to growth but with random effects and starting numbers cf. Hanlon and Vidyashankar (2011).
Here we turn to the important task of determining the initial number, viewed as unknown but fixed, of molecules in a PCR amplification, i.e. classical quantitative PCR. In literature, it has been treated under the simplifying assumption of constant replication probabilities p(z), cf. Olofsson (2003), Vikalo et al. (2007). For an experimental approach based on differentiation see Swillens et al. (2004) and for a mathematical paper, focussing however on mutations in an abstract formulation see Piau (2005). Through the use of digital PCR (Vogelstein and Kinzler 1999) and barcoding (Best et al. 2015;Ståhlberg 2016, personal communication) new possibilities and techniques have been introduced. We hope to be able to treat such frameworks. The present work should be suitable for calibration and interpolation of density values in realtime PCR (Kubista 2016, personal communication) in the usual way. Observed values yield model parameter estimates. Thus specified, the model delivers predictions of missing values.
In our setup, the value of v turns out to be crucial, the cases 0 < v < 1 and v = 1 yielding quite different situations. If the starting efficiency v ∈ (0, 1), then individual molecules replicate randomly and essentially independently during an intitial phase. By branching process theory their number will therefore, to begin with, grow like the product of a random factor and the famous exponential population growth. Randomness is therefore an essential part of the initial conditions of later phases with more of interaction with the environment but also more of deterministic structure, due to law of large numbers effects. It is in this sense, the original starting number has been hidden by a 'veil of uncertainty'.
If, on the other hand, v = 1, the first observable process size can be inverted to yield the starting number.
This phenomenon is what we investigate, for PCR in the present paper and for populations in habitats with a finite carrying capacity in a companion paper (Chigansky et al. 2017), cf. also Barbour et al. (2015Barbour et al. ( , 2016. For somewhat related early examples from epidemic processes and a recent from population genetics, cf. Kendall (1956), Whittle (1955), Martin and Lambert (2015).

Mathematical setup
Denote the number of molecules in the n-th PCR cycle by Z n , n = 0, 1, 2, . . ., so that Z n can be viewed as generated by the recursion started at Z 0 , where the ξ n, j 's are Bernoulli random variables taking values 1 and 0 with complementary probabilities, and where F n−1 denotes the sigma-algebra of the events, observable before time n. Consider the process X n = Z n /K , which we shall call the density process. An important role in its behaviour is played by the function which is, indeed, the conditional expectation of X n given X n−1 = x, The following result is known, see Kurtz (1970), Klebaner (1993).
Theorem 1 Suppose that X 0 → x 0 , as K → ∞. Then, for any n, where f n denotes the n-th iterate of f .
If the PCR starts from a fixed number Z 0 of molecules, clearly Z 0 /K → 0. Since f (0) = 0, also f n (0) = 0, for any n, and it follows that lim K →∞ X n = 0, for any n. In other words, the limiting reaction is not observable at any fixed number of repetitions. The main result of this paper is that it becomes observable when the number of iterations is n To arrive at the result we make use of a linear replication process Y n , in which the probability of successful molecular replication is constant and equal to v. In each round each molecule is thus replaced by two with probability v, but remains there alone with probability 1 − v. The expected number of successors is thus 1 − Mathematically, this process is given recursively by [see e.g. Haccou et al. (2007), Harris (2002) or Jagers (1975)] where the η n, j are independent Bernoulli random variables with P(η n, j = 1) = v.
Since the Y n /b n constitute a uniformly integrable martingale, it has an a.s. limit If the process starts from Z 0 molecules, then in view of the branching property, the corresponding limit is where the W i are i.i.d. with the same continuous distribution as W . As is well known from branching process theory (see e.g. Theorem 8.2 in Harris (2002)), the moment generating function of the latter φ(s) = E[e −sW ], is unique among moment generating functions satisfying the functional equation The random variable W (Z 0 ) appears in the main result as an argument of the deterministic function H obtained as the limit Its existence and some properties are studied in the next section. Here we formulate the main result and an important corollary.

Theorem 2
Let v ∈ (0, 1] and start the PCR amplification from Z 0 molecules. Then along any subsequence, such that log b K are integers. Remark 1 With v = 1, the process Z n grows deterministically at the geometric rate b = 2 and in this case W (Z 0 ) = Z 0 . As will be increasingly clear, there are, however reasons to treat v = 1 separately.

Corollary 1
For v ∈ (0, 1] and any fixed n where f n denotes the n-th iterate of f and This assertion extends to weak convergence of the sequences regarded as random elements in R Z : Remark 2 The limits increase strictly with respect to n. If 0 < v < 1, their entries are continuous random variables with positive variance, whereas if v = 1 they are positive reals. If the limit in (6) is taken along an arbitrary subsequence K , then X [log b K ] is asymptotic to the same limit up to a deterministic correction, which emerges in the rounding:

Existence
Write the two expressions for f , (2) and where g(x) = vx 2 1+x . This expression is more suitable for analysis of iterates of f near zero.
It is easy to establish that f is increasing, which yields that all f n are increasing.
and the sequence f n (x/b n ) is monotone decreasing in n for any positive x. Therefore the following limit in (5) exists,

Continuity
We show next that the convergence in (5) is uniform on bounded intervals. First observe that It is now easy to see by induction, that for any n and x Next, by (7) the Taylor expansion reads for an appropriate θ n . Replace now x by x/b n+1 to have Hence we obtain where we have used that g( converges uniformly on compacts. As a consequence of uniform convergence, we have that H is continuous.

The functional equation
Further , by taking the limit as n → ∞, we obtain that H solves Schröder's functional equation However, since the zero function is a solution, we must show that H is not identically To show that H is positive, use (7) to obtain the following formula for the n-th iterate where, as usual, f 0 (x) = x. Replacing x with xb −n , we have Hence from (10), for any n which is strictly positive for 0 < x < 1. Therefore H (x) > 0 in this domain.

Monotonicity
We show next that H is increasing. Let H n (x) = f n (x/b n ). Then each H n (x) is increasing and thus H (x) = lim n→∞ H n (x) does not decrease. Further, recall that Taking the limit n → ∞, we see that H (x) is a strictly increasing function on an open vicinity of the origin.
Suppose now that H is constant on an interval [x 1 , x 2 ] with x 2 > x 1 . Then, by (9), H (x 1 /b k ) = H (x 2 /b k ) for any integer k ≥ 1 and, since H (x) does not decrease, it must be constant on all the intervals [x 1 /b k , x 2 /b k ]. In particular, H (x) cannot be strictly increasing on any open vicinity of the origin. The obtained contradiction shows that H is strictly increasing everywhere on R + .
Next, since we have shown that the H n converge uniformly, for any o n (1) → 0 as n → ∞. Thus we have the following corollary needed in the proofs to come.
Corollary 2 We shall also need the inverse G := H −1 . It is easy to see that it solves the functional equation

Proofs
Let us start with the fundamental recursive equation for the stochastic density process X n (cf. Klebaner 1993) Note that ε n is a martingale difference sequence E(ε n |F n−1 ) = 0 and The corresponding deterministic recursion, obtained by omitting the martingale difference term, is

Proof of Theorem 2
In what follows bar denotes the density processes, i.e.,Z n = Z n /K ,Ȳ n = Y n /K . Consider first the case v < 1. Define times where c ∈ ( 1 2 , 1) is an arbitrary fixed constant and K is such that both n 1 and n 2 are integers.
The crux of the proof is to approximate the density process X n =Z n := Z n /K in two steps. First, on the interval [0, n 1 ] by the linear processȲ , and then on the interval [n 1 , n 2 ] by the nonlinear deterministic recursion, however started from the random pointȲ n 1 , resulting from the first step.
Denote by φ k, (x) the flow, generated by the nonlinear deterministic recursion (13), i.e. its solution at time , when started from x at time k, x = φ k, (x k ) = f l−k (x k ).
Let us stress that all the random objects here are defined on the same probability space and by construction coupled as described at the beginning of the proof.
Subtracting the deterministic recursion (13) from the stochastic one (11) we have Thus the sequence δ n satisfies where we have used (12) to bound E|ε n |. Note that δ n 1 = 0, as both recursions start at the same point X n 1 at time n 1 . Therefore since c > 1 2 and (15) now follows. The proof of (16) is more delicate and is done by coupling. We construct the nonlinear and linear replication processes Z n and Y n on the same probability space as follows. Let U n, j n, j ∈ N be i.i.d. random variables with the uniform distribution on [0, 1]. Define ξ n, j = 1 U n, j ≤ v K K +Z n−1 and η n, j = 1 {U n, j ≤v} .
Then Z n and Y n are realized by the formulae (1) and (3) with ξ n, j and η n, j as above.
Since v K K +Z n−1 < v, we have ξ n, j ≤ η n, j for all n, j and therefore the linear process Y is always greater than the nonlinear process Z , Z n ≤ Y n , for all n.
Construct an auxilliary linear process V n , which bounds Z n from below until Z n gets larger than K γ for γ ∈ (0, 1). Actually we require that c < γ < 1. Let Then clearly, ζ n, j < ξ n, j as long as Z n−1 < K γ . Hence It is also clear that for all n, j, ζ n, j < η n, j hence V n ≤ Y n . Thus we obtain We show next that by using the inequality above. Since the moments of simple Galton-Watson processes are easily computed (Theorem 5.1 in Haccou et al. (2007), Harris (2002), or Jagers (1975) Since EY n 1 = b n 1 = K c also, the first term in (17) satisfies By the Cauchy-Schwartz inequality for the second term EV n 1 1 τ <n 1 ≤ EV 2 n 1 P(τ < n 1 ) Since Z n < Y n for all n, it takes longer for the former process to reach K γ than the corresponding time for the latter, where the last bound is Doob's inequality for the martingale Y n b −n . Taking into account that EV 2 n 1 ∼ K 2c , we obtain from the above estimates lim K →∞ K −c EV n 1 1 τ <n 1 = 0.
Recall that γ > c. It follows that the convergence to the limit in (18) holds in L 1 , and in probability. For the corresponding densities, we have by dividing through by K that Since φ n 1 ,n 2 (x) = f n 2 −n 1 (x) and the function f is concave ( f < 0), its derivative attains its maximum value at zero, f (0) = b and f n (x) ≤ b n for any x ≥ 0. Therefore | f n (x) − f n (y)| ≤ b n |x − y|. For y =Ȳ n 1 and x = X n 1 , this and (19) yields and the proof of case v < 1 is complete. Consider now the case v = 1. In this case, the probability of successful replication is and the function f is Here b = v + 1 = 2 and The proof is the same, except that the linear replication process Y n is in fact deterministic Y n = Z 0 2 n , if it starts with Z 0 molecules, because the probability of replication is 1, P(η n, j = 1) = v = 1. Hence the limit W = Y n /2 n = Z 0 . The theorem is proved.

Proof of Corollary 1
The result follows by induction on n from the fundamental representation (11). For n = 0 it is the statement of the main result. For n = 1 take limits as K → ∞ in (11), and note that the stochastic term vanishes. Similarly, having proved it for n, it follows for n + 1. The functional limit theorem follows from finite dimensional convergence implying convergence in the sequence space, cf. Billingsley (Billingsley 1999, p. 19).
where z is an unknown number and x = X 0 = z/K < ρ. Our aim is to determine z for K >> z. Mathematically, we shall interpret this as K → ∞. In PCR literature based on enzyme kinetic considerations, values of the Michaelis-Menten constant range at least from 10 6 (Lalam 2006) up to 10 15 (Gevertz et al. 2005), in terms of molecule numbers. There are then two cases, known or unknown rate v. In the latter situation, v will have to be estimated from the observed concentrations. Further, as pointed out, the cases v = 1 and v < 1 exhibit an intriguing disparity, viz. consider first v < 1. By Corollary 1 The limit process here has strictly increasing trajectories and its entries have continuous distributions, so with probability one none of them equals ρ. The first hitting time being a discontinuous functional with respect to the locally uniform metric on space of sequences, is however continuous almost surely under the limit law. Therefore τ K (ρ) := inf n ∈ Z; X log b K +n ≥ ρ converges weakly to τ (ρ) := inf{n ∈ Z; f n (H (W (z))) ≥ ρ} as K → ∞.
We disregard this nuisance and assume in both cases that we have observed concentration values strictly larger than ρ from log b K + τ K (ρ) ≈ log b K + τ (ρ) onwards: κ 0 = f τ (H (W (z))), κ 1 = f τ +1 (H (W (z)), κ 2 = f τ +2 (H (W (z)), . . ., and correspondingly (H (z)), . . . (to ease notation, we omit the dependence of τ upon ρ.) By (9) this simplifies to for v < 1 and otherwise. Note that typically, since the experimenter would like to catch the density as early as possible, κ 0 ≈ ρ, which for example could be of the order of 0.05. Since H (x) is fairly close to the diagonal H (x) = x for 0 ≤ x ≤ 0.5 (see Figure 1) and W (z) ≈ z, we can conclude that as a rule τ < 0. As well as assuming K and ρ known it is easy to think of situations where so is v. Then we can proceed directly to determining z. For v = 1 this is straightforward: More generally, If there is variation between the z-values thus obtained we can of course take arithmetic means of the right hand side for the different observed j. Now, if v < 1, we obtain in the sense that the right hand side is an observed value of the random variable W (z). The initial number z of DNA molecules has now been hidden from direct calculation. What can be done is to estimate z from data, e.g. maximise the density at the first point of observation, where * denotes convolution power, ψ is the density of W , which we know to have the moment generating function φ from Sect. 2, corresponding to v. In this, z is an unknown parameter and we obtain a maximum likelihood estimateẑ = argmax z ψ * z (t), where t = b −τ G(κ 0 ) and z ranges over natural numbers. Again we can also consider later κ-values and take averages, if this increases stability. Note that if z is large (but still much smaller than K ), then by the local central limit theorem the ML problem is roughly the same as finding z maximizing the normal density with mean z and variance z 1−v 1+v =: zσ 2 at the point t = b −τ G(κ 0 ), This yields the estimatê or rather one of its neighboring integers. Now, if entities cannot be deduced a priori the question arises to what extent they can be estimated from our sequence of observations. Clearly, in the limit the relation between an observation x and its successor in the next round will be that the latter converges to f (x), as K → ∞, by Corollary 1. Thus e.g., These problems are fairly standard in statistical literature but certainly deserve a special investigation in the present context, if possible together with an experimental study of replication of single or few molecules, in order to determine the initial efficiency, v.