Skip to main content
Log in

Stochastic Modeling and Simulation of Viral Evolution

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

RNA viruses comprise vast populations of closely related, but highly genetically diverse, entities known as quasispecies. Understanding the mechanisms by which this extreme diversity is generated and maintained is fundamental when approaching viral persistence and pathobiology in infected hosts. In this paper, we access quasispecies theory through a mathematical model based on the theory of multitype branching processes, to better understand the roles of mechanisms resulting in viral diversity, persistence and extinction. We accomplish this understanding by a combination of computational simulations and the theoretical analysis of the model. In order to perform the simulations, we have implemented the mathematical model into a computational platform capable of running simulations and presenting the results in a graphical format in real time. Among other things, we show that the establishment of virus populations may display four distinct regimes from its introduction into new hosts until achieving equilibrium or undergoing extinction. Also, we were able to simulate different fitness distributions representing distinct environments within a host which could either be favorable or hostile to the viral success. We addressed the most used mechanisms for explaining the extinction of RNA virus populations called lethal mutagenesis and mutational meltdown. We were able to demonstrate a correspondence between these two mechanisms implying the existence of a unifying principle leading to the extinction of RNA viruses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Software Availability and Requirements

The ENVELOPE program was written in C++ programming language, using the Qt 4.8.6 framework, with the Qwt 5.2.1 library. It runs on Linux and MAC-OSX operating systems and requires at least 2 GB of RAM memory and 1.5 MB of disk space. Its distribution is free to all users under the LGPL license. Binary files for Linux and MAC-OSX operating systems are available for download at: https://envelopeviral.000webhostapp.com.

References

  • Alberch P (1991) From genes to phenotype: dynamical systems and evolvability. Genetica 84(1):5–11

    Article  Google Scholar 

  • Antoneli F, Bosco FAR, Castro D, Janini LMR (2013) Viral evolution and adaptation as a multivariate branching process. In: BIOMAT 2012—proceedings of the international symposium on mathematical and computational biology, vol 13. World Scientific, pp 217–243. https://doi.org/10.1142/9789814520829_0013

  • Antoneli F, Bosco FAR, Castro D, Janini LMR (2013) Virus replication as a phenotypic version of polynucleotide evolution. Bull Math Biol 75(4):602–628. https://doi.org/10.1007/s11538-013-9822-9

    Article  MathSciNet  MATH  Google Scholar 

  • Athreya KB, Ney PE (1972) Branching processes. Springer, Berlin

    Book  MATH  Google Scholar 

  • Bergstrom CT, McElhany P, Real LA (1999) Transmission bottlenecks as determinants of virulence in rapidly evolving pathogens. Proc Natl Acad Sci 96(9):5095–5100

    Article  Google Scholar 

  • Bradwell K, Combe M, Domingo-Calap P, Sanjuán R (2013) Correlation between mutation rate and genome size in riboviruses: mutation rate of bacteriophage \(\text{ Q }\beta \). Genetics 195(1):243–251

    Article  Google Scholar 

  • Bull JJ, Sanjuán R, Wilke CO (2007) Theory of lethal mutagenesis for viruses. J Virol 18(6):2930–2939. https://doi.org/10.1128/JVI.01624-06

    Article  Google Scholar 

  • Bull JJ, Sanjuán R, Wilke CO (2008) Lethal mutagenesis. In: Domingo E, Parrish CR, Holland JJ (eds) Origin and evolution of viruses, 2nd edn. Academic Press, London, pp 207–218. https://doi.org/10.1016/B978-0-12-374153-0.00009-6 chap. 9

    Chapter  Google Scholar 

  • Burch CL, Chao L (2004) Epistasis and its relationship to canalization in the RNA virus \(\varPhi 6\). Genetics 167(2):559–567

    Article  Google Scholar 

  • Burch CL, Guyader S, Samarov D, Shen H (2007) Experimental estimate of the abundance and effects of nearly neutral mutations in the RNA virus \(\varPhi 6\). Genetics 176(1):467–476

    Article  Google Scholar 

  • Campbell RB (2003) A logistic branching process for population genetics. J Theor Biol 225(2):195–203

    Article  MathSciNet  Google Scholar 

  • Carrasco P, de la Iglesia F, Elena SF (2007) Distribution of fitness and virulence effects caused by single-nucleotide substitutions in Tobacco Etch virus. J Virol 18(23):12979–12984

    Article  Google Scholar 

  • Cerf R (2015) Critical population and error threshold on the sharp peak landscape for a Moran model. Mem Am Math Soc 233(1096):1–87

    MathSciNet  MATH  Google Scholar 

  • Cerf R (2015) Critical population and error threshold on the sharp peak landscape for the Wright–Fisher model. Ann Appl Probab 25(4):1936–1992

    Article  MathSciNet  MATH  Google Scholar 

  • Cerf R, Dalmau J (2016) The distribution of the quasispecies for a Moran model on the sharp peak landscape. Stoch Processes Appl 126(6):1681–1709

    Article  MathSciNet  MATH  Google Scholar 

  • Cuesta JA (2011) Huge progeny production during transient of a quasi-species model of viral infection, reproduction and mutation. Math Comp Model 54:1676–1681. https://doi.org/10.1016/j.mcm.2010.11.055

    Article  MathSciNet  MATH  Google Scholar 

  • Cuevas JM, Duffy S, Sanjuán R (2009) Point mutation rate of bacteriophage \(\varPhi \text{ X }174\). Genetics 183:747–749

    Article  Google Scholar 

  • Dalmau J (2015) The distribution of the quasispecies for the Wright–Fisher model on the sharp peak landscape. Stoch Processes Appl 125(1):272–293

    Article  MathSciNet  MATH  Google Scholar 

  • Dalmau J (2016) Distribution of the quasispecies for a Galton–Watson process on the sharp peak landscape. J Appl Probab 53(02):606–613

    Article  MathSciNet  MATH  Google Scholar 

  • Demetrius L (1985) The units of selection and measures of fitness. Proc R Soc Lond B 225(1239):147–159

    Article  Google Scholar 

  • Demetrius L (1987) An extremal principle of macromolecular evolution. Phys Scr 36(4):693

    Article  MathSciNet  MATH  Google Scholar 

  • Demetrius L (2013) Boltzmann, Darwin and directionality theory. Phys Rep 530(1):1–85

    Article  MathSciNet  MATH  Google Scholar 

  • Demetrius L, Schuster P, Sigmund K (1985) Polynucleotide evolution and branching processes. Bull Math Biol 47(2):239–262

    Article  MathSciNet  MATH  Google Scholar 

  • Devroye L (1986) Non-uniform random variate generation. Springer, Berlin

    Book  MATH  Google Scholar 

  • Di Mascio M, Markowitz M, Louie M, Hogan C, Hurley A, Chung C, Ho DD, Perelson AS (2003) Viral blip dynamics during highly active antiretroviral therapy. J Virol 77(22):12165–12172

    Article  Google Scholar 

  • Dietz K (2005) Darwinian fitness, evolutionary entropy and directionality theory. BioEssays 27:1097–1101

    Article  Google Scholar 

  • Domingo E, Holland JJ (1997) RNA virus mutations and fitness for survival. Ann Rev Microbiol 51(1):151–178

    Article  Google Scholar 

  • Domingo E, Martin V, Perales C, Grande-Perez A, Garcia-Arriaza J, Arias A (2006) Viruses as quasispecies: biological implications. In: Domingo E (ed) Quasispecies: concept and implications for virology. Springer, Berlin, pp 51–82

    Chapter  Google Scholar 

  • Domingo E, Martínez-Salas E, Sobrino F, de la Torre JC, Portela A, Ortín J, López-Galindez C, Pérez-Breña P, Villanueva N, Nájera R (1985) The quasispecies (extremely heterogeneous) nature of viral RNA genome populations: biological relevance—a review. Gene 40(1):1–8

    Article  Google Scholar 

  • Domingo E, Sabo D, Taniguchi T, Weissmann G (1978) Nucleotide sequence heterogeneity of an RNA phage population. Cell 13:635–744

    Article  Google Scholar 

  • Domingo-Calap P, Cuevas JM, Sanjuán R (2009) The fitness effects of random mutations in single-stranded DNA and RNA bacteriophages. PLoS Genet 5(11):e1000,742

    Article  Google Scholar 

  • Drake JW (2012) A test of Kimura’s mutation-rate conjecture. In: Mothersill CE, Korogodina VL, Seymour CB (eds) Radiobiology and environmental security. Springer, Berlin, pp 13–18

    Chapter  Google Scholar 

  • Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58:465–523

    Article  Google Scholar 

  • Eigen M (1993) Viral quasispecies. Sci Am 269:42–49

    Article  Google Scholar 

  • Eigen M, Schuster P (1979) The hypercycle. A principle of natural self-organization. Springer, Berlin

    Google Scholar 

  • Feller W (1968) An introduction to probability theory and its applications, vol 1, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  • Fiebig EW, Wright DJ, Rawal BD, Garrett PE, Schumacher RT, Peddada L, Heldebrant C, Smith R, Conrad A, Kleinman SH, Busch MP (2003) Dynamics of HIV viremia and antibody seroconversion in plasma donors: implications for diagnosis and staging of primary HIV infection. Aids 17(13):1871–1879

    Article  Google Scholar 

  • Fortuna MA, Zaman L, Ofria C, Wagner A (2017) The genotype-phenotype map of an evolving digital organism. PLoS Comput Biol 13(2):e1005,414

    Article  Google Scholar 

  • Furió V, Moya A, Sanjuán R (2005) The cost of replication fidelity in an RNA virus. Proc Natl Acad Sci U S A 102(29):10233–10237

    Article  Google Scholar 

  • Gallant JE (2007) Making sense of blips. J Infect Dis 196(12):1729–1731

    Article  Google Scholar 

  • Gupta V, Dixit NM (2015) Scaling law characterizing the dynamics of the transition of HIV-1 to error catastrophe. Phys Biol 12(5):054,001

    Article  Google Scholar 

  • Harris TE (1963) The theory of branching processes. Springer, Berlin

    Book  MATH  Google Scholar 

  • Jagers P, Klebaner FC, Sagitov S (2007) On the path to extinction. Proc Natl Acad Sci U S A 104(15):6107–6111

    Article  MathSciNet  MATH  Google Scholar 

  • Kesten H, Stigum BP (1966) Additional limit theorems for indecomposable multidimensional Galton–Watson processes. Ann Math Stat 37(6):1463–1481

    Article  MathSciNet  MATH  Google Scholar 

  • Kesten H, Stigum BP (1966) A limit theorem for multidimensional Galton–Watson processes. Ann Math Stat 37(5):1211–1223

    Article  MathSciNet  MATH  Google Scholar 

  • Kesten H, Stigum BP (1967) Limit theorems for decomposable multi-dimensional Galton–Watson processes. J Math Anal Appl 17:309–338

    Article  MathSciNet  MATH  Google Scholar 

  • Kimmel M, Axelrod DE (2002) Branching processes in biology. Springer, New York

    Book  MATH  Google Scholar 

  • Kimura M, Maruyama T (1966) The mutational load with epistatic gene interactions in fitness. Genetics 54(6):1337

    Google Scholar 

  • Kurtz TG, Lyons R, Pemantle R, Peres Y (1994) A conceptual proof of the Kesten–Stigum theorem for multi-type branching processes. In: Athreya K, Jagers P (eds) Classical and modern branching processes, vol 84. Springer, New York, pp 181–185 IMA Vol. Math. Appl

    Chapter  MATH  Google Scholar 

  • Lambert A (2005) The branching process with logistic growth. Ann Appl Probab 15(2):1506–1535

    Article  MathSciNet  MATH  Google Scholar 

  • Lee PK, Kieffer TL, Siliciano RF, Nettles RE (2006) HIV-1 viral load blips are of limited clinical significance. J Antimicrob Chemother 57(5):803–805

    Article  Google Scholar 

  • Loeb LA, Essigmann JM, Kazazi F, Zhang J, Rose KD, Mullins JI (1999) Lethal mutagenesis of HIV with mutagenic nucleoside analogs. Proc Natl Acad Sci U S A 96:1492–1497

    Article  Google Scholar 

  • Lotka AJ (1939) Théorie analytique des associations biologiques. Part II. analyse démographique avec application particuliere al’espece humaine. Actualités Scientifiques et Industrielles 780:123–136

    MATH  Google Scholar 

  • Lynch M, Bürger R, Butcher D, Gabriel W (1993) The mutational meltdown in asexual populations. J Hered 84(5):339–344

    Article  Google Scholar 

  • Lynch M, Gabriel W (1990) Mutation load and the survival of small populations. Evolution 44:1725–1737

    Article  Google Scholar 

  • Manrubia SC, Lázaro E, Pérez-Mercader J, Escarmís C, Domingo E (2003) Fitness distributions in exponentially growing asexual populations. Phys Rev Lett 90(18):188,102

    Article  Google Scholar 

  • Matuszewski S, Ormond L, Bank C, Jensen JD (2017) Two sides of the same coin: a population genetics perspective on lethal mutagenesis and mutational meltdown. Virus Evolut 3(1):vex004

    Google Scholar 

  • McMichael AJ, Borrow P, Tomaras GD, Goonetilleke N, Haynes BF (2010) The immune response during acute HIV-1 infection: clues for vaccine development. Nat Rev Immunol 10(1):11–23

    Article  Google Scholar 

  • Mode CJ, Sleeman CK (2012) Stochastic processes in genetics and evolution: computer experiments in the quantification of mutation and selection. World Scientific, Singapore

    Book  Google Scholar 

  • Mode CJ, Sleeman CK, Raj T (2013) On the inclusion of self regulating branching processes in the working paradigm of evolutionary and population genetics. Front Genet 4:11

    Article  Google Scholar 

  • Nagaev AV (1967) On estimating the expected number of direct descendants of a particle in a branching process. Theory Probab Appl 12(2):314–320

    Article  MathSciNet  MATH  Google Scholar 

  • Nettles RE, Kieffer TL (2006) Update on HIV-1 viral load blips. Curr Opin HIV AIDS 1(2):157–161

    Google Scholar 

  • Nettles RE, Kieffer TL, Kwon P, Monie D, Han Y, Parsons T, Cofrancesco J, Gallant JE, Quinn TC, Jackson B (2005) Intermittent HIV-1 viremia (blips) and drug resistance in patients receiving HAART. Jama 293(7):817–829

    Article  Google Scholar 

  • Peris JB, Davis P, Cuevas JM, Nebot MR, Sanjuán R (2010) Distribution of fitness effects caused by single-nucleotide substitutions in bacteriophage F1. Genetics 185(2):603–609

    Article  Google Scholar 

  • Rong L, Perelson AS (2009) Asymmetric division of activated latently infected cells may explain the decay kinetics of the HIV-1 latent reservoir and intermittent viral blips. Math Biosci 217(1):77–87

    Article  MathSciNet  MATH  Google Scholar 

  • Rong L, Perelson AS (2009) Modeling HIV persistence, the latent reservoir, and viral blips. J Theor Biol 260(2):308–331

    Article  MathSciNet  MATH  Google Scholar 

  • Sanjuán R, Moya A, Elena SF (2004) The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. Proc Natl Acad Sci U S A 101:8396–8401

    Article  Google Scholar 

  • Schuster P, Swetina J (1988) Stationary mutant distributions and evolutionary optimization. Bull Math Biol 50(6):635–660

    Article  MathSciNet  MATH  Google Scholar 

  • Servedio MR, Brandvain Y, Dhole S, Fitzpatrick CL, Goldberg EE, Stern CA, Cleve JV, Yeh DJ (2014) Not just a theory: the utility of mathematical models in evolutionary biology. PLoS Biol 12(12):e1002,017. https://doi.org/10.1371/journal.pbio.1002017

    Article  Google Scholar 

  • Swetina J, Schuster P (1982) Self-replication with errors: a model for polynucleotide replication. Biophys Chem 16(4):329–345. https://doi.org/10.1016/0301-4622(82)87037-3

    Article  Google Scholar 

  • Takeuchi N, Hogeweg P (2007) Error-threshold exists in fitness landscapes with lethal mutants. BMC Evolut Biol 7(1):15

    Article  Google Scholar 

  • Tromas N, Elena SF (2010) The rate and spectrum of spontaneous mutations in a plant RNA virus. Genetics 185(3):983–989

    Article  Google Scholar 

  • Watson HW, Galton F (1874) On the probability of the extinction of families. J Anthropol Inst Great Br Irel 4:138–144

    Article  Google Scholar 

  • Wilke CO (2005) Quasispecies theory in the context of population genetics. BMC Evolut Biol 5(1):44

    Article  Google Scholar 

  • Zhu Y, Yongky A, Yin J (2009) Growth of an RNA virus in single cells reveals a broad fitness distribution. Virology 385(1):39–46. https://doi.org/10.1016/j.virol.2008.10.031

    Article  Google Scholar 

Download references

Acknowledgements

LG acknowledges the support of FAPESP through the Grant Number 14/13382-1. BG and DC received financial support from CAPES.

Author information

Authors and Affiliations

Authors

Contributions

LG and DC contributed equally to this work. LMRJ and FA contributed equally to this work. Conceived the model and formulated the underlying theory: LMJR and FA. Implemented the software: LG, DC and BG. Simulated the model and analyzed the output: LG and DC. Wrote the paper: LMRJ and FA.

Corresponding author

Correspondence to Fernando Antoneli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

A Review of Multitype Branching Process Theory

A discrete-time multitype branching process with types or classes indexed by a nonnegative integer r ranging from 0 to R is described by a sequence of vector-valued random variables \(\varvec{Z}_n=(Z_n^0,\ldots ,Z_n^R)\), (\(n=0,1,\ldots \)), where \(Z_n^r\) is the number of particles of type or class r in the nth generation. The initial population is represented by a vector of nonnegative integers \(\varvec{Z}_0\) (also called a multi-index) which is nonzero and non-random. The time evolution of the population is determined by a vector-valued discrete probability distribution \(\varvec{\zeta }(\varvec{i})=\big (\zeta _r(\varvec{i})\big )\), defined on the set of multi-indices \(\varvec{i}=(i^0,\ldots ,i^R)\), called the offspring distribution of the process, which is usually encoded as the coefficients of a vector-valued multivariate power series \(\varvec{f}(\varvec{z})=\big (f_r(\varvec{z})\big )\), called probability generating function (PGF).

The mean matrix or the matrix of first moments\(\varvec{M}=\{M_{ij}\}\) of a multitype branching process describes how the average number of particles in each type or class evolves in time and is defined by \(M_{ij}=\mathbf {E}(Z_1^i|Z^j_0=1)\), where \(Z^j_0=1\) is the abbreviation of \(\varvec{Z}_0=(0,\ldots ,1,\ldots ,1)\). In terms of the probability generating function \(\varvec{f}=(f_0,\ldots ,f_R)\) it is given by

$$\begin{aligned} M_{ij}=\dfrac{\partial f_j}{\partial z_i}(\varvec{s})\bigg |_{\varvec{s}=\varvec{1}} \end{aligned}$$
(3)

where \(\varvec{1}=(1,1,\ldots ,1)\). Typically, the mean matrix \(\varvec{M}\) is nonnegative, and hence it has a largest nonnegative eigenvalue. When the largest eigenvalue is positive, it coincides with the spectral radius of \(\varvec{M}\) and it is called, following Kimmel and Axelrod (2002), the malthusian parameter\(\mu \).

The vector of extinction probabilities of a multitype branching process, denoted by \(\varvec{\gamma }=(\gamma _0,\ldots ,\gamma _R)\), where \(0 \leqslant \gamma _r\leqslant 1\), is defined by the condition that \(\gamma _r\) is the probability that the process eventually become extinct given that initially there was exactly one particle of class r.

The classification theorem of multitype branching processes states that there are only three possible regimes for a multitype branching process (Harris 1963; Athreya and Ney 1972; Kimmel and Axelrod 2002):

Super-critical::

If \(\mu >1\) then \(0\leqslant \gamma _r<1\) for all r and, with positive probability the population survives indefinitely.

Sub-critical::

If \(\mu <1\) then \(\gamma _r=1\) for all r and with probability 1 the population becomes extinct in finite time.

Critical::

If \(\mu =1\) then \(\gamma _r=1\) for all r and with probability 1 the population becomes extinct; however, the expected time to the extinction is infinite.

When a multitype branching process is super-critical, it is expected that, according to the “Malthusian Law of Growth” it will grow indefinitely at a geometric rate proportional to \(\mu ^n\), where \(\mu \) is the malthusian parameter, \(\varvec{Z}_n \approx \mu ^n \,\varvec{W}_n\) for some bounded random vector \(\varvec{W}_n\), when \(n \rightarrow \infty \). The formalization of the above heuristic reasoning is given by the Kesten–Stigum limit theorem for super-critical multitype branching processes (see Kesten and Stigum 1966a, b, 1967). If \(\varvec{W}_n=\varvec{Z}_n/\mu ^n\) then there exists a scalar random variable \(W \ne 0\) such that, with probability one,

$$\begin{aligned} \lim _{n\rightarrow \infty } \varvec{W}_n = W \,\varvec{u} \end{aligned}$$
(4)

where \(\varvec{u}\) is the right eigenvector corresponding to the malthusian parameter \(\mu \) and

$$\begin{aligned} \mathbf {E}(W|\varvec{Z}_0)=\varvec{v}^{\mathrm {t}} \varvec{Z}_0 \end{aligned}$$
(5)

where \(\varvec{v}\) is the left eigenvector corresponding to the malthusian parameter \(\mu \). The vectors \(\varvec{u}\) and \(\varvec{v}\) may be normalized so that \(\varvec{v}^{\mathrm {t}}\varvec{u}=1\) and \(\varvec{1}^{\mathrm {t}}\varvec{u}=1\) where \({}^{\mathrm {t}}\) denotes the transpose of a vector. Moreover, under the assumption that \(\varvec{M}\) is nonnegative [which is satisfied by the phenotypic model (18)], the right and left eigenvectors corresponding to the malthusian parameter are nonnegative.

The normalization of right eigenvector \(\varvec{u}=(u_0,\ldots ,u_R)\) implies that \(\sum _r u_r=1\), and therefore one has the “law of convergence of types” (see Kurtz et al. 1994)

$$\begin{aligned} \lim _{n\rightarrow \infty } \dfrac{\varvec{Z}_n}{|\varvec{Z}_n|} = \varvec{u} \,, \end{aligned}$$
(6)

where \(|\varvec{Z}_n|=\sum _r Z_n^r\) is the total population at the nth generation and the equality holds almost surely. Equation (6) asserts that the asymptotic proportion of a replicative class r converges almost surely to the constant value \(u_r\).

In particular, Eq. (6) implies that the malthusian parameter is the asymptotic relative growth rate of the population

$$\begin{aligned} \mu = \lim _{n\rightarrow \infty } \dfrac{|\varvec{Z}_{n}|}{|\varvec{Z}_{n-1}|} = \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \,\sum _{j=1}^{|\varvec{Z}_{n-1}|} \#[\,j\,] \end{aligned}$$
(7)

since \(|\varvec{Z}_{n-1}|\) may be interpreted as the set of “parental particles” of the particles in the nth generation and \(|\varvec{Z}_{n}|\) is the sum of the “progeny sizes” \(\#[\,j\,]\) of the “parental particles” j from the previous generation.

Now consider the quantitative random variable \(\rho \) defined on the set of classes \(\{0,\ldots ,R\}\) and having probability distribution \((u_0,\ldots ,u_R)\), called the asymptotic distribution of classes. When the classes are indexed by their expectation values, the variable \(\rho \) associates to a random particle its expected class

$$\begin{aligned} \mathbf {P}(\rho =r)=u_r \,. \end{aligned}$$

Therefore, one can define the average reproduction rate of the population as

$$\begin{aligned} \langle \rho \rangle = \sum _{r=0}^R r \, u_r \,. \end{aligned}$$
(8)

Using Eqs. (4), (5), (6) one can show that the average reproduction rate is equal to the malthusian parameter:

$$\begin{aligned} \langle \rho \rangle = \mu \,. \end{aligned}$$
(9)

The average population size at the nth generation is \(|\langle \varvec{Z}_n \rangle | = \sum _{r=0}^R \langle Z^r_n \rangle \). Then for \(n\rightarrow \infty \), Eq. (4) gives \(|\langle \varvec{Z}_n \rangle | \approx \mu ^n |\langle \varvec{W}_n \rangle | \approx \mu ^n \langle W \rangle \) and so

$$\begin{aligned} \mu = \lim _{n\rightarrow \infty }\dfrac{|\langle \varvec{Z}_{n} \rangle |}{|\langle \varvec{Z}_{n-1} \rangle |} \end{aligned}$$
(10)

On the other hand, from the definition of mean matrix and its form (18), one has

$$\begin{aligned} |\langle \varvec{Z}_n \rangle | = |\varvec{M}\,\langle \varvec{Z}_{n-1} \rangle | =\sum _{r=0}^R r\,\langle Z^r_{n-1} \rangle \,. \end{aligned}$$

Now dividing by \(|\langle \varvec{Z}_{n-1} \rangle |\) and taking the limit \(n\rightarrow \infty \) gives

$$\begin{aligned} \mu = \lim _{n\rightarrow \infty }\dfrac{|\langle \varvec{Z}_{n} \rangle |}{|\langle \varvec{Z}_{n-1} \rangle |} = \lim _{n\rightarrow \infty }\sum _{r=0}^R r \,\dfrac{\langle Z^r_{n-1}\rangle }{|\langle \varvec{Z}_{n-1} \rangle |} = \sum _{r=0}^R r \, u_r = \langle \rho \rangle \end{aligned}$$

where here we used Eqs. (5) and (6) in the third equality from left to right.

In analogy with the characterization of the malthusian parameter as given by Eq. (7), one may define the asymptotic populational variance

$$\begin{aligned} \sigma ^2 = \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \, \sum _{j=1}^{|\varvec{Z}_{n-1}|} \#[\,j\,]^2 - \mu ^2 \end{aligned}$$
(11)

and in analogy with the mean reproduction rate, one may define the (squared) phenotypic diversity as

$$\begin{aligned} \sigma _{\rho }^2 = \langle \rho ^2 \rangle - \langle \rho \rangle ^2 \end{aligned}$$
(12)

By decomposing the sum in Eq. (11) according to the classes r, one obtains

$$\begin{aligned} \sum _{j=1}^{|\varvec{Z}_{n-1}|} \#[\,j\,]^2 = \sum _{r=0}^R \sum _{j_r=1}^{Z_{n-1}^r} \#[\,j_r\,]^2 \end{aligned}$$

where \(j_r\) runs over the particles of class r for \(r=0,\ldots ,R\) and \(\#[\,j_r\,]\) are independent random variables assuming nonnegative values with probability distribution \(t_r\), called fitness distribution of class r.

Denoting the variance of the fitness distribution \(t_r\) by \(\sigma ^2_{r}\), one may write the limit in Eq. (11) as

$$\begin{aligned} \sigma ^2= & {} \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \, \sum _{j=1}^{|\varvec{Z}_{n-1}|} \#[\,j\,]^2-\mu ^2 \\= & {} \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \,\sum _{r=0}^R \left[ Z^r_{n-1} \left( \dfrac{1}{Z^r_{n-1}} \sum _{j_r=1}^{Z_{n-1}^r}\#[\,j\,]^2 - r^2 \right) +Z^r_{n-1}\right] -\mu ^2 \\= & {} \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \,\sum _{r=0}^R (\sigma ^2_{r}+r^2)Z^r_{n-1}-\mu ^2 \end{aligned}$$

Then Eqs. (6), (9) and (12) give

$$\begin{aligned} \sigma ^2 = \sum _{r=0}^R (\sigma ^2_{r}+r^2 )\,u_r-\mu ^2 = \sum _{r=0}^R \sigma ^2_{r}\,u_r+\sigma ^2_{\rho } \end{aligned}$$
(13)

The difference between the asymptotic populational variance and the (squared) phenotypic diversity, called normalized populational variance, is the weighted average of the variances of the fitness distributions

$$\begin{aligned} \phi = \sigma ^2 - \sigma ^2_{\rho } = \sum _{r=0}^R \sigma ^2_{r}\,u_r \,. \end{aligned}$$
(14)

In particular, when the family of fitness distributions is the deterministic family the populational variance is exactly the phenotypic diversity (that is \(\phi =0\)). This is an expected result since the Delta distributions \(t_r(k)=\delta _{rk}\) have zero variance, and hence the only source of fluctuation of the population size is due to its stratification into replicative classes, which is expressed by the phenotypic diversity.

B Mathematical Basis of the Phenotypic Model

Based on the general aspects of the phenomenon of viral replication described before, it is compelling to model it in terms of a branching process. At each replicative cycle, every parental particle in the replicative class r produces a random number of progeny particles that is independently drawn from the corresponding fitness distribution.

A fitness distribution is a member of a location-scale family of discrete probability distributions \(t_r\) parameterized by the replicative classes (\(r=0,\ldots ,R\)) assuming nonnegative integer values and normalized so that the expectation value of \(t_r\), defined as \(\sum _{k} k\,t_r(k)\), is exactly r and \(t_0(k)=\delta _{k0}\). Here \(\delta _{kr}=1\) if \(k=r\) and \(\delta _{kr}=0\) if \(k \ne r\). Therefore, each particle in the viral population is characterized by the mean value of its fitness distribution, called mean replicative capability. Viral particles with replicative capability equal to zero (0) do not generate progeny; viral particles with replicative capability one (1) generate one particle on average; viral particles with replicative capability two (2) generate two particles on average, and so on. Typical examples of location-scale families of discrete probability distributions that can be used as fitness distributions are:

  1. (a)

    The family of Deterministic (Delta) distributions: \(t_r(k)=\delta _{kr}\).

  2. (b)

    The family of Poisson distributions: \(t_r(k)=\mathrm {e}^{-r}\tfrac{r^k}{k!}\).

Note that in the first example, the replicative capability is completely concentrated on the mean value r – that is, the particles have deterministic fitness. On the other hand, in the second example the fitness is truly stochastic.

During the replication, each progeny particle always undergoes one of the following effects:

Deleterious effect::

the mean replication capability of the respective progeny particle decreases by one. Note that when the particle has capability of replication equal to 0, it will not produce any progeny at all.

Beneficial effect::

the replication capability of the respective progeny particle increases by one. If the mean replication capability of the parental particle is already the maximum allowed, then the mean replication capability of the respective progeny particles will be the same as the replicative capability of the parental particle.

Neutral effect::

the mean replication capability of the respective progeny particle remains the same as the mean replication capability of the parental particle.

To define which effect will occur during a replication event, probabilities d, b and c are associated, respectively, with the occurrence of deleterious, beneficial and neutral effects. The only constraints these numbers should satisfy are \(0\leqslant d,b,c\leqslant 1\) and \(b+c+d=1\). In the case of in vitro experiments with homogeneous cell populations, the probabilities c, d and b essentially refer to the occurrence of mutations.

The probability generating function (PGF) of the phenotypic model with \(b=0\) and \(t_r(k)=\delta _{kr}\) is (see Antoneli et al. (2013a, b) for details):

$$\begin{aligned} f_0(z_0,z_1,\ldots ,z_R)&= 1 n\nonumber \\ f_1(z_0,z_1,\ldots ,z_R)&= dz_0+cz_1 \nonumber \\ f_2(z_0,z_1,\ldots ,z_R)&= (dz_1+cz_2)^2 \nonumber \\&\vdots \nonumber \\ f_R(z_0,z_1,\ldots ,z_R)&= (dz_{R-1}+cz_R)^R \end{aligned}$$
(15)

Note that the functions \(f_r(z_0,z_1,\ldots ,z_R)\) are polynomials whose coefficients are exactly the probabilities of the binomial distribution \(\mathrm {binom}(k;r,1-d)\). The PGF in the case with general beneficial effects and with a general family of fitness distribution (which reduces to the previous PGF when \(b=0\) and \(t_r(k)=\delta _{kr}\)) is given by.

$$\begin{aligned} f_0(z_0,z_1,\ldots ,z_R)&= 1 \nonumber \\ f_1(z_0,z_1,\ldots ,z_R)&= \sum _{k=0}^\infty \,t_1(k)\, (dz_0+cz_1+bz_2)^k \nonumber \\ f_2(z_0,z_1,\ldots ,z_R)&= \sum _{k=0}^\infty \,t_2(k)\, (dz_1+cz_2+bz_3)^k \nonumber \\&\vdots \nonumber \\ f_R(z_0,z_1,\ldots ,z_R)&= \sum _{k=0}^\infty \,t_R(k)\, (dz_{R-1}+(c+b)z_R)^k \end{aligned}$$
(16)

Note that in the last equation, the beneficial effect acts like the neutral effect. This is a kind of “consistency condition” ensuring that the populational replicative capability is, on average, upper bounded by R. Even though it is possible that a parental particle in the replicative classes R eventually has more than R progeny particles when \(t_r\) is not deterministic, the average progeny size is always R.

Finally, it is easy to see that the PGF of the two-dimensional case of the phenotypic model with \(b=0\) and \(z_0=1\) (and ignoring \(f_0\)) reduces to

$$\begin{aligned} f(z)~=~\sum _{k=0}^\infty \,t(k)\, ((1-c)+cz)^k~=~\sum _{k=0}^\infty \,t(k)\, (1-c(1-z))^k \,. \end{aligned}$$
(17)

This is formally identical to the PFG of the single-type model proposed by [Demetrius et al. 1985, p. 255, eq. (49)] for the evolution of polynucleotides. In their formulation, \(c=p^\nu \) is the probability that a given copy of a polynucleotide is exact, where the polymer has chain length of \(\nu \) nucleotides and p is the probability of copying a single nucleotide correctly. The replication distributiont(k) provides the number of copies a polynucleotide yields before it is degraded by hydrolysis.

A remarkable property of the phenotypic model that was fully explored in Antoneli et al. (2013a, b) is the fact that when \(b=0\) the phenotypic model is “exactly solvable” in a very specific sense.

It is straightforward form the generating function (16), using formula (3), that the matrix of the phenotypic model is given by

$$\begin{aligned} \varvec{M}=\begin{pmatrix} 0 &{}\quad d &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \ldots &{}\quad 0 \\ 0 &{}\quad c &{}\quad 2d &{}\quad 0 &{}\quad 0 &{}\quad \ldots &{}\quad 0 \\ 0 &{}\quad b &{}\quad 2c &{}\quad 3d &{}\quad 0 &{}\quad \ldots &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 2b &{}\quad 3c &{}\quad 4d &{}\quad \ldots &{}v 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 3b &{}\quad 4c &{}\quad \ldots &{}\quad 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{}\quad \vdots &{} \ddots &{}\quad Rd \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad (R-1)b &{}\quad R(c+b) \end{pmatrix} \,. \end{aligned}$$
(18)

Note that the mean matrix does depend on the fitness distributions \(t_r\) only through their mean values, since \(t_r\) are normalized to have the mean value r.

Assume for a moment that \(b=0\) (hence \(c=1-d\)). Then the mean matrix becomes upper-triangular, and hence its eigenvalues are the diagonal entries \(\lambda _r=r(1-d)\) and the malthusian parameter \(\mu \) is the largest eigenvalue \(\lambda _R\):

$$\begin{aligned} \mu =R(1-d) \,. \end{aligned}$$
(19)

Now suppose that \(b \ne 0\) is small compared to d and c (hence \(c=1-d-b\)). Then spectral perturbation theory allows one to write the malthusian parameter \(\mu \) as a power series

$$\begin{aligned} \mu = \mu _0 + \mu _1 b + \mu _2 b^2 + \cdots \end{aligned}$$

where \(\mu _0\) is the malthusian parameter for the case \(b=0\) and \(\mu _j\) are functions of the form \(R\,\tilde{m}_j(d,R)\). A lengthy calculation (see Antoneli et al. 2013b) gives the following result:

$$\begin{aligned} \mu = R \left( (1-d) + (R-1)\dfrac{d}{1-d}\,b + \varvec{O}(b^2)\right) \,. \end{aligned}$$
(20)

Let us return to the case \(b=0\) and consider the eigenvectors corresponding to the malthusian parameter \(\mu \). The right eigenvector \(\varvec{u}=(u_0,\ldots ,u_R)\) and the left eigenvector \(\varvec{v}=(v_0,\ldots ,v_R)\) may be normalized so that \(\varvec{v}^{\mathrm {t}}\varvec{u}=1\) and \(\varvec{1}^{\mathrm {t}}\varvec{u}=1\), where \({}^{\mathrm {t}}\) denotes the transpose of a vector. In Antoneli et al. (2013b), it is shown that the normalized right eigenvector \(\varvec{u}=(u_0,\ldots ,u_R)\) is given by

$$\begin{aligned} u_r=\left( {\begin{array}{c}R\\ r\end{array}}\right) \, (1-d)^r \, d^{R-r} \,. \end{aligned}$$
(21)

The fact that \(\varvec{u}\) is a binomial distribution is not accidental. Indeed, it can be shown that \(\varvec{u}\) is the probability distribution of a quantitative random variable \(\rho \) defined on the set of replicative classes \(\{0,\ldots ,R\}\), called the asymptotic distribution of classes, such that \(u_r=\mathrm {binom}(r;R,1-d)\) gives the limiting proportion of particles in the rth replicative class. Finally, when \(b \ne 0\) is small, spectral perturbation theory ensures that

$$\begin{aligned} u_r=\left( {\begin{array}{c}R\\ r\end{array}}\right) \, (1-d)^r \, d^{R-r} + \varvec{O}(b) \,. \end{aligned}$$
(22)

The phenotypic model is completely specified by the choice of the two probabilities b and d (since \(c=1-b-d\)), the maximum replicative capability R and a choice of a location-scale family of fitness distributions. Independent of the choice of family of fitness distributions, the parameter space of the model is the set \(\triangle ^2 \times \{R\in \mathbb {N}:R \geqslant 1\}\), where \(\triangle ^2=\{(b,d)\in [0,1]^2: b+d \leqslant 1\}\) is the two-dimensional simplex (see Fig. 10).

Fig. 10
figure 10

Parameter space of the phenotypic model. The blue line is boundary \(b+d=1\). The red, green and magenta curves are the critical curves \(\mu (b,d,R)=1\) for \(R=2,3,4\), respectively

In this parameter space, one can consider the critical curves\(\mu (b,d,R)=1\), where \(\mu (b,d,R)\) is the malthusian parameter as a function of the parameters of the phenotypic model. For each fixed R, the corresponding critical curve is independent of the fitness distributions and represents the parameter values (bd) such that the branching process is critical. Moreover, each curve splits the simplex into two regions representing the parameter values where the branching process is super-critical (above the curve) and sub-critical (below the curve).

One of the main results of Antoneli et al. (2013b) is a proof of the lethal mutagenesis criterion (Bull et al. 2007) for the phenotypic model, provided one assumes that all fitness effects are of a purely mutational nature. Recall that (Bull et al. 2007) assumes that all mutations are either neutral or deleterious and consider the mutation rate\(U=U_d+U_c\), where the component \(U_c\) comprises the purely neutral mutations and the component \(U_d\) comprises the mutations with a deleterious fitness effect. Furthermore, \(R_{\mathrm {max}}\) denotes the maximum replicative capability among all particles in the viral population. The lethal mutagenesis criterion proposed by Bull et al. (2007) states that a sufficient condition for extinction is

$$\begin{aligned} R_{\mathrm {max}}\,\mathrm {e}^{-U_d} < 1 \,. \end{aligned}$$
(23)

According to (Bull et al. 2007, 2008), \(\mathrm {e}^{-U_d}\) is both the mean fitness level and also the fraction of offspring with no non-neutral mutations. Moreover, in the absence of beneficial mutations and epistasis (Kimura and Maruyama 1966) the only type of non-neutral mutation are the deleterious mutations. Therefore, in terms of fitness effects, the probability \(\mathrm {e}^{-U_d}\) corresponds to \(1-d=c\). Since the evolution of the mean matrix depends only on the expected values of the fitness distribution \(t_r\), it follows that \(R_{\mathrm {max}}\) corresponds to R. That is, the lethal mutagenesis criterion of (23) is formally equivalent to extinction criterion

$$\begin{aligned} R(1-d) < 1 \end{aligned}$$
(24)

which is exactly the condition for the phenotypic model to become sub-critical. Formula (20) for the malthusian parameter provides a generalization of the extinction criterion (24) without the assumption that that all effects are either neutral or deleterious. If \(b>0\) is sufficiently small (up to order \(\varvec{O}(b^2)\)) and

$$\begin{aligned} R \left( (1-d) + (R-1)\dfrac{bd}{1-d}\right) < 1 \end{aligned}$$
(25)

then, with probability one, the population becomes extinct in finite time.

On the other hand, a deeper exploration of the implications of nonzero beneficial effects allowed for the discovery of a non-extinction criterion. If \(b>0\) is sufficiently small (up to order \(\varvec{O}(b^2)\)), R is sufficiently large (\(R \geqslant 10\) is enough) and

$$\begin{aligned} R^3 \, b > 1 \end{aligned}$$
(26)

then, asymptotically almost surely, the population cannot become extinct by increasing the deleterious probability d toward its maximum value \(1-b\) (see Antoneli et al. 2013b for details). In other words, a small increase in the beneficial probability may have a drastic effect on the extinction probabilities, possibly rendering the population impervious to become extinct by lethal mutagenesis (i.e., by the increase in deleterious effects).

In the theory of multitype branching processes, there are several variations as follows: continuous time, age dependent, self-regulated, etc. (see Athreya and Ney 1972; Harris 1963; Kimmel and Axelrod 2002). The implementation of a variation of the theory of multitype branching process accounting for the notions of evolutionary entropy and directionality theory (see Dietz 2005; Demetrius 2013) could be useful for studies on viral evolution. In this case, the malthusian parameter \(\mu \), which is the dominant eigenvalue of the mean matrix, could be expressed as the sum of two terms

$$\begin{aligned} \mu = H + \varPhi \,. \end{aligned}$$

The quantity H is called evolutionary entropy and \(\varPhi \) is called the reproductive potential (Demetrius 2013). An interesting direction to follow would be to develop an extinction criterion based on evolutionary entropy instead of the malthusian parameter.

C The Deterministic Selection Equation

According to (Demetrius et al. 1985; Demetrius 1985, 1987), one may associate to a multitype branching process a system of difference (or ordinary differential) equations, called selection equations, on the space of discrete probability distributions \(\triangle ^{R+1}=\{\varvec{p}\in \mathbb {R}^{R+1}:p_j\geqslant 0;\sum _{j}p_j=1\}\) over the finite state set \(\{0,\ldots ,R\}\). Given a discrete multitype branching process \(\varvec{Z}_n\), then the expectation values \(\langle \varvec{Z}_n\rangle \) satisfy \(\langle \varvec{Z}_n\rangle =\varvec{M}^n\varvec{Z}_0\), with \(\varvec{M}\) being the mean matrix of \(\varvec{Z}_n\). Hence \(\varvec{Z}_n\) is given by iteration of the difference equation \(\varvec{z}_n= \varvec{M}\varvec{z}_{n-1}\). This yields a discrete-time selection equation by normalizing the difference equation, thereby obtaining

$$\begin{aligned} \varvec{x}_n = \dfrac{1}{\varvec{1}^\mathrm {t}\varvec{M}\varvec{x}_{n-1}}\varvec{M}\varvec{x}_{n-1} \end{aligned}$$
(27)

where \(\varvec{1}=(1,\ldots ,1)\). Then, passing (27) to continuous time one obtains a continuous-time selection equation

$$\begin{aligned} \dot{\varvec{x}} = [\varvec{M}\varvec{x}-\varvec{x}(\varvec{1}^\mathrm {t}\varvec{M}\varvec{x})] \dfrac{1}{\varvec{1}^\mathrm {t}\varvec{M}\varvec{x}} \,. \end{aligned}$$
(28)

Multiplying the right hand side of Eq. (28) with the factor \(\varvec{1}^\mathrm {t}\varvec{M}\varvec{x}\), which is always strictly positive on \(\triangle ^{R+1}\), corresponds to a change in velocity (re-scaling time) and so, the solutions of (28) are the same as the solutions of

$$\begin{aligned} \dot{\varvec{x}} = \varvec{M}\varvec{x}-\varvec{x}(\varvec{1}^\mathrm {t}\varvec{M}\varvec{x}) \end{aligned}$$
(29)

It follows from general considerations (see Demetrius et al. 1985; Demetrius 1985, 1987) that Eq. (29) has a unique global stable equilibrium on \(\triangle ^{R+1}\) given by the normalized right eigenvector \(\varvec{u}\) of \(\varvec{M}\) corresponding to its largest eigenvalue \(\mu \). In this sense, the deterministic selection equation yields a description of the evolution of the normalized mean values of the corresponding stochastic model, thus defining a mean field (macroscopis) dynamics representing the infinite population limit of the branching process.

D The Power Law Distribution Family

It is typical to parameterize power law distributions by the exponents, which measures the “weight of the tail” of the distribution. However, we need to have a location-scale parameterized family in order to impose the same normalization as we have done for the other types of distributions. Therefore, we define the power law distribution with mean valuer by

$$\begin{aligned} \mathfrak {z}_r(k) = \frac{(k-1)^{s(r)}}{\zeta (s(r))} \end{aligned}$$

for \(k=0,1,\ldots ,\infty \) and \(r \geqslant 1\), where \(\zeta (s)\) is the Riemann zeta function, defined for \(s>1\), by

$$\begin{aligned} \zeta (s) = \sum _{n=1}^{\infty } \frac{1}{n^s} \end{aligned}$$

and the function s(r) is given by the inverse function of

$$\begin{aligned} r = \varphi (s) = \frac{\zeta (s-1)}{\zeta (s)}-1 \,. \end{aligned}$$

Namely, \(s=\varphi ^{-1}(r)\) for \(r\geqslant 1\) and hence when \(1 \leqslant r < \infty \) the exponent s satisfies \(3<s<2\). Moreover, the Laurent series expansion for \(r\rightarrow \infty \) (\(s \rightarrow 2\)) is given by:

$$\begin{aligned} s(r) \approx 2 + \frac{6}{\pi ^2(1+r-C)} \,. \end{aligned}$$
(30)

The constant C in the previous formula is given by \(C = [6 \gamma \pi ^2 - 36\,\zeta '(2)]/\pi ^4 \approx 0.6974\), where \(\gamma \) is Euler’s constant and \(\zeta '(2)\) is the derivative of \(\zeta (s)\) evaluated at 2. Observe that when the mean value \(r \geqslant 1\), the exponent \(s<3\), and so the variance of \(\mathfrak {z}_r(k)\) is infinite.

The implementation of the pseudo-random generation of samples from the distribution \(\mathfrak {z}_r(k)\) in the ENVELOPE program is based on the algorithm of Devroye (1986) for the Zipf distribution on the positive integers, using formula 30 for the computation of the exponent s given the mean value r. Pseudo-random generation for the remaining fitness distributions was implemented using the standard library of C++ programing language (this library requires C++ (2011) or superior).

E Main Routines of the ENVELOPE Program

figure a
figure b
figure c
figure d
figure e
figure f
figure g
figure h
figure i

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fabreti, L.G., Castro, D., Gorzoni, B. et al. Stochastic Modeling and Simulation of Viral Evolution. Bull Math Biol 81, 1031–1069 (2019). https://doi.org/10.1007/s11538-018-00550-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-018-00550-4

Keywords

Navigation