Stochastic Modeling and Simulation of Viral Evolution

Fabreti, Luiza Guimarães; Castro, Diogo; Gorzoni, Bruno; Janini, Luiz Mario Ramos; Antoneli, Fernando

doi:10.1007/s11538-018-00550-4

Stochastic Modeling and Simulation of Viral Evolution

Published: 14 December 2018

Volume 81, pages 1031–1069, (2019)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Luiza Guimarães Fabreti¹,
Diogo Castro¹,
Bruno Gorzoni¹,
Luiz Mario Ramos Janini² &
…
Fernando Antoneli ORCID: orcid.org/0000-0001-9179-4632³

810 Accesses
5 Citations
10 Altmetric
Explore all metrics

Abstract

RNA viruses comprise vast populations of closely related, but highly genetically diverse, entities known as quasispecies. Understanding the mechanisms by which this extreme diversity is generated and maintained is fundamental when approaching viral persistence and pathobiology in infected hosts. In this paper, we access quasispecies theory through a mathematical model based on the theory of multitype branching processes, to better understand the roles of mechanisms resulting in viral diversity, persistence and extinction. We accomplish this understanding by a combination of computational simulations and the theoretical analysis of the model. In order to perform the simulations, we have implemented the mathematical model into a computational platform capable of running simulations and presenting the results in a graphical format in real time. Among other things, we show that the establishment of virus populations may display four distinct regimes from its introduction into new hosts until achieving equilibrium or undergoing extinction. Also, we were able to simulate different fitness distributions representing distinct environments within a host which could either be favorable or hostile to the viral success. We addressed the most used mechanisms for explaining the extinction of RNA virus populations called lethal mutagenesis and mutational meltdown. We were able to demonstrate a correspondence between these two mechanisms implying the existence of a unifying principle leading to the extinction of RNA viruses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Getting to Know Viral Evolutionary Strategies: Towards the Next Generation of Quasispecies Models

Quasispecies Dynamics of RNA Viruses

Modelling Viral Evolution and Adaptation

Software Availability and Requirements

The ENVELOPE program was written in C++ programming language, using the Qt 4.8.6 framework, with the Qwt 5.2.1 library. It runs on Linux and MAC-OSX operating systems and requires at least 2 GB of RAM memory and 1.5 MB of disk space. Its distribution is free to all users under the LGPL license. Binary files for Linux and MAC-OSX operating systems are available for download at: https://envelopeviral.000webhostapp.com.

References

Alberch P (1991) From genes to phenotype: dynamical systems and evolvability. Genetica 84(1):5–11
Article Google Scholar
Antoneli F, Bosco FAR, Castro D, Janini LMR (2013) Viral evolution and adaptation as a multivariate branching process. In: BIOMAT 2012—proceedings of the international symposium on mathematical and computational biology, vol 13. World Scientific, pp 217–243. https://doi.org/10.1142/9789814520829_0013
Antoneli F, Bosco FAR, Castro D, Janini LMR (2013) Virus replication as a phenotypic version of polynucleotide evolution. Bull Math Biol 75(4):602–628. https://doi.org/10.1007/s11538-013-9822-9
Article MathSciNet MATH Google Scholar
Athreya KB, Ney PE (1972) Branching processes. Springer, Berlin
Book MATH Google Scholar
Bergstrom CT, McElhany P, Real LA (1999) Transmission bottlenecks as determinants of virulence in rapidly evolving pathogens. Proc Natl Acad Sci 96(9):5095–5100
Article Google Scholar
Bradwell K, Combe M, Domingo-Calap P, Sanjuán R (2013) Correlation between mutation rate and genome size in riboviruses: mutation rate of bacteriophage $\text{ Q }\beta $. Genetics 195(1):243–251
Article Google Scholar
Bull JJ, Sanjuán R, Wilke CO (2007) Theory of lethal mutagenesis for viruses. J Virol 18(6):2930–2939. https://doi.org/10.1128/JVI.01624-06
Article Google Scholar
Bull JJ, Sanjuán R, Wilke CO (2008) Lethal mutagenesis. In: Domingo E, Parrish CR, Holland JJ (eds) Origin and evolution of viruses, 2nd edn. Academic Press, London, pp 207–218. https://doi.org/10.1016/B978-0-12-374153-0.00009-6 chap. 9
Chapter Google Scholar
Burch CL, Chao L (2004) Epistasis and its relationship to canalization in the RNA virus $\varPhi 6$. Genetics 167(2):559–567
Article Google Scholar
Burch CL, Guyader S, Samarov D, Shen H (2007) Experimental estimate of the abundance and effects of nearly neutral mutations in the RNA virus $\varPhi 6$. Genetics 176(1):467–476
Article Google Scholar
Campbell RB (2003) A logistic branching process for population genetics. J Theor Biol 225(2):195–203
Article MathSciNet Google Scholar
Carrasco P, de la Iglesia F, Elena SF (2007) Distribution of fitness and virulence effects caused by single-nucleotide substitutions in Tobacco Etch virus. J Virol 18(23):12979–12984
Article Google Scholar
Cerf R (2015) Critical population and error threshold on the sharp peak landscape for a Moran model. Mem Am Math Soc 233(1096):1–87
MathSciNet MATH Google Scholar
Cerf R (2015) Critical population and error threshold on the sharp peak landscape for the Wright–Fisher model. Ann Appl Probab 25(4):1936–1992
Article MathSciNet MATH Google Scholar
Cerf R, Dalmau J (2016) The distribution of the quasispecies for a Moran model on the sharp peak landscape. Stoch Processes Appl 126(6):1681–1709
Article MathSciNet MATH Google Scholar
Cuesta JA (2011) Huge progeny production during transient of a quasi-species model of viral infection, reproduction and mutation. Math Comp Model 54:1676–1681. https://doi.org/10.1016/j.mcm.2010.11.055
Article MathSciNet MATH Google Scholar
Cuevas JM, Duffy S, Sanjuán R (2009) Point mutation rate of bacteriophage $\varPhi \text{ X }174$. Genetics 183:747–749
Article Google Scholar
Dalmau J (2015) The distribution of the quasispecies for the Wright–Fisher model on the sharp peak landscape. Stoch Processes Appl 125(1):272–293
Article MathSciNet MATH Google Scholar
Dalmau J (2016) Distribution of the quasispecies for a Galton–Watson process on the sharp peak landscape. J Appl Probab 53(02):606–613
Article MathSciNet MATH Google Scholar
Demetrius L (1985) The units of selection and measures of fitness. Proc R Soc Lond B 225(1239):147–159
Article Google Scholar
Demetrius L (1987) An extremal principle of macromolecular evolution. Phys Scr 36(4):693
Article MathSciNet MATH Google Scholar
Demetrius L (2013) Boltzmann, Darwin and directionality theory. Phys Rep 530(1):1–85
Article MathSciNet MATH Google Scholar
Demetrius L, Schuster P, Sigmund K (1985) Polynucleotide evolution and branching processes. Bull Math Biol 47(2):239–262
Article MathSciNet MATH Google Scholar
Devroye L (1986) Non-uniform random variate generation. Springer, Berlin
Book MATH Google Scholar
Di Mascio M, Markowitz M, Louie M, Hogan C, Hurley A, Chung C, Ho DD, Perelson AS (2003) Viral blip dynamics during highly active antiretroviral therapy. J Virol 77(22):12165–12172
Article Google Scholar
Dietz K (2005) Darwinian fitness, evolutionary entropy and directionality theory. BioEssays 27:1097–1101
Article Google Scholar
Domingo E, Holland JJ (1997) RNA virus mutations and fitness for survival. Ann Rev Microbiol 51(1):151–178
Article Google Scholar
Domingo E, Martin V, Perales C, Grande-Perez A, Garcia-Arriaza J, Arias A (2006) Viruses as quasispecies: biological implications. In: Domingo E (ed) Quasispecies: concept and implications for virology. Springer, Berlin, pp 51–82
Chapter Google Scholar
Domingo E, Martínez-Salas E, Sobrino F, de la Torre JC, Portela A, Ortín J, López-Galindez C, Pérez-Breña P, Villanueva N, Nájera R (1985) The quasispecies (extremely heterogeneous) nature of viral RNA genome populations: biological relevance—a review. Gene 40(1):1–8
Article Google Scholar
Domingo E, Sabo D, Taniguchi T, Weissmann G (1978) Nucleotide sequence heterogeneity of an RNA phage population. Cell 13:635–744
Article Google Scholar
Domingo-Calap P, Cuevas JM, Sanjuán R (2009) The fitness effects of random mutations in single-stranded DNA and RNA bacteriophages. PLoS Genet 5(11):e1000,742
Article Google Scholar
Drake JW (2012) A test of Kimura’s mutation-rate conjecture. In: Mothersill CE, Korogodina VL, Seymour CB (eds) Radiobiology and environmental security. Springer, Berlin, pp 13–18
Chapter Google Scholar
Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58:465–523
Article Google Scholar
Eigen M (1993) Viral quasispecies. Sci Am 269:42–49
Article Google Scholar
Eigen M, Schuster P (1979) The hypercycle. A principle of natural self-organization. Springer, Berlin
Google Scholar
Feller W (1968) An introduction to probability theory and its applications, vol 1, 3rd edn. Wiley, New York
MATH Google Scholar
Fiebig EW, Wright DJ, Rawal BD, Garrett PE, Schumacher RT, Peddada L, Heldebrant C, Smith R, Conrad A, Kleinman SH, Busch MP (2003) Dynamics of HIV viremia and antibody seroconversion in plasma donors: implications for diagnosis and staging of primary HIV infection. Aids 17(13):1871–1879
Article Google Scholar
Fortuna MA, Zaman L, Ofria C, Wagner A (2017) The genotype-phenotype map of an evolving digital organism. PLoS Comput Biol 13(2):e1005,414
Article Google Scholar
Furió V, Moya A, Sanjuán R (2005) The cost of replication fidelity in an RNA virus. Proc Natl Acad Sci U S A 102(29):10233–10237
Article Google Scholar
Gallant JE (2007) Making sense of blips. J Infect Dis 196(12):1729–1731
Article Google Scholar
Gupta V, Dixit NM (2015) Scaling law characterizing the dynamics of the transition of HIV-1 to error catastrophe. Phys Biol 12(5):054,001
Article Google Scholar
Harris TE (1963) The theory of branching processes. Springer, Berlin
Book MATH Google Scholar
Jagers P, Klebaner FC, Sagitov S (2007) On the path to extinction. Proc Natl Acad Sci U S A 104(15):6107–6111
Article MathSciNet MATH Google Scholar
Kesten H, Stigum BP (1966) Additional limit theorems for indecomposable multidimensional Galton–Watson processes. Ann Math Stat 37(6):1463–1481
Article MathSciNet MATH Google Scholar
Kesten H, Stigum BP (1966) A limit theorem for multidimensional Galton–Watson processes. Ann Math Stat 37(5):1211–1223
Article MathSciNet MATH Google Scholar
Kesten H, Stigum BP (1967) Limit theorems for decomposable multi-dimensional Galton–Watson processes. J Math Anal Appl 17:309–338
Article MathSciNet MATH Google Scholar
Kimmel M, Axelrod DE (2002) Branching processes in biology. Springer, New York
Book MATH Google Scholar
Kimura M, Maruyama T (1966) The mutational load with epistatic gene interactions in fitness. Genetics 54(6):1337
Google Scholar
Kurtz TG, Lyons R, Pemantle R, Peres Y (1994) A conceptual proof of the Kesten–Stigum theorem for multi-type branching processes. In: Athreya K, Jagers P (eds) Classical and modern branching processes, vol 84. Springer, New York, pp 181–185 IMA Vol. Math. Appl
Chapter MATH Google Scholar
Lambert A (2005) The branching process with logistic growth. Ann Appl Probab 15(2):1506–1535
Article MathSciNet MATH Google Scholar
Lee PK, Kieffer TL, Siliciano RF, Nettles RE (2006) HIV-1 viral load blips are of limited clinical significance. J Antimicrob Chemother 57(5):803–805
Article Google Scholar
Loeb LA, Essigmann JM, Kazazi F, Zhang J, Rose KD, Mullins JI (1999) Lethal mutagenesis of HIV with mutagenic nucleoside analogs. Proc Natl Acad Sci U S A 96:1492–1497
Article Google Scholar
Lotka AJ (1939) Théorie analytique des associations biologiques. Part II. analyse démographique avec application particuliere al’espece humaine. Actualités Scientifiques et Industrielles 780:123–136
MATH Google Scholar
Lynch M, Bürger R, Butcher D, Gabriel W (1993) The mutational meltdown in asexual populations. J Hered 84(5):339–344
Article Google Scholar
Lynch M, Gabriel W (1990) Mutation load and the survival of small populations. Evolution 44:1725–1737
Article Google Scholar
Manrubia SC, Lázaro E, Pérez-Mercader J, Escarmís C, Domingo E (2003) Fitness distributions in exponentially growing asexual populations. Phys Rev Lett 90(18):188,102
Article Google Scholar
Matuszewski S, Ormond L, Bank C, Jensen JD (2017) Two sides of the same coin: a population genetics perspective on lethal mutagenesis and mutational meltdown. Virus Evolut 3(1):vex004
Google Scholar
McMichael AJ, Borrow P, Tomaras GD, Goonetilleke N, Haynes BF (2010) The immune response during acute HIV-1 infection: clues for vaccine development. Nat Rev Immunol 10(1):11–23
Article Google Scholar
Mode CJ, Sleeman CK (2012) Stochastic processes in genetics and evolution: computer experiments in the quantification of mutation and selection. World Scientific, Singapore
Book Google Scholar
Mode CJ, Sleeman CK, Raj T (2013) On the inclusion of self regulating branching processes in the working paradigm of evolutionary and population genetics. Front Genet 4:11
Article Google Scholar
Nagaev AV (1967) On estimating the expected number of direct descendants of a particle in a branching process. Theory Probab Appl 12(2):314–320
Article MathSciNet MATH Google Scholar
Nettles RE, Kieffer TL (2006) Update on HIV-1 viral load blips. Curr Opin HIV AIDS 1(2):157–161
Google Scholar
Nettles RE, Kieffer TL, Kwon P, Monie D, Han Y, Parsons T, Cofrancesco J, Gallant JE, Quinn TC, Jackson B (2005) Intermittent HIV-1 viremia (blips) and drug resistance in patients receiving HAART. Jama 293(7):817–829
Article Google Scholar
Peris JB, Davis P, Cuevas JM, Nebot MR, Sanjuán R (2010) Distribution of fitness effects caused by single-nucleotide substitutions in bacteriophage F1. Genetics 185(2):603–609
Article Google Scholar
Rong L, Perelson AS (2009) Asymmetric division of activated latently infected cells may explain the decay kinetics of the HIV-1 latent reservoir and intermittent viral blips. Math Biosci 217(1):77–87
Article MathSciNet MATH Google Scholar
Rong L, Perelson AS (2009) Modeling HIV persistence, the latent reservoir, and viral blips. J Theor Biol 260(2):308–331
Article MathSciNet MATH Google Scholar
Sanjuán R, Moya A, Elena SF (2004) The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. Proc Natl Acad Sci U S A 101:8396–8401
Article Google Scholar
Schuster P, Swetina J (1988) Stationary mutant distributions and evolutionary optimization. Bull Math Biol 50(6):635–660
Article MathSciNet MATH Google Scholar
Servedio MR, Brandvain Y, Dhole S, Fitzpatrick CL, Goldberg EE, Stern CA, Cleve JV, Yeh DJ (2014) Not just a theory: the utility of mathematical models in evolutionary biology. PLoS Biol 12(12):e1002,017. https://doi.org/10.1371/journal.pbio.1002017
Article Google Scholar
Swetina J, Schuster P (1982) Self-replication with errors: a model for polynucleotide replication. Biophys Chem 16(4):329–345. https://doi.org/10.1016/0301-4622(82)87037-3
Article Google Scholar
Takeuchi N, Hogeweg P (2007) Error-threshold exists in fitness landscapes with lethal mutants. BMC Evolut Biol 7(1):15
Article Google Scholar
Tromas N, Elena SF (2010) The rate and spectrum of spontaneous mutations in a plant RNA virus. Genetics 185(3):983–989
Article Google Scholar
Watson HW, Galton F (1874) On the probability of the extinction of families. J Anthropol Inst Great Br Irel 4:138–144
Article Google Scholar
Wilke CO (2005) Quasispecies theory in the context of population genetics. BMC Evolut Biol 5(1):44
Article Google Scholar
Zhu Y, Yongky A, Yin J (2009) Growth of an RNA virus in single cells reveals a broad fitness distribution. Virology 385(1):39–46. https://doi.org/10.1016/j.virol.2008.10.031
Article Google Scholar

Download references

Acknowledgements

LG acknowledges the support of FAPESP through the Grant Number 14/13382-1. BG and DC received financial support from CAPES.

Author information

Authors and Affiliations

Programa de Pós-Graduação em Infectologia, Universidade Federal de São Paulo, São Paulo, SP, Brazil
Luiza Guimarães Fabreti, Diogo Castro & Bruno Gorzoni
Departamentos de Microbiologia, Imunologia, Parasitologia and Medicina, Laboratório de Retrovirologia, Universidade Federal de São Paulo, São Paulo, SP, Brazil
Luiz Mario Ramos Janini
Departamento de Informática em Saúde, Laboratório de Biocomplexidade e Genômica Evolutiva, Universidade Federal de São Paulo, São Paulo, SP, Brazil
Fernando Antoneli

Authors

Luiza Guimarães Fabreti
View author publications
You can also search for this author in PubMed Google Scholar
Diogo Castro
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Gorzoni
View author publications
You can also search for this author in PubMed Google Scholar
Luiz Mario Ramos Janini
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Antoneli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LG and DC contributed equally to this work. LMRJ and FA contributed equally to this work. Conceived the model and formulated the underlying theory: LMJR and FA. Implemented the software: LG, DC and BG. Simulated the model and analyzed the output: LG and DC. Wrote the paper: LMRJ and FA.

Corresponding author

Correspondence to Fernando Antoneli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Review of Multitype Branching Process Theory

A discrete-time multitype branching process with types or classes indexed by a nonnegative integer r ranging from 0 to R is described by a sequence of vector-valued random variables $\varvec{Z}_n=(Z_n^0,\ldots ,Z_n^R)$, ($n=0,1,\ldots $), where $Z_n^r$ is the number of particles of type or class r in the nth generation. The initial population is represented by a vector of nonnegative integers $\varvec{Z}_0$ (also called a multi-index) which is nonzero and non-random. The time evolution of the population is determined by a vector-valued discrete probability distribution $\varvec{\zeta }(\varvec{i})=\big (\zeta _r(\varvec{i})\big )$, defined on the set of multi-indices $\varvec{i}=(i^0,\ldots ,i^R)$, called the offspring distribution of the process, which is usually encoded as the coefficients of a vector-valued multivariate power series $\varvec{f}(\varvec{z})=\big (f_r(\varvec{z})\big )$, called probability generating function (PGF).

The mean matrix or the matrix of first moments$\varvec{M}=\{M_{ij}\}$ of a multitype branching process describes how the average number of particles in each type or class evolves in time and is defined by $M_{ij}=\mathbf {E}(Z_1^i|Z^j_0=1)$, where $Z^j_0=1$ is the abbreviation of $\varvec{Z}_0=(0,\ldots ,1,\ldots ,1)$. In terms of the probability generating function $\varvec{f}=(f_0,\ldots ,f_R)$ it is given by

$$\begin{aligned} M_{ij}=\dfrac{\partial f_j}{\partial z_i}(\varvec{s})\bigg |_{\varvec{s}=\varvec{1}} \end{aligned}$$

(3)

where $\varvec{1}=(1,1,\ldots ,1)$. Typically, the mean matrix $\varvec{M}$ is nonnegative, and hence it has a largest nonnegative eigenvalue. When the largest eigenvalue is positive, it coincides with the spectral radius of $\varvec{M}$ and it is called, following Kimmel and Axelrod (2002), the malthusian parameter$\mu $.

The vector of extinction probabilities of a multitype branching process, denoted by $\varvec{\gamma }=(\gamma _0,\ldots ,\gamma _R)$, where $0 \leqslant \gamma _r\leqslant 1$, is defined by the condition that $\gamma _r$ is the probability that the process eventually become extinct given that initially there was exactly one particle of class r.

The classification theorem of multitype branching processes states that there are only three possible regimes for a multitype branching process (Harris 1963; Athreya and Ney 1972; Kimmel and Axelrod 2002):

Super-critical::: If $\mu >1$ then $0\leqslant \gamma _r<1$ for all r and, with positive probability the population survives indefinitely.
Sub-critical::: If $\mu <1$ then $\gamma _r=1$ for all r and with probability 1 the population becomes extinct in finite time.
Critical::: If $\mu =1$ then $\gamma _r=1$ for all r and with probability 1 the population becomes extinct; however, the expected time to the extinction is infinite.

When a multitype branching process is super-critical, it is expected that, according to the “Malthusian Law of Growth” it will grow indefinitely at a geometric rate proportional to $\mu ^n$, where $\mu $ is the malthusian parameter, $\varvec{Z}_n \approx \mu ^n \,\varvec{W}_n$ for some bounded random vector $\varvec{W}_n$, when $n \rightarrow \infty $. The formalization of the above heuristic reasoning is given by the Kesten–Stigum limit theorem for super-critical multitype branching processes (see Kesten and Stigum 1966a, b, 1967). If $\varvec{W}_n=\varvec{Z}_n/\mu ^n$ then there exists a scalar random variable $W \ne 0$ such that, with probability one,

$$\begin{aligned} \lim _{n\rightarrow \infty } \varvec{W}_n = W \,\varvec{u} \end{aligned}$$

(4)

where $\varvec{u}$ is the right eigenvector corresponding to the malthusian parameter $\mu $ and

$$\begin{aligned} \mathbf {E}(W|\varvec{Z}_0)=\varvec{v}^{\mathrm {t}} \varvec{Z}_0 \end{aligned}$$

(5)

where $\varvec{v}$ is the left eigenvector corresponding to the malthusian parameter $\mu $. The vectors $\varvec{u}$ and $\varvec{v}$ may be normalized so that $\varvec{v}^{\mathrm {t}}\varvec{u}=1$ and $\varvec{1}^{\mathrm {t}}\varvec{u}=1$ where ${}^{\mathrm {t}}$ denotes the transpose of a vector. Moreover, under the assumption that $\varvec{M}$ is nonnegative [which is satisfied by the phenotypic model (18)], the right and left eigenvectors corresponding to the malthusian parameter are nonnegative.

The normalization of right eigenvector $\varvec{u}=(u_0,\ldots ,u_R)$ implies that $\sum _r u_r=1$, and therefore one has the “law of convergence of types” (see Kurtz et al. 1994)

$$\begin{aligned} \lim _{n\rightarrow \infty } \dfrac{\varvec{Z}_n}{|\varvec{Z}_n|} = \varvec{u} \,, \end{aligned}$$

(6)

where $|\varvec{Z}_n|=\sum _r Z_n^r$ is the total population at the nth generation and the equality holds almost surely. Equation (6) asserts that the asymptotic proportion of a replicative class r converges almost surely to the constant value $u_r$.

In particular, Eq. (6) implies that the malthusian parameter is the asymptotic relative growth rate of the population

$$\begin{aligned} \mu = \lim _{n\rightarrow \infty } \dfrac{|\varvec{Z}_{n}|}{|\varvec{Z}_{n-1}|} = \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \,\sum _{j=1}^{|\varvec{Z}_{n-1}|} \#[\,j\,] \end{aligned}$$

(7)

since $|\varvec{Z}_{n-1}|$ may be interpreted as the set of “parental particles” of the particles in the nth generation and $|\varvec{Z}_{n}|$ is the sum of the “progeny sizes” $\#[\,j\,]$ of the “parental particles” j from the previous generation.

Now consider the quantitative random variable $\rho $ defined on the set of classes $\{0,\ldots ,R\}$ and having probability distribution $(u_0,\ldots ,u_R)$, called the asymptotic distribution of classes. When the classes are indexed by their expectation values, the variable $\rho $ associates to a random particle its expected class

$$\begin{aligned} \mathbf {P}(\rho =r)=u_r \,. \end{aligned}$$

Therefore, one can define the average reproduction rate of the population as

$$\begin{aligned} \langle \rho \rangle = \sum _{r=0}^R r \, u_r \,. \end{aligned}$$

(8)

Using Eqs. (4), (5), (6) one can show that the average reproduction rate is equal to the malthusian parameter:

$$\begin{aligned} \langle \rho \rangle = \mu \,. \end{aligned}$$

(9)

The average population size at the nth generation is $|\langle \varvec{Z}_n \rangle | = \sum _{r=0}^R \langle Z^r_n \rangle $. Then for $n\rightarrow \infty $, Eq. (4) gives $|\langle \varvec{Z}_n \rangle | \approx \mu ^n |\langle \varvec{W}_n \rangle | \approx \mu ^n \langle W \rangle $ and so

$$\begin{aligned} \mu = \lim _{n\rightarrow \infty }\dfrac{|\langle \varvec{Z}_{n} \rangle |}{|\langle \varvec{Z}_{n-1} \rangle |} \end{aligned}$$

(10)

On the other hand, from the definition of mean matrix and its form (18), one has

$$\begin{aligned} |\langle \varvec{Z}_n \rangle | = |\varvec{M}\,\langle \varvec{Z}_{n-1} \rangle | =\sum _{r=0}^R r\,\langle Z^r_{n-1} \rangle \,. \end{aligned}$$

Now dividing by $|\langle \varvec{Z}_{n-1} \rangle |$ and taking the limit $n\rightarrow \infty $ gives

$$\begin{aligned} \mu = \lim _{n\rightarrow \infty }\dfrac{|\langle \varvec{Z}_{n} \rangle |}{|\langle \varvec{Z}_{n-1} \rangle |} = \lim _{n\rightarrow \infty }\sum _{r=0}^R r \,\dfrac{\langle Z^r_{n-1}\rangle }{|\langle \varvec{Z}_{n-1} \rangle |} = \sum _{r=0}^R r \, u_r = \langle \rho \rangle \end{aligned}$$

where here we used Eqs. (5) and (6) in the third equality from left to right.

In analogy with the characterization of the malthusian parameter as given by Eq. (7), one may define the asymptotic populational variance

$$\begin{aligned} \sigma ^2 = \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \, \sum _{j=1}^{|\varvec{Z}_{n-1}|} \#[\,j\,]^2 - \mu ^2 \end{aligned}$$

(11)

and in analogy with the mean reproduction rate, one may define the (squared) phenotypic diversity as

$$\begin{aligned} \sigma _{\rho }^2 = \langle \rho ^2 \rangle - \langle \rho \rangle ^2 \end{aligned}$$

(12)

By decomposing the sum in Eq. (11) according to the classes r, one obtains

$$\begin{aligned} \sum _{j=1}^{|\varvec{Z}_{n-1}|} \#[\,j\,]^2 = \sum _{r=0}^R \sum _{j_r=1}^{Z_{n-1}^r} \#[\,j_r\,]^2 \end{aligned}$$

where $j_r$ runs over the particles of class r for $r=0,\ldots ,R$ and $\#[\,j_r\,]$ are independent random variables assuming nonnegative values with probability distribution $t_r$, called fitness distribution of class r.

Denoting the variance of the fitness distribution $t_r$ by $\sigma ^2_{r}$, one may write the limit in Eq. (11) as

$$\begin{aligned} \sigma ^2= & {} \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \, \sum _{j=1}^{|\varvec{Z}_{n-1}|} \#[\,j\,]^2-\mu ^2 \\= & {} \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \,\sum _{r=0}^R \left[ Z^r_{n-1} \left( \dfrac{1}{Z^r_{n-1}} \sum _{j_r=1}^{Z_{n-1}^r}\#[\,j\,]^2 - r^2 \right) +Z^r_{n-1}\right] -\mu ^2 \\= & {} \lim _{n\rightarrow \infty } \dfrac{1}{|\varvec{Z}_{n-1}|} \,\sum _{r=0}^R (\sigma ^2_{r}+r^2)Z^r_{n-1}-\mu ^2 \end{aligned}$$

Then Eqs. (6), (9) and (12) give

$$\begin{aligned} \sigma ^2 = \sum _{r=0}^R (\sigma ^2_{r}+r^2 )\,u_r-\mu ^2 = \sum _{r=0}^R \sigma ^2_{r}\,u_r+\sigma ^2_{\rho } \end{aligned}$$

(13)

The difference between the asymptotic populational variance and the (squared) phenotypic diversity, called normalized populational variance, is the weighted average of the variances of the fitness distributions

$$\begin{aligned} \phi = \sigma ^2 - \sigma ^2_{\rho } = \sum _{r=0}^R \sigma ^2_{r}\,u_r \,. \end{aligned}$$

(14)

In particular, when the family of fitness distributions is the deterministic family the populational variance is exactly the phenotypic diversity (that is $\phi =0$). This is an expected result since the Delta distributions $t_r(k)=\delta _{rk}$ have zero variance, and hence the only source of fluctuation of the population size is due to its stratification into replicative classes, which is expressed by the phenotypic diversity.

B Mathematical Basis of the Phenotypic Model

Based on the general aspects of the phenomenon of viral replication described before, it is compelling to model it in terms of a branching process. At each replicative cycle, every parental particle in the replicative class r produces a random number of progeny particles that is independently drawn from the corresponding fitness distribution.

A fitness distribution is a member of a location-scale family of discrete probability distributions $t_r$ parameterized by the replicative classes ($r=0,\ldots ,R$) assuming nonnegative integer values and normalized so that the expectation value of $t_r$, defined as $\sum _{k} k\,t_r(k)$, is exactly r and $t_0(k)=\delta _{k0}$. Here $\delta _{kr}=1$ if $k=r$ and $\delta _{kr}=0$ if $k \ne r$. Therefore, each particle in the viral population is characterized by the mean value of its fitness distribution, called mean replicative capability. Viral particles with replicative capability equal to zero (0) do not generate progeny; viral particles with replicative capability one (1) generate one particle on average; viral particles with replicative capability two (2) generate two particles on average, and so on. Typical examples of location-scale families of discrete probability distributions that can be used as fitness distributions are:

(a)
The family of Deterministic (Delta) distributions: $t_r(k)=\delta _{kr}$.
(b)
The family of Poisson distributions: $t_r(k)=\mathrm {e}^{-r}\tfrac{r^k}{k!}$.

Note that in the first example, the replicative capability is completely concentrated on the mean value r – that is, the particles have deterministic fitness. On the other hand, in the second example the fitness is truly stochastic.

During the replication, each progeny particle always undergoes one of the following effects:

Deleterious effect::: the mean replication capability of the respective progeny particle decreases by one. Note that when the particle has capability of replication equal to 0, it will not produce any progeny at all.
Beneficial effect::: the replication capability of the respective progeny particle increases by one. If the mean replication capability of the parental particle is already the maximum allowed, then the mean replication capability of the respective progeny particles will be the same as the replicative capability of the parental particle.
Neutral effect::: the mean replication capability of the respective progeny particle remains the same as the mean replication capability of the parental particle.

To define which effect will occur during a replication event, probabilities d, b and c are associated, respectively, with the occurrence of deleterious, beneficial and neutral effects. The only constraints these numbers should satisfy are $0\leqslant d,b,c\leqslant 1$ and $b+c+d=1$. In the case of in vitro experiments with homogeneous cell populations, the probabilities c, d and b essentially refer to the occurrence of mutations.

The probability generating function (PGF) of the phenotypic model with $b=0$ and $t_r(k)=\delta _{kr}$ is (see Antoneli et al. (2013a, b) for details):

$$\begin{aligned} f_0(z_0,z_1,\ldots ,z_R)&= 1 n\nonumber \\ f_1(z_0,z_1,\ldots ,z_R)&= dz_0+cz_1 \nonumber \\ f_2(z_0,z_1,\ldots ,z_R)&= (dz_1+cz_2)^2 \nonumber \\&\vdots \nonumber \\ f_R(z_0,z_1,\ldots ,z_R)&= (dz_{R-1}+cz_R)^R \end{aligned}$$

(15)

Note that the functions $f_r(z_0,z_1,\ldots ,z_R)$ are polynomials whose coefficients are exactly the probabilities of the binomial distribution $\mathrm {binom}(k;r,1-d)$. The PGF in the case with general beneficial effects and with a general family of fitness distribution (which reduces to the previous PGF when $b=0$ and $t_r(k)=\delta _{kr}$) is given by.

$$\begin{aligned} f_0(z_0,z_1,\ldots ,z_R)&= 1 \nonumber \\ f_1(z_0,z_1,\ldots ,z_R)&= \sum _{k=0}^\infty \,t_1(k)\, (dz_0+cz_1+bz_2)^k \nonumber \\ f_2(z_0,z_1,\ldots ,z_R)&= \sum _{k=0}^\infty \,t_2(k)\, (dz_1+cz_2+bz_3)^k \nonumber \\&\vdots \nonumber \\ f_R(z_0,z_1,\ldots ,z_R)&= \sum _{k=0}^\infty \,t_R(k)\, (dz_{R-1}+(c+b)z_R)^k \end{aligned}$$

(16)

Note that in the last equation, the beneficial effect acts like the neutral effect. This is a kind of “consistency condition” ensuring that the populational replicative capability is, on average, upper bounded by R. Even though it is possible that a parental particle in the replicative classes R eventually has more than R progeny particles when $t_r$ is not deterministic, the average progeny size is always R.

Finally, it is easy to see that the PGF of the two-dimensional case of the phenotypic model with $b=0$ and $z_0=1$ (and ignoring $f_0$) reduces to

$$\begin{aligned} f(z)~=~\sum _{k=0}^\infty \,t(k)\, ((1-c)+cz)^k~=~\sum _{k=0}^\infty \,t(k)\, (1-c(1-z))^k \,. \end{aligned}$$

(17)

This is formally identical to the PFG of the single-type model proposed by [Demetrius et al. 1985, p. 255, eq. (49)] for the evolution of polynucleotides. In their formulation, $c=p^\nu $ is the probability that a given copy of a polynucleotide is exact, where the polymer has chain length of $\nu $ nucleotides and p is the probability of copying a single nucleotide correctly. The replication distributiont(k) provides the number of copies a polynucleotide yields before it is degraded by hydrolysis.

A remarkable property of the phenotypic model that was fully explored in Antoneli et al. (2013a, b) is the fact that when $b=0$ the phenotypic model is “exactly solvable” in a very specific sense.

It is straightforward form the generating function (16), using formula (3), that the matrix of the phenotypic model is given by

$$\begin{aligned} \varvec{M}=\begin{pmatrix} 0 &{}\quad d &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \ldots &{}\quad 0 \\ 0 &{}\quad c &{}\quad 2d &{}\quad 0 &{}\quad 0 &{}\quad \ldots &{}\quad 0 \\ 0 &{}\quad b &{}\quad 2c &{}\quad 3d &{}\quad 0 &{}\quad \ldots &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 2b &{}\quad 3c &{}\quad 4d &{}\quad \ldots &{}v 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 3b &{}\quad 4c &{}\quad \ldots &{}\quad 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{}\quad \vdots &{} \ddots &{}\quad Rd \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad (R-1)b &{}\quad R(c+b) \end{pmatrix} \,. \end{aligned}$$

(18)

Note that the mean matrix does depend on the fitness distributions $t_r$ only through their mean values, since $t_r$ are normalized to have the mean value r.

Assume for a moment that $b=0$ (hence $c=1-d$). Then the mean matrix becomes upper-triangular, and hence its eigenvalues are the diagonal entries $\lambda _r=r(1-d)$ and the malthusian parameter $\mu $ is the largest eigenvalue $\lambda _R$:

$$\begin{aligned} \mu =R(1-d) \,. \end{aligned}$$

(19)

Now suppose that $b \ne 0$ is small compared to d and c (hence $c=1-d-b$). Then spectral perturbation theory allows one to write the malthusian parameter $\mu $ as a power series

$$\begin{aligned} \mu = \mu _0 + \mu _1 b + \mu _2 b^2 + \cdots \end{aligned}$$

where $\mu _0$ is the malthusian parameter for the case $b=0$ and $\mu _j$ are functions of the form $R\,\tilde{m}_j(d,R)$. A lengthy calculation (see Antoneli et al. 2013b) gives the following result:

$$\begin{aligned} \mu = R \left( (1-d) + (R-1)\dfrac{d}{1-d}\,b + \varvec{O}(b^2)\right) \,. \end{aligned}$$

(20)

Let us return to the case $b=0$ and consider the eigenvectors corresponding to the malthusian parameter $\mu $. The right eigenvector $\varvec{u}=(u_0,\ldots ,u_R)$ and the left eigenvector $\varvec{v}=(v_0,\ldots ,v_R)$ may be normalized so that $\varvec{v}^{\mathrm {t}}\varvec{u}=1$ and $\varvec{1}^{\mathrm {t}}\varvec{u}=1$, where ${}^{\mathrm {t}}$ denotes the transpose of a vector. In Antoneli et al. (2013b), it is shown that the normalized right eigenvector $\varvec{u}=(u_0,\ldots ,u_R)$ is given by

$$\begin{aligned} u_r=\left( {\begin{array}{c}R\\ r\end{array}}\right) \, (1-d)^r \, d^{R-r} \,. \end{aligned}$$

(21)

The fact that $\varvec{u}$ is a binomial distribution is not accidental. Indeed, it can be shown that $\varvec{u}$ is the probability distribution of a quantitative random variable $\rho $ defined on the set of replicative classes $\{0,\ldots ,R\}$, called the asymptotic distribution of classes, such that $u_r=\mathrm {binom}(r;R,1-d)$ gives the limiting proportion of particles in the rth replicative class. Finally, when $b \ne 0$ is small, spectral perturbation theory ensures that

$$\begin{aligned} u_r=\left( {\begin{array}{c}R\\ r\end{array}}\right) \, (1-d)^r \, d^{R-r} + \varvec{O}(b) \,. \end{aligned}$$

(22)

The phenotypic model is completely specified by the choice of the two probabilities b and d (since $c=1-b-d$), the maximum replicative capability R and a choice of a location-scale family of fitness distributions. Independent of the choice of family of fitness distributions, the parameter space of the model is the set $\triangle ^2 \times \{R\in \mathbb {N}:R \geqslant 1\}$, where $\triangle ^2=\{(b,d)\in [0,1]^2: b+d \leqslant 1\}$ is the two-dimensional simplex (see Fig. 10).

In this parameter space, one can consider the critical curves$\mu (b,d,R)=1$, where $\mu (b,d,R)$ is the malthusian parameter as a function of the parameters of the phenotypic model. For each fixed R, the corresponding critical curve is independent of the fitness distributions and represents the parameter values (b, d) such that the branching process is critical. Moreover, each curve splits the simplex into two regions representing the parameter values where the branching process is super-critical (above the curve) and sub-critical (below the curve).

One of the main results of Antoneli et al. (2013b) is a proof of the lethal mutagenesis criterion (Bull et al. 2007) for the phenotypic model, provided one assumes that all fitness effects are of a purely mutational nature. Recall that (Bull et al. 2007) assumes that all mutations are either neutral or deleterious and consider the mutation rate$U=U_d+U_c$, where the component $U_c$ comprises the purely neutral mutations and the component $U_d$ comprises the mutations with a deleterious fitness effect. Furthermore, $R_{\mathrm {max}}$ denotes the maximum replicative capability among all particles in the viral population. The lethal mutagenesis criterion proposed by Bull et al. (2007) states that a sufficient condition for extinction is

$$\begin{aligned} R_{\mathrm {max}}\,\mathrm {e}^{-U_d} < 1 \,. \end{aligned}$$

(23)

According to (Bull et al. 2007, 2008), $\mathrm {e}^{-U_d}$ is both the mean fitness level and also the fraction of offspring with no non-neutral mutations. Moreover, in the absence of beneficial mutations and epistasis (Kimura and Maruyama 1966) the only type of non-neutral mutation are the deleterious mutations. Therefore, in terms of fitness effects, the probability $\mathrm {e}^{-U_d}$ corresponds to $1-d=c$. Since the evolution of the mean matrix depends only on the expected values of the fitness distribution $t_r$, it follows that $R_{\mathrm {max}}$ corresponds to R. That is, the lethal mutagenesis criterion of (23) is formally equivalent to extinction criterion

$$\begin{aligned} R(1-d) < 1 \end{aligned}$$

(24)

which is exactly the condition for the phenotypic model to become sub-critical. Formula (20) for the malthusian parameter provides a generalization of the extinction criterion (24) without the assumption that that all effects are either neutral or deleterious. If $b>0$ is sufficiently small (up to order $\varvec{O}(b^2)$) and

$$\begin{aligned} R \left( (1-d) + (R-1)\dfrac{bd}{1-d}\right) < 1 \end{aligned}$$

(25)

then, with probability one, the population becomes extinct in finite time.

On the other hand, a deeper exploration of the implications of nonzero beneficial effects allowed for the discovery of a non-extinction criterion. If $b>0$ is sufficiently small (up to order $\varvec{O}(b^2)$), R is sufficiently large ($R \geqslant 10$ is enough) and

$$\begin{aligned} R^3 \, b > 1 \end{aligned}$$

(26)

then, asymptotically almost surely, the population cannot become extinct by increasing the deleterious probability d toward its maximum value $1-b$ (see Antoneli et al. 2013b for details). In other words, a small increase in the beneficial probability may have a drastic effect on the extinction probabilities, possibly rendering the population impervious to become extinct by lethal mutagenesis (i.e., by the increase in deleterious effects).

In the theory of multitype branching processes, there are several variations as follows: continuous time, age dependent, self-regulated, etc. (see Athreya and Ney 1972; Harris 1963; Kimmel and Axelrod 2002). The implementation of a variation of the theory of multitype branching process accounting for the notions of evolutionary entropy and directionality theory (see Dietz 2005; Demetrius 2013) could be useful for studies on viral evolution. In this case, the malthusian parameter $\mu $, which is the dominant eigenvalue of the mean matrix, could be expressed as the sum of two terms

$$\begin{aligned} \mu = H + \varPhi \,. \end{aligned}$$

The quantity H is called evolutionary entropy and $\varPhi $ is called the reproductive potential (Demetrius 2013). An interesting direction to follow would be to develop an extinction criterion based on evolutionary entropy instead of the malthusian parameter.

C The Deterministic Selection Equation

According to (Demetrius et al. 1985; Demetrius 1985, 1987), one may associate to a multitype branching process a system of difference (or ordinary differential) equations, called selection equations, on the space of discrete probability distributions $\triangle ^{R+1}=\{\varvec{p}\in \mathbb {R}^{R+1}:p_j\geqslant 0;\sum _{j}p_j=1\}$ over the finite state set $\{0,\ldots ,R\}$. Given a discrete multitype branching process $\varvec{Z}_n$, then the expectation values $\langle \varvec{Z}_n\rangle $ satisfy $\langle \varvec{Z}_n\rangle =\varvec{M}^n\varvec{Z}_0$, with $\varvec{M}$ being the mean matrix of $\varvec{Z}_n$. Hence $\varvec{Z}_n$ is given by iteration of the difference equation $\varvec{z}_n= \varvec{M}\varvec{z}_{n-1}$. This yields a discrete-time selection equation by normalizing the difference equation, thereby obtaining

$$\begin{aligned} \varvec{x}_n = \dfrac{1}{\varvec{1}^\mathrm {t}\varvec{M}\varvec{x}_{n-1}}\varvec{M}\varvec{x}_{n-1} \end{aligned}$$

(27)

where $\varvec{1}=(1,\ldots ,1)$. Then, passing (27) to continuous time one obtains a continuous-time selection equation

$$\begin{aligned} \dot{\varvec{x}} = [\varvec{M}\varvec{x}-\varvec{x}(\varvec{1}^\mathrm {t}\varvec{M}\varvec{x})] \dfrac{1}{\varvec{1}^\mathrm {t}\varvec{M}\varvec{x}} \,. \end{aligned}$$

(28)

Multiplying the right hand side of Eq. (28) with the factor $\varvec{1}^\mathrm {t}\varvec{M}\varvec{x}$, which is always strictly positive on $\triangle ^{R+1}$, corresponds to a change in velocity (re-scaling time) and so, the solutions of (28) are the same as the solutions of

$$\begin{aligned} \dot{\varvec{x}} = \varvec{M}\varvec{x}-\varvec{x}(\varvec{1}^\mathrm {t}\varvec{M}\varvec{x}) \end{aligned}$$

(29)

It follows from general considerations (see Demetrius et al. 1985; Demetrius 1985, 1987) that Eq. (29) has a unique global stable equilibrium on $\triangle ^{R+1}$ given by the normalized right eigenvector $\varvec{u}$ of $\varvec{M}$ corresponding to its largest eigenvalue $\mu $. In this sense, the deterministic selection equation yields a description of the evolution of the normalized mean values of the corresponding stochastic model, thus defining a mean field (macroscopis) dynamics representing the infinite population limit of the branching process.

D The Power Law Distribution Family

It is typical to parameterize power law distributions by the exponents, which measures the “weight of the tail” of the distribution. However, we need to have a location-scale parameterized family in order to impose the same normalization as we have done for the other types of distributions. Therefore, we define the power law distribution with mean valuer by

$$\begin{aligned} \mathfrak {z}_r(k) = \frac{(k-1)^{s(r)}}{\zeta (s(r))} \end{aligned}$$

for $k=0,1,\ldots ,\infty $ and $r \geqslant 1$, where $\zeta (s)$ is the Riemann zeta function, defined for $s>1$, by

$$\begin{aligned} \zeta (s) = \sum _{n=1}^{\infty } \frac{1}{n^s} \end{aligned}$$

and the function s(r) is given by the inverse function of

$$\begin{aligned} r = \varphi (s) = \frac{\zeta (s-1)}{\zeta (s)}-1 \,. \end{aligned}$$

Namely, $s=\varphi ^{-1}(r)$ for $r\geqslant 1$ and hence when $1 \leqslant r < \infty $ the exponent s satisfies $3<s<2$. Moreover, the Laurent series expansion for $r\rightarrow \infty $ ($s \rightarrow 2$) is given by:

$$\begin{aligned} s(r) \approx 2 + \frac{6}{\pi ^2(1+r-C)} \,. \end{aligned}$$

(30)

The constant C in the previous formula is given by $C = [6 \gamma \pi ^2 - 36\,\zeta '(2)]/\pi ^4 \approx 0.6974$, where $\gamma $ is Euler’s constant and $\zeta '(2)$ is the derivative of $\zeta (s)$ evaluated at 2. Observe that when the mean value $r \geqslant 1$, the exponent $s<3$, and so the variance of $\mathfrak {z}_r(k)$ is infinite.

The implementation of the pseudo-random generation of samples from the distribution $\mathfrak {z}_r(k)$ in the ENVELOPE program is based on the algorithm of Devroye (1986) for the Zipf distribution on the positive integers, using formula 30 for the computation of the exponent s given the mean value r. Pseudo-random generation for the remaining fitness distributions was implemented using the standard library of C++ programing language (this library requires C++ (2011) or superior).

E Main Routines of the ENVELOPE Program

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fabreti, L.G., Castro, D., Gorzoni, B. et al. Stochastic Modeling and Simulation of Viral Evolution. Bull Math Biol 81, 1031–1069 (2019). https://doi.org/10.1007/s11538-018-00550-4

Download citation

Received: 22 December 2017
Accepted: 03 December 2018
Published: 14 December 2018
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s11538-018-00550-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic Modeling and Simulation of Viral Evolution

Abstract

Access this article

Similar content being viewed by others