Abstract
Reaction–diffusion models describing the movement, reproduction and death of individuals within a population are key mathematical modelling tools with widespread applications in mathematical biology. A diverse range of such continuum models have been applied in various biological contexts by choosing different flux and source terms in the reaction–diffusion framework. For example, to describe the collective spreading of cell populations, the flux term may be chosen to reflect various movement mechanisms, such as random motion (diffusion), adhesion, haptotaxis, chemokinesis and chemotaxis. The choice of flux terms in specific applications, such as wound healing, is usually made heuristically, and rarely it is tested quantitatively against detailed cell density data. More generally, in mathematical biology, the questions of model validation and model selection have not received the same attention as the questions of model development and model analysis. Many studies do not consider model validation or model selection, and those that do often base the selection of the model on residual error criteria after model calibration is performed using nonlinear regression techniques. In this work, we present a model selection case study, in the context of cell invasion, with a very detailed experimental data set. Using Bayesian analysis and information criteria, we demonstrate that model selection and model validation should account for both residual errors and model complexity. These considerations are often overlooked in the mathematical biology literature. The results we present here provide a straightforward methodology that can be used to guide model selection across a range of applications. Furthermore, the case study we present provides a clear example where neglecting the role of model complexity can give rise to misleading outcomes.
Similar content being viewed by others
Notes
Code available from GitHub https://github.com/ProfMJSimpson/Warne2019_BulletinofMathematicalBiology.
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723. https://doi.org/10.1109/TAC.1974.1100705
Armstrong NJ, Painter KJ, Sherratt JA (2009) Adding adhesion to a chemical signaling model for somite formation. Bull Math Biol 71:1–24. https://doi.org/10.1007/s11538-008-9350-1
Barenblatt GI (2003) Scaling. Cambridge University Press, Cambridge
Berger J (2006) The case for objective Bayesian analysis. Bayesian Anal 1:385–402. https://doi.org/10.1214/06-BA115
Bianchi A, Painter KJ, Sherratt JA (2016) Spatio-temporal models of lymphangiogenesis in wound healing. Bull Math Biol 78:1904–1941. https://doi.org/10.1007/s11538-016-0205-x
Box GEP (1976) Science and statistics. J Am Stat Assoc 71:791–799. https://doi.org/10.1080/01621459.1976.10480949
Browning AP, McCue SW, Simpson MJ (2017) A Bayesian computational approach to explore the optimal duration of a cell proliferation assay. Bull Math Biol 79:1888–1906. https://doi.org/10.1007/s11538-017-0311-4
Browning AP, Haridas P, Simpson MJ (2018) A Bayesian sequential learning framework to parameterise continuum models of melanoma invasion into human skin. Bull Math Biol. https://doi.org/10.1007/s11538-018-0532-1
Cai AQ, Landman KA, Hughes BD (2007) Multi-scale modeling of a wound-healing cell migration assay. J Theor Biol 245:576–594. https://doi.org/10.1016/j.jtbi.2006.10.024
Clyde M, George EI (2004) Model uncertainty. Stat Sci 19:81–94. https://doi.org/10.1214/088342304000000035
Cohen Y, Galiano G (2013) Evolutionary distributions and competition by way of reaction–diffusion and by way of convolution. Bull Math Biol 75:2305–2323. https://doi.org/10.1007/s11538-013-9890-x
Consonni G, Fouskakis D, Liseo B, Ntzoufras I (2018) Prior distributions for objective Bayesian analysis. Bayesian Anal 13:627–679. https://doi.org/10.1214/18-BA1103
Crank J (1975) The mathematics of diffusion. Oxford University Press, Oxford
Drovandi CC, Pettitt AN (2013) Bayesian experimental design for models with intractable likelihoods. Biometrics 69:937–948. https://doi.org/10.1111/biom.12081
Edelstein-Keshet L (2005) Mathematical models in biology, 6th edn. SIAM, Philadelphia
Efron B (1986) Why isn’t everyone a Bayesian? Am Stat 40:1–5. https://doi.org/10.1080/00031305.1986.10475342
Flegg JA, McElwain DLS, Byrne HM, Turner IW (2009) A three species model to simulate application of hyperbaric oxygen therapy to chronic wounds. PLOS Comput Biol 5:e1000451. https://doi.org/10.1371/journal.pcbi.1000451
Flegg JA, Byrne HM, McElwain DLS (2010) Mathematical model of hyperbaric oxygen therapy applied to chronic diabetic wounds. Bull Math Biol 72:1867–1891. https://doi.org/10.1007/s11538-010-9514-7
Fortelius M, Geritz S, Gyllenberg M, Toivonen J (2015) Adaptive dynamics on an environmental gradient that changes over a geological time-scale. J Theor Biol 376:91–104. https://doi.org/10.1016/j.jtbi.2015.03.036
Gelman A (2008a) Objections to Bayesian statistics. Bayesian Anal 3:445–450. https://doi.org/10.1214/08-BA318
Gelman A (2008b) Rejoinder. Bayesian Anal 3:467–478. https://doi.org/10.1214/08-BA318REJ
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Chapman & Hall, London
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2014) Bayesian data analysis, 3rd edn. Chapman & Hall, London
Gerlee P (2013) The model muddle: in search of tumor growth laws. Cancer Res 73:2407–2411. https://doi.org/10.1158/0008-5472.CAN-12-4355
Gurney W, Nisbet R (1975) The regulation of inhomogeneous populations. J Theor Biol 52:441–457. https://doi.org/10.1016/0022-5193(75)90011-9
Haridas P, McGovern JA, McElwain DLS, Simpson MJ (2017) Quantitative comparison of the spreading and invasion of radial growth phase and metastatic melanoma cells in a three-dimensional human skin equivalent model. PeerJ 5:e3754. https://doi.org/10.7717/peerj.3754
Harris S (2004) Fisher equation with density-dependent diffusion: special solutions. J Phys A Math Gen 37:6267. https://doi.org/10.1088/0305-4470/37/24/005
Jackson PR, Juliano J, Hawkins-Daarud A, Rockne RC, Swanson KR (2015) Patient-specific mathematical neuro-oncology: using a simple proliferation and invasion tumor model to inform clinical practice. Bull Math Biol 77:846–856. https://doi.org/10.1007/s11538-015-0067-7
Jin W, Penington CJ, McCue SW, Simpson MJ (2016a) Stochastic simulation tools and continuum models for describing two-dimensional collective cell spreading with universal growth functions. Phys Biol 13:056003. https://doi.org/10.1088/1478-3975/13/5/056003
Jin W, Shah ET, Penington CJ, McCue SW, Chopin LK, Simpson MJ (2016b) Reproducibility of scratch assays is affected by the initial degree of confluence: experiments, modelling and model selection. J Theor Biol 390:136–145. https://doi.org/10.1016/j.jtbi.2015.10.040
Jin W, Shah ET, Penington CJ, McCue SW, Maini PK, Simpson MJ (2017) Logistic proliferation of cells in scratch assays is delayed. Bull Math Biol 79:1028–1050. https://doi.org/10.1007/s11538-017-0267-4
Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends Ecol Evol 19:101–108. https://doi.org/10.1016/j.tree.2003.10.013
Johnston ST, Shah ET, Chopin LK, McElwain DLS, Simpson MJ (2015) Estimating cell diffusivity and cell proliferation rate by interpreting IncuCyte ZOOM™ assay data using the Fisher–Kolmogorov model. BMC Sys Biol 9:38. https://doi.org/10.1186/s12918-015-0182-y
Johnston ST, Ross JV, Binder BJ, McElwain DLS, Haridas P, Simpson MJ (2016) Quantifying the effect of experimental design choices for in vitro scratch assays. J Theor Biol 400:19–31. https://doi.org/10.1016/j.jtbi.2016.04.012
Kass RE, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 91:1343–1370. https://doi.org/10.2307/2291752
King JR, McCabe PM (2003) On the Fisher–KPP equation with fast nonlinear diffusion. P R Soc Lond A Mat 459:2529–2546. https://doi.org/10.1098/rspa.2003.1134
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86. https://doi.org/10.1214/aoms/1177729694
Lambert B (2018) A student’s guide to Bayesian statistics, 1st edn. Sage Publications, Thousand Oaks
Lambert B, MacLean AL, Fletcher AG, Combes AN, Little MH, Byrne HM (2018) Bayesian inference of agent-based models: a tool for studying kidney branching morphogenesis. J Math Biol 76:1673–1697. https://doi.org/10.1007/s00285-018-1208-z
Liang CC, Park A, Guan JL (2007) In vitro scratch assay: a convenient and inexpensive method for analysis of cell migration in vitro. Nat Protoc 2:329–333. https://doi.org/10.1038/nprot.2007.30
Liepe J, Filippi S, Komorowski M, Stumpf MPH (2013) Maximizing the information content of experiments in systems biology. PLOS Comput Biol 9:e1002888. https://doi.org/10.1371/journal.pcbi.1002888
Maini P, McElwain DS, Leavesley D (2004a) Travelling waves in a wound healing assay. Appl Math Lett 17:575–580. https://doi.org/10.1016/S0893-9659(04)90128-0
Maini P, McElwain DS, Leavesley D (2004b) Traveling wave model to interpret a wound-healing cell migration assay for human peritoneal mesothelial cells. Tissue Eng 10:475–482. https://doi.org/10.1089/107632704323061834
Marchant BP, Norbury J, Sherratt JA (2001) Travelling wave solutions to a haptotaxis-dominated model of malignant invasion. Nonlinearity 14:1653–1671. https://doi.org/10.1088/0951-7715/14/6/313
Marjoram P, Molitor J, Plagnol V, Tavaré S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA 100:15324–15328. https://doi.org/10.1073/pnas.0306899100
Matsiaka OM, Baker RE, Shah ET, Simpson MJ (2018) Mechanistic and experimental models of cell migration reveal the importance of intercellular interactions in cell invasion. bioRxiv preprint https://doi.org/10.1101/391557
Murray JD (2002) Mathematical biology: I. An introduction. Springer, New York
Nardini JT, Chapnick DA, Liu X, Bortz DM (2016) Modeling keratinocyte wound healing dynamics: cell–cell adhesion promotes sustained collective migration. J Theor Biol 400:103–117. https://doi.org/10.1016/j.jtbi.2016.04.015
Parker A, Simpson MJ, Baker RE (2018) The impact of experimental design choices on parameter inference for models of growing cell colonies. R Soc Open Sci 5:180384. https://doi.org/10.1098/rsos.180384
Pooley CM, Marion G (2018) Bayesian model evidence as a practical alternative to deviance information criterion. R Soc Open Sci 5:171519. https://doi.org/10.1098/rsos.171519
Ryan EG, Drovandi CC, McGree JM, Pettitt AN (2016) A review of modern computational algorithms for Bayesian optimal design. Int Stat Rev 84:128–154. https://doi.org/10.1111/insr.12107
Sarapata EA, de Pillis LG (2014) A comparison and catalog of intrinsic tumor growth models. Bull Math Biol 76:2010–2024. https://doi.org/10.1007/s11538-014-9986-y
Savla U, Olson LE, Waters CM (2004) Mathematical modeling of airway epithelial wound closure during cyclic mechanical strain. J Appl Physiol 96:566–574. https://doi.org/10.1152/japplphysiol.00510.2003
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464. https://doi.org/10.1214/aos/1176344136
Sengers BG, Please CP, Oreffo RO (2007) Experimental characterization and computational modelling of two-dimensional cell spreading for skeletal regeneration. J R Soc Interface 4:1107–1117. https://doi.org/10.1098/rsif.2007.0233
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Sherratt JA (2015) Using wavelength and slope to infer the historical origin of semiarid vegetation bands. Proc Natl Acad Sci USA 112:4202–4207. https://doi.org/10.1073/pnas.1420171112
Sherratt JA (2016) When does colonisation of a semi-arid hillslope generate vegetation patterns? J Math Biol 73:199–226. https://doi.org/10.1007/s00285-015-0942-8
Sherratt JA, Murray JD (1990) Models of epidermal wound healing. P R Soc Lond B Bio 241:29–36. https://doi.org/10.1098/rspb.1990.0061
Silk D, Kirk PDW, Barnes CP, Toni T, Stumpf MPH (2014) Model selection in systems biology depends on experimental design. PLOS Comput Biol 10:e1003650. https://doi.org/10.1371/journal.pcbi.1003650
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman & Hall, London
Simpson MJ, Landman KA, Hughes BD, Newgreen DF (2006) Looking inside an invasion wave of cells using continuum models: proliferation is the key. J Theor Biol 243:343–360. https://doi.org/10.1016/j.jtbi.2006.06.021
Simpson MJ, Zhang DC, Mariani M, Landman KA, Newgreen DF (2007) Cell proliferation drives neural crest cell invasion of the intestine. Dev Biol 302:553–568. https://doi.org/10.1016/j.ydbio.2006.10.017
Simpson MJ, Baker RE, McCue SW (2011) Models of collective cell spreading with variable cell aspect ratio: a motivation for degenerate diffusion models. Phys Rev E 83:021901. https://doi.org/10.1103/PhysRevE.83.021901
Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci USA 104:1760–1765. https://doi.org/10.1073/pnas.0607208104
Sisson SA, Fan Y, Beaumont M (2018) Handbook of approximate Bayesian computation, 1st edn. Chapman & Hall, London
Skellam JG (1951) Random dispersal in theoretical populations. Biometrika 38:196–218. https://doi.org/10.2307/2332328
Slezak F, Diego Surez C, Cecchi GA, Marshall G, Stolovitzky G (2010) When the optimal is not the best: parameter estimation in complex biological models. PLOS ONE 5:e13283. https://doi.org/10.1371/journal.pone.0013283
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc B 64:583–639. https://doi.org/10.1111/1467-9868.00353
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2014) The deviance information criterion: 12 years on. J R Stat Soc B 76:485–493. https://doi.org/10.1111/rssb.12062
Stoica P, Selen Y (2004) Model-order selection: a review of information criterion rules. IEEE Signal Proc Mag 21:36–47. https://doi.org/10.1109/MSP.2004.1311138
Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C (2013) Approximate Bayesian computation. PLOS Comput Biol 9:e1002803. https://doi.org/10.1371/journal.pcbi.1002803
Swanson KR, Alvord EC, Murray JD (2002) Virtual brain tumours (gliomas) enhance the reality of medical imaging and highlight inadequacies of current therapy. Br J Cancer 86:14–18. https://doi.org/10.1038/sj.bjc.6600021
Swanson KR, Bridge C, Murray JD, Alvord EC (2003) Virtual and real brain tumors: Using mathematical modeling to quantify glioma growth and invasion. J Neurol Sci 216:1–10. https://doi.org/10.1016/j.jns.2003.06.001
Tsoularis A, Wallace J (2002) Analysis of logistic growth models. Math Biosci 179:21–55. https://doi.org/10.1016/S0025-5564(02)00096-2
Vanlier J, Tiemann CA, Hilbers PAJ, van Riel NAW (2012) A Bayesian approach to targeted experiment design. Bioinformatics 28:1136–1142. https://doi.org/10.1093/bioinformatics/bts092
Vittadello ST, McCue SW, Gunasingh G, Haass NK, Simpson MJ (2018) Mathematical models for cell migration with real-time cell cycle dynamics. Biophys J 114:1241–1253. https://doi.org/10.1016/j.bpj.2017.12.041
Warne DJ, Baker RE, Simpson MJ (2017) Optimal quantification of contact inhibition in cell populations. Biophys J 113:1920–1924. https://doi.org/10.1016/j.bpj.2017.09.016
Warne DJ, Baker RE, Simpson MJ (2018) Multilevel rejection sampling for approximate Bayesian computation. Comput Stat Data Anal 124:71–86. https://doi.org/10.1016/j.csda.2018.02.009
Warne DJ, Baker RE, Simpson MJ (2019) Simulation and inference algorithms for stochastic biochemical reaction networks: form basic concepts to state-of-the-art. J R Soc Interface. https://doi.org/10.1098/rsif.2018.0943
Wilkinson RD (2013) Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat Appl Genet Mol 12:129–141. https://doi.org/10.1515/sagmb-2013-0010
Witelski TP (1995) Merging traveling waves for the Porous-Fisher’s equation. Appl Math Lett 8:57–62. https://doi.org/10.1016/0893-9659(95)00047-T
Yang Y (2005) Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92:937–950. https://doi.org/10.2307/20441246
Acknowledgements
This work is supported by the Australian Research Council (DP170100474). Ruth E. Baker is a Royal Society Wolfson Research Merit Award holder and a Leverhulme Research Fellow, and also acknowledges the Biotechnology and Biological Sciences Research Council for funding via Grant No. BB/R000816/1. Computational resources were provided by the eResearch Office, Queensland University of Technology. We thank the three referees for their insightful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Analysis of Wave Front Concavity
The Generalised Porous Fisher model (Eq. (4) in the main text) with \(\lambda = 0\), in one dimension, is
where \(D_0\) is the free diffusivity and K the cell carrying capacity density. For the initial condition \(C(x,0) = C_0\delta (x)\), Eq. (16) has an exact solution,
where \(d_0 = C_0 \varGamma (1/r + 3/2)/(K\sqrt{\pi }\varGamma (1/r + 1))\), \(t_0 = d_0^2 r /(2D_0(r+2))\), \(h(t) = (t/t_0)^{1/(r + 2)}\) and \(\varGamma (x)\) is the Gamma function. This solution, often called the source solution for the porous media equation, has compact support, \(x \in [-d_0 h(t),d_0h(t)]\). Here, \(|x| = d_0 h(t)\) are the contact points. This solution is very different to the source solution for the linear diffusion equation, \(r=0\), which is a Gaussian function without compact support (Barenblatt 2003; Crank 1975).
Without loss of generality, we now only consider the positive real line \(x \ge 0\). The cell density is always decreasing as we approach the contact point, that is, \(\partial C /\partial x < 0\) for \(0< x < d_0 h(t)\). Specifically, we have
From Eq. (17), three different front properties are possible. As \(x \rightarrow d_0 h(t)\), we observe: (i) a sharp decreasing function with non-negative gradient, for \(0< r < 1\), as in Fig. 2e–h; (ii) a sharp front with finite negative slope, for \(r = 1\), as in Fig. 2i–l, with \(\partial C /\partial x \rightarrow -2K/(d_0^2h(t)^3)\); and (iii) a sharp front with unbounded negative slope, for \(r > 1\), as in Fig. 2m–p, with \(\partial C /\partial x \rightarrow -\infty \).
To explore the concavity of the density profile, C(x, t), at the contact point, it is sufficient to show how the sign of \(\partial ^2 C/\partial x^2\) at the contact point depends on r. The second derivative with respect to x, for \(r > 0\), is
We have, for \(0< r < 1\), that \(\partial ^2 C/\partial x^2 > 0\) as \(x \rightarrow d_0 h(t)\). For \(r \ge 1\), \(\partial ^2C/\partial x^2 < 0\) as \(x \rightarrow d_0 h(t)\). Hence, at the contact point, \(x = d_0 h(t)\), the solution is concave down for \(r \ge 1\) and concave up otherwise.
Appendix B: Numerical Scheme
Here we describe our numerical scheme for the computational solution to the following reaction–diffusion equation:
with the initial condition,
and boundary conditions,
Consider N points in space, \(\{x_i\}_{i=1}^N\), with \(x_1 = 0\), \(x_N = L\) and \(\varDelta x = x_{i+1} - x_{i}\) for all \(i = [1,2,\ldots , N]\). Similarly, define M temporal points, \(\{t_j\}_{j=1}^M\), with \(t_1 = 0\), \(t_M = T\) and \(\varDelta t = t_{j+1} - t_{j}\) for all \(j = [1,2, \ldots , T]\). Next, define the notation, \(C_{i+k} = C(x_i + k\varDelta x,t)\), and \(C_{i+k}^{j+s}~=~C(x_i + k \varDelta x, t_j + s \varDelta t)\).
Let \(J(C) = - D(C)\partial C/ \partial x\) and substitute into Eq. (18) to yield
At the ith point, apply a first-order central difference to \(\partial J / \partial x\) with step \(\varDelta x/2\). The result is the system of ODEs
Similarly, a first-order central difference is applied to \(J(C_{i+1/2})\) and \(J(C_{i-1/2})\) using the step \(\varDelta x/2\) yields
It is important to note that we will only obtain a solution for \(C_{i+k}\) at integer values of k; therefore, the evaluation of the diffusion terms in Eqs. (20) and (21) cannot be directly computed since \(k = \pm 1/2\). We thus approximate with an averaging scheme,
After substitution of Eqs. (20), (21), (22) and (23) into Eq. (19), we have the coupled system of nonlinear ODEs defined in terms of our spatial discretisation,
The no-flux boundaries are enforced using first-order forward differences
where \(C_0\) and \(C_{N+1}\) represent the solution at “ghost nodes” that are not a part of the domain.
The ODEs are discretised in time using a first-order backward difference method leading to the backward-time, centred-space (BTCS) scheme,
While this scheme is first order in time and space, it has the advantage of unconditional stability.
Since the scheme is implicit, a nonlinear root finding solver is required to compute solution at \(t_{j+1}\) given a previously computed solution at time \(t_j\). To achieve this, we apply fixed-point iteration. We re-arrange the system to be of the form \(\mathbf {C}^{j+1} = \mathbf {G}(\mathbf {C}^{j+1})\) where \(\mathbf {C}^{j+1} = \left[ C_0^{j+1}, C_1^{j+1},\ldots ,C_{N+1}^{j+1}\right] ^\text {T}\). That is,
where
We then define the sequence \(\{\mathbf {X}^k\}_{k\ge 0}\), generated through the nonlinear recurrence relation \(\mathbf {X}^{k+1} = \mathbf {G}(\mathbf {X}^{k})\) with \(\mathbf {X}^{0} = \mathbf {C}^j\). This sequence is iterated until \(\Vert \mathbf {X}^{k+1} - \mathbf {X}^k\Vert _2 < \tau \), where \(\tau \) is the error tolerance and \(\Vert \cdot \Vert _2\) is the Euclidean vector norm. Once the sequence has converged, we set \(\mathbf {C}^{j+1} = \mathbf {X}^{k+1}\) and continue to solve for the next time step.
For a given set of model parameters, the spatial and temporal step sizes, \(\varDelta x\) and \(\varDelta t\), need to be selected. In particular, the following condition must hold to ensure accuracy, \(\max _{C \in [0,K]} D(C) < \varDelta x ^2 / \varDelta t\). We then refine \(\varDelta x\) and \(\varDelta t\) together to ensure solutions are independent of the discretisation. Note that as r increases, higher values of D(C) become valid; therefore, particular attention is required to generate Fig. 5 in the main text. The values of \(\varDelta x\), \(\varDelta t\) and \(\tau \) used for the simulations in this work are shown in Table 2. Note that in all cases the discretisation is more refined than required to solve the given problem accurately.
Appendix C: Computational Inference
The Bayesian inference problems described in the main text all require the computation of the posterior PDF. Up to a normalisation constant, the posterior PDF is given by
If the posterior distribution can be sampled, the posterior PDF may be determined by using Monte Carlo integration. Thus, the main requirement is a method of generating N independent, identically distributed (i.i.d.) samples from the posterior distribution.
For many applications of practical interest, Eq. (26) cannot be used directly to generate the samples required since the likelihood is often intractable. Approximate Bayesian computation (ABC) techniques resolve this complexity through the approximation (Sunnåker et al. 2013)
where \(d( \mathscr {D}, \mathscr {D}_s)\) is a discrepancy metric between the true data, \(\mathscr {D}\), and simulated data, \(\mathscr {D}_s\sim \mathscr {L}(\varvec{\theta } ; \mathscr {D}_s)\) and \(\epsilon \) is the discrepancy threshold. ABC methods have the property that \(p(\varvec{\theta } \mid d( \mathscr {D}, \mathscr {D}_s) < \epsilon ) \rightarrow p(\varvec{\theta } \mid \mathscr {D})\) as \(\epsilon \rightarrow 0\). This leads directly to the ABC rejection sampling algorithm (Algorithm C.1). For deterministic models, under the assumption of Gaussian observation errors, \(\epsilon /\sigma \ll 1\), and \(d( \mathscr {D}, \mathscr {D}_s)\) taken as the sum of the squared errors, it can be shown that ABC methods are equivalent to exact posterior sampling (Wilkinson 2013).
In some cases, the acceptance probability in Algorithm C.1 is computationally prohibitive for small \(\epsilon \). In such situations, an ABC extension to Markov Chain Monte Carlo sampling may be applied (Marjoram et al. 2003). The resulting ABC MCMC sampling method (Algorithm C.2), under reasonable conditions on the proposal kernel \(K(\varvec{\theta }^{i} \mid \varvec{\theta }^{i-1})\), simulates a Markov chain with \(p(\varvec{\theta } \mid d( \mathscr {D}, \mathscr {D}_s) < \epsilon )\) (Eq. (27)) as its stationary distribution. It is essential to simulate the Markov chain for a sufficiently long time such that the \(N_T\) dependent samples are effectively equivalent to the required N i.i.d. samples.
Using either ABC rejection sampling (Algorithm C.1) or ABC MCMC sampling (Algorithm C.2), we can apply Monte Carlo integration to compute the posterior PDF as given in Eq. (27). For simplicity, we focus on the approximation of the jth marginal posterior PDF (Silverman 1986),
where \(\theta _j\) is the jth element of \(\varvec{\theta }\), \(\theta _j^{(i)}\) are the jth elements of \(\varvec{\theta }^{(i)} \overset{\text {i.i.d}}{\sim } p(\varvec{\theta } \mid \mathscr {D})\), b is the smoothing parameter and K(x) is the smoothing kernel with property \(\displaystyle {\int _{-\infty }^{\infty } K(x) \, \text {d}x = 1}\).
Appendix D: Additional Results
In this appendix, we present extended results that are excluded from the main text for brevity. We provide more detailed information on the Bayesian analysis presented in Sects. 4.2 and 4.3. Furthermore, we extend the Bayesian inference problem, as provided in Section 4.2, to account for the treatment of uncertainty in the initial condition.
1.1 D.1 Joint Posterior Features
Here we report various descriptive statistics for the joint posterior PDFs computed in Sect. 4. For each posterior distribution, we report the posterior mode, the posterior mean, the variance/covariance matrix and the correlation coefficient matrix.
Given cell density data, \(\mathscr {D}\), a set of continuum model parameters, \(\varvec{\theta }\), in parameter space \(\varvec{\varTheta }\subseteq \mathbb {R}^k\) with \(k > 0\), and a model implied through a likelihood function, \(\mathscr {L}(\varvec{\theta } ; \mathscr {D})\), then summary statistics can be computed from the joint posterior, \(p(\varvec{\theta } \mid \mathscr {D})\), to obtain estimates and uncertainties on the true parameters. The maximum a posteriori (MAP) parameter estimate is the parameter set with the greatest posterior probability density as given by the posterior mode,
The posterior mean is the central tendency of the parameters,
The variance/covariance, matrix \(\varSigma \in \mathbb {R}^{k \times k}\), provides information on the multivariate uncertainties, that is the spread of parameters. The (i, j)th element of \(\varSigma \), denoted by \(\sigma _{i,j}\), is given by
where \(\theta _i\) and \(\theta _j\) are the ith and jth elements of \(\varvec{\theta }\). Note that \(\mathbb {C}\left[ \theta _i,\theta _i\right] = \mathbb {V}\left[ \theta _i\right] \) and \(\sigma _{i,j} = \sigma _{j,i}\). Lastly, the correlation coefficient matrix \(R \in \mathbb {R}^{k \times k}\) measures the linear dependence between parameter pairs. The (i, j)th element of R, denoted by \(\rho _{i,j}\), is given by
Note \(\rho _{i,i} = 1\) for all \(i \in [1,k]\), and \(\rho _{i,j} = \rho _{j,i}\). The results of all these statistics, for the inference problems considered in the main text, are presented in Tables 3, 4, 5, and 6.
1.2 D.2 Bivariate Marginal Posterior PDFs
In the main text, we computed only univariate marginal posterior PDFs, and we extend this analysis by providing bivariate marginal PDFs here. For the Fisher–KPP and Porous Fisher models, we have three bivariate marginal posterior PDFs,
Similarly, for the Generalised Porous Fisher Model, we have six bivariate marginal posterior PDFs,
The resulting PDFs using the three initial density conditions are shown for: the Fisher–KPP model (Fig. 6); the Porous Fisher model (Fig. 7); and the Generalised Porous model (Fig. 8).
1.3 D.3 Uncertainty in Initial Condition
In the main text, the assumption was made that \(C_{\text {obs}}(x,0) = C(x,0 ; \varvec{\theta })\). That is, we use initial observations as the initial density profile to simulate the model given parameters \(\varvec{\theta }\). Since the model is deterministic, the final form of the likelihood is a multivariate Gaussian distribution, which simplifies calculations considerably. Both Jin et al. (2016b) and Warne et al. (2017) indicate that such an assumption could result in underestimation of the uncertainties in parameter estimates.
Following from Warne et al. (2017), we take \(C_{\text {obs}}(x,0) = C(x,0 ; \varvec{\theta }) + \eta _0\), where \(\eta _0\) is a Gaussian random variable with mean \(C(x,0 ; \varvec{\theta })\) and variance \(\sigma _0^2\). Note that we do not require \(\sigma _0 = \sigma \), in fact, there are reasons to consider \(\sigma _0 > \sigma \); for example, experimental protocols for seeding cell culture plates can be an additional source of variation in initial cell densities (Jin et al. 2016b; Warne et al. 2017). Since \(C_{\text {obs}}(x,0) \sim \mathscr {N}(C(x,0 ; \varvec{\theta }),\sigma _0^2)\), it is also true that \(C(x,0 ; \varvec{\theta }) \sim \mathscr {N}(C_{\text {obs}}(x,0),\sigma _0^2)\). Therefore, our models are to be treated as random PDEs with deterministic dynamics, but random initial conditions.
Since the initial conditions are random, the initial condition is a latent variable that must be integrated out. Thus, the likelihood becomes
where \(\sigma _0\) is assumed to be known and \(p(C(x_i,0 ; \varvec{\theta }) \mid \sigma _0)\) is a Gaussian PDF with mean \(C_{\text {obs}}(x_i,0)\) and variance \(\sigma _0\). This likelihood integral must be computed using Monte Carlo methods. Computationally, we apply directly the ABC MCMC method as given in Algorithm C.2. The only algorithmic difference being that simulated data, \(\mathscr {D}_s\), is generated though solving the model PDE after a realisation of the initial density profile has been generated. Overall, this leads to slower convergence in the Markov chain and hence longer computation times.
The inference problem using random initial density profiles was solved using ABC MCMC under the Fisher–KPP model and the Porous Fisher model for initial densities based on 16,000 initial cells only. We take \(\sigma _0 = 2\sigma \). Univariate and bivariate marginal posterior PDFs are shown in Figs. 9 and 10. In the Fisher–KPP model, the additional uncertainty seems to have a significant effect on the uncertainty in the carrying capacity, K, in agreement with Warne et al. (2017). However, the diffusion coefficient, \(D_0\), and proliferation rate, \(\lambda \), are not affected as significantly.
For the Porous Fisher model, both \(D_0\) and K are greatly affected. This is not surprising, since motility is density dependent for the Porous Fisher model. By contrast, the Fisher–KPP model is almost unaffected in the marginal posterior PDF of \(D_0\), since it is independent of initial cell density.
Rights and permissions
About this article
Cite this article
Warne, D.J., Baker, R.E. & Simpson, M.J. Using Experimental Data and Information Criteria to Guide Model Selection for Reaction–Diffusion Problems in Mathematical Biology. Bull Math Biol 81, 1760–1804 (2019). https://doi.org/10.1007/s11538-019-00589-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-019-00589-x