A Continuation Technique for Maximum Likelihood Estimators in Biological Models

Estimating model parameters is a crucial step in mathematical modelling and typically involves minimizing the disagreement between model predictions and experimental data. This calibration data can change throughout a study, particularly if modelling is performed simultaneously with the calibration experiments, or during an on-going public health crisis as in the case of the COVID-19 pandemic. Consequently, the optimal parameter set, or maximal likelihood estimator (MLE), is a function of the experimental data set. Here, we develop a numerical technique to predict the evolution of the MLE as a function of the experimental data. We show that, when considering perturbations from an initial data set, our approach is significantly more computationally efficient that re-fitting model parameters while producing acceptable model fits to the updated data. We use the continuation technique to develop an explicit functional relationship between fit model parameters and experimental data that can be used to measure the sensitivity of the MLE to experimental data. We then leverage this technique to select between model fits with similar information criteria, a priori determine the experimental measurements to which the MLE is most sensitive, and suggest additional experiment measurements that can resolve parameter uncertainty.


Introduction
As quantitative modeling becomes more prevalent across biology and medicine [Altrock et al., 2015;Perelson, 2002;Sanche et al., 2020], mathematical models are increasingly being developed during the experimental data collection that will inform model parameters.This cooperation facilitates the use of mathematical modelling to inform experimental design and suggest potential intervention strategies [Cárdenas et al., 2022;Luo et al., 2022;Sanche et al., 2020;Zhang et al., 2022].The COVID-19 pandemic is a striking example of the resulting feedback loop, where mathematical models suggest intervention strategies that influence the evolving public health crisis before being re-calibrated to new data.[Davies et al., 2020;Holmdahl and Buckee, 2020;Thompson, 2020].
Each updated data set requires re-calibration of the model typically through computationally expensive optimization techniques.To reduce this computational cost of the re-calibration step, it is common to use the existing parameters as a starting point when performing parameter fitting to incoming experimental data sets.This approach recycles optimization work but does not utilize leverage the relationship between the initial and updated experimental data set.Here, we present a computational method to incorporate information about evolving data sets during the model validation and parameter estimation steps.
Specifically, for given model parameters and an initial experimental data set, we develop a method to predict the best-fit parameter set to an updated experimental data set.Our approach can be viewed as a numerical continuation technique [De Souza and Humphries, 2019;Dhooge et al., 2008].However, rather than studying the dynamical properties of the mathematical model as a function of model parameters, we consider the evolution of best-fit model parameters as a function of the experimental data.We use the necessary condition for a local optima to write the best-fit parameters as an implicit function of the experimental data.Thus, we predict best-fit parameter sets for evolving experimental data without performing any optimization.Avoiding optimization leads to significant computational savings and we demonstrate these gains via two examples.In both these examples, our prediction method produces comparable model fits to randomly perturbed data sets to optimization techniques without the computational cost of solving the inverse optimization problem.
While our approach does lead to increased computational efficiency, the more immediate application of our work may be in experimental design.Specifically, we identify an explicit relationship between individual best-fit parameter values and individual experimental data points through our continuation approach.We can therefore quantify which experimental measurements are the most informative for determining bestfit parameters and measure the sensitivity of parameter estimates to perturbations in data.The role of experimental design in model selection and parameterization has been extensively studied [Cárdenas et al., 2022;Li andVu, 2013, 2015;Silk et al., 2014].In particular, Li and Vu [2015] studied how correlations between best-fit model parameters can impact practical and structural identifiability of model parameters while Cárdenas et al. [2022]; Silk et al. [2014] explored how experimental design impacts model selection from a class of possible mathematical models.Conversely, our contribution explicitly relates individual experimental measurements with individual best-fit parameter estimates.We explicitly link our continuation technique to the Fisher information matrix commonly used in optimal experimental design [Braniff et al., 2019b;Kreutz and Timmer, 2009].Taken together, our approach allows the increased confidence in model parametrization from optimal experimental design to be mapped directly to individual model parameters.Accordingly, we can therefore design experiments to address specific uncertainties in parameter estimates.Furthermore, our work offers a distinct step towards understanding how robust parameter estimates are to evolving data.Many existing computational methods quantify confidence in parameterization; formal parameter sensitivity analyses [Maiwald et al., 2016;Marino et al., 2008;Zi, 2011], virtual population approaches [Allen et al., 2016;Cassidy and Craig, 2019;Jenner et al., 2021], or parameter identifiability analysis [Castro and de Boer, 2020], often via profile likelihood computation [Kreutz et al., 2012;Raue et al., 2014Raue et al., , 2009]], quantify how robust model predictions are to parameter variation.In particular, these techniques view the experimental data as fixed up to experimental noise and focus on the relationship between model parameters and model predictions.We offer a complementary approach to existing sensitivity analysis by explicitly studying how the best-fit parameters vary due to changes in calibration data.As we will see, our approach encodes information from local sensitivity analysis when calculating the functional relationship between the best-fit parameters and the calibration data.Consequently, while classical sensitivity analysis quantifies variability in model output due to change in model parameters, our approach considers changes in model parameters, and thus model predictions, as a function of the calibration data.We demonstrate this mapping of experimental data to best-fit parameter via an example drawn from mathematical oncology [Cassidy et al., 2021].These results, when combined with existing information criteria like the AIC or BIC [Kass and Raftery, 1995], allow for modellers to quantify the robustness of best-fit parameter estimates when comparing different model fits to experimental data.
The remainder of the article is structured as follows.We begin by defining the optimization problem in Section 2.1.We develop the continuation method in Section 2.2, discuss our numerical implementation in 2.3, and explore the connection between our continuation approach and classical profile likelihood in 3.1.We then turn to two examples from mathematical biology to illustrate the utility of our technique in Section 3.2 before finishing with a brief discussion.

Formulation of the optimization problem
Here, we introduce the framework of the underlying optimization problem.We focus on ordinary differential equation (ODE) models representing biological processes, as these models are common throughout mathematical biology.However, our approach extends to partial differential equation or delay differential equation models directly.We consider a generic ODE based model throughout the remainder of this work.
Let the model states be given by x(t) ∈ R n with model parameters denoted by θ ∈ Ω ⊂ R p where Ω is a subset of biologically plausible parameter values.We explicitly allow the initial condition x(0) to depend explicitly on the model parameters θ .Taken together, we consider the differential equation model where f is continuously differentiable in x and θ .
We consider calibration data {φ i } d×m i=1 representing m measurements each taken at d time points {t i } d i=1 .It is possible that model species are not directly comparable against the calibration data so we define the m model observables by In what follows, we consider m = 1 for notational simplicity although the analysis extends for m 2.

Likelihood function and objective function
Remark 2.1.The methods that follow do not assume a specific objective function.However, we do assume that the objective function is twice continuously differentiable as is commonly the case.For simplicity, we present the remainder of our results using the common log-likelihood formulation [Maiwald et al., 2016;Stapor et al., 2018].
The likelihood describes the probability of observing experimental data φ as a function of θ and is given by The experimental error at each measurement point, σ i , can be estimated as an additional model parameter or fixed to a known value.Here, we follow Sharp et al. [2022] and take σ i fixed at a known constant value, although it is possible to include σ i in the vector of unknown parameters θ .The maximum likelihood estimator (MLE) θ * , and thus best-fit model parameters for the given experimental data φ , is defined by the solution of the inverse problem As the differential equations defining y(x(t, θ )) rarely have explicit solutions, the likelihood (2) is difficult to evaluate analytically.It is therefore standard to minimize the negative log-likelihood (3) Under the assumption that σ i = σ is fixed, the error term log √ 2πσ 2 and denominator of G(θ , φ ) are constant and do not influence the solution of the optimization problem.The maximum likelihood estimator θ * is the parameter set that minimizes G(θ , φ * ).A number of computational techniques exist to minimize G(θ , φ ) and thus calculate θ * .These optimization techniques typically require simulating the mathematical model (1) at each optimization step.Further complicating the optimization, G(θ , φ * ) is often non-convex with multiple local minima.

Continuation of maximal likelihood estimator
In (3), we explicitly write the objective function G as a function of the model parameters θ and the experimental data φ .Accordingly, the MLE θ * is an implicit function of the experimental data φ defined as the solution of the optimization problem Model fitting is increasingly performed concurrently with experiments [Luo et al., 2022] or obtained from an evolving real-world scenario, as in epidemic modelling [Sanche et al., 2020].In both of these cases, the experimental data is evolving and should not be considered as known and constant.Accordingly, we are interested in the MLE as a function of the experimental data φ .Most existing optimization techniques consider the experimental data fixed and omit this dependence.Here, we develop a continuation type technique to compute the evolution of θ * numerically as a function of φ from an initial solution of the optimization problem.Ultimately, we calculate the evolution of θ * (φ ) as the calibration data varies to generate a curve of potential MLEs in (φ , θ * ) space using a numerical continuation technique.
Numerical continuation methods compute branches of implicitly defined curves.A standard application of these continuation type techniques in mathematical biology is numerical bifurcation analysis [Dhooge et al., 2008;Sanche et al., 2022].In their most common form, numerical bifurcation techniques compute equilibrium systems of a non-linear dynamical system as a function of model parameters but can be used to detect much richer dynamical behaviour [De Souza and Humphries, 2019].Often, these continuation techniques leverage "predictor-corrector" algorithms.Predictor-corrector approaches use the implicit function theorem to predict the solution to the corresponding non-linear system of equations.Then, the predicted solution is used as a starting value to explicitly calculate the solution of the system of equations during the corrector step.Here, we develop a similar "prediction-correction" strategy to predict the behaviour of the solution θ * (φ ) of the inverse problem (4) as a function of the data φ .We focus on the "predictor" step, as the corrector step, if necessary, can utilize existing numerical optimization techniques to calculate the MLE.
As the log-likelihood (3) is continuously differentiable, local optimal must satisfy D θ G(θ * , φ ) = 0, (5) so we necessarily have However, unlike the implicit equation used to determine equilibria of a dynamical system and used in continuation techniques for numerical bifurcation analysis, the optimality condition ( 5) is a necessary, but not sufficient, condition for θ * to be a MLE.Models that are not structurally identifiable [Raue et al., 2014] have manifolds in parameter space on which this optimality constraint holds but are not necessarily MLEs.
We discuss the relationship between our approach and profile likelihood classifications of structural identifiability in Section 3.1.Now, let θ * 0 be the MLE for calibration data φ 0 .Further, let the Hessian D 2 θ G(θ , φ ) be invertible at Then, the implicit function theorem ensures the existence of a function Ψ(φ ) such that The implicit function theorem ensures that Ψ exists but computing Ψ(φ ) analytically is functionally impossible.However, the implicit function Ψ(φ ) is continuously differentiable and we expand Ψ as a function of the calibration data φ using Taylor series where φ + ∆φ is the updated calibration data.Then, to predict Ψ starting from a known solution Ψ(φ ) = θ * we calculate DΨ(φ ).The implicit function theorem implies that We thus use DΨ to evaluate (6) and thus perform the continuation step.

Numerical Implementation
We now show how to use the objective function (3) to calculate finite difference approximations to the derivatives included in (6).As before, we assume that we are given a point For θ n denoting the n-th parameter, we calculate The derivatives ∂ θ n y i (θ ) can be calculated through finite difference schemes [Zi, 2011] where ∆θ n is a small perturbation in only the n-th parameter.In practice, it is standard to take ∆θ n to be some small percentage of the initial parameter θ n [Li et al., 2011].In this case, computing D 2 θ ,φ G(Ψ(φ ), φ ) requires 2p model simulations where p is the number of model parameters.We note that ∂ θ n y i (θ ) is commonly used to perform local sensitivity analysis and that more accurate finite difference approximations, such as centered differences, can be used to calculate D 2 θ ,φ G(Ψ(φ ), φ ).Calculating the Hessian D 2 θ G(θ , φ ) via finite differences is simple to implement but computationally expensive due to the number of objective function evaluations.However, the Hessian, or the observed Fisher Information, is commonly used throughout parameter optimization algorithms and other techniques such as profile likelihood calculations, estimates of the likelihood function, and classical sensitivity anaylsis, which has led to recent advances in the development of computationally efficient techniques to calculate D 2 θ G(θ , φ ) [Stapor et al., 2018] and the ability to recycle these calculations to avoid computational cost.
In the following examples, we use a finite difference scheme to calculate D 2 θ G(θ , φ ).We calculate the diagonal elements of D 2 θ G(θ , φ ) using forward second order differences and the off-diagonal terms by Thus, our computation of the Hessian requires 2p(p + 1) objective function evaluations, although, as mentioned, more efficient implementations are available.In fact, many gradient-based optimization techniques approximate the Hessian D 2 θ ,θ G(θ , φ ) at each iteration [MATLAB, 2017].For example, both fmincon and fminunc in [MATLAB, 2017] calculate D 2 θ ,θ G(θ , φ ) at each step and print the pre-computed Hessian as an output of the optimizer.It is therefore possible, and efficient, to recycle this calculation when calculating an update to θ * 0 using (5).All told, this numerical implementation requires 2p(p + 2) model simulations to evaluate (5).This computational cost is certainly not optimal but does benefit from re-using calculations performed in local sensitivity analysis and the optimization step.Finally, while we have written (5) with the inverse of D 2 θ G(θ , φ ), it is computationally more appropriate to solve the linear system of equations for the unknown DΨ.

Relationship with existing techniques
There are a number of existing techniques to study the relationship between model parameters and data.While our continuation technique focuses on the relationship between the MLE and the calibration data, it has many ties to these existing techniques.We therefore discuss how this continuation method relates to parameter identifiability as assessed by the profile likelihood; local sensitivity analysis; and experimental design, with a focus on using the explicit relationship between data and the MLE to suggest additional experimental measurements.

Parameter identifiability
Thus far, we have explicitly written the MLE estimator as a function of the experimental data used to fit a model.Our approach is intrinsically related to parameter identifability analysis.Identifiability analysis attempts to determine if available experimental observations are capable to uniquely determine model parameters.Accordingly, the practical identifiability of a mathematical model depends on available experimental data.The profile likelihood, given by ), and introduced by Raue et al. [2009], is a projection of the likelihood function onto the model parameter θ i = c.The profile likelihood illustrates the behaviour of the likelihood function as the parameter θ i is fixed away from the optimal value θ * i .The shape of PLE θ i (c) illustrates the confidence interval of the parameter estimate θ * i for given experimental data.Formally, Raue et al. [2009] define these confidence intervals by where ∆ α = χ 2 (α, d f ) is the χ 2 distribution at significance level α and d f degrees of freedom [Raue et al., 2009].A parameter is practically identifiable in the sense of Raue et al. [2009] with confidence level α if C.I.(θ i , α) is bounded in parameter space for given experimental data.Conversely, a non-identifiable parameter has a profile likelihood that does not increase past the threshold ∆ α .
The profile likelihood is intrinsically linked to the available experimental data φ i .We view the PLE as a function of both the parameter θ i and the experimental data φ For practically unidentifiable models, it is natural to ask what perturbations to the experimental data could render the model practically identifiable.Raue et al. [2009] use the profile likelihood of a model parameter to suggest additional experiments to resolve practical non-identifiability.They simulate the model for parameter values along PLE θ i to suggest additional experimental measurements at times t s,i , where t s,i represents the i−th simulated measurement time.In our framework, we define We note that the definition of θ * | θ i =c (φ ) is precisely that of θ * (φ ) with the added constraint that θ i = c.We can calculate D φ θ * | θ i =c as a function of the experimental data φ in precisely the same manner as described previously.Consequently, our continuation approach can complement the experimental design approach suggested by Raue et al. [2009] by incorporating the sensitivity of the MLE to perturbations in the (simulated or experimental) calibration data.

Sensitivity analysis
Local sensitivity analysis quantifies how small perturbations of the best-fit parameters impact model output [Zi, 2011].A standard approach to local sensitivity analysis is using the finite difference approximation of to identify which parameter values strongly impact model projections.When |S n | is small, the model output is considered to be insensitive to θ n .The n-th row of D 2 θ ,φ G(Ψ(φ ), φ ) is precisely S n (t i ) for t i corresponding to calibration data measurements.When implementing (5), the magnitude of the continuation step DΨ(φ )∆φ in the direction of θ n is scaled by S n .This scaling encodes the local sensitivity of model predictions to variations in parameters in the prediction of Ψ(φ ).Consequently, our continuation method naturally includes the information gained from local sensitivity analysis.

Experimental design
In our derivation of DΨ, we assumed that the Hessian matrix D 2 θ G(θ , φ ) was invertible.The Hessian gives the curvature of the loglikelihood and is known as the observed Fisher information matrix I obs .The observed Fisher information is a local measurement in data space.Conversely, the expected Fisher information considers the entirety of data space for fixed model parameters θ .The expected Fisher information is obtained by taking the expectation of D 2 θ G(θ , φ ) over all possible experimental measurements φ and is defined via Many existing experimental design methods leverage the expected Fisher information matrix to minimize the covariance in model parameter estimates via the Cramér-Rao inequality.These experimental design techniques typically maximize some aspect, often the determinant, of the Fisher information matrix as a function of possible data to select the most informative calibration data set [Kreutz and Timmer, 2009].From a geometric perspective, maximizing the determinant of the Fisher information matrix corresponds to minimizing the volume of the confidence ellipsoid engendered from the covariance matrix [Braniff et al., 2019b].
In particular, Braniff et al. [2019a] considered the case of bistable gene regulatory networks where the fold bifurcation and unstable manifold between stable equilibria complicates experimental design and parameter estimation.Sharp et al. [2022] considered an information-geometry perspective to propose the expected Fisher information matrix and resulting Riemannian manifold as a guide for data collection.As is often the case, both Sharp et al. [2022] and Braniff et al. [2019a] used the expected Fisher information, which considers all possible calibration data via the expectation over φ .Here, we show how our approach complements the classical Fisher information approach to experimental design, albeit through a local measurement, in (θ , φ ) space.We recall that were the identity, then DΨ would correspond to the Fisher information approach to measuring uncertainty in MLE.
In the calculation of DΨ∆φ , the matrix D 2 θ ,φ G(Ψ(φ ), φ ) maps perturbations in the calibration data ∆φ through the curvature of the loglikelihood to changes in the MLE.Consequently, D 2 θ ,φ G(Ψ(φ ), φ ) acts as a change of basis matrix from the space of calibration data to parameter space.Simply, D 2 θ ,φ G(Ψ(φ ), φ )∆φ scales changes in the calibration data to the confidence ellipsoid in parameter space obtained from [I obs ] −1 .Geometrically, if D 2 θ G has eigenvalues λ i with corresponding eigenvectors ν i , then choosing ∆φ such that θ G∆φ translates perturbations in calibration data to the corresponding eigenspace of the covariance matrix.
For example, the i−th column of DΨ maps perturbations of the i−th data point to changes in the MLE.Specifically, the sum where informative is understood as the data point inducing the largest sensitivity in the MLE.As an extreme example, if then perturbations in φ n do not impact the MLE estimate, which implies complete insensitivity of the model fit to φ k .This example corresponds to ∆φ belonging to the kernel of the matrix D 2 θ ,φ G since we have assumed that D 2 θ G is invertible.We can therefore utilize our analysis to identify which additional experimental measurements could increase confidence in model parameterization.Consider k additional measurements {φ s,i = y s,i (θ * )} k i=1 taken directly from the model simulation at times {t s,i } k i=1 where the subscript s indicates simulated data.Including {φ s,i } in the objective function (3) does not change the MLE or objective value function as these simulated data exactly match the model values.However, DΨ(φ + ∆φ s,i ) quantifies the sensitivity of the MLE to variability in the k simulated measurements.Accordingly, the measurement that maximizes DΨ(φ + ∆φ s,i ) for a fixed perturbation size ∆ is a good candidate for an additional experimental measurement to decrease parameter uncertainty.

Examples
The continuation framework derived earlier is applicable to a large variety of models throughout in the mathematical biology literature.To demonstrate the utility of the continuation method, we consider two examples from distinct fields and model formulations.First, we consider a mathematical model of phenotypic heterogeneity in non-small cell lung cancer (NSCLC) [Cassidy et al., 2021].This model is given by a system of two non-local, structured PDEs representing the density of drug-sensitive and drug-tolerant NSCLC cells.The PDE model is equivalent to a system of integral equations following the introduction of two auxiliary variables which can be further reduced to a system of ODEs (see [Cassidy et al., 2021] for details).The parameters of the ODE model were fit to in vitro NSCLC data taken from growth experiments in treated and untreated media [Cassidy et al., 2021].
We also consider a classical model of HIV-1 viral dynamics.This model has been used extensively to understand viral dynamics data [Perelson, 2002] and the identifiability of model parameters was considered by Wu et al. [2008].In that work, Wu et al. [2008] used simulated data to validate their identifiability results; we follow Wu et al. [2008] and use simulated data to illustrate our approach.

A PDE model of phenotypic switching in mathematical oncology
Non-genetic phenotypic heterogeneity has been increasingly studied as a driver of treatment resistance in solid cancers [Goldman et al., 2015].A number of mathematical models have been derived to study the emergence of phenotypic plasticity in cancer cell lines [Craig et al., 2019;Gunnarsson et al., 2020;Jolly et al., 2018;Sahoo et al., 2021].We consider the Cassidy et al. [2021] model that tracks the density of NSCLC cells with a drug-sensitive (A(t, a)) or drug-tolerant (B(t, a)) phenotype at time t and age a.The total number of cells of each phenotype is given by The total number of NSCLC cells is given by N(t) = Ā(t) + B(t).Cassidy et al. [2021] considered logistic growth with an Allee effect, wherein cooperation between cells of the same phenotype can lead to increased growth rates, given by where r A and r B are phenotype specific growth rates, the carrying capacity is K, and the strength of the Allee effect is Finally, drug-tolerant and drug-sensitive cells have phenotype-specific death rates d B and A(t, a) and B(t, a) satisfy the age structured PDEs with boundary conditions corresponding to cellular reproduction given by The functions β i j represent the probability of a reproducing mother cell with age a and phenotype i giving birth to a daughter cell with phenotype j.The probability of phenotypic inheritance is given by where σ i represents the decay rate of intracellular signalling factors that modulate how ageing impacts the probability of daughter cells retaining the mother cells phenotype, and Further details, including a derivation of the initial conditions of ( 10), model analysis, and reduction of the phenotype switching mode (10) to a system of ODEs can be found in Cassidy et al. [2021].
The model ( 10) was fit to in vitro experimental data corresponding to NSCLC cell population growth in untreated and treated environments where treatment is applied from day 3 onwards.The calibration data is 4 data points {φ i } 4 i=1 collected at time t i = 0, 2, 4, 6 days in the control experiment, and two additional data points {φ i } 6 i=5 collected on days t i = 4, 6 days during the treated experiment.As anti-cancer treatment is applied from day 3 on-wards and decreases the cancer cell population, we necessarily have φ 5 φ 3 and φ 6 φ 4 .We denote the experimental data used to parametrize the model by {φ 0 i } 6 i=1 .The model output corresponding to the experimental measurements is thus and the objective function is the standard sum of squares error given by
We perturbed the experimental data collected by Craig et al. [2019] with increasing amounts of Gaussian noise.We created 10 perturbed data sets {φ j i } 6 i=1 where the index j = 1, 2, ..., 10, denotes the j-th perturbed data set and the normally distributed noise with µ = 0, σ 2 = 1, and scaled such that log 10 (φ j i ) − log 10 (φ * i ) = (0.05 + jh step × (0.75 − 2 × 0.05) 2 10(11) log 10 (φ 0 i ) where h step = 0.65/55 was chosen such that log 10 (φ 10 i ) − log 10 (φ 0 i ) = 0.75 log 10 (φ 0 i ) .We enforce that this randomly perturbed data satisfies φ 5 φ 3 and φ 6 φ 4 .For each perturbed data set {φ j i }, we used the continuation method described in Section 2.2 to calculate The naive approach to calculate the MLE θ * (φ j ) for updated data φ j would be to use the MLE from the previous data, θ * (φ j−1 ), as an initial starting guess for the parameter fitting step.Hence, to illustrate the utility of our continuation technique, we calculated Ψ(φ j ) using ( 12) and then calculated G pheno (Ψ(φ j ), φ j ).We also calculated the true MLE θ * (φ j ) using the Matlab algorithm fmincon from the starting guesses Ψ(φ j ) and θ * (φ j−1 ).In Figure 1 A), we show the objective function value evaluated at the updated data φ j and three parameter sets : the naive starting point, θ * (φ j−1 ); the predicted MLE, Ψ(φ j ); and the true MLE, θ * (φ j ).We note that the non-monotonic profile of the objective function G pheno in Figure 1 A) is to be expected as we are adding noise to experimental data.This noise may perturb the existing data away from dynamics that can be well-described by the mathematical model.Accordingly, the important information from Figure 1 A) is the comparison which demonstrates the accuracy of the continuation step (5) in driving a relative decrease in G pheno .
Further, in Figure 1 B), we show the cumulative number of objective function evaluations when calculating θ * (φ j ) for j = 1, 2, ..., 10 when starting the optimization from θ * (φ j−1 ) and Ψ(φ j ).The total number of function evaluations used is lower when starting the optimization from the predicted MLE Ψ(φ j ) than when starting from θ * (φ j−1 .More strikingly, the predicted MLE G(Ψ(φ j ), φ j ) is comparable against G(θ * (φ j , φ j ) in Figure 1 A) and there is computational benefit to only calculating the predicted MLE Ψ(φ j ) rather than re-fitting the parameters.Taken together, the results shown in Figure 1 demonstrate the accuracy and computation efficiency gained by calculating Ψ(φ j ).We now demonstrate how to utilize the continuation frame work to identify additional time points to increase confidence in model parameters.We focus on the treated environment and consider additional time points t s,i = 3.1, 3.2, 3.3, 3.4, 3.5, 5, 7 days with corresponding simulated measurements {φ i,s } 7 i=1 = N(t s,i ).We perturb each of these simulated measurements by a fixed amount, ∆φ = ±0.3N(3.1), to give 14 additional, perturbed measurements.We appended each of these 14 measurements to the experimental data and predicted the MLE to these appended data sets.

Iterations Function evaluations Objective function
We calculated the relative change in the MLE for each model parameter and each of the 14 appended data sets.We note that each of the simulated data point occurs following the beginning of therapy.The immediate decrease observed in N(t) following the beginning of treatment is due to the death of sensitive cells following treatment administration and controlled by the parameter d max a .From the biological interpretation of the parameters, we expect d max a to be highly sensitive to perturbations in these data points.
As expected, d max a was the most sensitive model parameter to perturbations of the simulated data and we show the percent relative change in d max a from the unperturbed data in Figure 2 B).As expected, the maximal death rate of sensitive cells increased when the simulated data point was decreased from the true value and decreased when the simulated data point was increased.
The treatment sensitive population rapidly shrinks during therapy.The stabilization and rebound of the population during therapy is due to the expansion of the drug resistant population.This stabilization occurs once the drug sensitive population has been maximally suppressed which due to the drug effect.The most informative simulated data point, as measured by the magnitude of the relative change in the parameter d max a , was at time t i,s = 3.4.At t = 3.4, drug sensitive cells are no longer dominant due to drug pressure.The depth of the population response to treatment, as measured by N(3.4), is thus highly sensitive to death rate of these drug sensitive cells under treatment.In Figure 2 A), we show the simulated experimental measurements and predicted model dynamics for the most informative time point.The predicted model simulations capture the perturbed data point while retaining good fits to the true experimental data.

Parameter continuation in a viral dynamics model
The standard viral dynamics model has been extensively used to understand the dynamics of viral infection in HIV-1 [Perelson, 2002].The model tracks the concentration of uninfected target cells, T (t), infected cells I(t), and free infectious virus V (t).Here, we follow Wu et al. [2008] and consider a model of HIV-1 dynamics where the target cells are CD4 + T-cells.These cells are produced at a constant rate λ and cleared linearly at rate d.Infection occurs at a rate β following contact between a target cell and infectious viral particle and infected cells are cleared at rate δ .Upon lysis, infected cells release N viral particles into the circulation and free virus is cleared at a constant rate c.The viral dynamics model is given by to the calibration measurements is where using log 10 measurements of viral load is standard in HIV studies.
During antiretroviral therapy (ART), the viral load may fall below the limit of detection of standard assays.While there are a number of techniques to account for this censored data, we do not consider data collected during ART, so the objective function is given by the sum of squares error (14) Wu et al. [2008] characterized the identifiability of this model using a higher order derivative method.They found that, if the initial conditions of the model T 0 , I 0 , and V 0 are known, then all six model parameters θ = {β , d, δ , c, N, λ , } are identifiable.To illustrate their results, they fixed θ = {(2×10 −5 , 0.15, 0.55, 5.5, 900, 80} and simulated the ODE model ( 13).They sampled the simulated viral load at 37 distinct time points and added noise ε i sampled from a Gaussian distribution with µ = 0 and σ 2 = 1 [Wu et al., 2008].
In Section 3.2, we demonstrated the effectiveness of our continuation technique by focusing on objective value function and computational efficiency in calculating the MLE.Here, we illustrate how model dynamics evolve during MLE continuation.We follow Wu et al. [2008] but consider a smaller subset of calibration data collected at time t i = {0.4, 1, 8, 14, 20, 36, 46, 58}.We add noise ε 0 i sampled from a Gaussian distribution with µ = 0 and σ 2 = 0.15 so the initial calibration data is We first fit the model to the simulated data φ 0 i to obtain an initial MLE.We then generate 4 additional viral load time courses {φ j i } 10 j=1 by for ε j i sampled from a Gaussian distribution with µ = 0 and σ 2 = 1 and h step = ±0.1,±0.2.This collection of 4 data sets could feasibly represent experimental data measured from an increasingly large sample drawn from a population of HIV-1 positive individuals with population viral dynamic parameters given by θ = {(2 × 10 −5 , 0.15, 0.55, 5.5, 900, 80}.Here, we test the ability of our continuation technique to predict reasonable viral dynamic curves without refitting the data. In Figure 3 A), we compute the predicted Ψ(φ j ) and plot the predicted model dynamics obtained from Ψ(φ j ) against the perturbed data φ j .In Figure 3 B), we show the fit model predictions to the perturbed data.In each case, the viral dynamics show comparable model predictions for the fit and predicted model parameters demonstrating that our continuation method can successfully predict reasonable model simulations.In fact, the Bayesian Information Criteria [Kass and Raftery, 1995] indicates no significant differences between the predicted and true MLE for all 4 data sets.However, Figure 3   It is common to find numerous local minima of ( 14) when fitting (13) to simulated data.As measured by the value of the log-likelihood function or information criteria, these local minima can produce comparable fits to a given data set despite different dynamics.We perturbed the initial data set φ 0 by log(φ 1 i ) = log(φ 0 i ) + 0.8ε i for ε i sampled from a Gaussian distribution with µ = 0 and σ 2 = 1.We fit this perturbed data from 10 distinct initial guesses using fmincon [MATLAB, 2017].These 10 starting initial guesses converged to two local minima.We denote the corresponding parameter estimates by θ1 and θ2 and plot the resulting model trajectories in Fig 4 .These fits are indistinguishable by BIC and both appear to accurately describe the viral load data.Consequently, it is not obvious which of θ1 and θ2 best describe the data.
However, it is reasonable to expect that the MLE should be robust to small perturbations of the calibration data.We measure the robustness of each of these minima by calculating DΨ(φ 1 ) at θ1 and θ2 .A smaller norm DΨ(φ 1 ) implies less sensitivity of the MLE to perturbations of the calibration data.For the example shown in Fig 4, there is a 16 fold difference in sensitivity to calibration data.In this way, DΨ can be used to distinguish between otherwise similar fits.We suggest that, when choosing between multiple fits with similar BIC values, the parameter estimate with the smaller sensitivity to the data is a more robust, and thus preferential, fit.

Discussion
Parameter fitting is crucial step when using mathematical models to predict novel treatment strategies, extrapolate from clinical trials, identify new drug targets or schedules, or propose non-pharmaceutical interventions [Brady and Enderling, 2019;Cassidy and Craig, 2019;Cassidy et al., 2020].However, parameter fitting can be difficult and computationally expensive.A large variety of fitting techniques have therefore been developed to calibrate model predictions against data [Horbelt et al., 2002;Kreutz et al., 2013;Lauß et al., 2018;Toni et al., 2009].Moreover, mathematical modeling is increasingly applied to understand emerging data and make real-time predictions.In this case, as new data emerges, the model parameters must be refit with potential computational cost.Here, we developed a continuation type technique to quantify how updates to experimental data will impact the MLE and predict the evolution of the MLE as a function of the experimental data used to calibrate the model.We used the implicit function theorem to calculate the trajectory of the MLE through parameter space.
As the implicit function theorem only guarantees the existence of a differentiable trajectory Ψ through calibration data-parameter space, we utilized the first order Taylor expansion Ψ to extrapolate the evolution of the MLE due to changes in experimental data.We showed how this calculation is intrinsically linked to local sensitivity analysis and the curvature of the objective function.In two examples drawn from mathematical biology, we showed how this continuation technique can predict acceptable model fits to experimental data while significantly reducing computational overhead.In fact, in most applications, our continuation technique requires no dedicated computational overhead as the Hessian of the objective function is calculated at each step when using common optimization algorithms, such as fmincon [MATLAB, 2017], and local sensitivity analysis is a standard step in model fitting.
Perhaps more importantly that gains in computational efficiency, our approach explicitly identifies relationships between individual experimental measurements and parameter estimates.Our approach addresses similar questions to local sensitivity analysis from a distinct perspective.Rather than using simulations to understand how small perturbations in model parameters from the best-fit parameters change model outputs as in standard sensitivity analysis, we quantify how changes in the training data impact the best-fit parameters and measure the sensitivity of the best-fit parameters to variations in this calibration data.As we showed in Section 3.2, this perspective can be used to suggest additional experimental measurements to increase confidence in model parameterization.Further, we showed how to use DΨ to understand which experimental measurements are most informative for model parameterizations and identify redundant measurements that do not provide additional information for parameter estimation.
Our technique is a type of local analysis that explores the functional dependence of the MLE on experimental data starting from a pre-identified MLE.Specifically, we assume that the Hessian of the objective function is invertible at the MLE and our results are necessarily local in parameter space as we are extrapolating from a pre-identified MLE.Nevertheless, our examples show the utility of our continuation approach for even large perturbations of the experimental data.
Despite these limitations, we developed a continuation-type technique to predict the functional dependence of a MLE on the experimental data used to train a mathematical model.While we have focused on applications in mathematical biology, our approach is immediately portable to other domains.As our method is independent of the number of data points, our approach could be particularly useful in big-data applications.Ultimately, our results offer a unified approach to quantify the relationship between training data and best-fit model parameters and to leverage this understanding to suggest additional experiments to increase confidence in model parameterization.

Figure 1 :
Figure 1: Comparison between MLE estimates obtained using the naive and continuation approaches.Panel A shows a comparison of the objective function value for the naive and continuation guesses as well as the true minimal objective function value as a function of the perturbation of the experimental data from the initial data.Panel B shows a comparison of the number of objective value evaluations required to obtain the minimal value when starting from the naive or predicted MLE with the number of function evaluations required to calculate Ψ(θ i ).

Figure 2 :
Figure 2: Evaluating additional time points to identify d max a in an in vitro model of NSCLC.Panel A shows the a selection of predicted model dynamics when fit to experimental data with a single additional time point φ * i,s that is perturbed by a ∆φ from the true simulated value.For figure clarity, model trajectories corresponding to the perturbation of {φ 4,s } is shown.Panel B shows a tornado plot of the predicted relative change in the best-fit parameter d max a for each additional simulated data point {φ i,s } 7 i=1 .The left side of the tornado plot, in blue, shows the relative change when the perturbed value φ i,s = φ * i,s + ∆φ is larger than the simulated value φ * i,s .The right-hand side, in orange, shows the relative change in d max a when φ i,s = φ * i,s + ∆φ is smaller than the simulated value φ * i,s .
C) shows the significant computational improvement obtained by only calculating the continuation step rather than fitting all model parameters at each step.The predicted model dynamics track the true viral load trajectory.

Figure 3 :
Figure 3: Comparison of predicted model fits to randomly perturbed data.Panels A and B show model trajectories obtained using predicted model parameters to the simulated experimental data perturbed by φ j i = φ 0 i + h step |ε j i |.Panel A shows the predicted model fits to the experimental data while B shows the model fits to data resulting from the true MLE.Panel C shows the number of objective value evaluations required to predict the MLE using this continuation technique or fit the model parameters to the perturbed data using the known parameters as a starting guess.

Figure 4 :
Figure 4: Comparison of two potential fits to randomly perturbed viral dynamics models.Model trajectories obtained from two local minima from fitting 10 initial guesses to viral load data shown in black.Both trajectories accurately describe the viral load dynamics as evidenced by a small difference in BIC.However, the parameter estimate corresponding to the oscillatory trajectory is much more robust as measured by DΨ(φ 1 ) .