INTRODUCTION

Nonlinear mixed effects (NLME) models are used to describe populations of individuals that behave qualitatively similar, but where every individual has its own quantitative characteristics. These models have been highly applicable in pharmacokinetics (PK) and pharmacodynamics (PD) (1,2,3,4). Physiological systems are often modeled with a continuous-time deterministic model and noisy observations that are discrete in time. However, since the mechanisms of the physiological systems of interest are often not completely understood, a natural extension is to account for the uncertainty in the dynamics. This is the motivation behind extending the NLME models with yet a stochastic part (5).

Stochastic differential equations (SDEs) greatly extend the descriptive power of ordinary differential equations (ODEs) that are usually used to encode dynamical system models in pharmacometrics. An ODE model can be turned into an SDE model by the addition of possibly non-linearly scaled noise terms. These additional terms can be thought of as a type of slack-variable introduced to pick-up potential discrepancies between the mechanistically modeled dynamics and unknown but indirectly observed dynamical effects. In other words, these noise terms in the dynamical system equations represent the lumped effect of all not explicitly mechanistically modeled effects in the system. This includes turning model parameters into state variables with random dynamics, which for instance could be used to model inter occasion variability (6) or time-dependent changes in parameters where the trend is unknown a priori, and for the estimation of an unknown input signal (7).

The application of SDEs to model uncertainty in dynamics has long been used in other fields, such as finance and control theory, and various parameter estimation methods have been developed (8). However, the sparsity and irregular sampling of PK/PD data has made it difficult to directly apply these parameter estimation methods. Nevertheless, some attempts to develop methods for parameter estimation in NLME models with stochastic dynamics have been successful, using for example (i) Bayesian inference (9,10), (ii) expectation maximization (EM) methods (11,12,13), and (iii) by expanding the traditional gradient-based estimation methods using Kalman filters (14,15). These methods have been used for several PK/PD applications (16,17,18,19). This paper focuses on gradient-based methods. A general overview of some of the related method and application papers is shown in Table I.

Table I Timeline showing the main contributions to nonlinear mixed effects models with stochastic differential equations and to the S-FOCE method

ODE models are a subset of SDE models because SDEs reduce to ODEs when the system noise (the Gaussian process covariance) is decreased towards zero. Ideally, the dynamical system under study is well characterized and the mechanistic terms present in the dynamical system model equations give an accurate description of the model’s dynamical behavior. Then ODEs are a suitable model description. To account for incorrect model definitions, Kristensen et al. (20) introduced a method for evaluating the choice of a deterministic population model described with ODEs and iteratively improving it by using an SDE model to pinpoint where in the dynamical model equations misspecification may be present. A similar idea had previously been used for gray box models (21).

Instead of opting to find a deterministic model, Tornøe et al. (14) and Overgaard et al. (15) captured the additional uncertainty due to model misspecification, oversimplifications, and approximations by using SDEs in NLME models. The well-known parameter estimation method first-order conditional estimation (FOCE) (22) was expanded to account for the new mathematical framework. This involved state variable estimation using extended Kalman filters (EKFs) (23). The algorithm was initially implemented in NONMEM (14) and later implemented in both MATLAB (24) and R (25).

Optimizing the FOCE approximation of the log likelihood using gradient-based methods, such as the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm (26), requires sensitivities of the state variables of the underlying model. However, NLME models defined by differential equations often lack an analytic solution. Hence, the traditional way to compute the gradient is to use a finite difference approximation. An alternative approach is to obtain the gradients utilizing the sensitivity equations for the original set of differential equations. The algorithm originally proposed by Almquist et al. (27) is an extension of the FOCE algorithm that uses these sensitivity-based gradients for population NLME models based on ODEs. This algorithm is here referred to as S-FOCE. The sensitivity approach was also applied by Leander et al. (28) for single individual models using SDEs. For the single individual, the likelihood was approximated using the EKF, which in turn requires a first-order differentiation of the EKF. These ideas have been combined for parameter estimation of NLME models using SDEs (29). In that study, the exact gradient approach was used instead of the traditional finite difference approach when using the FOCE-EKF approach by using the mixed symbolic and numeric algebra capabilities in Mathematica (30). However, the details of the algorithm have until now not been presented.

This paper extends the S-FOCE algorithm to a general NLME model based on SDEs using the EKF approach. This involves both the first- and the second-order sensitivities of the EKF. Earlier stage of this work has been presented by (31). The method is evaluated on synthetically generated datasets from common PK/PD models with uncertainty in the dynamics, in addition to the traditional observation uncertainty and interindividual variability. Relative computational time of the sensitivity-based gradients is compared to the finite difference approach. Moreover, the precision and the accuracy of both gradient approaches are analyzed.

The paper is structured as follows. First, the necessary theory for the proposed extension is introduced, including the approximate population likelihood and the EKF, followed by the proposed algorithm. Second, the methods for evaluating and comparing of the algorithm are described. Third, the results of the comparison are presented, showing speedup in estimation time as well as increased precision and accuracy of gradients for the proposed algorithm. Finally, the results are discussed along with an outlook towards implementation of the algorithm in an existing parameter estimation framework.

THEORETICAL BACKGROUND

This section introduces the theory needed for extending the S-FOCE method to NLME models with stochastic dynamics (SDE-NLME).

Nonlinear Mixed Effects Models Based on Stochastic Differential Equations

NLME models can be defined using SDEs to describe the dynamics of the continuous-time state variables x. The continuous-time, stochastic dynamics are modeled as

$$ {\displaystyle \begin{array}{ll}d{\boldsymbol{x}}_i& =\boldsymbol{f}\left({\boldsymbol{x}}_i,{\boldsymbol{u}}_i,t,\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right) dt+\boldsymbol{G}\left({\boldsymbol{x}}_i,{\boldsymbol{u}}_i,t,\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)d{\boldsymbol{W}}_i\\ {}{\boldsymbol{x}}_i(0)& ={\boldsymbol{x}}_0\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)\end{array}} $$
(1)

with discrete observations modeled using an observation function h and Gaussian noise \( {\boldsymbol{e}}_{ij}\sim N\left(\mathbf{0},\boldsymbol{\Sigma} \left({\boldsymbol{x}}_{ij},{\boldsymbol{u}}_i,{t}_{i_j},\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)\right) \),

$$ {\boldsymbol{y}}_{ij}=\boldsymbol{h}\left({\boldsymbol{x}}_{ij},{\boldsymbol{u}}_i,{t}_{i_j},\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)+{\boldsymbol{e}}_{ij}. $$
(2)

The function f(xi, ui, t, θ, ηi), describing the deterministic part of the dynamics, is called the drift function and the part introducing randomness to the state variables, G(xi, ui, t, θ, ηi)dWi, is called the system noise. Here, Wi is defined as a Wiener process, with dWi ∼ N(0, dtI). The indices i and j correspond to individual i and its j-th observation at time \( {t}_{i_j} \) for each individual. Here, θ, ηi, and ui are, respectively, the fixed effects, random effects, and known inputs, such as covariates including dosage. The random effects ηi are chosen to be normally distributed with mean zero and covariance Ω(θ). This way of formulating the NLME model gives three sources of variability in the response (29), namely (i) observation noise, eij; (ii) system noise, G(xi, ui, t, θ, ηi)dWi; and (iii) parameter variability, ηi.

The Approximate Population Likelihood

Since xi are stochastic processes, the distribution of yij changes if conditioned on previous observations. Let Υi(j − 1) = {υi1, υi2, …, υi(j − 1)} denote those observations. The residuals εij are defined as

$$ {\boldsymbol{\varepsilon}}_{ij}={\boldsymbol{\upsilon}}_{ij}-{\hat{\boldsymbol{y}}}_{ij} $$
(3)

with the expected observation value \( {\hat{\boldsymbol{y}}}_{ij} \) and covariance Rij conditioned on Υi(j − 1) and θ defined as

$$ {\displaystyle \begin{array}{ll}{\hat{\boldsymbol{y}}}_{ij}& =\mathrm{E}\left[{\boldsymbol{y}}_{ij}|{\boldsymbol{Y}}_{i\left(j-1\right)},\boldsymbol{\theta} \right]\\ {}{\boldsymbol{R}}_{ij}& =\mathrm{Cov}\left[{\boldsymbol{y}}_{ij}|{\boldsymbol{Y}}_{i\left(j-1\right)},\boldsymbol{\theta} \right].\end{array}} $$
(4)

As shown in, e.g., (14,29), the combined likelihood L for all individuals simplifies to

$$ L\left(\boldsymbol{\theta} \right)=\prod \limits_{i=1}^N\int \exp \left({\boldsymbol{l}}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)\right)d{\boldsymbol{\eta}}_i, $$
(5)

where

$$ {\displaystyle \begin{array}{rl}-2{\boldsymbol{l}}_i=& \sum \limits_{j=1}^{n_i}\left({\boldsymbol{\varepsilon}}_{ij}^T{\boldsymbol{R}}_{ij}^{-1}{\boldsymbol{\varepsilon}}_{ij}+\mathrm{logdet}\left(2\pi {\boldsymbol{R}}_{ij}\right)\right)\\ {}& +{\boldsymbol{\eta}}_i{\boldsymbol{\Omega}}^{-1}{\boldsymbol{\eta}}_i+\mathrm{logdet}\left(2\pi \boldsymbol{\Omega} \right),\end{array}} $$
(6)

and where the indirect dependence of θ and η are suppressed for simplicity.

The FOCE approximation of the population likelihood becomes

$$ \log {L}_F\left(\boldsymbol{\theta} \right)=\sum \limits_{i=1}^N\left({\boldsymbol{l}}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)-\frac{1}{2}\mathrm{logdet}\left(\frac{-{\boldsymbol{H}}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)}{2\pi}\right)\right). $$
(7)

Here, \( {\boldsymbol{\eta}}_i^{\ast } \) maximizes the individual likelihood li for a given θ, and Hi is a first-order approximation of the Hessian Δli. Further details on this approximation can be found in, e.g., (27).

The Extended Kalman Filter

The SDEs introduce uncertainty to the time evolution of the state variables of the system. The EKF can be used to estimate the state variables, as suggested in (14,15,29). The continuous-discrete EKF is a state-space estimator for the continuous-discrete state-space models of the form introduced in Eq. (1), with the exception that the state variables should be independent from both the observation noise and the system noise (32). The EKF estimates the conditional expectation of the state \( {\hat{\boldsymbol{x}}}_{i\left(j|j\right)}=\mathrm{E}\left[{\boldsymbol{x}}_{i{t}_j}|{\boldsymbol{Y}}_j,{\boldsymbol{\phi}}_i\right] \) and its covariance \( {\boldsymbol{P}}_{i\left(j|j\right)}=\mathrm{Cov}\left[{\boldsymbol{x}}_{i{t}_j}|{\boldsymbol{Y}}_j,{\boldsymbol{\phi}}_i\right] \). Here, ϕi = (θ, ηi, ui) is a vector containing all parameters corresponding to individual i.

In the rest of this section, the notation is simplified by omitting the individual index i. The drift function f and the observation function h from Eqs. (2) and (1) are linearized by introducing

$$ {\displaystyle \begin{array}{lll}{\boldsymbol{A}}_t& ={\left.\frac{\partial \boldsymbol{f}}{\partial \boldsymbol{x}}\right|}_{\boldsymbol{x}={\hat{\boldsymbol{x}}}_{t\mid j-1}},& t\in \left[{t}_{j-1},{t}_j\right]\\ {}{\boldsymbol{C}}_j& ={\left.\frac{\partial \boldsymbol{h}}{\partial \boldsymbol{x}}\right|}_{\boldsymbol{x}={\hat{\boldsymbol{x}}}_{j\mid j-1}}.& \end{array}} $$
(8)

The EKF has two main steps, a time-update step and a measurement-update step (33). In the time-update step, the state variables and covariance are predicted using all previous observations. This is done by solving the differential equations

$$ {\displaystyle \begin{array}{lll}\frac{d{\hat{\boldsymbol{x}}}_{t\mid j-1}}{dt}& =\boldsymbol{f}\left({\hat{\boldsymbol{x}}}_{t\mid j-1},{\boldsymbol{u}}_t,t,\boldsymbol{\phi} \right),& t\in \left[{t}_{j-1},{t}_j\right]\\ {}\frac{d{\boldsymbol{P}}_{t\mid j-1}}{dt}& ={\boldsymbol{A}}_t{\boldsymbol{P}}_{t\mid j-1}+{\boldsymbol{P}}_{t\mid j-1}{\boldsymbol{A}}_t^T+{\boldsymbol{G}}_t{\boldsymbol{G}}_t^T,& t\in \left[{t}_{j-1},{t}_j\right].\end{array}} $$
(9)

The above prediction gives

$$ {\displaystyle \begin{array}{ll}{\hat{\boldsymbol{y}}}_j& =\boldsymbol{h}\left({\hat{\boldsymbol{x}}}_{j\mid j-1},{\boldsymbol{u}}_j,{t}_j,\boldsymbol{\phi} \right)\\ {}{\boldsymbol{R}}_j& ={\boldsymbol{C}}_j{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T+{\boldsymbol{\Sigma}}_j.\end{array}} $$
(10)

In the measurement-update step, the prediction is used to compute the Kalman gain

$$ {\boldsymbol{K}}_j={\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T{\boldsymbol{R}}_j^{-1} $$
(11)

which is used to update the state and its covariance from the current observation

$$ {\displaystyle \begin{array}{ll}{\hat{\boldsymbol{x}}}_{j\mid j}& ={\hat{\boldsymbol{x}}}_{j\mid j-1}+{\boldsymbol{K}}_j{\boldsymbol{\varepsilon}}_j\\ {}{\boldsymbol{P}}_{j\mid j}& ={\boldsymbol{P}}_{j\mid j-1}-{\boldsymbol{K}}_j{\boldsymbol{R}}_j{\boldsymbol{K}}_j^T.\end{array}} $$
(12)

The filter is initialized with

$$ {\displaystyle \begin{array}{ll}{\hat{\boldsymbol{x}}}_{1\mid 0}& ={\boldsymbol{x}}_0\\ {}{\boldsymbol{P}}_{1\mid 0}& ={\boldsymbol{P}}_0.\end{array}} $$
(13)

PROPOSED ALGORITHM

This section presents the extension of the S-FOCE method, originally proposed by Almquist et al. (27) for parameter estimation of general NLME models based on ODEs, to the use of SDEs. The method utilizes sensitivity equations to calculate exact gradients for a gradient-based optimization of the model parameters. The sensitivities are the derivatives of a function with regard to its parameters. The sensitivities needed for the gradient-based optimization in addition to the sensitivities in the existing S-FOCE algorithm are presented. A detailed derivation of terms needed for the sensitivity equations can be found in the Appendix.

The fixed effects parameters θ can be estimated by maximizing the approximate population likelihood (APL) in Eq. (7) using gradient-based methods. This means a need for computing (27)

$$ \frac{d\log {L}_f\left(\boldsymbol{\theta} \right)}{d{\boldsymbol{\theta}}_m}=\sum \limits_{i=1}^N\left(\frac{d{l}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)}{d{\boldsymbol{\theta}}_m}-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{H}}_i^{-1}\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)\frac{d{\boldsymbol{H}}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)}{d{\boldsymbol{\theta}}_m}\right]\right). $$
(14)

Unlike the ODE case, the APL is formulated in terms of conditional expected values of state variables and their covariances given by the EKF. Hence, sensitivities of entities computed by the EKF equations should be computed. These are the first-order sensitivities

$$ {\displaystyle \begin{array}{r}\frac{d{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_{ik}},\kern1em \frac{d{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\theta}}_n},\kern1em \frac{d{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\eta}}_{ik}},\kern1em \frac{d{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\theta}}_n}\\ {}\frac{d{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}},\kern1em \frac{d{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\theta}}_n}\end{array}} $$
(15)

and the second-order sensitivities

$$ {\displaystyle \begin{array}{r}\frac{d^2{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\theta}}_n},\kern1em \frac{d^2{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\eta}}_{il}},\kern1em \frac{d^2{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\theta}}_n},\kern1em \frac{d^2{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\eta}}_{il}}\\ {}\frac{d^2{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\theta}}_n},\kern1em \frac{d^2{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\eta}}_{il}}.\end{array}} $$
(16)

In addition, the sensitivities of the state variable covariance P are needed. The derivations of the results can be found in the Appendix. The resulting SDE-EKF extension of S-FOCE is outlined in Algorithm 1.

figure a

METHODS

Initializing the EKF

The use of the EKF requires initial values of the state variable x0 and state variable covariance P0. The state variable is set to the initial value of the system equations. To estimate the covariance of the state, different methods can be used. Setting P0 = I is a common but arbitrary way of initializing the EKF. Tornøe et al. (14) set P0 to the integral of the Wiener processes and system dynamics between the first two observations to get a representative value. However, in the case of PK/PD models, one often starts in a steady state and can solve the initial covariance analytically from Eq. (9).

Models and Data

To evaluate the proposed method, a simulation-estimation study was set up. Data was simulated from three common PK and PK/PD models: a two-compartmental PK model (M1), a PK/PD model consisting of a two-compartmental PK together with a direct PD response model (M2), and a PK/PD model consisting of a two-compartmental PK with an indirect PD response model (M3). All model dynamics are described using SDEs, with a one-dimensional Wiener process describing uncertainty of the absorption reaction. The number of state variables and parameters is shown in Table II. Each experimental setup included 50 individuals divided into five dose groups. Representative values were chosen for the simulation in compatible units. The sampling times are chosen to take the characteristics of the PK and PD responses into account. A full description of the model and experimental design can be found in the Appendix.

Table II Overview of the benchmark models

Comparison

Algorithm 1 consists of two levels of optimization, an inner level to find the empirical Bayes estimates \( {\boldsymbol{\eta}}_i^{\ast } \) given θ, and an outer level to find the optimal fixed effects θ. Both levels of optimization use gradient-based methods. Three different ways of computing gradients are considered. One uses sensitivities in both levels of optimization (S − S), another uses sensitivities in the inner level and finite differences in the outer level (S − F), and the last uses finite differences in both levels of optimization (F − F). Furthermore, the finite differences are either computed according to forward differences

$$ \frac{f\left({\boldsymbol{\theta}}_k\left(1+{10}^{-q}\right)\right)-f\left({\boldsymbol{\theta}}_k\right)}{{\boldsymbol{\theta}}_k{10}^{-q}}, $$
(17)

or central differences

$$ \frac{f\left({\boldsymbol{\theta}}_k\left(1+{10}^{-q}\right)\right)-f\left({\boldsymbol{\theta}}_k\left(1-{10}^{-q}\right)\right)}{2{\boldsymbol{\theta}}_k{10}^{-q}}, $$
(18)

where q controls the relative step length and f denotes the function to be differentiated. The proposed algorithm is evaluated in terms of runtime and in terms of precision and accuracy of the gradient. The initial values of the optimization are chosen within 15% from the true value, drawn from a uniform distribution. The algorithm is implemented and run using Mathematica 11.0 (30). Calculations are performed on a workstation with a 4.00-GHz Intel Core i7-6700K CPU and 32.0 GB of RAM. A running version is available upon request.

Timing Comparison

The timing comparison is done in a similar way as the previously performed time comparison for S-FOCE with ODEs (27). For a given model and a relevant simulated dataset, the total evaluation time is calculated using each of the above approaches to obtain the gradient. To isolate the effect of computing the different gradients, they were calculated for the same set of points. The points were chosen as the points from the gradient-based optimization using the S − S gradient approach. This ensures that equally many evaluations were performed for each gradient. For the finite differences, the step length is kept constant, by letting q = 4, corresponding to relative step length of 0.0001.

Precision Comparison

Precision is compared only for finite differences in the outer level of optimization. For the inner level, both methods use sensitivity-based gradients. The gradients are computed at the found optimum, θ, using the S − F and S − S gradients. The step length varies, with q ranging from 3.2 to 6.4. To capture the effect of new realizations on the numerical precision in log LF(θ), the gradient is computed 300 times for each q using different randomized starting values of η for the inner level of optimization. The starting values were drawn from a uniform distribution between −0.3 and 0.3.

RESULTS

Faster Estimation

The proposed algorithm was used to estimate the parameter values for models M1–M3. The resulting parameter estimates using the S1–S gradients along with the relative standard errors can be found in Table III. For each of the finite difference methods and models (M1–M3), the runtime of the parameter estimation was measured for all three previously mentioned optimization schemes. The speedup in each case is the relative gain when going from an F − F scheme to either of the S − F or S − S schemes. This is shown in Fig. 1. As the models become more complex in terms of number of fixed effects and number of state variables, the relative benefit of using sensitivities instead of finite differences increases. This holds both for forward differences and central differences. However, for the central differences, the advantage is more significant, since the central difference gradient requires approximately double the amount of calls to the likelihood function, compared to forward differences.

Table III Parameter values used for simulating data and parameter estimates for the benchmark models. relative standard errors in percent are shown in parenthesis. the parameters that are not a part of the model are denoted with a dash (-)
Fig. 1
figure 1

Comparison of speedup from one estimation scheme (F–F) to another (S–F and S–S). The dotted line represents the speedup baseline. The gain increases with the increased complexity of the model

Improved Precision and Accuracy

The gradient error was computed as the difference between the obtained value and the average value of the values obtained using the sensitivity-based gradient method. The exact method does not utilize a finite difference and is independent of q.

As shown in Fig. 2, the sensitivity-based gradient is more precise than either of the other two methods. For large step size (small q), the gradients computed using either forward or central differences are biased but relatively precise. However, as shown in Fig. 3, the precision of the finite difference methods is not better than the precision of the sensitivity-based method. For small step size (large q), the accuracy increases but both methods suffer from a significant loss of precision on a relative scale compared to the sensitivity-based method.

Fig. 2
figure 2

The 10th and 90th percentiles of the sample-based gradient errors of the exact method (blue), or using forward (red) or central (yellow) finite differences plotted as a function of the finite difference step size, quantified as 10q. Errors of both signs are displayed using a double logarithmic y-axis originating from ±10−7

Fig. 3
figure 3

Sample standard deviation of the gradient using the exact method (blue), forward differences (red), or central differences (yellow), plotted as a function of the finite difference step size, quantified as 10q

Figure 3 shows the standard deviation of the gradient error as a function of the step size. The linear dependency of the standard deviation on q can be reasoned directly from Eqs. (17) and (18), by assuming a numerical error term in relation to each calculation of the function value that is independent of the step size. The forward finite difference gradient Δfθk can be written as

$$ \frac{\Delta f}{\Delta {\boldsymbol{\theta}}_k}=\frac{\left(f\left({\boldsymbol{\theta}}_k\left(1+{10}^{-q}\right)\right)+{\epsilon}_1\right)-\left(f\left({\boldsymbol{\theta}}_k\right)+{\epsilon}_2\right)}{{\boldsymbol{\theta}}_k{10}^{-q}}, $$
(19)

where ϵ1, ϵ2 are independent, identically distributed, numerical error terms with mean zero and variance σ2. The total variance of the forward finite difference gradient due to measurement error is therefore

$$ \mathrm{Var}\left(\frac{\Delta f}{\Delta {\boldsymbol{\theta}}_k}\right)=\frac{2{\sigma}^2}{{\boldsymbol{\theta}}_k^2{10}^{-2q}}. $$
(20)

Taking the logarithm of this gradient’s standard deviation yields

$$ {\log}_{10}\left(\sqrt{\mathrm{Var}\left(\frac{\Delta f}{\Delta {\boldsymbol{\theta}}_k}\right)}\right)=q+{\log}_{10}\left[\sigma \right]+\frac{1}{2}{\log}_{10}\left[2\right]-{\log}_{10}\left[{\boldsymbol{\theta}}_k\right], $$
(21)

which is linear in q with unit slope. Similar results can be derived for central difference gradients.

DISCUSSION

Misspecification of the dynamics in population modeling can be identified and quantified through an SDE-NLME approach. This is accomplished by estimating the magnitude of the system noise terms of the SDEs. These terms serve as a source of uncertainty that affects the time evolution of the model state variables, and they are used in parallel with the classical stochastic model components for the residual error and the between-subject variability.

In this work, a novel method for parameter estimation in SDE-NLME models based on exact gradients is presented. It is evaluated by performing parameter estimation for PK/PD models of different complexity, and comparing the computational time, precision, and accuracy to a standard approach based on a finite difference gradient. The method performs well and shows potential for fast and robust analysis of PK/PD data.

Parameter estimation for NLME PK/PD models can be a time-consuming task, especially when the model of the physiological system is defined by ODEs. The computational burden increases even more if the mathematical framework is expanded to an SDE-NLME setting. By computing the exact gradient of the FOCE likelihood, the need for repeatedly having to solve the inner optimization problem for each perturbation of the fixed effect parameters is avoided, significantly decreasing computational time complexity. As seen in Fig. 1, larger and more complex models of this study benefit more from exact gradients. This suggests that the speedup provided by the exact gradient computations in general will be largest for complex models where this time-saving matters the most. Similar to the S-FOCE method proposed by Almquist et al. (27), the algorithm presented in this paper can easily be parallelized over individuals, further increasing its feasibility.

The exact gradients are computed from numerical solutions of ODEs and may therefore contain numerical errors. However, they are still considered exact since the numerical integration does not introduce any bias, and since the numerical error can be made arbitrarily small by controlling the tolerance of the integrator. The quality of the gradients computed with finite differences was evaluated by comparison to the mean of the exactly computed gradient (Fig. 2). For large finite difference step sizes, the relative precision is high but the gradient is biased. Although the accuracy of the finite difference gradient increased with smaller step size, the sample standard deviation clearly shows a simultaneous loss of precision. This demonstrates a fundamental problem with the finite difference method: it is hard to obtain both good precision and accuracy at the same time. Central differences may improve issues with bias, but still suffer from precision issues to a similar extent as forward differences. Exact gradients, on the other hand, elegantly circumvent the step length conundrum inherent to the finite difference approximation. They are unbiased per construction and the absolute precision is always comparatively high. Although exact gradients outperform the finite difference gradients when compared as such, it is unknown whether exact gradients improve robustness in the parameter estimation step, or in the covariance step, which ultimately is what matters. It may appear plausible that this indeed is the case, but future research is required to prove it.

The EKF has been the standard way of handing stochastic dynamics in NLME models. To fulfill the prerequisites of the EKF, the model state variables must be independent of both the observation noise and the system noise (32). However, in PK/PD models, it is often more realistic to assume proportional noise (1,34). A pragmatic way around this issue is to replace the state variables with their conditional expectation in the model definition. For some models, one could also consider the Lamperti transformation (35) for eliminating the state variable dependency of the system noise. Another filter that might also be applicable is the unscented Kalman filter (UKF), introduced by (36), which instead of linearization of the nonlinear system, uses a so-called unscented transformation. The UKF has been shown to perform very well for a variety of nonlinear models (37), and should be considered as an alternative for state variable estimation in SDE-NLME models.

The EKF was initialized by assuming steady state and solving Eq. (9). In the models chosen for the simulation-estimation study, the steady-state approach is trivial, with P0 = 0. This way of initializing the EKF uses information about the particular model and is superior to using simulations of the initial behavior (14) or simply choosing an arbitrary initialization.

To gain impact in the pharmacometric community, two important steps remain. First, additional applications of SDE-NLME models are needed to further demonstrate the benefits of PK/PD models with uncertain dynamics. Some applications have already been mentioned in this work (6,16,17,19). Second, for novel methods to be widely used, they must be implemented in software with large user-bases, such as NONMEM, Phoenix, and Monolix (22,38,39), or open-source initiatives such as nlmixr (40). Tornøe et al. (14) have previously reported on the implementation of SDE-NLME in the industry-standard software NONMEM. The prediction differential equations Eq. (9) were then introduced as a system of ODEs and event tags were introduced to account for the time-update and measurement-update steps. In version 7.4, NONMEM introduced a FAST option to perform the optimization using the exact gradient approach introduced for ODE models by Almquist et al. (27). To our understanding, the combination of the recent implementation of the FAST method in NONMEM and the filter implementation of Tornøe et al. (14) should be enough for provide this functionality in NONMEM. However, it requires the user to explicitly state the update equations of the Kalman equations. It is only through such wide availability that SDE-NLME methods will get acceptance and popularity among PK/PD modelers.

CONCLUSION

A feasible method for parameter estimation of SDE-NLME models that extends S-FOCE has been proposed. It provides shorter computational time compared to previously used finite difference gradient-based methods. The gradients computed during optimization of the APL are both more precise and more accurate than those computed using finite differences.