Exact Gradients Improve Parameter Estimation in Nonlinear Mixed Effects Models with Stochastic Dynamics

Olafsdottir, Helga Kristin; Leander, Jacob; Almquist, Joachim; Jirstrand, Mats

doi:10.1208/s12248-018-0232-7

Exact Gradients Improve Parameter Estimation in Nonlinear Mixed Effects Models with Stochastic Dynamics

Research Article
Open access
Published: 01 August 2018

Volume 20, article number 88, (2018)
Cite this article

Download PDF

You have full access to this open access article

The AAPS Journal Aims and scope Submit manuscript

Exact Gradients Improve Parameter Estimation in Nonlinear Mixed Effects Models with Stochastic Dynamics

Download PDF

Helga Kristin Olafsdottir^1,2,
Jacob Leander^1,2,3,
Joachim Almquist^1,3,4 &
…
Mats Jirstrand¹

2067 Accesses
3 Citations
Explore all metrics

Abstract

Nonlinear mixed effects (NLME) modeling based on stochastic differential equations (SDEs) have evolved into a promising approach for analysis of PK/PD data. SDE-NLME models go beyond the realm of standard population modeling as they consider stochastic dynamics, thereby introducing a probabilistic perspective on the state variables. This article presents a summary of the main contributions to SDE-NLME models found in the literature. The aims of this work were to develop an exact gradient version of the first-order conditional estimation (FOCE) method for SDE-NLME models and to investigate whether it enabled faster estimation and better gradient precision/accuracy compared to the use of gradients approximated by finite differences. A simulation-estimation study was set up whereby finite difference approximations of the gradients of each level were interchanged with the exact gradients. Following previous work, the uncertainty of the state variables was accounted for using the extended Kalman filter (EKF). The exact gradient FOCE method was implemented in Mathematica 11 and evaluated on SDE versions of three common PK/PD models. When finite difference gradients were replaced by exact gradients at both FOCE levels, relative runtimes improved between 6- and 32-fold, depending on model complexity. Additionally, gradient precision/accuracy was significantly better in the exact gradient case. We conclude that parameter estimation using FOCE with exact gradients can successfully be applied to SDE-NLME models.

A penalized simulated maximum likelihood method to estimate parameters for SDEs with measurement error

Article 24 October 2018

Parameter estimation of complex mixed models based on meta-model approach

Article 22 June 2016

Sandwich Variance Estimation for random effect misspecification in Generalized Linear Mixed Models

Article Open access 28 December 2016

INTRODUCTION

Nonlinear mixed effects (NLME) models are used to describe populations of individuals that behave qualitatively similar, but where every individual has its own quantitative characteristics. These models have been highly applicable in pharmacokinetics (PK) and pharmacodynamics (PD) (1,2,3,4). Physiological systems are often modeled with a continuous-time deterministic model and noisy observations that are discrete in time. However, since the mechanisms of the physiological systems of interest are often not completely understood, a natural extension is to account for the uncertainty in the dynamics. This is the motivation behind extending the NLME models with yet a stochastic part (5).

Stochastic differential equations (SDEs) greatly extend the descriptive power of ordinary differential equations (ODEs) that are usually used to encode dynamical system models in pharmacometrics. An ODE model can be turned into an SDE model by the addition of possibly non-linearly scaled noise terms. These additional terms can be thought of as a type of slack-variable introduced to pick-up potential discrepancies between the mechanistically modeled dynamics and unknown but indirectly observed dynamical effects. In other words, these noise terms in the dynamical system equations represent the lumped effect of all not explicitly mechanistically modeled effects in the system. This includes turning model parameters into state variables with random dynamics, which for instance could be used to model inter occasion variability (6) or time-dependent changes in parameters where the trend is unknown a priori, and for the estimation of an unknown input signal (7).

The application of SDEs to model uncertainty in dynamics has long been used in other fields, such as finance and control theory, and various parameter estimation methods have been developed (8). However, the sparsity and irregular sampling of PK/PD data has made it difficult to directly apply these parameter estimation methods. Nevertheless, some attempts to develop methods for parameter estimation in NLME models with stochastic dynamics have been successful, using for example (i) Bayesian inference (9,10), (ii) expectation maximization (EM) methods (11,12,13), and (iii) by expanding the traditional gradient-based estimation methods using Kalman filters (14,15). These methods have been used for several PK/PD applications (16,17,18,19). This paper focuses on gradient-based methods. A general overview of some of the related method and application papers is shown in Table I.

Table I Timeline showing the main contributions to nonlinear mixed effects models with stochastic differential equations and to the S-FOCE method

Full size table

ODE models are a subset of SDE models because SDEs reduce to ODEs when the system noise (the Gaussian process covariance) is decreased towards zero. Ideally, the dynamical system under study is well characterized and the mechanistic terms present in the dynamical system model equations give an accurate description of the model’s dynamical behavior. Then ODEs are a suitable model description. To account for incorrect model definitions, Kristensen et al. (20) introduced a method for evaluating the choice of a deterministic population model described with ODEs and iteratively improving it by using an SDE model to pinpoint where in the dynamical model equations misspecification may be present. A similar idea had previously been used for gray box models (21).

Instead of opting to find a deterministic model, Tornøe et al. (14) and Overgaard et al. (15) captured the additional uncertainty due to model misspecification, oversimplifications, and approximations by using SDEs in NLME models. The well-known parameter estimation method first-order conditional estimation (FOCE) (22) was expanded to account for the new mathematical framework. This involved state variable estimation using extended Kalman filters (EKFs) (23). The algorithm was initially implemented in NONMEM (14) and later implemented in both MATLAB (24) and R (25).

Optimizing the FOCE approximation of the log likelihood using gradient-based methods, such as the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm (26), requires sensitivities of the state variables of the underlying model. However, NLME models defined by differential equations often lack an analytic solution. Hence, the traditional way to compute the gradient is to use a finite difference approximation. An alternative approach is to obtain the gradients utilizing the sensitivity equations for the original set of differential equations. The algorithm originally proposed by Almquist et al. (27) is an extension of the FOCE algorithm that uses these sensitivity-based gradients for population NLME models based on ODEs. This algorithm is here referred to as S-FOCE. The sensitivity approach was also applied by Leander et al. (28) for single individual models using SDEs. For the single individual, the likelihood was approximated using the EKF, which in turn requires a first-order differentiation of the EKF. These ideas have been combined for parameter estimation of NLME models using SDEs (29). In that study, the exact gradient approach was used instead of the traditional finite difference approach when using the FOCE-EKF approach by using the mixed symbolic and numeric algebra capabilities in Mathematica (30). However, the details of the algorithm have until now not been presented.

This paper extends the S-FOCE algorithm to a general NLME model based on SDEs using the EKF approach. This involves both the first- and the second-order sensitivities of the EKF. Earlier stage of this work has been presented by (31). The method is evaluated on synthetically generated datasets from common PK/PD models with uncertainty in the dynamics, in addition to the traditional observation uncertainty and interindividual variability. Relative computational time of the sensitivity-based gradients is compared to the finite difference approach. Moreover, the precision and the accuracy of both gradient approaches are analyzed.

The paper is structured as follows. First, the necessary theory for the proposed extension is introduced, including the approximate population likelihood and the EKF, followed by the proposed algorithm. Second, the methods for evaluating and comparing of the algorithm are described. Third, the results of the comparison are presented, showing speedup in estimation time as well as increased precision and accuracy of gradients for the proposed algorithm. Finally, the results are discussed along with an outlook towards implementation of the algorithm in an existing parameter estimation framework.

THEORETICAL BACKGROUND

This section introduces the theory needed for extending the S-FOCE method to NLME models with stochastic dynamics (SDE-NLME).

Nonlinear Mixed Effects Models Based on Stochastic Differential Equations

NLME models can be defined using SDEs to describe the dynamics of the continuous-time state variables x. The continuous-time, stochastic dynamics are modeled as

$$ {\displaystyle \begin{array}{ll}d{\boldsymbol{x}}_i& =\boldsymbol{f}\left({\boldsymbol{x}}_i,{\boldsymbol{u}}_i,t,\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right) dt+\boldsymbol{G}\left({\boldsymbol{x}}_i,{\boldsymbol{u}}_i,t,\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)d{\boldsymbol{W}}_i\\ {}{\boldsymbol{x}}_i(0)& ={\boldsymbol{x}}_0\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)\end{array}} $$

(1)

with discrete observations modeled using an observation function h and Gaussian noise $ {\boldsymbol{e}}_{ij}\sim N\left(\mathbf{0},\boldsymbol{\Sigma} \left({\boldsymbol{x}}_{ij},{\boldsymbol{u}}_i,{t}_{i_j},\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)\right) $,

$$ {\boldsymbol{y}}_{ij}=\boldsymbol{h}\left({\boldsymbol{x}}_{ij},{\boldsymbol{u}}_i,{t}_{i_j},\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)+{\boldsymbol{e}}_{ij}. $$

(2)

The function f(x_i, u_i, t, θ, η_i), describing the deterministic part of the dynamics, is called the drift function and the part introducing randomness to the state variables, G(x_i, u_i, t, θ, η_i)dW_i, is called the system noise. Here, W_i is defined as a Wiener process, with dW_i ∼ N(0, dtI). The indices i and j correspond to individual i and its j-th observation at time $ {t}_{i_j} $ for each individual. Here, θ, η_i, and u_i are, respectively, the fixed effects, random effects, and known inputs, such as covariates including dosage. The random effects η_i are chosen to be normally distributed with mean zero and covariance Ω(θ). This way of formulating the NLME model gives three sources of variability in the response (29), namely (i) observation noise, e_ij; (ii) system noise, G(x_i, u_i, t, θ, η_i)dW_i; and (iii) parameter variability, η_i.

The Approximate Population Likelihood

Since x_i are stochastic processes, the distribution of y_ij changes if conditioned on previous observations. Let Υ_{i(j − 1)} = {υ_i1, υ_i2, …, υ_{i(j − 1)}} denote those observations. The residuals ε_ij are defined as

$$ {\boldsymbol{\varepsilon}}_{ij}={\boldsymbol{\upsilon}}_{ij}-{\hat{\boldsymbol{y}}}_{ij} $$

(3)

with the expected observation value $ {\hat{\boldsymbol{y}}}_{ij} $ and covariance R_ij conditioned on Υ_{i(j − 1)} and θ defined as

$$ {\displaystyle \begin{array}{ll}{\hat{\boldsymbol{y}}}_{ij}& =\mathrm{E}\left[{\boldsymbol{y}}_{ij}|{\boldsymbol{Y}}_{i\left(j-1\right)},\boldsymbol{\theta} \right]\\ {}{\boldsymbol{R}}_{ij}& =\mathrm{Cov}\left[{\boldsymbol{y}}_{ij}|{\boldsymbol{Y}}_{i\left(j-1\right)},\boldsymbol{\theta} \right].\end{array}} $$

(4)

As shown in, e.g., (14,29), the combined likelihood L for all individuals simplifies to

$$ L\left(\boldsymbol{\theta} \right)=\prod \limits_{i=1}^N\int \exp \left({\boldsymbol{l}}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i\right)\right)d{\boldsymbol{\eta}}_i, $$

(5)

where

$$ {\displaystyle \begin{array}{rl}-2{\boldsymbol{l}}_i=& \sum \limits_{j=1}^{n_i}\left({\boldsymbol{\varepsilon}}_{ij}^T{\boldsymbol{R}}_{ij}^{-1}{\boldsymbol{\varepsilon}}_{ij}+\mathrm{logdet}\left(2\pi {\boldsymbol{R}}_{ij}\right)\right)\\ {}& +{\boldsymbol{\eta}}_i{\boldsymbol{\Omega}}^{-1}{\boldsymbol{\eta}}_i+\mathrm{logdet}\left(2\pi \boldsymbol{\Omega} \right),\end{array}} $$

(6)

and where the indirect dependence of θ and η are suppressed for simplicity.

The FOCE approximation of the population likelihood becomes

$$ \log {L}_F\left(\boldsymbol{\theta} \right)=\sum \limits_{i=1}^N\left({\boldsymbol{l}}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)-\frac{1}{2}\mathrm{logdet}\left(\frac{-{\boldsymbol{H}}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)}{2\pi}\right)\right). $$

(7)

Here, $ {\boldsymbol{\eta}}_i^{\ast } $ maximizes the individual likelihood l_i for a given θ, and H_i is a first-order approximation of the Hessian Δl_i. Further details on this approximation can be found in, e.g., (27).

The Extended Kalman Filter

The SDEs introduce uncertainty to the time evolution of the state variables of the system. The EKF can be used to estimate the state variables, as suggested in (14,15,29). The continuous-discrete EKF is a state-space estimator for the continuous-discrete state-space models of the form introduced in Eq. (1), with the exception that the state variables should be independent from both the observation noise and the system noise (32). The EKF estimates the conditional expectation of the state $ {\hat{\boldsymbol{x}}}_{i\left(j|j\right)}=\mathrm{E}\left[{\boldsymbol{x}}_{i{t}_j}|{\boldsymbol{Y}}_j,{\boldsymbol{\phi}}_i\right] $ and its covariance $ {\boldsymbol{P}}_{i\left(j|j\right)}=\mathrm{Cov}\left[{\boldsymbol{x}}_{i{t}_j}|{\boldsymbol{Y}}_j,{\boldsymbol{\phi}}_i\right] $. Here, ϕ_i = (θ, η_i, u_i) is a vector containing all parameters corresponding to individual i.

In the rest of this section, the notation is simplified by omitting the individual index i. The drift function f and the observation function h from Eqs. (2) and (1) are linearized by introducing

$$ {\displaystyle \begin{array}{lll}{\boldsymbol{A}}_t& ={\left.\frac{\partial \boldsymbol{f}}{\partial \boldsymbol{x}}\right|}_{\boldsymbol{x}={\hat{\boldsymbol{x}}}_{t\mid j-1}},& t\in \left[{t}_{j-1},{t}_j\right]\\ {}{\boldsymbol{C}}_j& ={\left.\frac{\partial \boldsymbol{h}}{\partial \boldsymbol{x}}\right|}_{\boldsymbol{x}={\hat{\boldsymbol{x}}}_{j\mid j-1}}.& \end{array}} $$

(8)

The EKF has two main steps, a time-update step and a measurement-update step (33). In the time-update step, the state variables and covariance are predicted using all previous observations. This is done by solving the differential equations

$$ {\displaystyle \begin{array}{lll}\frac{d{\hat{\boldsymbol{x}}}_{t\mid j-1}}{dt}& =\boldsymbol{f}\left({\hat{\boldsymbol{x}}}_{t\mid j-1},{\boldsymbol{u}}_t,t,\boldsymbol{\phi} \right),& t\in \left[{t}_{j-1},{t}_j\right]\\ {}\frac{d{\boldsymbol{P}}_{t\mid j-1}}{dt}& ={\boldsymbol{A}}_t{\boldsymbol{P}}_{t\mid j-1}+{\boldsymbol{P}}_{t\mid j-1}{\boldsymbol{A}}_t^T+{\boldsymbol{G}}_t{\boldsymbol{G}}_t^T,& t\in \left[{t}_{j-1},{t}_j\right].\end{array}} $$

(9)

The above prediction gives

$$ {\displaystyle \begin{array}{ll}{\hat{\boldsymbol{y}}}_j& =\boldsymbol{h}\left({\hat{\boldsymbol{x}}}_{j\mid j-1},{\boldsymbol{u}}_j,{t}_j,\boldsymbol{\phi} \right)\\ {}{\boldsymbol{R}}_j& ={\boldsymbol{C}}_j{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T+{\boldsymbol{\Sigma}}_j.\end{array}} $$

(10)

In the measurement-update step, the prediction is used to compute the Kalman gain

$$ {\boldsymbol{K}}_j={\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T{\boldsymbol{R}}_j^{-1} $$

(11)

which is used to update the state and its covariance from the current observation

$$ {\displaystyle \begin{array}{ll}{\hat{\boldsymbol{x}}}_{j\mid j}& ={\hat{\boldsymbol{x}}}_{j\mid j-1}+{\boldsymbol{K}}_j{\boldsymbol{\varepsilon}}_j\\ {}{\boldsymbol{P}}_{j\mid j}& ={\boldsymbol{P}}_{j\mid j-1}-{\boldsymbol{K}}_j{\boldsymbol{R}}_j{\boldsymbol{K}}_j^T.\end{array}} $$

(12)

The filter is initialized with

$$ {\displaystyle \begin{array}{ll}{\hat{\boldsymbol{x}}}_{1\mid 0}& ={\boldsymbol{x}}_0\\ {}{\boldsymbol{P}}_{1\mid 0}& ={\boldsymbol{P}}_0.\end{array}} $$

(13)

PROPOSED ALGORITHM

This section presents the extension of the S-FOCE method, originally proposed by Almquist et al. (27) for parameter estimation of general NLME models based on ODEs, to the use of SDEs. The method utilizes sensitivity equations to calculate exact gradients for a gradient-based optimization of the model parameters. The sensitivities are the derivatives of a function with regard to its parameters. The sensitivities needed for the gradient-based optimization in addition to the sensitivities in the existing S-FOCE algorithm are presented. A detailed derivation of terms needed for the sensitivity equations can be found in the Appendix.

The fixed effects parameters θ can be estimated by maximizing the approximate population likelihood (APL) in Eq. (7) using gradient-based methods. This means a need for computing (27)

$$ \frac{d\log {L}_f\left(\boldsymbol{\theta} \right)}{d{\boldsymbol{\theta}}_m}=\sum \limits_{i=1}^N\left(\frac{d{l}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)}{d{\boldsymbol{\theta}}_m}-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{H}}_i^{-1}\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)\frac{d{\boldsymbol{H}}_i\left(\boldsymbol{\theta}, {\boldsymbol{\eta}}_i^{\ast}\right)}{d{\boldsymbol{\theta}}_m}\right]\right). $$

(14)

Unlike the ODE case, the APL is formulated in terms of conditional expected values of state variables and their covariances given by the EKF. Hence, sensitivities of entities computed by the EKF equations should be computed. These are the first-order sensitivities

$$ {\displaystyle \begin{array}{r}\frac{d{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_{ik}},\kern1em \frac{d{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\theta}}_n},\kern1em \frac{d{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\eta}}_{ik}},\kern1em \frac{d{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\theta}}_n}\\ {}\frac{d{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}},\kern1em \frac{d{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\theta}}_n}\end{array}} $$

(15)

and the second-order sensitivities

$$ {\displaystyle \begin{array}{r}\frac{d^2{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\theta}}_n},\kern1em \frac{d^2{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\eta}}_{il}},\kern1em \frac{d^2{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\theta}}_n},\kern1em \frac{d^2{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\eta}}_{il}}\\ {}\frac{d^2{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\theta}}_n},\kern1em \frac{d^2{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}d{\boldsymbol{\eta}}_{il}}.\end{array}} $$

(16)

In addition, the sensitivities of the state variable covariance P are needed. The derivations of the results can be found in the Appendix. The resulting SDE-EKF extension of S-FOCE is outlined in Algorithm 1.

METHODS

Initializing the EKF

The use of the EKF requires initial values of the state variable x₀ and state variable covariance P₀. The state variable is set to the initial value of the system equations. To estimate the covariance of the state, different methods can be used. Setting P₀ = I is a common but arbitrary way of initializing the EKF. Tornøe et al. (14) set P₀ to the integral of the Wiener processes and system dynamics between the first two observations to get a representative value. However, in the case of PK/PD models, one often starts in a steady state and can solve the initial covariance analytically from Eq. (9).

Models and Data

To evaluate the proposed method, a simulation-estimation study was set up. Data was simulated from three common PK and PK/PD models: a two-compartmental PK model (M1), a PK/PD model consisting of a two-compartmental PK together with a direct PD response model (M2), and a PK/PD model consisting of a two-compartmental PK with an indirect PD response model (M3). All model dynamics are described using SDEs, with a one-dimensional Wiener process describing uncertainty of the absorption reaction. The number of state variables and parameters is shown in Table II. Each experimental setup included 50 individuals divided into five dose groups. Representative values were chosen for the simulation in compatible units. The sampling times are chosen to take the characteristics of the PK and PD responses into account. A full description of the model and experimental design can be found in the Appendix.

Table II Overview of the benchmark models

Full size table

Comparison

Algorithm 1 consists of two levels of optimization, an inner level to find the empirical Bayes estimates $ {\boldsymbol{\eta}}_i^{\ast } $ given θ, and an outer level to find the optimal fixed effects θ^∗. Both levels of optimization use gradient-based methods. Three different ways of computing gradients are considered. One uses sensitivities in both levels of optimization (S − S), another uses sensitivities in the inner level and finite differences in the outer level (S − F), and the last uses finite differences in both levels of optimization (F − F). Furthermore, the finite differences are either computed according to forward differences

$$ \frac{f\left({\boldsymbol{\theta}}_k\left(1+{10}^{-q}\right)\right)-f\left({\boldsymbol{\theta}}_k\right)}{{\boldsymbol{\theta}}_k{10}^{-q}}, $$

(17)

or central differences

$$ \frac{f\left({\boldsymbol{\theta}}_k\left(1+{10}^{-q}\right)\right)-f\left({\boldsymbol{\theta}}_k\left(1-{10}^{-q}\right)\right)}{2{\boldsymbol{\theta}}_k{10}^{-q}}, $$

(18)

where q controls the relative step length and f denotes the function to be differentiated. The proposed algorithm is evaluated in terms of runtime and in terms of precision and accuracy of the gradient. The initial values of the optimization are chosen within 15% from the true value, drawn from a uniform distribution. The algorithm is implemented and run using Mathematica 11.0 (30). Calculations are performed on a workstation with a 4.00-GHz Intel Core i7-6700K CPU and 32.0 GB of RAM. A running version is available upon request.

Timing Comparison

The timing comparison is done in a similar way as the previously performed time comparison for S-FOCE with ODEs (27). For a given model and a relevant simulated dataset, the total evaluation time is calculated using each of the above approaches to obtain the gradient. To isolate the effect of computing the different gradients, they were calculated for the same set of points. The points were chosen as the points from the gradient-based optimization using the S − S gradient approach. This ensures that equally many evaluations were performed for each gradient. For the finite differences, the step length is kept constant, by letting q = 4, corresponding to relative step length of 0.0001.

Precision Comparison

Precision is compared only for finite differences in the outer level of optimization. For the inner level, both methods use sensitivity-based gradients. The gradients are computed at the found optimum, θ^∗, using the S − F and S − S gradients. The step length varies, with q ranging from 3.2 to 6.4. To capture the effect of new realizations on the numerical precision in log L_F(θ), the gradient is computed 300 times for each q using different randomized starting values of η for the inner level of optimization. The starting values were drawn from a uniform distribution between −0.3 and 0.3.

RESULTS

Faster Estimation

The proposed algorithm was used to estimate the parameter values for models M1–M3. The resulting parameter estimates using the S1–S gradients along with the relative standard errors can be found in Table III. For each of the finite difference methods and models (M1–M3), the runtime of the parameter estimation was measured for all three previously mentioned optimization schemes. The speedup in each case is the relative gain when going from an F − F scheme to either of the S − F or S − S schemes. This is shown in Fig. 1. As the models become more complex in terms of number of fixed effects and number of state variables, the relative benefit of using sensitivities instead of finite differences increases. This holds both for forward differences and central differences. However, for the central differences, the advantage is more significant, since the central difference gradient requires approximately double the amount of calls to the likelihood function, compared to forward differences.

Table III Parameter values used for simulating data and parameter estimates for the benchmark models. relative standard errors in percent are shown in parenthesis. the parameters that are not a part of the model are denoted with a dash (-)

Full size table

Improved Precision and Accuracy

The gradient error was computed as the difference between the obtained value and the average value of the values obtained using the sensitivity-based gradient method. The exact method does not utilize a finite difference and is independent of q.

As shown in Fig. 2, the sensitivity-based gradient is more precise than either of the other two methods. For large step size (small q), the gradients computed using either forward or central differences are biased but relatively precise. However, as shown in Fig. 3, the precision of the finite difference methods is not better than the precision of the sensitivity-based method. For small step size (large q), the accuracy increases but both methods suffer from a significant loss of precision on a relative scale compared to the sensitivity-based method.

Figure 3 shows the standard deviation of the gradient error as a function of the step size. The linear dependency of the standard deviation on q can be reasoned directly from Eqs. (17) and (18), by assuming a numerical error term in relation to each calculation of the function value that is independent of the step size. The forward finite difference gradient Δf/Δθ_k can be written as

$$ \frac{\Delta f}{\Delta {\boldsymbol{\theta}}_k}=\frac{\left(f\left({\boldsymbol{\theta}}_k\left(1+{10}^{-q}\right)\right)+{\epsilon}_1\right)-\left(f\left({\boldsymbol{\theta}}_k\right)+{\epsilon}_2\right)}{{\boldsymbol{\theta}}_k{10}^{-q}}, $$

(19)

where ϵ₁, ϵ₂ are independent, identically distributed, numerical error terms with mean zero and variance σ². The total variance of the forward finite difference gradient due to measurement error is therefore

$$ \mathrm{Var}\left(\frac{\Delta f}{\Delta {\boldsymbol{\theta}}_k}\right)=\frac{2{\sigma}^2}{{\boldsymbol{\theta}}_k^2{10}^{-2q}}. $$

(20)

Taking the logarithm of this gradient’s standard deviation yields

$$ {\log}_{10}\left(\sqrt{\mathrm{Var}\left(\frac{\Delta f}{\Delta {\boldsymbol{\theta}}_k}\right)}\right)=q+{\log}_{10}\left[\sigma \right]+\frac{1}{2}{\log}_{10}\left[2\right]-{\log}_{10}\left[{\boldsymbol{\theta}}_k\right], $$

(21)

which is linear in q with unit slope. Similar results can be derived for central difference gradients.

DISCUSSION

Misspecification of the dynamics in population modeling can be identified and quantified through an SDE-NLME approach. This is accomplished by estimating the magnitude of the system noise terms of the SDEs. These terms serve as a source of uncertainty that affects the time evolution of the model state variables, and they are used in parallel with the classical stochastic model components for the residual error and the between-subject variability.

In this work, a novel method for parameter estimation in SDE-NLME models based on exact gradients is presented. It is evaluated by performing parameter estimation for PK/PD models of different complexity, and comparing the computational time, precision, and accuracy to a standard approach based on a finite difference gradient. The method performs well and shows potential for fast and robust analysis of PK/PD data.

Parameter estimation for NLME PK/PD models can be a time-consuming task, especially when the model of the physiological system is defined by ODEs. The computational burden increases even more if the mathematical framework is expanded to an SDE-NLME setting. By computing the exact gradient of the FOCE likelihood, the need for repeatedly having to solve the inner optimization problem for each perturbation of the fixed effect parameters is avoided, significantly decreasing computational time complexity. As seen in Fig. 1, larger and more complex models of this study benefit more from exact gradients. This suggests that the speedup provided by the exact gradient computations in general will be largest for complex models where this time-saving matters the most. Similar to the S-FOCE method proposed by Almquist et al. (27), the algorithm presented in this paper can easily be parallelized over individuals, further increasing its feasibility.

The exact gradients are computed from numerical solutions of ODEs and may therefore contain numerical errors. However, they are still considered exact since the numerical integration does not introduce any bias, and since the numerical error can be made arbitrarily small by controlling the tolerance of the integrator. The quality of the gradients computed with finite differences was evaluated by comparison to the mean of the exactly computed gradient (Fig. 2). For large finite difference step sizes, the relative precision is high but the gradient is biased. Although the accuracy of the finite difference gradient increased with smaller step size, the sample standard deviation clearly shows a simultaneous loss of precision. This demonstrates a fundamental problem with the finite difference method: it is hard to obtain both good precision and accuracy at the same time. Central differences may improve issues with bias, but still suffer from precision issues to a similar extent as forward differences. Exact gradients, on the other hand, elegantly circumvent the step length conundrum inherent to the finite difference approximation. They are unbiased per construction and the absolute precision is always comparatively high. Although exact gradients outperform the finite difference gradients when compared as such, it is unknown whether exact gradients improve robustness in the parameter estimation step, or in the covariance step, which ultimately is what matters. It may appear plausible that this indeed is the case, but future research is required to prove it.

The EKF has been the standard way of handing stochastic dynamics in NLME models. To fulfill the prerequisites of the EKF, the model state variables must be independent of both the observation noise and the system noise (32). However, in PK/PD models, it is often more realistic to assume proportional noise (1,34). A pragmatic way around this issue is to replace the state variables with their conditional expectation in the model definition. For some models, one could also consider the Lamperti transformation (35) for eliminating the state variable dependency of the system noise. Another filter that might also be applicable is the unscented Kalman filter (UKF), introduced by (36), which instead of linearization of the nonlinear system, uses a so-called unscented transformation. The UKF has been shown to perform very well for a variety of nonlinear models (37), and should be considered as an alternative for state variable estimation in SDE-NLME models.

The EKF was initialized by assuming steady state and solving Eq. (9). In the models chosen for the simulation-estimation study, the steady-state approach is trivial, with P₀ = 0. This way of initializing the EKF uses information about the particular model and is superior to using simulations of the initial behavior (14) or simply choosing an arbitrary initialization.

To gain impact in the pharmacometric community, two important steps remain. First, additional applications of SDE-NLME models are needed to further demonstrate the benefits of PK/PD models with uncertain dynamics. Some applications have already been mentioned in this work (6,16,17,19). Second, for novel methods to be widely used, they must be implemented in software with large user-bases, such as NONMEM, Phoenix, and Monolix (22,38,39), or open-source initiatives such as nlmixr (40). Tornøe et al. (14) have previously reported on the implementation of SDE-NLME in the industry-standard software NONMEM. The prediction differential equations Eq. (9) were then introduced as a system of ODEs and event tags were introduced to account for the time-update and measurement-update steps. In version 7.4, NONMEM introduced a FAST option to perform the optimization using the exact gradient approach introduced for ODE models by Almquist et al. (27). To our understanding, the combination of the recent implementation of the FAST method in NONMEM and the filter implementation of Tornøe et al. (14) should be enough for provide this functionality in NONMEM. However, it requires the user to explicitly state the update equations of the Kalman equations. It is only through such wide availability that SDE-NLME methods will get acceptance and popularity among PK/PD modelers.

CONCLUSION

A feasible method for parameter estimation of SDE-NLME models that extends S-FOCE has been proposed. It provides shorter computational time compared to previously used finite difference gradient-based methods. The gradients computed during optimization of the APL are both more precise and more accurate than those computed using finite differences.

References

Andersson R, Kroon T, Almquist J, Jirstrand M, Oakes ND, Evans ND, et al. Modeling of free fatty acid dynamics: insulin and nicotinic acid resistance under acute and chronic treatments. J Pharmacokinet Pharmacodyn. 2017;44(3):203–22.
Article PubMed PubMed Central CAS Google Scholar
Jiang D, et al. Effects of CYP2C19 and CYP2C9 genotypes on pharmacokinetic variability of valproic acid in Chinese epileptic patients: nonlinear mixed-effect modeling. Eur J Clin Pharmacol. 2009;65(12):1187.
Article PubMed CAS Google Scholar
Silber HE, Jauslin PM, Frey N, Gieschke R, Simonsson US, Karlsson MO. An integrated model for glucose and insulin regulation in healthy volunteers and type 2 diabetic patients following intravenous glucose provocations. J Clin Pharmacol. 2007;47(9):1159–71.
Article PubMed CAS Google Scholar
Mould D, Upton RN. Basic concepts in population modeling, simulation, and model-based drug development—part 2: introduction to pharmacokinetic modeling methods. CPT: Pharmacometrics Systems & Pharmacology. 2013;2(4):1–14.
Google Scholar
Donnet S, Samson A. A review on estimation of stochastic differential equations for pharmacokinetic/pharmacodynamic models. Adv Drug Deliv Rev. 2013;65(7):929–39.
Article PubMed CAS Google Scholar
Deng C, Plan EL, Karlsson MO. Approaches for modeling within subject variability in pharmacometric count data analysis: dynamic inter-occasion variability and stochastic differential equations. J Pharmacokinet Pharmacodyn. 2016;43(3):305–14.
Article PubMed Google Scholar
Trägårdh M, Chappell MJ, Ahnmark A, Lindén D, Evans ND, Gennemark P. Input estimation for drug discovery using optimal control and Markov chain Monte Carlo approaches. J Pharmacokinet Pharmacodyn. 2016;43(2):207–21.
Article PubMed PubMed Central CAS Google Scholar
Sørensen H. Parametric inference for diffusion processes observed at discrete points in time: a survey. Int Stat Rev. 2004;72(3):337–54.
Article Google Scholar
Donnet S, Foulley JL, Samson A. Bayesian analysis of growth curves using mixed models defined by stochastic differential equations. Biometrics. 2010;66(3):733–41.
Article PubMed Google Scholar
Yan FR et al. Parameter estimation of population pharmacokinetic models with stochastic differential equations: implementation of an estimation algorithm. Journal of Probability and Statistics. 2014; 2014.
Donnet S, Samson A. Parametric inference for mixed models defined by stochastic differential equations. ESAIM-Probab Stat. 2008;12:196–218.
Article Google Scholar
Donnet S and Samson A. EM algorithm coupled with particle filter for maximum likelihood parameter estimation of stochastic differential mixed-effects models. 2011.
Delattre M, Lavielle M. Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models. Stat Interface. 2013;6(4):519–32.
Article Google Scholar
Tornøe CW, Overgaard RV, Agersø H, Nielsen HA, Madsen H, Jonsson EN. Stochastic differential equations in NONMEM: implementation, application, and comparison with ordinary differential equations. Pharm Res. 2005;22(8):1247–58.
Article PubMed CAS Google Scholar
Overgaard RV, Jonsson N, Tornøe CW, Madsen H. Non-linear mixed-effects models with stochastic differential equations: implementation of an estimation algorithm. J Pharmacokinet Pharmacodyn. 2005;32(1):85–107.
Article PubMed Google Scholar
Overgaard RV, Holford N, Rytved KA, Madsen H. PKPD model of interleukin-21 effects on thermoregulation in monkeys—application and evaluation of stochastic differential equations. Pharm Res. 2007;24(2):298–309.
Article PubMed CAS Google Scholar
Berglund M, Sunnåker M, Adiels M, Jirstrand M, Wennberg B. Investigations of a compartmental model for leucine kinetics using non-linear mixed effects models with ordinary and stochastic differential equations. Math Med Biol. 2011;29(4):361–84.
Article PubMed Google Scholar
Krengel A, Hauth J, Taskinen MR, Adiels M, Jirstrand M. A continuous-time adaptive particle filter for estimations under measurement time uncertainties with an application to a plasma-leucine mixed effects model. BMC Syst Biol. 2013;7(1):8.
Article PubMed PubMed Central Google Scholar
Matzuka B, Chittenden J, Monteleone J, Tran H. Stochastic nonlinear mixed effects: a metformin case study. J Pharmacokinet Pharmacodyn. 2016;43(1):85–98.
Article PubMed Google Scholar
Kristensen NR, Madsen H, Ingwersen SH. Using stochastic differential equations for PK/PD model development. J Pharmacokinet Pharmacodyn. 2005;32(1):109–41.
Article PubMed Google Scholar
Tornøe CW, Jacobsen JL, Pedersen O, Hansen T, Madsen H. Grey-box modelling of pharmacokinetic/pharmacodynamic systems. J Pharmacokinet Pharmacodyn. 2004;31(5):401–17.
Article PubMed Google Scholar
Beal SL, Sheiner LB, Boeckmann AJ, and Bauer RJ. NONMEM 7.4 users guides.
Jazwinski AH. Stochastic processes and filtering theory. Chelmsford: Courier Corporation; 2007.
Google Scholar
Mortensen SB, Klim S, Dammann B, Kristensen NR, Madsen H, Overgaard RV. A matlab framework for estimation of NLME models using stochastic differential equations. J Pharmacokinet Pharmacodyn. 2007;34(5):623–42.
Article PubMed Google Scholar
Klim S, Mortensen SB, Kristensen NR, Overgaard RV, Madsen H. Population stochastic modelling (PSM)—an R package for mixed-effects models based on stochastic differential equations. Comput Methods Prog Biomed. 2009;94(3):279–89.
Article Google Scholar
Fletcher R. Practical methods of optimization. Hoboken: John Wiley & Sons; 2013.
Google Scholar
Almquist J, Leander J, Jirstrand M. Using sensitivity equations for computing gradients of the FOCE and FOCEI approximations to the population likelihood. J Pharmacokinet Pharmacodyn. 2015;42:191–209.
Article PubMed PubMed Central Google Scholar
Leander J, Lundh T, Jirstrand M. Stochastic differential equations as a tool to regularize the parameter estimation problem for continuous time dynamical systems given discrete time measurements. Math Biosci. 2014;251:54–62.
Article PubMed Google Scholar
Leander J, Almquist J, Ahlström C, Gabrielsson J, Jirstrand M. Mixed effects modeling using stochastic differential equations: illustrated by pharmacokinetic data of nicotin acid in obese Zucker rats. AAPS J. 2015;17:586–96.
Article PubMed PubMed Central CAS Google Scholar
Wolfram Research, Inc., Mathematica, version 11.0. 2016.
Olafsdottir HK. Sensitivity-based gradients for parameter estimation in nonlinear mixed effects models with deterministic and stochastic dynamics, Master’s thesis, Institutionen för signaler och system, Chalmers tekniska högskola, 2016.
Spinello D, Stilwell DJ. Nonlinear estimation with state-dependent Gaussian observation noise. IEEE Trans Autom Control. 2010;55(6):1358–66.
Article Google Scholar
Anderson BD, Moore JB. Optimal filtering. Englewood Cliffs. 1979;21:22–95.
Google Scholar
Nguyen T, et al. Model evaluation of continuous data pharmacometric models: metrics and graphics. CPT: Pharmacometrics Systems & Pharmacology. 2017;6(2):87–109.
CAS Google Scholar
Iacus SM. Simulation and inference for stochastic differential equations: with R examples. Berlin: Springer Science & Business Media; 2009.
Google Scholar
Julier SJ, Uhlmann JK. New extension of the Kalman filter to nonlinear systems. In: Kadar I, editor. Signal processing, sensor fusion, and target recognition VI, vol. 3068. International Society for Optics and Photonics; 1997. pp. 182–194. https://doi.org/10.1117/12.280797.
Wan EA and Van Der Merwe R. The unscented Kalman filter for nonlinear estimation. In Adaptive systems for signal processing, communications, and control symposium 2000. AS-SPCC The IEEE. 2000; 2000:153–158.
Cetara, Phoenix 8.0..
Lixoft, Monolix.
Fidler M, Xiong Y, Schoemaker R, Wilkins J, Trame M, and Wang W. Nlmixr: nonlinear mixed effects models in population pharmacokinetics and pharmacodynamics. 2017.

Download references

Acknowledgements

This project has been supported by the Swedish Foundation for Strategic Research, which is gratefully acknowledged.

Author information

Authors and Affiliations

Fraunhofer-Chalmers Centre, Chalmers Science Park, Gothenburg, Sweden
Helga Kristin Olafsdottir, Jacob Leander, Joachim Almquist & Mats Jirstrand
Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden
Helga Kristin Olafsdottir & Jacob Leander
Quantitative Clinical Pharmacology, Early Clinical Development, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden
Jacob Leander & Joachim Almquist
Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
Joachim Almquist

Authors

Helga Kristin Olafsdottir
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Leander
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Almquist
View author publications
You can also search for this author in PubMed Google Scholar
Mats Jirstrand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helga Kristin Olafsdottir.

Appendices

Appendix

Derivation of EKF Sensitivities

This section derives the equations needed for performing both the first- and second-order sensitivities calculation and thereby obtaining the gradients needed for extension of the S-FOCE method to the case of SDEs.

First-Order Sensitivities

The sensitivities dε_ij/dη_ik and dR_ij/dη_ik need to be determined. Using the chain rule gives

$$ {\displaystyle \begin{array}{l}\frac{d{\boldsymbol{\varepsilon}}_{ij}}{d{\boldsymbol{\eta}}_{ik}}=\frac{d\left({\boldsymbol{\upsilon}}_{ij}-{\hat{\boldsymbol{y}}}_{ij}\right)}{d{\boldsymbol{\eta}}_{ik}}\\ {}\kern2.36em =-\frac{d{\hat{\boldsymbol{y}}}_{ij}}{d{\boldsymbol{\eta}}_{ik}}\\ {}=-\left(\frac{\partial \boldsymbol{h}}{\partial {\boldsymbol{\eta}}_{ik}}+\frac{\partial \boldsymbol{h}}{\partial {\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}\frac{d{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}}\right)\end{array}} $$

(22)

and

$$ \frac{d{\boldsymbol{R}}_{ij}}{d{\boldsymbol{\eta}}_{ik}}\kern0.5em =\frac{\partial {\boldsymbol{R}}_{ij}}{\partial {\boldsymbol{\eta}}_{ik}}+\frac{\partial {\boldsymbol{R}}_{ij}}{\partial {\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}\frac{d{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}}. $$

(23)

The derivatives

$$ \frac{\partial \boldsymbol{h}}{\partial {\boldsymbol{\eta}}_{ik}},\mathrm{and}\ \frac{\partial \boldsymbol{h}}{\partial {\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}} $$

(24)

can be found directly from the definition of h. However, the derivatives

$$ \frac{d{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_{ik}},\kern0.5em \frac{\partial {\boldsymbol{R}}_{ij}}{\partial {\boldsymbol{\eta}}_{ik}},\mathrm{and}\ \frac{\partial {\boldsymbol{R}}_{ij}}{\partial {\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}. $$

(25)

remain to be found. In further calculations, the individual index i will be suppressed.

Differentiating Predicted Expected State Variables

Start by considering $ \frac{d{\hat{\boldsymbol{x}}}_{i\left(j|j-1\right)}}{d{\boldsymbol{\eta}}_k} $. This derivative can be obtained by solving the sensitivity equation

$$ {\displaystyle \begin{array}{l}\frac{d}{dt}\left(\frac{d{\hat{\boldsymbol{x}}}_{t\mid j}}{d{\boldsymbol{\eta}}_k}\right)\kern0.5em =\frac{\partial \boldsymbol{f}}{\partial {\boldsymbol{\eta}}_k}+\frac{\partial \boldsymbol{f}}{\partial {\hat{\boldsymbol{x}}}_{t\mid j}}\frac{d{\hat{\boldsymbol{x}}}_{t\mid j}}{d{\boldsymbol{\eta}}_k},\kern1.00em t\in \left[{t}_j,{t}_{j+1}\right]\\ {}\begin{array}{ll}\frac{d{\hat{\boldsymbol{x}}}_{t\mid j}}{d{\boldsymbol{\eta}}_k}\left({t}_j\right)& =\frac{d{\hat{\boldsymbol{x}}}_{j\mid j}}{d{\boldsymbol{\eta}}_k}\end{array}\end{array}} $$

(26)

where

$$ {\hat{\boldsymbol{x}}}_{j\mid j}\kern0.5em ={\hat{\boldsymbol{x}}}_{j\mid j-1}+{\boldsymbol{K}}_j{\boldsymbol{\varepsilon}}_j $$

(27)

and thus,

$$ \frac{d{\hat{\boldsymbol{x}}}_{j\mid j}}{d{\boldsymbol{\eta}}_k}\kern0.5em =\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_k}+\frac{d{\boldsymbol{K}}_j}{d{\boldsymbol{\eta}}_k}{\boldsymbol{\varepsilon}}_j+{\boldsymbol{K}}_j\frac{d{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_k} $$

(28)

Differentiating Predicted State Variable Covariance

Now consider the other two remaining derivatives, $ \frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k} $ and $ \frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}} $. The equation

$$ {\boldsymbol{R}}_j={\boldsymbol{C}}_j{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T+{\boldsymbol{\Sigma}}_j $$

(29)

gives

$$ \frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}\kern0.5em =\frac{\partial {\boldsymbol{C}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T+{\boldsymbol{C}}_j\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T+{\boldsymbol{C}}_j{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}+\frac{\partial {\boldsymbol{\Sigma}}_j}{\partial {\boldsymbol{\eta}}_k}. $$

(30)

From the definition of Σ_j and h, ∂Σ_j/∂η_k and ∂C_j/∂η_k can be directly obtained. The remaining derivative is ∂P_{j ∣ j − 1}/∂η_k. By definition,

$$ {\boldsymbol{P}}_{1\mid 0}={\boldsymbol{P}}_0 $$

(31)

so

$$ \frac{\partial {\boldsymbol{P}}_{1\mid 0}}{\partial {\boldsymbol{\eta}}_k}=\frac{\partial {\boldsymbol{P}}_0}{\partial {\boldsymbol{\eta}}_k} $$

(32)

directly follows. For positive integers j, the following sensitivity equation for P_{j + 1 ∣ j}

$$ {\displaystyle \begin{array}{ll}\frac{d}{dt}\left(\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k}\right)=& \frac{\partial {\boldsymbol{A}}_t}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{P}}_{t\mid j}+{\boldsymbol{A}}_t\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k}+{\boldsymbol{P}}_{t\mid j}\frac{\partial {\boldsymbol{A}}_t^T}{\partial {\boldsymbol{\eta}}_k}+\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{A}}_t^T\\ {}& +\frac{\partial {\boldsymbol{G}}_t}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{G}}_t^T+{\boldsymbol{G}}_t\frac{\partial {\boldsymbol{G}}_t^T}{\partial {\boldsymbol{\eta}}_k},\kern2.00em t\in \left[{t}_j,{t}_{j+1}\right]\\ {}\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k}\left({t}_j\right)=& \frac{\partial {\boldsymbol{P}}_{j\mid j}}{\partial {\boldsymbol{\eta}}_k}\end{array}} $$

(33)

where

$$ \frac{\partial {\boldsymbol{P}}_{j\mid j}}{\partial {\boldsymbol{\eta}}_k}\kern0.5em =\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}-\left(\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j{\boldsymbol{K}}_j^T+{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{K}}_j^T+{\boldsymbol{K}}_j{\boldsymbol{R}}_j\frac{\partial {\boldsymbol{K}}_j^T}{\partial {\boldsymbol{\eta}}_k}\right) $$

(34)

In the same way as for $ \frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k} $, the equations for $ \frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}} $ are obtained by replacing η_k with $ {\hat{\boldsymbol{x}}}_{j\mid j-1} $.

Differentiating the Kalman Gain

In the above cases, the derivative and partial derivative of the Kalman gain K_j are required. This can be done by first noting that

$$ \frac{d{\boldsymbol{K}}_j}{d{\boldsymbol{\eta}}_k}=\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\eta}}_k}+\frac{\partial {\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_k} $$

(35)

and then calculating the partial derivative of K_j with respect to η_k as follows:

$$ {\displaystyle \begin{array}{ll}\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\eta}}_k}=& \frac{\partial }{\partial {\boldsymbol{\eta}}_k}\left({\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T{\boldsymbol{R}}_j^{-1}\right)\\ {}=& \frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T{\boldsymbol{R}}_j^{-1}+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j^{-1}+{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T\frac{\partial {\boldsymbol{R}}_j^{-1}}{\partial {\boldsymbol{\eta}}_k}\\ {}=& \frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T{\boldsymbol{R}}_j^{-1}+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j^{-1}\\ {}& -{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T{\boldsymbol{R}}_j^{-1}\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j^{-1}\\ {}=& \frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T{\boldsymbol{R}}_j^{-1}+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j^{-1}-{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j^{-1}\\ {}=& \left(\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}-{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}\right){\boldsymbol{R}}_j^{-1}\end{array}} $$

(36)

In the same way, it follows that

$$ \frac{\partial {\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\kern0.5em =\left(\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}{\boldsymbol{C}}_j^T+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}-{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\right){\boldsymbol{R}}_j^{-1} $$

(37)

Moreover, the derivative of K_j with respect to θ is calculated in the same way as the derivative with respect to η.

Second-Order Sensitivities

This section derives the equations needed to compute the derivatives

$$ \frac{d^2{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\theta}}_n}\ \mathrm{and}\ \frac{d^2{\boldsymbol{R}}_j}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\theta}}_n}. $$

(38)

The chain rule yields

$$ {\displaystyle \begin{array}{l}\frac{d^2{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\theta}}_n}\kern0.5em =\kern0.36em -\frac{d}{d{\boldsymbol{\theta}}_n}\left(\frac{\partial \boldsymbol{h}}{\partial {\boldsymbol{\eta}}_k}+\frac{\partial \boldsymbol{h}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_k}\right)\\ {}\kern4em =\kern0.48em -\Big(\frac{\partial^2\boldsymbol{h}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}+\frac{\partial^2\boldsymbol{h}}{\partial {\boldsymbol{\eta}}_k\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\theta}}_n}\operatorname{}\\ {}+\left(\frac{\partial^2\boldsymbol{h}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}\partial {\boldsymbol{\theta}}_n}+\frac{\partial^2\boldsymbol{h}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{{\boldsymbol{\theta}}_n}\right)\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_k}\\ {}+\operatorname{}\frac{\partial \boldsymbol{h}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{d^2{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\theta}}_n}\Big)\end{array}} $$

(39)

and

$$ {\displaystyle \begin{array}{l}\frac{d^2{\boldsymbol{R}}_j}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\theta}}_n}\kern0.36em =\kern0.48em \frac{d}{d{\boldsymbol{\theta}}_n}\left(\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}+\frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_k}\right)\\ {}\kern2.28em =\kern0.48em \frac{\partial^2{\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}+\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}{\partial {\boldsymbol{\theta}}_n}\\ {}\kern3.119999em +\left(\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}\partial {\boldsymbol{\theta}}_n}+\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\theta}}_n}\right)\frac{d{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_k}\\ {}\kern3.119999em +\frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{d^2{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_k{\boldsymbol{\theta}}_n}\end{array}} $$

(40)

Note that in the above calculations, θ_n can be replaced by η_l in order to obtain the derivatives

$$ \frac{d^2{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\eta}}_l}\ \mathrm{and}\ \frac{d^2{\boldsymbol{R}}_j}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\eta}}_l}. $$

(41)

Second Derivative of Predicted Expected State

$$ {\displaystyle \begin{array}{l}\frac{d}{dt}\left(\frac{d^2{\hat{\boldsymbol{x}}}_{t\mid j}}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\theta}}_n}\right)=\kern0.5em \frac{\partial^2\boldsymbol{f}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}+\frac{\partial^2\boldsymbol{f}}{\partial {\eta}_j\partial {\hat{\boldsymbol{x}}}_{t\mid j}}\frac{d{\hat{\boldsymbol{x}}}_{t\mid j-1}}{d{\boldsymbol{\theta}}_n}\\ {}\kern7em +\left(\frac{\partial^2\boldsymbol{f}}{\partial {\hat{\boldsymbol{x}}}_{t\mid j}\partial {\boldsymbol{\theta}}_n}+\frac{\partial^2\boldsymbol{f}}{\partial {\hat{\boldsymbol{x}}}_{t\mid j}^2}\frac{d{\hat{\boldsymbol{x}}}_{t\mid j}}{d{\boldsymbol{\theta}}_n}\right)\frac{d{\hat{\boldsymbol{x}}}_{t\mid j}}{d{\boldsymbol{\eta}}_k}\\ {}\kern7em +\frac{\partial \boldsymbol{f}}{\partial {\hat{\boldsymbol{x}}}_{t\mid j}}\frac{d^2{\hat{\boldsymbol{x}}}_{t\mid j}}{d{\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n},\kern0.84em t\in \left[{t}_j,{t}_{j+1}\right]\\ {}\frac{d^2{\hat{\boldsymbol{x}}}_{t\mid j}}{d{\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\left({t}_j\right)\kern0.48em =\kern0.5em \frac{d^2{\hat{\boldsymbol{x}}}_{j\mid j}}{d{\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\end{array}} $$

(42)

where

$$ \frac{d^2{\hat{\boldsymbol{x}}}_{j\mid j}}{d{\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\kern0.5em =\frac{d^2{\hat{\boldsymbol{x}}}_{j\mid j-1}}{d{\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}+\frac{d^2{\boldsymbol{K}}_j}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\theta}}_n}{\boldsymbol{\varepsilon}}_j+\frac{d{\boldsymbol{K}}_j}{d{\boldsymbol{\eta}}_k}\frac{{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\theta}}_n}+\frac{d{\boldsymbol{K}}_j}{d{\boldsymbol{\theta}}_n}\frac{{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_k}+{\boldsymbol{K}}_j\frac{d{\boldsymbol{\varepsilon}}_j}{d{\boldsymbol{\eta}}_kd{\boldsymbol{\theta}}_n} $$

(43)

Second Derivative of Predicted State Covariance

$$ {\displaystyle \begin{array}{ll}\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}=& \frac{\partial }{\partial {\boldsymbol{\theta}}_n}\left(\frac{\partial {\boldsymbol{C}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T+{\boldsymbol{C}}_j\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T+{\boldsymbol{C}}_j{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}+\frac{\partial {\boldsymbol{\Sigma}}_j}{\partial {\boldsymbol{\eta}}_k}\right)\\ {}=& \frac{\partial^2{\boldsymbol{C}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T+\frac{\partial {\boldsymbol{C}}_j}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\theta}}_n}{\boldsymbol{C}}_j^T+\frac{\partial {\boldsymbol{C}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{P}}_{j\mid j-1}\frac{{\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\theta}}_n}\\ {}& +\frac{\partial {\boldsymbol{C}}_j}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T+{\boldsymbol{C}}_j\frac{\partial^2{\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{C}}_j^T+{\boldsymbol{C}}_j\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}\frac{{\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\theta}}_n}\\ {}& +\frac{\partial {\boldsymbol{C}}_j}{\partial {\boldsymbol{\theta}}_n}{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}+{\boldsymbol{C}}_j\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}+{\boldsymbol{C}}_j{\boldsymbol{P}}_{j\mid j-1}\frac{\partial^2{\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\\ {}& +\frac{\partial^2{\boldsymbol{\Sigma}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\end{array}} $$

(44)

with

$$ {\displaystyle \begin{array}{ll}\frac{d}{dt}\left(\frac{\partial^2{\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\right)=& \frac{\partial^2{\boldsymbol{A}}_t}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{P}}_{t\mid j}+\frac{\partial {\boldsymbol{A}}_t}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\theta}}_n}+\frac{\partial {\boldsymbol{A}}_t}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k}+{\boldsymbol{A}}_t\frac{\partial^2{\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\\ {}\ & +\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{A}}_t^T}{\partial {\boldsymbol{\eta}}_k}+{\boldsymbol{P}}_{t\mid j}\frac{\partial^2{\boldsymbol{A}}_t^T}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}+\frac{\partial^2{\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{A}}_t^T+\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{A}}_t^T}{\partial {\boldsymbol{\theta}}_n}\\ {}& +\frac{\partial^2{\boldsymbol{G}}_t}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{G}}_t^T+\frac{\partial {\boldsymbol{G}}_t}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{G}}_t^T}{\partial {\boldsymbol{\theta}}_n}+\frac{\partial {\boldsymbol{G}}_t}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{G}}_t^T}{\partial {\boldsymbol{\eta}}_k}+{\boldsymbol{G}}_t\frac{\partial^2{\boldsymbol{G}}_t^T}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n},\kern2.00em t\in \left[{t}_j,{t}_{j+1}\right]\\ {}\frac{\partial^2{\boldsymbol{P}}_{t\mid j}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\left({t}_j\right)=& \frac{\partial^2{\boldsymbol{P}}_{j\mid j}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\end{array}} $$

(45)

where

$$ {\displaystyle \begin{array}{ll}\frac{\partial^2{\boldsymbol{P}}_{j\mid j}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}=& \frac{\partial^2{\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\\ {}& -\left(\frac{\partial^2{\boldsymbol{K}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{R}}_j{\boldsymbol{K}}_j^T+\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\theta}}_n}{\boldsymbol{K}}_j^T+\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j\frac{\partial {\boldsymbol{K}}_j^T}{\partial {\boldsymbol{\theta}}_n}\right.\\ {}& +\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{K}}_j^T+{\boldsymbol{K}}_j\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{K}}_j^T+{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{K}}_j^T}{\partial {\boldsymbol{\theta}}_n}\\ {}& \left.+\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\theta}}_n}{\boldsymbol{R}}_j\frac{\partial {\boldsymbol{K}}_j^T}{\partial {\boldsymbol{\eta}}_k}+{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{K}}_j^T}{\partial {\boldsymbol{\eta}}_k}+{\boldsymbol{K}}_j{\boldsymbol{R}}_j\frac{\partial^2{\boldsymbol{K}}_j^T}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\right)\end{array}} $$

(46)

The derivatives $ \frac{\partial^2{\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}} $, $ \frac{\partial^2{\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}\partial {\boldsymbol{\theta}}_n} $, and $ \frac{\partial^2{\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2} $ are calculated in the same way. In the special case of $ \frac{\partial^2{\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2} $, the equations can be simplified to

$$ {\displaystyle \begin{array}{l}\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\kern0.36em =\kern0.36em \frac{\partial^2{\boldsymbol{C}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{P}}_{j\mid j-1}{\boldsymbol{C}}_j^T+{\boldsymbol{C}}_j\frac{\partial^2{\boldsymbol{P}}_{j\mid j-1}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{C}}_j^T+{\boldsymbol{C}}_j{\boldsymbol{P}}_{j\mid j-1}\frac{\partial^2{\boldsymbol{C}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\\ {}\kern5em +2\Big(\frac{\partial {\boldsymbol{C}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}{\boldsymbol{C}}_j^T+\frac{\partial {\boldsymbol{C}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}{\boldsymbol{P}}_{j\mid j-1}\frac{{\boldsymbol{C}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\operatorname{}\\ {}\operatorname{}\kern2em +{\boldsymbol{C}}_j\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{{\boldsymbol{C}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\Big)+\frac{\partial^2{\boldsymbol{\Sigma}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\end{array}} $$

(47)

with

$$ {\displaystyle \begin{array}{l}\frac{d}{dt}\left(\frac{\partial^2{\boldsymbol{P}}_{t\mid j}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\right)\kern0.36em =\kern0.36em \frac{\partial^2{\boldsymbol{A}}_t}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{P}}_{t\mid j}+{\boldsymbol{A}}_t\frac{\partial^2{\boldsymbol{P}}_{t\mid j}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}+{\boldsymbol{P}}_{t\mid j}\frac{\partial^2{\boldsymbol{A}}_t^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}+\frac{\partial^2{\boldsymbol{P}}_{t\mid j}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{A}}_t^T\\ {}\kern4.559998em +2\left(\frac{\partial {\boldsymbol{A}}_t}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}+\frac{\partial {\boldsymbol{P}}_{t\mid j}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\boldsymbol{A}}_t^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\right)\\ {}\kern4.559998em +\frac{\partial^2{\boldsymbol{G}}_t}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{G}}_t^T+2\frac{\partial {\boldsymbol{G}}_t}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\boldsymbol{G}}_t^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}+{\boldsymbol{G}}_t\frac{\partial^2{\boldsymbol{G}}_t^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2},\kern0.84em t\in \left[{t}_j,{t}_{j+1}\right]\\ {}\frac{\partial^2{\boldsymbol{P}}_{t\mid j}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\left({t}_j\right)\kern0.36em =\kern0.36em \frac{\partial^2{\boldsymbol{P}}_{j\mid j}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\end{array}} $$

(48)

where

$$ {\displaystyle \begin{array}{l}\frac{\partial^2{\boldsymbol{P}}_{j\mid j}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\kern0.36em =\kern0.36em \frac{\partial^2{\boldsymbol{P}}_{j\mid j-1}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\\ {}\kern3.239999em -\Big(\frac{\partial^2{\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{R}}_j{\boldsymbol{K}}_j^T+{\boldsymbol{K}}_j\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{K}}_j^T+{\boldsymbol{K}}_j{\boldsymbol{R}}_j\frac{\partial^2{\boldsymbol{K}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\operatorname{}\\ {}\kern3.119999em +2\Big(\frac{\partial {\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}{\boldsymbol{K}}_j^T+\frac{\partial {\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}{\boldsymbol{R}}_j\frac{\partial {\boldsymbol{K}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\operatorname{}\\ {}\kern3.119999em \operatorname{}\operatorname{}+{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\boldsymbol{K}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\left)\right)\end{array}} $$

(49)

Second Derivative of the Kalman Gain

In both cases above, the second partial derivative of the Kalman gain K_j is required. This can be done as follows:

$$ {\displaystyle \begin{array}{ll}\frac{\partial^2{\boldsymbol{K}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}=& \frac{\partial^2{\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{C}}_j^T{\boldsymbol{R}}_j^{-1}+\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\theta}}_n}{\boldsymbol{R}}_j^{-1}+\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T\frac{\partial {\boldsymbol{R}}_j^{-1}}{\partial {\boldsymbol{\theta}}_n}\\ {}& +\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j^{-1}+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial^2{\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{R}}_j^{-1}+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{R}}_j^{-1}}{\partial {\boldsymbol{\theta}}_n}\\ {}& -\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{R}}_j^{-1}-{\boldsymbol{K}}_j\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{R}}_j^{-1}-{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{R}}_j^{-1}}{\partial {\boldsymbol{\theta}}_n}\\ {}=& \left(\frac{\partial^2{\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{C}}_j^T+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial^2{\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}{\boldsymbol{R}}_j^{-1}-{\boldsymbol{K}}_j\frac{\partial^2{\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\boldsymbol{\theta}}_n}\right.\\ {}& \left.+\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\theta}}_n}+\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}-\frac{\partial {\boldsymbol{K}}_j}{\partial {\boldsymbol{\theta}}_n}\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}\right){\boldsymbol{R}}_j^{-1}\\ {}& -\left(\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\boldsymbol{\eta}}_k}{\boldsymbol{C}}_j^T+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\boldsymbol{\eta}}_k}-{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\eta}}_k}\right){\boldsymbol{R}}_j^{-1}\frac{\partial {\boldsymbol{R}}_j}{\partial {\boldsymbol{\theta}}_n}{\boldsymbol{R}}_j^{-1}\end{array}} $$

(50)

The derivatives $ \frac{\partial^2{\boldsymbol{K}}_j}{\partial {\boldsymbol{\eta}}_k\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}} $, $ \frac{\partial^2{\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}\partial {\boldsymbol{\theta}}_n} $, and $ \frac{\partial^2{\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2} $ are calculated in the same way. In the special case of $ \frac{\partial^2{\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2} $, the derivative can be simplified to

$$ {\displaystyle \begin{array}{l}\frac{\partial^2{\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\kern0.36em =\kern0.36em \Big(\frac{\partial^2{\boldsymbol{P}}_{j\mid j-1}}{{\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{C}}_j^T+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial^2{\boldsymbol{C}}_j^T}{{\hat{\boldsymbol{x}}}_{j\mid j-1}^2}{\boldsymbol{R}}_j^{-1}-{\boldsymbol{K}}_j\frac{\partial^2{\boldsymbol{R}}_j}{{\hat{\boldsymbol{x}}}_{j\mid j-1}^2}\operatorname{}\\ {}\operatorname{}\kern3.119999em +\frac{\partial 2{\boldsymbol{P}}_{j\mid j-1}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}-\frac{\partial {\boldsymbol{K}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\Big){\boldsymbol{R}}_j^{-1}\\ {}\kern3.119999em -\left(\frac{\partial {\boldsymbol{P}}_{j\mid j-1}}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}{\boldsymbol{C}}_j^T+{\boldsymbol{P}}_{j\mid j-1}\frac{\partial {\boldsymbol{C}}_j^T}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}-{\boldsymbol{K}}_j\frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}\right){\boldsymbol{R}}_j^{-1}\frac{\partial {\boldsymbol{R}}_j}{\partial {\hat{\boldsymbol{x}}}_{j\mid j-1}}{\boldsymbol{R}}_j^{-1}\end{array}} $$

(51)

Model Definitions

In all following models, A₁, A₂, and A₃ represent the absorption, central, and peripheral compartments of the PK part of the model and R represents the PD response. The measured concentration in the central compartment is denoted by

$$ C(t)=\frac{A_2(t)}{V_2}. $$

(52)

The two-compartment PK model is described with the equations

$$ \left(\begin{array}{c}d{A}_1\\ {}d{A}_2\\ {}d{A}_3\end{array}\right)\kern0.5em =\left(\begin{array}{cccccccc}-& {k}_a{A}_1& & & & & +& \mathrm{Input}(t)\\ {}& {k}_a{A}_1& -& \frac{CL}{V_2}{A}_2& -\frac{Q}{V_2}{A}_2+\frac{Q}{V_3}{A}_3& & & \\ {}& & & & \frac{Q}{V_2}{A}_2-\frac{Q}{V_3}{A}_3& & & \end{array}\right) dt+\left(\begin{array}{cc}-& {g}_1{A}_1\\ {}& {g}_1{A}_1\\ {}& 0\end{array}\right)d{W}_t $$

(53)

In model M1, the initial values are

$$ \left({A}_1(0),{A}_2(0),{A}_3(0)\right)=\left(\mathrm{Dose},0,0\right), $$

(54)

Input(t) = 0, and the output is considered to be

$$ {\boldsymbol{y}}_t=C(t)+{\boldsymbol{e}}_t,\kern2.00em {\boldsymbol{e}}_t\sim N\left(0,{\boldsymbol{\Sigma}}_t\right) $$

(55)

with

$$ {\boldsymbol{\Sigma}}_t=\left({\sigma}_1^2+{\sigma}_2^2C{(t)}^2\right). $$

(56)

The parameters k_a and CL are considered to be log-normally distributed. The covariance matrix for the random effects is

$$ \boldsymbol{\Omega} =\left(\begin{array}{cc}{\omega}_{11}^2& 0\\ {}0& {\omega}_{22}^2\end{array}\right) $$

(57)

For model M2, the dynamical equations displayed in Eq. (53) are kept the same, but the initial values are

$$ \left({A}_1(0),{A}_2(0),{A}_3(0)\right)=\left(0,0,0\right), $$

(58)

Input(t) represents the dose given at time t, and the output is

$$ {\boldsymbol{y}}_t=\left(\begin{array}{c}C(t)\\ {}R(t)\end{array}\right)+{\boldsymbol{e}}_t,\kern2.00em {\boldsymbol{e}}_t\sim N\left(0,{\boldsymbol{\Sigma}}_t\right) $$

(59)

where

$$ R(t)=100\left(1-\frac{I_{\mathrm{max}}C(t)}{{\mathrm{IC}}_{50}+C(t)}\right) $$

(60)

and

$$ {\boldsymbol{\Sigma}}_t=\left(\begin{array}{cc}{\sigma}_1^2+{\sigma}_2^2C{(t)}^2& 0\\ {}0& {\sigma}_3^2\end{array}\right). $$

(61)

The parameters k_a, CL, and IC₅₀ are considered to be log-normally distributed. The covariance matrix for the random effects is

$$ \boldsymbol{\Omega} =\left(\begin{array}{ccc}{\omega}_{11}^2& 0& 0\\ {}0& {\omega}_{22}^2& 0\\ {}0& 0& {\omega}_{33}^2\end{array}\right) $$

(62)

For model M3, the dynamical system of equation is

$$ \left(\begin{array}{c}d{A}_1\\ {}d{A}_2\\ {}d{A}_3\\ {} dR\end{array}\right)=\left(\begin{array}{c}\begin{array}{cccccccc}-& {k}_a{A}_1& & & & & +& \mathrm{Input}(t)\\ {}& {k}_a{A}_1& -& \frac{\mathrm{CL}}{V_2}{A}_2& -\frac{Q}{V_2}{A}_2+\frac{Q}{V_3}{A}_3& & & \\ {}& & & & \frac{Q}{V_2}{A}_2-\frac{Q}{V_3}{A}_3& & & \end{array}\\ {}\begin{array}{c}{k}_{\mathrm{in}}\left(1-{I}_{\mathrm{max}}\frac{A_2}{V2}/\left({\mathrm{IC}}_{50}+\frac{A_2}{V2}\right)\right)-{k}_{\mathrm{out}}R\end{array}\end{array}\right) dt+\left(\begin{array}{cc}-& {g}_1{A}_1\\ {}& {g}_1{A}_1\\ {}& 0\\ {}& 0\end{array}\right)d{W}_t, $$

(63)

the initial values are

$$ \left({A}_1(0),{A}_2(0),{A}_3(0),R(0)\right)=\left(0,0,0,\frac{k_{\mathrm{in}}}{k_{\mathrm{out}}}\right), $$

(64)

Input(t) represents the dose given at time t and the output is

$$ {\boldsymbol{y}}_t=\left(\begin{array}{c}C(t)\\ {}R(t)\end{array}\right)+{\boldsymbol{e}}_t,\kern2.00em {\boldsymbol{e}}_t\sim N\left(0,{\boldsymbol{\Sigma}}_t\right) $$

(65)

with Σ_t as given by Eq. (61). The parameters k_a, CL, and IC₅₀ are considered to be log-normally distributed. The covariance matrix for the random effects is as given by Eq. (62).

The sampling scheme was the same for all three models, relative to dosing. The PK part was sampled at hours 0.25, 0.5, 0.75, 1, 1.5, 2, 4, 6, 8, 12, 16, and 24 and the PD part was sampled at hours − 24, − 18, − 12, 0, 1, 2, 4, 12, 24, 48, 72, 96, and 124. The doses of the hypothetical drug were 5, 10, 50, 100, and 200 dose units.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Olafsdottir, H.K., Leander, J., Almquist, J. et al. Exact Gradients Improve Parameter Estimation in Nonlinear Mixed Effects Models with Stochastic Dynamics. AAPS J 20, 88 (2018). https://doi.org/10.1208/s12248-018-0232-7

Download citation

Received: 04 March 2018
Accepted: 14 May 2018
Published: 01 August 2018
DOI: https://doi.org/10.1208/s12248-018-0232-7

KEY WORDS

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Exact Gradients Improve Parameter Estimation in Nonlinear Mixed Effects Models with Stochastic Dynamics

Abstract

Similar content being viewed by others

A penalized simulated maximum likelihood method to estimate parameters for SDEs with measurement error

Parameter estimation of complex mixed models based on meta-model approach

Sandwich Variance Estimation for random effect misspecification in Generalized Linear Mixed Models

INTRODUCTION

THEORETICAL BACKGROUND

Nonlinear Mixed Effects Models Based on Stochastic Differential Equations

The Approximate Population Likelihood

The Extended Kalman Filter

PROPOSED ALGORITHM

METHODS

Initializing the EKF

Models and Data

Comparison

Timing Comparison

Precision Comparison

RESULTS

Faster Estimation

Improved Precision and Accuracy

DISCUSSION

CONCLUSION

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Derivation of EKF Sensitivities

First-Order Sensitivities

Differentiating Predicted Expected State Variables

Differentiating Predicted State Variable Covariance

Differentiating the Kalman Gain

Second-Order Sensitivities

Second Derivative of Predicted Expected State

Second Derivative of Predicted State Covariance

Second Derivative of the Kalman Gain

Model Definitions

Rights and permissions

About this article

Cite this article

Share this article

KEY WORDS

Search

Navigation