FormalPara Key Points

This study presents a novel methodology for evaluating pharmacokinetic models during their construction or selecting consistent models for therapeutic drug monitoring.

The main advantage of the method is its ability to account for the dependence between concentrations measured in the same individual.

The performances of the method have been evaluated through two simulation examples: to detect a misspecification in the structural model and to identify outliers in a set of kinetics.

1 Introduction

Population pharmacokinetic (PK) models are commonly used in the field of pharmacology. Building a model involves several steps, including an evaluation step, that is crucial in ensuring the model’s predictive performance. The criteria used to evaluate a model belong to three families.

The first family of criteria comprises graphs that rely on estimating the typical PK parameters, which in turn enable the estimation of typical concentrations. These graphs serve the purpose of assessing the suitability of the structural model and provide visual insights into the variability between individuals as well as within individuals. They offer graphic representations that aid the evaluation of the appropriateness of the model and provide information regarding inter-individual and intra-individual dispersion. The second family of criteria groups together graphs that are based on predicting the individual PK parameters. These graphs allow to check the appropriateness of the distribution assumptions, such as Individuals Weighted Residuals (IWRES) versus time or observed concentration versus individual prediction. However, due to shrinkage [1, 2], the prediction of individual PK parameters can sometimes be poor, especially when there are only a small number of concentrations per individual. While there are some corrections of shrinkage that can be considered [3], the average distance (actually the rate of convergence) to the actual value of these parameters cannot be smaller than a value inversely proportional to the number of subjects.

The third family of criteria aims to evaluate whether the concentrations used to build the model reasonably belong to the concentration distribution imposed by the model, which is obtained through simulation [4]. The gold standards of these methods are the Visual Predictive Check (VPC) and Normalised Prediction Distribution Errors (NPDE) [1]. The VPC is a graph that displays both the prediction interval provided by the model and the observed concentrations at each time point [5,6,7]. Although this tool has been corrected in several forms to improve its performance, it does not account for the dependence between concentrations assessed in the same individual.

The second method involves calculating the NPDE, which is a tool derived from Prediction Discrepancy (PD) [8]. The PD of an observation corresponds to its percentile in its own predictive distribution. However, since the concentrations observed in the ith individual are not independent, it is not possible to evaluate the distribution of PD as soon as there is more than one observation per subject in the study. As a consequence, Brendel et al., proposed decorrelating PD to make it independent, resulting in the NPDE [9, 10]. However, as noted by the authors themselves, this implies independence only when the vector of concentrations of each individual is drawn from a Gaussian distribution [11]. A basic theorem of statistics says that the unique function of independent Gaussian random variables preserving normality is the linear (more precisely affine) function. As the nonlinear mixed effects model (NLME) is a nonlinear function of Gaussian random variables, the resulting vector of concentrations cannot be Gaussian. As a consequence, as the vector of concentrations is not Gaussian, decorrelation does not imply independence.

In the end, PD and NPDE are very useful in detecting significant deviations from the model when a single concentration, or several concentrations spaced in time, are available for each individual. In all other cases, it is expected that some concentrations will be incorrectly classified as outliers, while others will be erroneously considered compatible with the studied model.

In this article, we propose a method that allows to evaluate a model accounting for the dependence between concentrations in the same individual. This method is based on a free adaptation to our context of the ideas presented in the first chapter of the book by Chacόn and Duong [12].

2 Materials and Methods

This section comprises four subsections. The first subsection presents the model and notation that will be used throughout the paper. In the second subsection, we explain the theoretical basis of the proposed method. The third subsection shows the implementation of the method in the field of pharmacology, and the fourth subsection is a simulation study that (1) illustrates the method’s application and (2) evaluates its performance in two aspects: detecting misspecifications in the structural model and identifying individuals with outlier kinetic profiles.

2.1 Model and Notation

Let us consider the following population PK model

$$\left\{ {\begin{array}{*{20}l} {\begin{array}{*{20}c} { Y_{t} } & = & {m\left( {t, \varphi , C} \right) + g\left( {t;\varphi ;C;\sigma } \right)\varepsilon_{t} { }\;{\text{where}} \;\varepsilon_{t} \sim_{{{\text{iid}}}} {\mathbb{N}}\left( {0,1} \right)} \\ \varphi & = & {h\left( {\theta ,C, \eta } \right)\; {\text{where}}\; \eta \sim {\mathbb{N}}\left( {0,{\Omega }} \right)} \\ \end{array} } \\ \end{array} } \right.$$
(1)

This model describes the evolution of the concentrations \(\left( {Y_{t} } \right)_{t}\) with time \(t\) in an individual according to some of its individual characteristics \(C\) (covariates) and known population PK parameters \(\left( {\theta , {\Omega }, \sigma } \right)\).

\(\varphi\) represents the vector of individual PK parameters.

\(m\) represents the function describing the structural model, \(g\) represents the function describing the error model and \(h\) represent the function describing the relation between the individual PK parameters \(\varphi\), the covariate \(C\), the population parameters \(\theta\) and the individual random effects \(\eta\).

Assume now that \(n_{i}\) concentrations \((Y_{i1} , \ldots ,Y_{{i n_{i} }} )\) have been observed at times \((t_{i1} , \ldots ,t_{{i n_{i} }} )\) in the ith individual of a sample of N individuals. The covariates values of all individuals \((C_{i} )\) are also assumed to be known.

2.2 The Theoretical Basis

Consider a random variable \(X \in {\mathcal{R}}^{p}\) with a probability distribution function (pdf) denoted \(f\). For technical reasons that can easily be relaxed, we assume that \(f\) is non-constant on all non-empty subsets of \({\mathcal{R}}^{p} .\) For all \(\lambda \ge 0,\) we consider the level set of all x such that f(x) is greater than \(\lambda\):

$$A_{\lambda } = \left\{ {x \in {\mathcal{R}}^{p} , f\left( x \right) \ge \lambda } \right\}$$

Notice that the sequence of sets \(\left( {A_{\lambda } } \right)_{ \lambda \ge 0 }\) is decreasing for the inclusion. In other terms, for all \(0 \le \lambda_{1} \le \lambda_{2}\), \(A_{{\lambda_{2} }} \subseteq A_{{\lambda_{1} }}\).

Let us now consider the function

$$F\left( \lambda \right) = 1 - \int {f\left( x \right)1_{{A_{\lambda } }} \left( x \right){\text{d}}x} = \int {f\left( x \right)1_{{A_{\lambda }^{c} }} \left( x \right){\text{d}}x}$$

where \(A_{\lambda }^{c} = \left\{ {x \in {\mathcal{R}}^{p} , f\left( x \right) < \lambda } \right\}\) and \(1_{{A_{\lambda }^{c} }} \left( x \right) = 1\) when \(x \in A_{\lambda }^{c}\) and \(1_{{A_{\lambda }^{c} }} \left( x \right) = 0\) otherwise.

Using these sets, we can see that for all \(\lambda \ge 0,\)

$$1 - F\left( \lambda \right) = P\left( {X \in A_{\lambda } } \right) = P\left( {f\left( X \right) \ge \lambda } \right)$$
(2)

The sets \(A_{\lambda }\) are, therefore, the prediction regions of the random vector \(X.\) By their very construction, they are even the smallest (with respect to their volume) prediction regions of \(X\) that can be built. This idea for constructing small prediction regions has already been proposed for multidimensional discrete random variables [13].

Now, let us inverse the process of Eq. (2), i.e., we fix \(X\) and look for the \(\lambda\) verifying \(X \in A_{\lambda } .\) We thus consider the random variable

$$\lambda \left( X \right) = {\text{sup }}\left\{ {\lambda \ge 0 \;{\text{so }}\;{\text{that}}\;{ }f\left( X \right) \ge \lambda } \right\}$$
(3)

For any \(\lambda_{1} \ge 0,\) \(P\left( {\lambda \left( X \right) > \lambda_{1} } \right) = P\left( {f\left( X \right) > \lambda_{1} } \right) = 1 - F\left( {\lambda_{1} } \right).\) From this last equality we derive that the cumulative distribution function of the univariate random variable \(\lambda \left( X \right)\) is \(F.\) This allows to write the following property:

Let X be a multidimensional random variable, \(F\left( \lambda \right)\) and \(\lambda \left( X \right)\) be respectively defined as in Eqs. (2) and (3), then \(P\left( X \right) \triangleq F\left( {\lambda \left( X \right)} \right)\) is distributed according to a uniform distribution on ]0,1[.

A noticeable property of \(P\left( X \right) = F\left( {\lambda \left( X \right)} \right)\) is that its distribution does not depend on the dimension of \(X\).

2.3 Application to Population Pharmacokinetic Model

This theoretical property can be applied to PK models. Returning to our notations, we consider N vectors, \(y_{1} , \ldots ,y_{N}\), containing, respectively, the concentrations observed in N independent individuals. An unknown model, denoted \(M_{0}\), is at the origin of the observed concentrations.

We want to test whether the observed concentrations can be considered to come from the population PK model described by Eq. (1). Because this model depends on the unknown parameter \(\tau = \left( {\theta ,{\Omega },\sigma } \right)\) we denote this model \(M_{\tau } .\) Let us denote \(\hat{\tau }\) the estimation of \(\tau\) obtained from the observed concentrations \(y_{1} , \ldots ,y_{N} .\) Conditionally on \(\hat{\tau }\), i.e. assuming that the actual value of \(\tau\) is \(\hat{\tau }\), our initial question, “is the model \(M_{\tau }\) valid?” translates into “is \(M_{{\hat{\tau }}}\) equal to \(M_{0}\)?”.

To answer this question, we first have to compute the probability \(P\left( {y_{i} } \right)\) of observing the concentration vector \(y_{i}\) for each individual.

For the ith individual, using the population model given by the Eq. (1), we simulate vectors of concentrations (\(Y_{{it_{1} }} , \ldots ,Y_{{it_{n} }} )\) at the same time points at which the individual was sampled. We set \(f\) to be the pdf of the joint distribution of the concentration vector.

We need to compute:

  1. 1.

    the function \(F\left( \lambda \right) = P\left( {f\left( {Y_{{it_{1} }} , \ldots ,Y_{{it_{n} }} } \right) \le \lambda } \right).\)

  2. 2.

    \(\lambda \left( {y_{i} } \right)\), where \(y_{i} = \left( {y_{{i1}} , \ldots ,y_{{in}} } \right)\) is the vector containing the concentrations observed in the ith individual; \(y_{i}\) should not be confused with the theoretical concentrations (\(Y_{{it_{1} }} , \ldots ,Y_{{it_{n} }} )\) simulated with the model (Eq. 1). A large value of \(\lambda \left( {y_{i} } \right)\) indicates that the observed kinetic profile in the individual is in the “heart” of the distribution of the kinetic profiles given by the model. On the contrary, a low value of \(\lambda \left( {y_{i} } \right)\) indicates that the observed kinetic profile is far from the one expected by the model.

  3. 3.

    the multivariate exact discrepancy (MED) is defined as \(P\left( {y_{i} } \right) = F\left( {\lambda \left( {y_{i} } \right){ }} \right)\) that, when multiplied by 100, gives the percentage of kinetic profiles given by the model that are more “extreme” than the observed profile. For example, if \(F\left( {\lambda \left( {y_{i} } \right){ }} \right) = 1 \times 10^{ - 6}\), it means that only one individual in a million has a kinetic profile from the model that is more extreme than the one observed in our individual. In other words, the model is inadequate for describing the kinetic of the individual.

Because \(f\) is the pdf of a marginal distribution, it is quite difficult to obtain directly through computation. Instead, we propose to approximate \(f\) through simulations as follows:

  1. (a)

    Simulate K vectors \(\left( {Y_{{it_{1} }}^{\left( k \right)} , \ldots ,Y_{{it_{n} }}^{\left( k \right)} } \right)_{k = 1, \ldots ,K}\) using model Eq. (1).

  2. (b)

    Compute an approximation \(\tilde{f}\) of \(f\) using a kernel estimator based on the K vectors simulated at step (a). To approximate \(\tilde{f}\), we chose to use the function kde available in the package ks in R.

  3. (c)

    For a given \(\lambda\), an approximation \(\tilde{F}\left( \lambda \right)\) of \(F\left( \lambda \right)\) is given by the percentage of \(\tilde{f}\left( {Y_{{it_{1} }}^{\left( k \right)} , \ldots ,Y_{{it_{{n_{{}} }} }}^{\left( k \right)} } \right)\) smaller than \(\lambda\); in other terms,

    $$\tilde{F}\left( \lambda \right) = \frac{1}{K}\mathop \sum \limits_{k = 1}^{K} 1_{{\left[ {{ }\tilde{f}\left( {Y_{{it_{1} }}^{\left( k \right)} , \ldots ,Y_{{it_{n} }}^{\left( k \right)} } \right) \le \lambda } \right]}}$$
    (4)

The quality of the approximation of \(f\) and \(F\left(\lambda \right)\) depends solely on the number of simulated vectors K. The greater the number of concentrations per individual, the greater K. We have no guidelines for choosing K at present.

A summary of the MED calculation for one patient is shown in Fig. 1. To make it easier to understand, the Fig. 1 shows the MED calculation when only one concentration is available for one patient.

Fig. 1
figure 1

Schematic representation of the multivariate exact discrepancy (MED) computation in one dimension. The a (on the left) represents the different notation used to compute the MED. The x-axis represents the concentrations that can be observed with the tested model at a given time with fixed values of covariates. The y-axis represents the probability density function f(x) of these concentrations. The red crosses represented on the x-axis are concentrations simulated with the model. A large value of f(x) indicates a large density of (simulated) concentrations. For a given value of \(\lambda\), we can compute \({A}_{\lambda }\), the set of concentrations whose density is higher than \(\lambda\). From \({A}_{\lambda }\), we can calculate \(1-F\left(\lambda \right)=P\left(X\in {A}_{\lambda }\right)\), that can be seen as the percentage of simulated concentrations with a density higher than \(\lambda\), and \(F\left(\lambda \right)\), the percentage of simulated concentrations with a density lower than lambda. Now that these sets have been defined, let’s see how to use them to define MED. The figure b (on the right) illustrates how to compute the MED for a patient for whom we have measured one concentration (\({C}_{\mathrm{obs}}\)) represented on the x-axis. The y-axis still represents the pdf f(x). We can calculate the set \({{A}_{\lambda }}_{\mathrm{Cobs}}\) for which \({C}_{\mathrm{obs}}\) is at the boundary and the corresponding \(\lambda\) that will be denoted \({\lambda }_{{C}_{\mathrm{obs}}}\). The MED are defined as the probability that a concentration does not belong to \({A}_{{\lambda }_{\mathrm{Cobs}}}\) that can be seen as the percentage of simulated concentrations standing in a region where \(f\left(x\right)\le {\lambda }_{{C}_{\mathrm{obs}}}\). If the MED is large, the observed concentration stands in a region of high density and the concentration is consistent with the model. On the contrary, if the MED is low, we have a low probability of observing the concentration with the model tested

If we repeat the proposed method to each individual, we obtain one probability per individual, \(P\left( {y_{1} } \right), \ldots ,P\left( {y_{N} } \right)\).

If we return to the question of whether or not \(M_{{\hat{\tau }}}\) describes the data well, we can state that if \(M_{{\hat{\tau }}}\) = \({\text{M}}_{0}\), the \(P\left( {y_{i} } \right)\) values should be distributed according to a uniform distribution on [0, 1]. Therefore, if the distribution of the \(P\left( {y_{i} } \right)\) values is not uniform, \(M_{{\hat{\tau }}}\)\({\text{M}}_{0}\) and the model \(M_{{\hat{\tau }}}\) is invalidated.

It remains to be checked whether the distribution of the \(P\left( y \right)\)’s is uniform on [0, 1]. Any test that compares a sample distribution to a reference distribution (uniform on ]0; 1[) can be used. In this context, tests such as the Kolmogorov-Smirnov test, Anderson Darling test, chi-square tests are appropriate; each having its own advantages and drawbacks. In the remainder of this article, we will use the Kolmogorov-Smirnov test without discussing this choice.

To summarise the interpretation of MED, we know that MED coming from the true model are expected to be drawn from a uniform distribution over [0, 1]. There is thus a small probability (< alpha = 5%) to observe a MED less than 5% coming from the true model. In this respect, MED can be interpreted as a p-value.

The great advantage of this method is its ability to identify outliers. Indeed, an individual’s probability determines whether or not the individual’s kinetic belongs to the model. To do that, a probability threshold must be defined: if an individual’s probability falls below this threshold, then the individual is identified as an outlier. When an individual is considered as an outlier, their entire PK profile is rejected, not just certain concentrations. Detection of outliers can be useful both in building a model and in performing therapeutic drug monitoring.

Usually in statistics, an outlier is defined as a value that has a small probability to come from the studied model. Similarly, we will classify an individual as an outlier if their kinetic profile has a small probability of being generated by the population PKs under investigation. The search of outlier kinetics is quite distinct from the search of outlier concentrations, which involves comparing each concentration to a chosen model without considering the other concentrations obtained in the same individual.

Consider N vectors, \(y_{1} , \ldots ,y_{N}\), containing the concentrations observed in N independent individuals. These vectors do not necessarily have the same size because the number of concentrations per individual may vary. Additionally, these concentrations may not have been obtained at the same times post-administration. Our goal is to identify individuals with a kinetic profile that can be considered as outliers with respect to the model described by Eq. (1). This model contains estimation of population parameters, \(\hat{\tau }\), obtained from the observed concentrations \(y_{1} , \ldots ,y_{N} .\) We assume that \(\hat{\tau }\) is the true value of \(\tau\) and is known without any imprecision. For all individuals, we can compute \(P\left( {y_{1} } \right), \ldots ,P\left( {y_{N} } \right)\) using the method described above. When N is large, we can expect to observe very low and very large values of the \(P\left( {y_{i} } \right)\) even if none of the kinetics are outliers. The threshold value below which the \(P\left( {y_{i} } \right)\)’s can be considered abnormal can be derived by their minimum.

Let us consider \(LL_{\alpha }\) the limit defined by

$$P\left( {\min \left\{ {P\left( {y_{1} } \right), \ldots ,P\left( {y_{N} } \right)} \right\} \le LL_{\alpha } } \right) = \alpha$$
(5a)

The meaning of this limit is the following: only 100 \(\times \alpha\)% of the samples of N individuals should exhibit a kinetics (\(y_{i} )\) coming from the model (Eq. 1) with \(P\left( {y_{i} } \right) \le LL_{\alpha }\).

In other terms, if \(\alpha\) is chosen small enough (for example, \(\alpha = 0.05\)), a kinetic profile \(y_{i}\) with \(P\left( {y_{i} } \right) \le LL_{\alpha }\) should be considered as outlier.

From the property, we know that when \(y_{1} , \ldots ,y_{N}\) are drawn in the model described by (Eq. 1) then \(P\left( {y_{i} } \right)\sim_{iid} U\left( {\left[ {0,1} \right]} \right)\). It follows that

$$LL_{\alpha } = 1 - \left( {1 - \alpha } \right)^{1/N}$$
(5b)

2.4 Simulation Study

The objective of the simulation section is (1) to illustrate how the method works using a very simple example and (2) to evaluate the performance of the method in detecting a problem in the model through two examples.

All the population PK models used in this article have arbitrarily chosen population parameters and variance. They were chosen to ensure a good illustration of the method.

2.4.1 Illustrative Example

This simplified example demonstrates the practical application of the method within the context of TDM (Therapeutic Drug Monitoring). In this scenario, we have measured two concentrations in a patient and our objective is to identify compatible models.

Assuming a dose D = 100 was administered intravenously to an individual at time 0. We observed two concentrations \(\left( {y_{1} , y_{2} } \right) = \left( {1.22, 0.011} \right)\) in this individual at times \(t_{1} = 39\) and \(t_{2} = 99\), respectively.

We would like to determine if the following published model can well describe the individual’s kinetics.

$$\left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} { Y_{t} } & = & {\frac{D}{V}e^{{ - \frac{cl}{V}t}} \left( {1 + 0.1\varepsilon_{t} } \right){ }\;{\text{where}} \;\varepsilon_{t} \sim_{{{\text{iid}}}} {\mathbb{N}}\left( {0,1} \right)} \\ {\left( {\begin{array}{*{20}c} {cl} \\ V \\ \end{array} } \right)} & = & {\left( {\begin{array}{*{20}c} {0.1{\text{exp}}\left( {\eta_{cl} } \right)} \\ {1{\text{exp}}\left( {\eta_{V} } \right)} \\ \end{array} } \right), \;{\text{where}}\;\left( {\begin{array}{*{20}c} {\eta_{cl} } \\ {\eta_{V} } \\ \end{array} } \right)\sim {\mathbb{N}}\left( {0,\left( {\begin{array}{*{20}c} {0.2^{2} } & 0 \\ 0 & {0.1^{2} } \\ \end{array} } \right)} \right)} \\ \end{array} } \\ \end{array} } \right.$$
(6)

The model described in Eq. (6) has the same structure as the one of Eq. (1) with \({\uptheta } = \left( {\begin{array}{*{20}c} {cl} \\ V \\ \end{array} } \right)\), there is no covariate so no need to write a matrix C containing their values, \(m\left( {t, \varphi } \right) = \frac{D}{V}e^{{ - \frac{cl}{V}t}}\), \(g\left( {t, \varphi ,\sigma } \right) = 0.1 \times \frac{D}{V}e^{{ - \frac{cl}{V}t}}\), \(h\left( {\theta , \eta } \right) = \left( {\begin{array}{*{20}c} {0.1{\text{exp}}\left( {\eta_{cl} } \right)} \\ {{\text{exp}}\left( {\eta_{V} } \right)} \\ \end{array} } \right)\), \({\Omega } = \left( {\begin{array}{*{20}c} {0.2^{2} } & 0 \\ 0 & {0.1^{2} } \\ \end{array} } \right).\)

We simulated K = 100,000 vectors \((Y_{39} , Y_{99} )\) using the model Eq. (6).

The kernel density estimator was constructed using simulations with the ks library in R.

The individual’s probability was calculated using Eq. (4). For comparison, we analysed the results provided by VPC and MED graphically.

2.4.2 Evaluation of a Population Pharmacokinetic Model

This first example evaluated the method’s ability to detect a misspecification in the structural model. In some published population PK studies, the PK model used to describe individual kinetics is a one compartmental model. These studies can exhibit important residual error. For instance, models with a combined or proportional residual error model that show a coefficient of variation of the residual error much greater than the coefficient of variation of the analytical method used.

The following simulations evaluate the performance of the proposed method in detecting a model that is too simplistic for analysing data. The data are assumed to come from a bi-compartmental model (described by Eq. 7 below) but are analysed using a one-compartmental model (Eq. 8).

The following 2-compartment model was used to generate L = 200 sets of N = 200 kinetics, each of these including 2 time points.

$$\left\{ {\begin{array}{*{20}l} {y_{ij}^{\left( l \right)} = \left( {A_{i}^{\left( l \right)} e^{{ - \alpha_{i}^{\left( l \right)} t_{ij}^{\left( l \right)} }} + B_{i}^{\left( l \right)} e^{{ - \beta_{i}^{\left( l \right)} t_{ij}^{\left( l \right)} }} } \right) \times \left( {1 + 0.2\varepsilon_{ij}^{\left( l \right)} } \right){ }\;{\text{where}}\;{ }\varepsilon_{ij}^{\left( l \right)} \sim_{{{\text{iid}}}} {\mathbb{N}}\left( {0,1} \right)} \hfill \\ {\log \left( {A_{i}^{\left( l \right)} ,\alpha_{i}^{\left( l \right)} ,B_{i}^{\left( l \right)} ,\beta_{i}^{\left( l \right)} } \right)\sim_{{{\text{iid}}}} {\mathbb{N}}\left( {\left( {\ln \left( {100} \right),\ln \left( {0.1} \right),\ln \left( {10} \right),\ln \left( \beta \right)} \right),0.1^{2} I_{4} } \right),\,\,l = 1, \ldots ,L,\;i = 1, \ldots ,n,\;j = 1,2} \hfill \\ \end{array} } \right.$$
(7)

Where iid means independent and identically distributed and I4 refers to a 4 × 4 identity matrix

For each value of \(\beta\) within the interval [0.01, 0.1] with an increment of 0.01, 200 sets of 200 simulations were performed. The kinetic profiles resulting of these simulations are shown in Fig. 2.

Fig. 2
figure 2

This figure represents the 5%, 50% (median) and 95% percentiles of concentrations (on a logarithmic scale) simulated with a 2-compartment model (Eq. 7) for several values of \(\beta\) (slow or terminal decay slope) ranging from the 0.01 to 0.1. The curves obtained with \(\beta =0.1\) corresponds to a one-compartment model (straight line). The curves obtained from \(\beta =0.06\) to \(\beta =0.09\) are almost indistinguishable from the one obtained with \(\beta =0.1\)

For all simulated individuals, the first sampling time \({t}_{i1}^{(l)}\) was drawn from a uniform distribution on [0, 125] and the second sampling time was set to \({t}_{i1}^{(l)}+125.\) This way of choosing the sampling times ensures that the two time points in a kinetic profile are sufficiently spaced. Next, for each value of \(\beta\), the closest one compartment model was looked for by analysing several datasets generated with Eq. (7) using the following model:.

$$\left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}l} {Y_{ij} = A_{i} e^{{ - \alpha_{i} u_{ij} }} \times \left( {1 + \sigma \varepsilon_{ij} } \right){ }\;{\text{where }}\;\varepsilon_{ij} \sim_{{{\text{iid}}}} {\mathbb{N}}\left( {0,1} \right)} \hfill \\ \hfill \\ {\log \left( {A_{i} ,\alpha_{i} } \right)\sim_{{{\text{iid}}}} {\mathbb{N}}\left( {\left( {\ln \left( {\theta_{A} } \right),\ln \left( {\theta_{\alpha } } \right)} \right),\left( {\begin{array}{*{20}c} {\omega_{A}^{2} } & 0 \\ 0 & {\omega_{\alpha }^{2} } \\ \end{array} } \right)} \right)} \hfill \\ \end{array} } \\ \end{array} } \right.$$
(8)

We built the models Eq. (8) using Monolix. To define the best one-compartmental models (i.e., the models closest to model Eq. 7), we used 1500 individuals with 10 observations per individual. The times \(\left( {u_{ij} } \right)\) for all individuals were uniformly selected from the time interval [0, 250].

The model described by Eq. (8) was used to simulate K = 10,000 vectors at each time point \(t_{ij}^{\left( l \right)}\) to compute \(P_{i}^{\left( l \right)} = \tilde{F}_{i}^{\left( l \right)} \left( {\left( {y_{i1}^{\left( l \right)} ,y_{i2}^{\left( l \right)} } \right)} \right)\) as described by Eq. (4). For each set (each \(l = 1, \ldots ,L\)), the distribution of the \(\left( {P_{i}^{\left( l \right)} } \right)_{i = 1,..,n }\) was compared to a uniform distribution over [0, 1] using a Kolmogorov-Smirnov test.

The 200 sets built with \(\beta\) = 0.1 (i.e., the true model) allows us to evaluate the type I error, as the percentage of the 200 sets where the hypothesis of uniformity distribution of the \(\left( {P_{i}^{\left( l \right)} } \right)_{i = 1,..,n }\) was rejected. Similarly, the power of detecting the wrong model (beta different from 0.1) was evaluated by the percentage of p-values given by the Kolmogorov-Smirnov test that were less than 0.05, indicating rejection of the uniformity hypothesis made on the distribution of the \(\left( {P_{i}^{\left( l \right)} } \right)_{i = 1,..,n }\).

We chose to compare the results obtained with MED and NPDE as a reference method. However, as we were only interested in comparing the power of the tests, we used the Prediction Distribution Error (PDE), a non-standardized variable, instead of NPDE. PDE were constructed in the same way as MED (10,000 concentration vectors simulated for each individual) and tested for uniformity using a Kolmogorov-Smirnov test. As for MED, we evaluated the type I error and the power of PDE.

The power tests obtained with our method and with PDE were compared graphically.

2.4.3 Detection of Individuals with Outlier Kinetics

The second example evaluates the performance of the method to detect outlier’s kinetic of individuals. The aim of this example is to evaluate the ability of the method to reject a model based on the percentage of outliers.

Let us consider the following population PK model

$$\left\{ {\begin{array}{*{20}l} {Y_{ij} \; = \;\frac{{100Ka_{i} }}{{V_{i} \left( {Ka_{i} - Cl_{i} /V_{i} } \right)}}\left( {e^{{ - \frac{{Cl_{i} }}{{V_{i} }}t_{j} }} - e^{{ - Ka_{i} t_{j} }} } \right) \times \left( {1 + 0.1 \times \varepsilon_{ij} } \right)\;\;{\text{ where}} \;\;\varepsilon_{ij} \sim_{{{\text{iid}}}} {\mathbb{N}}\left( {0,1} \right)} \hfill \\ {\left( {\begin{array}{*{20}c} {Ka_{i} } \\ {V_{i} } \\ {Cl_{i} } \\ \end{array} } \right)\; = \;\left( {\begin{array}{*{20}c} {{\text{exp}}\left( {\eta_{i}^{Ka} } \right)} \\ {10{\text{ exp}}\left( {\eta_{i}^{V} } \right)} \\ {\theta_{Cl} {\text{exp}}\left( {\eta_{i}^{Cl} } \right)} \\ \end{array} } \right),\;\; {\text{where}}\left( {\begin{array}{*{20}c} {\eta_{i}^{Ka} } \\ {\eta_{i}^{V} } \\ {\eta_{i}^{Cl} } \\ \end{array} } \right)\sim_{{{\text{iid}}}} {\mathbb{N}}\left( {0,\left( {\begin{array}{*{20}c} {0.1^{2} } & {} & {} \\ 0 & {0.1^{2} } & {} \\ 0 & 0 & {0.1^{2} } \\ \end{array} } \right)} \right)} \hfill \\ \end{array} } \right.$$
(9)

This model, with \(\theta_{{{\text{cl}}}} = 2\) L/h, serves as the reference model. Therefore, kinetics simulated with this model (\(\theta_{{{\text{cl}}}} = 2\) L/h) were used to compute \(\tilde{f}\) as previously described.

Next, a sequence of values of \(\theta_{cl} \in \left[ {1;3} \right]\) by step of 0.1 was used to generate outlier kinetics. Three sets of arbitrarily fixed observations times were used. The first set of observation times contained 2 times: \(t_{1} = 3\) and \(t_{2} = 33\); the second set contained 3 times: \(t_{1} = 3\), \(t_{2} = 16\) and \(t_{3} = 33\) and finally, the third set contained 4 times: \(t_{1} = 3\), \(t_{2} = 10\), \(t_{3} = 20\), and \(t_{4} = 33.\)

For each simulated outlier kinetic \(\left( y \right)\), we computed the corresponding \(P\left( y \right)\). The kinetics \(\left( y \right)\) for which \(P\left( y \right) \le LL_{0.05}\) were identified as outliers. We computed the percentages of detected outlier kinetics (also called power) for N = 100, N = 200 and N = 10,000.

As there are no other methods available to detect outliers, we did not compare our tool with any other method.

3 Results

3.1 Illustrative Example

The plot of percentiles provided by the model described by Eq. (6) are presented in Fig. 3. We have plotted the two observed concentrations in red on the plot. It is evident that this plot suggests these two concentrations to plausibly come from the model described in Eq. (6).

Fig. 3
figure 3

The blue lines represent the 5%, 50% (median) and 95% percentiles of concentrations (in logarithmic scale) simulated with a 1-compartment model (Eq. 6). The two red points represent the observed concentrations in a patient and the black points the variability expected around these points. As both points fall within the 90% prediction interval, the usual analysis of this figure is misleading. Indeed, both observed concentrations belong to the prediction interval, which suggests that the two concentrations observed on the patient are consistent with the model. However, taken as a whole the possible kinetic profiles are not compatible with the 1-comparmental model described in the Eq. (6)

However, when looking at the shape of the kinetics (in a logarithmic scale), all the log-concentrations of a kinetics are grouped around a straight line. The plot of percentiles roughly show the limits of the straight lines that can be encountered. It is also clear that the “beginning” of the kinetics represented by a dot line cannot belong to the 90% prediction interval.

Figure 4 represents the value of \({\text{ln}}(Y_{99} )\) as a function of \({\text{ln}}(Y_{39} )\), with different coloured areas representing the sets \(A_{\lambda }\) for various values of \(\lambda .\) The darker coloured areas were obtained with the largest values of \(\lambda\) and they correspond to the area with the highest density of the simulated points. Similarly, the areas drawn with light colours correspond to low density of simulated points. The red point \(\left( {\ln 1.22,\ln 0.011} \right)\) represents the observed concentrations in the individual. The value of \(\lambda\) from which the level set of the pdf contains \(\left( {\ln 1.22,\ln 0.011} \right)\) is close to zero. More precisely \(\lambda \left( {\ln 1.22,\ln 0.011} \right) \simeq 7 \times 10^{ - 10} .\) Finally, \(\tilde{F}\left( {\lambda \left( {\ln 1.22,\ln 0.011} \right)} \right) = 0\) indicating that none of the simulated vectors given by Eq. (6) have a lower pdf value, showing that \(\tilde{F}\left( {\left( {\ln 1.22,\ln 0.011} \right){ }} \right) \le 1/100,000\).

Fig. 4
figure 4

The x and y axes of this figure represent, respectively, the concentrations (in log-scale) simulated with the 1-compartment model (Eq. 6) at times 39 and 99 (C_39 and C_99). The shapes represented (close to ellipses) are the sets \({A}_{\lambda }\) for various values of \(\lambda\). The sets shown in red were obtained with the largest values of λ and correspond to the area where most of the concentrations simulated with a 1-compartment model (Eq. 6) are found (in logarithmic scale). On the contrary, the sets represented in light yellow correspond to the area where only few concentrations from the 1-compartment model (Eq. 6) can be observed. The area in white (the rest of the picture) corresponds to the area where none of the 100,000 concentrations simulated with the 1-compartment model (Eq. 6) were observed. In this white area, lambda is close to 0. Finally, the point in red correspond to the two concentrations observed in the individual. These two concentrations fall in a white area indicating that the 1-compartment model (Eq. 6) cannot be used to describe the individual’s kinetics

In conclusion, there is a probability \(\le 10^{ - 5}\) that these concentrations come from the model Eq. (6).

3.2 Evaluation of a Population Pharmacokinetic Model

In Table 1, we present the estimated parameter values obtained by fitting a one-compartment model to the two-compartment model simulations according to the value of \(\beta\) indicated in the first row. As the kinetics become more bi-compartmental (decreasing beta), the one-compartment model becomes less suitable for the data, resulting in an increase in residual variability.

Table 1 Mono-compartmental population pharmacokinetic model based on \(\beta\)

Figure 5 shows the type I error rate when \(\beta\) = 0.1 (i.e., one-compartment model). The power of detecting the wrong model by our method and by PDE is presented for different values of \(\beta\).

Fig. 5
figure 5

This figure represents the probability of detecting that a 2-points kinetics simulated with the 2-compartment model (Eq. 7) does not come from the 1-compartment model (Eq. 8) for 2 different methods: the method proposed in this paper and the Prediction Distribution Error (PDE) (reference method). Several values of \(\beta\) (beta) were used to simulate the 2-points kinetics

With our method, the kinetics simulated with a value of \(\beta\) less than 0.05 are always identified as being simulated with a model that is not a one-compartment model: the power is 100%. This probability decreases when beta get closer to 0.1, which corresponds to a one-compartment model. The advantage of the proposed method is that it allows distinguishing a two-compartment model from a one-compartment model, even for \(\beta\) = 0.06 or 0.07, while visual examination of the data cannot achieve this (as shown in Fig. 2). The “power” obtained with \(\beta\) = 0.1 corresponds to the level of the statistical test: 0.05.

With the PDE method, only the kinetics simulated with a value of \(\beta\) equal to 0.01 are always identified as being simulated with a model that is not a one-compartment model. For the value of \(\beta\) between 0.07 and 0.09, the PDE method always fails to distinguish a two-compartment model from a one-compartment model.

3.3 Detecting |Individuals with Outlier PK

The detection of outliers is presented with various numbers of available subjects, denoted N. Increasing N allows for the evaluation of the method’s power to detect outliers based on the number of available data. When N = 10,000, the number of detected outliers converges to its true value. Figure 6 presents the plot of percentiles of the true model (\({\theta }_{\mathrm{cl}}\) equal to 2 L/h) along with examples of outlier kinetic profiles simulated with \({\theta }_{cl}\) of 1 L/h (in red), 2 L/h (in grey) and 3 L/h (in purple).

Fig. 6
figure 6

This figure displays the 5th, 50th (median) and 95th percentiles of concentrations (in log-scale) simulated with a 1-compartment model and an extravascular administration (Eq. 9) for the reference clearance (2 L/h), represented in black. Examples of kinetic profiles obtained with different \({\theta }_{\mathrm{cl}}\) are superimposed. Specifically, profiles with a \({\theta }_{\mathrm{cl}}\) equal to 1 L/h, 2 L/h and 3 L/h are, respectively, presented in purple, grey and red

The results obtained with 2 and 3 observations are presented in Fig. 7. The curves obtained with four observation times were found to be superimposable with the one obtained using only 3 observation times.

Fig. 7
figure 7

The left (resp. right) figure shows the percentage of 2-point (resp. 3-point) kinetic profiles simulated with the 1-compartment model and an extravascular administration (Eq. 9) with \({\theta }_{\mathrm{cl}}\) ranging from 1 to 3 L/h and identified as coming from a 1-compartment model and an extravascular administration (Eq. 9) with \({\theta }_{\mathrm{cl}}\ne 2.0\) L/h. The decision rule used to declare a kinetic profile as not compatible with \({\theta }_{\mathrm{cl}}=2.0\) L/h is described by Eqs. (5a) and (5b)

The decision rule used to identify a kinetic profile as an outlier was designed to identify the most extreme (i.e., with the smallest P(y)) observable kinetics among N kinetics. Therefore, this decision rule depends on the number of kinetics among which we search to identify outliers. Figure 7 shows that as the value of \({\theta }_{cl}\) deviates further from 2 L/h, it becomes easier for the proposed method and decision rule to identify kinetics simulated with \({\theta }_{\mathrm{cl}}\) different from 2 and analysed with \({\theta }_{\mathrm{cl}}\) = 2. On the other hand, simulated kinetics with \({\theta }_{\mathrm{cl}}\) between 1.5 and 2.5 are not identified as outliers, regardless of the number of observation times used.

4 Discussion

In this article, we propose a new tool for evaluating population PK models. Our method enables the computation of a probability for each individual that their concentration profile comes from the model. The implementation of this method is done in 3 steps: (a) simulating kinetics using the tested model; (b) calculating the pdf with the simulated kinetics; and (c) calculating the multivariate exact discrepancy for each observed kinetics (P(y)) with the pdf. The proposed method can be used regardless of individual sampling times and their covariate values.

We demonstrated the performance of our method through two simulation examples. The first example evaluated its ability to detect a misspecification in the structural model. Our method showed good statistical properties and outperformed PDE in detecting the incorrect structural model. One possible explanation is that our method fully exploits the non-independence between the concentrations measured in an individual.

The second example evaluates the ability of the method to detect outliers kinetic. This novel application is highly interesting since there is currently no other method to identify outliers. The rationale behind this application is that, during the model building process, if too many observed kinetics are not compatible with the model, the model should be changed. However, we must recognise that the identification of outlier kinetics based on the law of extremes (the minimum) did not give the expected results. Indeed, we were unable to identify as outliers the simulated kinetics whose mean clearance values were close to the reference clearance value. Further work on this subject, including exploring other decision rules, is necessary to enable better identification of kinetics that deviate significantly from the model.

Another direct application of this method is to use it for selecting a population PK model in the context of therapeutic drug monitoring (TDM). Currently, in hospital practice, model evaluation remains difficult due to the lack of guidelines [14]. However, selecting the correct model is a crucial step to ensure that the predictions made are accurate and that the resulting dosage regimen is appropriate. The main difficulty is due to the lack of an appropriate tool, i.e., a method to select a model at the individual level [15]. The proposed method shows great promise for this application as it provides one probability per individual that their kinetics belong to a model. This probability could therefore be used to select a model consistent with the observed concentrations at an individual level rather than a population level. It can thus be used to flag individuals with atypical kinetics, and this information can be used to indicate uncertainty in dose recommendations, as well as to request additional blood samples to better characterise its PK profile. But, as the method evaluates an entire kinetic profile, an erroneous observation (due to a wrong dosage or sampling time) could lead to an individual being wrongly identified as an outlier. This is a limitation of this version of the method.

A much more detailed simulation analysis should be carried out to confirm (or refute) these preliminary findings. We still need to test the method’s ability to detect various other specification errors encountered in the model-building process (e.g., evaluation of a covariate model…) as well as its ability to perform external validation for TDM purposes.

Moreover, while the results obtained seem encouraging, it is important to note that, like all other model evaluation methods, we reason conditionally based on the population parameters. In other words, the imprecision of the population parameters is not accounted for. This is a limit of the method that we do not know how to overcome. This limitation requires us to work with the best available estimators of the population parameters, i.e., with the maximum likelihood.

Finally, there is still important work to be done on the estimation of the pdf \(\tilde{f}\).

When there are few observations per individual (less than 4), the ks library in R can quickly calculate the MED. However, when the number of observations per individual is greater than 5, ks takes a long time to estimate \(f\). This is probably because ks tries to estimate the optimal bandwidth (smoothing parameter), which can be a time-consuming process. It is likely, but still needs to be verified, that crude (non-optimal) bandwidths would allow model evaluation and outlier search questions to be answered efficiently while also being computed quickly.

In conclusion, the method proposed in this article shows great promise for both model evaluation and selection of model to perform TDM. However, it needs to be tested on a large scale in order to fully assess its effective power, especially in the model building process.