We propose a Bayesian hierarchical mixture PK model to accommodate the artifactual outliers observed in the motivating PK data. Bayesian hierarchical models are well suited to population PK models [28, 41–43] with one more level defining priors. We will use the same model for inter-individual variation as in the conventional population PK analysis. The key component of our new model is the model for the intra-individual variation at the first stage. Specifically, we propose a finite mixture model as the residual error model to accommodate the outliers. Such a mixture model assumes that the contaminants are generated from a population (high concentration of the infused drug) different from the main population (valid plasma drug concentrations).
At the first stage, we specify the following nonlinear regression model for y
ij
with a mixture error distribution as follows:
$$ \begin{aligned} y_{ij} &= f(\varvec{\theta}_{\user2{i}}, x_{ij}) + e_{ij} \\ e_{ij} | \varvec{\psi}_{\user2{q}} &\sim \sum_{q=1}^Q w_{q}d_{q}(\varvec{\psi}_{\bf q}), \end{aligned} $$
where d
q
is a distribution with parameters \(\varvec{\psi}_{\user2{q}}\) and \({\user2{w}} = (w_{1},{\ldots}, w_{Q})\) is an unknown vector of mixing weights. We assume that the number of components Q is fixed. We have found that setting Q = 3 appears to be adequate to accommodate the outliers in our motivating data.
We performed a residual analysis that could be helpful in choosing the number of components in the mixture model and candidate distributions. The distribution of residuals obtained from the conventional population PK analysis is overlaid by a normal density curve (blue dashed line) in Fig. 3. The figure clearly shows that the residual error distribution is highly skewed in both tails, more skewed to the right (due to a high proportion of large values of outliers), suggesting that a three-component mixture would be a good choice. A three-component mixture described below is shown on the right panel (black solid line), and the components are separately presented on the left panel with a fixed set of parameters shown in the figure. This figure suggests that each component could appropriately model the heavy-tailed residuals in both tails while still adequately modeling the bulk of the data. That is, the three-component mixture appears to fit the whole residuals reasonably well. Although not necessary for a similar type of data, this empirical approach would be especially useful when a dataset does not provide enough information for a certain parameter in the model, for example, due to a limited population size, so that we need to fix some parameters.
For the DEX data, we consider the following: d
1 is a normal (N) with mean 0 and variance τ1, N(0, τ1), d
2 is the convolution of a N(0,τ2) and a Gamma (Ga) with shape a and rate parameter b, and d
3 is the convolution of a N(0,τ3) and the distribution of the negative of a gamma random variable (henceforth referred to as the negative gamma, NGa) with shape c and rate d. To elaborate, if X∼ Ga(c, d) then −X ∼NGa(c, d) and our residual error distribution is e
ij
∼ w
1N(0, τ1) + w
2 N(0, τ2)*Ga(a, b) + w
3 N(0, τ3)*NGa(c, d), where an * denotes distributional convolution. Instead of Ga and NGa, a lognormal, and the distribution of the negative of a lognormal random variable, may be used as alternative distributions when the distribution of artifactual concentrations would be expected to be less skewed.
The rationale behind this distribution is as follows. We presume that w
1 is large; hence most of the data are assumed to have normally-distributed errors (e.g. the normal density represented by blue dashed line in Fig. 3). The second component is the convolution of a N(0, τ2) and Ga(a, b), which accommodates large values of contaminants close to the boundary of the valid concentrations as well as more obviously large outliers (e.g. the density represented by red dashed line). The third component, a convolution of a N(0, τ3) and NGa(c, d), is for small values of artifactual concentrations (hence large negative residuals), which will be likely due to dilution of the sample with saline flush or discontinuation of the infusion which had not been recorded. This term is also for accommodating the outliers close to the boundary and the more obviously small ones (e.g. the density represented by green dashed line).
Though we presented the full model above, a useful submodel fits the data well and presents a parsimonious and interpretable subclass. Specifically, consider a case where τ1 = τ2 = τ3 = τ. Then the model is specified as
$$ \begin{aligned} e_{ij} &\sim w_1 \hbox{N}(0,\tau) + w_2 \hbox{N}(0,\tau)*\hbox{Ga}(a,b) + w_3 \hbox{N}(0,\tau)*\hbox{NGa}(c, d) \\ e_{ij}&\sim w_1 \hbox{N}(0,\tau) * \delta(0)+ w_2 \hbox{N}(0,\tau)*\hbox{Ga}(a,b) + w_3 \hbox{N}(0,\tau)*\hbox{NGa}(c, d), \end{aligned} $$
where δ(0) is a distribution that is degenerate at 0. It is informative to note that this distribution arises from the equation \(e_{ij} = \epsilon_{ij} + \xi_{ij}\) where \(\epsilon_{ij}\) is a N(0,τ) random variable and ξ
ij
is a mixture of a distribution degenerate at 0, a Ga(a, b) and a NGa(c, d) with probabilities w
1, w
2 and w
3, respectively.
Represented in this form, the error model is uniquely interpretable. Specifically, \(\epsilon_{ij}\) can be thought of as representing natural residual error, while ξ
ij
represents the outlier shift. Here, the outlier shift process, ξ
ij
, is degenerate at 0 with probability w
1, is positive with probability w
2 and is negative with probability w
3. To reiterate, the model is:
$$ \begin{aligned} y_{ij} &= f(\varvec{\theta}_{\user2{i}}, x_{ij}) + \epsilon_{ij} + \xi_{ij} \\ \epsilon_{ij} &\sim \hbox{N} (0,\tau) \\ \xi_{ij} &\sim \left\{ \begin{array}{ll} \hbox{Degenerate\,at\,0} & \hbox{w.p.}\, w_1\\ \hbox{Ga}(a, b) & \hbox{w.p.}\, w_2 \\ \hbox{NGa}(c, d) & \hbox{w.p.}\, w_3, \end{array} \right. \end{aligned} $$
where w.p. stands for ‘with probability’. This parsimonious approach worked well as discussed in section “Model checking and inferences” and was implemented using Markov chain Monte Carlo (MCMC) methods in WinBUGS [44] and PKBugs [45, 46].
Notice that our model specification naturally produce an ordering constraint in the components means that would avoid the label switching problems, a well known problem in a mixture modeling [47, 48]. This problem is due to the invariance of likelihood for re-labeling of the mixture components (this means the likelihood is the same for all permutations of the components’ indices of the mixture). A commonly used solution is to impose an identifiability constraint by ordering the components means or the mixture weights [47].
In addition, the model allows the probability of being valid or an outlier for each point depending on how far the data point is from the expected range by taking advantage of the predictive ability of a PK model. Although we do not expect that the amount of shift from the PK model would be systemic, the direction of outliers can be expected, either much larger or smaller than the predicted at each point. The greater the deviation from the prediction, the higher the chance of being contaminated; this is incorporated as a probability weight. Although some covariates could be helpful to better model contaminants, they were not collected in this study; in fact, it may not even be feasible to collect them in a ICU setting.
As a note, a normal proportional error model would not be a good choice for the valid population, since it would compete with the Gamma component of the model to model outliers, making the MCMC sampling for posterior distributions difficult. Even if the valid population follows a normal proportional error model, the use of a normal additive error model would result in little bias in the estimates for PK parameters, since some skewed valid concentrations on the boundary of the invalid would be categorized as the valid in many MCMC iterations (so still high probability of being valid, although not 1), allowing them to contribute in estimating the PK parameters most of times. Our simulation results below also support this.
At the third stage, we define priors as follows. We use vague priors for fixed effects \(\varvec{\theta},\) normal distributions with zero means and very small precision 10−4 (i.e. very large variance). We use the conjugate prior for the inverse of the normal variance, τ ∼Ga(0.1, 0.1) (i.e. τ has a prior mean of 1 and a prior variance of 10). The inverse of \(\Upomega\) is assigned the conjugate prior, a Wishart prior W(R, ρ) with a scale matrix R of order 2 × 2 for which diagonal elements set about 0.18 (i.e. the 2 × 2 mean of the Wishart distribution is ρ R
−1; the precision of each element is approximately 11 ≈ 1/0.32), corresponding to 30% CV for inter-individual variability of PK parameters and off-diagonals set to 0. The degrees of freedom ρ is set 2 which allows the least informative proper Wishart prior for the inverse of \(\Upomega\) (ρ must be equal to or greater than 2 for the prior to be proper, and the larger ρ represents stronger belief in the prior guess on R). We use the conjugate prior probabilities of group membership, a Dirichlet distribution for the mixing weights, w ∼Dirichlet(α1, α2, α3), with a common default set for a vague prior, α1 = α2 = α3 = 1, which assigns equal prior masses on each group membership and is equivalent to a prior sample size of 3. Finally, the priors for a and c are assumed to be a uniform (Unif) with boundaries of 1 and 10, Unif(1, 10), and those for b and d to be Unif(0, 10), based on the shape of residual distribution and a plausible range of the parameters.