Combining MCMC with ‘sequential’ PKPD modelling
Abstract
We introduce a method for preventing unwanted feedback in Bayesian PKPD link models. We illustrate the approach using a simple example on a single individual, and subsequently demonstrate the ease with which it can be applied to more general settings. In particular, we look at the three ‘sequential’ population PKPD models examined by Zhang et al. (J Pharmacokinet Pharmacodyn 30:387–404, 2003; J Pharmacokinet Pharmacodyn 30:405–416, 2003), and provide graphical representations of these models to elucidate their structure. An important feature of our approach is that it allows uncertainty regarding the PK parameters to propagate through to inferences on the PD parameters. This is in contrast to standard two-stage approaches whereby ‘plug-in’ point estimates for either the population or the individual-specific PK parameters are required.
Keywords
Feedback Graphical models Markov chain Monte Carlo Sequential PKPD WinBUGSIntroduction
Part of the problem here is that there exists some inconsistency between the PK and PD models. However, it is typically impracticable to attempt to eradicate all such inconsistencies, especially in a population data set. Another aspect of the problem is that the two models are not being weighted according to their plausibility. In fact, in the above example the PK data are being swamped by the PD data, and the PK fit becomes unimportant as the PD fit is ‘tweaked’. However, we typically have more confidence in the PK model and may feel it appropriate to weight the two data sets accordingly. In this paper we consider the approach of discounting the likelihood contribution of the PD data to the estimation of the PK parameters, which has the flavour of a two-stage, or ‘sequential’, analysis.
- 1.
Basing estimates \(\hat{\Uptheta}\) on the PK data alone, and then estimating the θ_{i}s using the PD data alone—termed the Population PK Parameters (PPP) approach by ZBS;
- 2.
Basing estimates \(\hat{\Uptheta}\) on the PK data alone, and then estimating the θ_{i}’s using both the PK and PD data—termed the Population PK Parameters and Data (PPP&D) approach by ZBS;
- 3.
Basing estimates \(\hat{\theta_i}\) on the PK data alone—termed the Individual PK Parameters (IPP) approach by ZBS.
In a simulation study when the assumed models are correct, ZBS (paper I) showed that all three sequential methods were much faster to compute than the simultaneous procedure, and that the PPP&D sequential method in particular had similar performance to the simultaneous approach. However, if the assumed models are not correct, ZBS (paper II) show that sequential methods are far more robust, and in particular that the fitted PK model can be very sensitive to misspecification of the PD model. This behaviour is to be expected given the type of feedback from the PD model into the fitted PK model exemplified in Figs. 1 and 2.
- 1.
From the PK analysis, estimate N plausible sets of possible values for the concentrations. This would generally be achieved through simulation;
- 2.
Carry out N separate PD analyses, one for each of the estimated sets of concentrations;
- 3.
Average over the N PD analyses with appropriate adjustment for estimates and intervals [8].
There has been a similar debate over simultaneous and sequential approaches to multiple imputation: [9] reviews approaches to missing covariates in regression of a response y on covariates X and points out that multiple imputation may condition only on X or on both X and y, while [10, 8, p. 217] discuss the potential advantages of adopting different models for imputation and analysis.
Little and Rubin [8] emphasise that multiple imputation is best considered as a Bayesian predictive procedure. Indeed, a Bayesian approach implemented within the MCMC framework [11, 12, 13, 14] has already been proved to be of considerable value in population PKPD [15, 16, 17, 18, 19, 20, 21]. In the present context, it is natural to also consider a multiple imputation approach to sequential PKPD analysis. In this paper we demonstrate how such a multiple imputation approach can be implemented within a full MCMC analysis via a simple adjustment to the sampling algorithm. This is achieved through the use of a “cut” function in the model description within WinBUGS [5], although the idea is easily transferable to any MCMC program. Section “Methods” describes the method and provides further insight into the modelling assumptions underlying each of ZBS’s sequential methods, while sect. “Results” illustrates use of the method (and its impact) in a population PKPD setting. A concluding discussion is given in sect. “Discussion”.
Methods
Graphical models
To help clarify the ideas discussed in this paper, it is convenient to start with a graphical representation of the structural assumptions relating the quantities in the PKPD model. Graphical models have become increasingly popular as ‘building blocks’ for constructing complex statistical models of biological and other phenomena [22]. These graphs consist of nodes representing the variables in the model, linked by directed or undirected ‘edges’ representing the dependence relationships between the variables. Here we focus on graphs where all the edges are directed and where there are no loops (i.e. it is not possible to follow a path of arrows and return to the starting node). Such graphs are known as Directed Acyclic Graphs (DAGs) and have been extensively used in modelling situations where the relationships between the variables are asymmetric, for example from cause to effect.
Cutting the influence of children on their parents
If we wanted to avoid the influence of z on the estimation of x, we would need to prevent or cut the feedback from z to x, allowing x to be estimated purely on the basis of y. This leads to the graphical model shown in Fig. 3b, which treats x as a parent of z but does not consider z to be a child of x. We denote this one-way flow of information by the ‘valve’ notation shown in the figure. When performing MCMC, all we have to do to prevent feedback is avoid including a likelihood term for z when sampling x. For example, if using Gibbs sampling, the conditional distribution used to generate values of x will not include any terms involving z.
We emphasise that our conclusions based on this approach no longer arise from a full probability model, and as such disobey basic rules for both likelihood and Bayesian inference. However, as discussed further in sect. “Discussion”, we may perhaps view this procedure as allowing a more ‘robust’ estimate of x that is not influenced by (possibly changing) assumptions about the form of the response model.
Population PKPD modelling
Following the notation in ZBS, we let y_{i} and z_{i} denote the vectors of observed concentrations and observed effects for individual \(i =1,\ldots,N\), with y and z denoting the collections of all PK and PD data across subjects. The (typically vector-valued) PK and PD parameters of the ith individual are denoted θ_{i} and ϕ_{i}, respectively, with θ and ϕ denoting the sets of all PK and PD parameters. Finally the PK and PD population parameters are denoted Θ and Φ respectively. Typically, the inter-individual distributions of the PK and PD parameters are assumed to be independently multivariate normal, so that \(\Uptheta=(\mu_{\theta}, \Upsigma_{\theta})\) and \(\Upphi=(\mu_{\phi}, \Upsigma_{\phi})\) are the population means and covariances of the individual-level PK and PD parameters respectively, although the following discussion is general and applies to all distributional assumptions.
It is important to note that we do not know how to write down the resulting joint ‘posterior’ here! Our approach allows us to sample from it but we do not know its mathematical form (that is not to say that such a form doesn’t exist). Hence we cannot compare it analytically with the correct posterior.
Four different population PKPD models and the data on which their parameters depend
Parameter
| Model | |||
---|---|---|---|---|
SIM | PPP | PPP&D | IPP | |
θ | z, y | z, p(Θ|y) | \(z, y, p({\Uptheta}| y)\) | y |
θ^{PK} | – | y | y | – |
Θ | z, y | y | y | y |
ϕ | z, y | \(z, p(\Uptheta|y)\) | \(z, y, p({\Uptheta}| y)\) | \(z, p({\theta}| y)\) |
Φ | z, y | \(z, p(\Uptheta|y)\) | \(z, y, p({\Uptheta}| y)\) | \(z, p({\theta}| y)\) |
Implementing cuts in the BUGS language
Results
- IPP model: We simply create a copy of the \({\tt theta[\,]}\) variable using the \({\tt cut(.)}\) function and use this copy, \({\tt theta.cut[\,]},\) in the PD model (line 18) instead of \({\tt theta[\,]}\). The following BUGS code, inserted between lines 8 and 9, say, creates the new variable:$$ {\tt for\;(j\;in\;1{:}4)\;\{theta.cut[i,j]\; < \!\!-\;cut(theta[i,j])\}} $$
- PPP model: We first create copies of the population PK parameters, e.g.We then change the name of the subject-specific PK parameters linked to the PK data (as in the graph of Fig. 6): we replace \({\tt theta}\) on lines 6 and 8 with \({\tt theta.PK}\). Finally we specify a distributional assumption for the subject-specific PK parameters linked to the PD data (\({\tt theta[\,]}\)). We assume that they arise from a population distribution parameterised by the ‘copied’ population parameters, which is equivalent to assuming a prior equal to the posterior predictive distribution from the PK-data-only analysis. This is achieved by inserting the following, say, between lines 8 and 9:$$ \begin{aligned} &{\tt for\;(i\;in\;1{:}4)\;\{} \\ &\;\; {\tt mu.PK.cut[i]\; < \!\!-\;cut(mu.PK[i])}\\ &\;\; {\tt for\;(j\;in\;1{:}4)\;\{{Sigma.PK.Inv.cut[i,j]\; < \!\!-\;cut(Sigma.PK.Inv[i,j])}} {\tt \}}\\&\} \end{aligned} $$$$ {\tt theta[i,1{:}4]\;\sim\;dmnorm(mu.PK.cut[\,],\;Sigma.PK.Inv.cut[,])} $$
- PPP&D model: We simply extend the PPP model by linking the subject-specific PK parameters used in the PD model (\({\tt theta[\,]}\)) to an exact copy of the PK data. For example, we could insert the following between lines 6 and 7:where \({\tt y[\,]}\) is duplicated in the data set to form \({\tt y.copy[\,]}\).$$ \begin{aligned}&{\tt y.copy[i,j]\;\sim\;dnorm(log.Cb.copy[i,j],\;tau.PK.copy)}\\&{\tt log.Cb.copy[i,j]\; < \!\!-\;log(pkIVinf2(theta[i,],\;PK.t[i,j],\;Dose[i],\;TI[i]))}\end{aligned} $$
For each model, two Markov chains with widely differing starting values were generated using WinBUGS. Seventy thousand iterations were performed and values from iterations 20001–70000 were retained for inference (giving a total sample size of 100000 for each parameter of interest). Note that this is a very conservative analysis, with run-lengths one tenth to one fifth as long typically being quite sufficient in practice. Run-times on a 1.2 GHz laptop machine were 172, 172, 184 and 95 min for the SIM, PPP, PPP&D and IPP models, respectively.
Posterior mean deviances (as a measure of model fit) from each population PKPD model for each source of data
Data source
| Model | |||
---|---|---|---|---|
SIM | PPP | PPP&D | IPP | |
PK (y) | −154.8 | −162.9 | −162.8 | −162.9 |
PD (z) | 6459 | 6392 | 6462 | 6491 |
Posterior median point-estimates for population PKPD parameters from each of the four models; 95% credible intervals are given in parentheses
Parameter
| Model | |||
---|---|---|---|---|
SIM | PPP | PPP&D | IPP | |
logCL | −4.05 (−4.14, −3.95) | −4.05 (−4.15, −3.96) | −4.05 (−4.15, −3.96) | −4.05 (−4.15, −3.96) |
logQ | −3.83 (−3.95, −3.72) | −3.77 (−3.89, −3.65) | −3.77 (−3.89, −3.65) | −3.77 (−3.89, −3.65) |
logV1 | −2.65 (−2.81, −2.49) | −2.70 (−2.86, −2.54) | −2.70 (−2.86, −2.54) | −2.70 (−2.86, −2.54) |
logV2 | −1.03 (−1.13, −0.923) | −1.03 (−1.13, −0.919) | −1.02 (−1.13, −0.918) | −1.03 (−1.13, −0.919) |
logE_{0} | 4.23 (4.11, 4.34) | 4.19 (4.09, 4.29) | 4.23 (4.11, 4.34) | 4.23 (4.11, 4.35) |
\({\log}E_{\rm max}\) | 4.35 (4.28, 4.42) | 4.41 (4.32, 4.49) | 4.35 (4.28, 4.42) | 4.35 (4.27, 4.42) |
\({\log}EC_{50}\) | −3.13 (−3.33, −2.93) | −2.82 (−3.21, −2.46) | −3.13 (−3.34, −2.93) | −3.15 (−3.38, −2.91) |
\(\hbox{Var}[\log CL]\) | 0.039 (0.021, 0.082) | 0.040 (0.021, 0.084) | 0.040 (0.022, 0.084) | 0.040 (0.022, 0.084) |
Var[log Q] | 0.014 (0.004, 0.051) | 0.014 (0.005, 0.049) | 0.014 (0.005, 0.049) | 0.014 (0.005, 0.047) |
\(\hbox{Var}[\log V1]\) | 0.056 (0.013, 0.176) | 0.045 (0.010, 0.154) | 0.045 (0.010, 0.153) | 0.045 (0.010, 0.151) |
\(\hbox{Var}[\log V2]\) | 0.037 (0.016, 0.086) | 0.040 (0.018, 0.092) | 0.040 (0.018, 0.092) | 0.040 (0.018, 0.091) |
\(\hbox{Var}[\log E_0]\) | 0.052 (0.027, 0.112) | 0.037 (0.018, 0.081) | 0.053 (0.027, 0.113) | 0.056 (0.028, 0.121) |
\(\hbox{Var}[\log E_{\rm max}]\) | 0.018 (0.008, 0.041) | 0.028 (0.014, 0.062) | 0.018 (0.008, 0.042) | 0.019 (0.009, 0.044) |
\(\hbox{Var}[\log EC_{50}]\) | 0.114 (0.020, 0.340) | 0.411 (0.092, 1.113) | 0.117 (0.021, 0.348) | 0.173 (0.059, 0.447) |
\(\sigma_{\rm PK}\) | 0.180 (0.164, 0.200) | 0.178 (0.162, 0.197) | 0.178 (0.162, 0.196) | 0.178 (0.162, 0.196) |
\(\sigma_{\rm PD}\) | 9.11 (8.68, 9.57) | 8.77 (8.35, 9.22) | 9.13 (8.70, 9.60) | 9.25 (8.81, 9.73) |
Discussion
We have constructed a general framework for preventing unwanted feedback in the simultaneous analysis of linked models. This has the flavour of a two-stage approach but unlike standard two-stage approaches our method allows uncertainty from the first stage to propagate fully through to the second stage. Thus the approach can be thought of as a form of multiple imputation. We have illustrated the use of our method for several PKPD models and shown how a graphical modelling perspective can elucidate the assumptions underlying established two-stage methods.
The four models considered can be thought of as representing varying degrees of confidence in the PK model relative to the PD model. With the full probability model (SIM), we are assuming equal confidence (in fact, total belief) in both models. If both PK and PD model specifications are optimal in some sense (we refrain from using the term ‘correct’ since all models are simplifications of reality) then the PK parameter values supported by the PK model and data should be consistent with those supported by the PD model and data, and so allowing full feedback (i.e. borrowing strength) between the models using SIM is desirable. However, if the two parts of the PK-PD model specification lead to inconsistencies between the parameter values supported by the different component models, the SIM approach can lead to fitting problems, especially if the PD data are more substantial than the PK. As discussed earlier, it is often realistic to have more confidence in the PK model specification than in the PD model. Hence, at the other extreme to SIM is the IPP model, which represents an uncompromising belief in the PK model, in the sense that the individual-specific parameters obtained from the PK model cannot be modified in any way to accommodate a better fitting PD model—they are input as ‘distributional constants’. PPP and PPP&D lie somewhere in between these two extremes, with both preventing modification of the population PK parameters by the PD model, but allowing the subject-specific PK parameters to adapt to the PD model. This adaptation is stronger for PPP than for PPP&D since in the latter case it is tempered by the direct influence of the ‘cloned’ PK data.
In light of these observations we can see that the three ‘sequential’ methods can be ranked in terms of their expected ability, in general, to fit the PD data. (Recall that all three methods should fit the PK data equally well, since the PK model is identical in each case and no feedback from the PD model is permitted.) Fitting performance is (mostly) governed by the flexibility of the PK parameters used as input to the PD model, i.e. the extent to which they can be modified from the values that would be suggested by the PK data alone. The more flexible the input parameters, the better the PD fit achievable. As PPP, PPP&D and IPP represent increasing confidence in the PK parameters that would be obtained from the PK data alone, we would expect PPP, in general, to offer the best fitting performance and IPP the worst, with PPP&D lying somewhere in between (although they may, of course, all perform equally well). Hence if model fit is the main criterion for selecting between ‘sequential’ models, we might recommend PPP as the method of choice. However, perhaps it is preferable, in practice, to base such a decision on a careful consideration of one’s relative confidence in the PK model instead. One might argue that PPP is particularly attractive because of the relatively weak assumption regarding the PK inputs to the PD model. However, this same assumption potentially increases the disparity between the PK parameters that best fit the PK data and those that are used as input to the PD model. We might question how meaningful a model that leads to two different, perhaps contradictory, sets of PK parameters may be—such an apparent internal inconsistency may, for some, be grounds for avoiding such an approach altogether. PPP&D also suffers from this problem, but to a lesser extent, since it allows less adaptation of the PK parameters to fit the PD model than does PPP; again, this may be viewed by some as a conceptual flaw in the approach, but others might think that allowing a degree of inconsistency is preferable to having to adopt an uncompromising belief in the PK model (by constraining the individual PK parameters to be distributional constants), as in IPP. Of the three sequential methods, we would expect the PD model fit for PPP&D to be most similar to that for SIM, since both allow the PK data to directly influence estimation of the PD model. However, unlike SIM, PPP&D yields a PK model fit that is not detrimentally influenced by feedback from the PD model in cases where relative confidence in the PD versus PK model specification is low.
We must emphasise, again, that models containing cuts do not correspond to an underlying full probability model (Bayesian or otherwise), in the same way that sequential analyses neither correspond to some joint model. Hence, the ‘joint distribution’ from which we sample during our MCMC scheme is not a formal posterior; indeed it is possible that a joint distribution with the simulated properties does not even exist. This does not invalidate the approach, however; we simply think of cuts as representing the specification of ‘distributional constants’, an intuitive means of acknowledging a fixed degree of uncertainty regarding (otherwise fixed) input parameters, which is a natural objective in many contexts, for robustifying one’s inferences. Even without the acknowledgement of uncertainty, cuts/sequential analyses can afford robustness by providing a mechanism whereby estimates relating to the measurement error model are not influenced by (possibly changing) assumptions about the response model. This is particularly important when there may be model misspecification. For example, we may have a well established (population) PK model, for which there is considerable biological rationale, and it is undesirable for our inferences regarding this model to change as we explore the PKPD relationship; the cut also ensures that the same inputs are used throughout the exploration process. ZBS have examined the performance of sequential methods under various model misspecification scenarios; while beyond the scope of the current paper, this is an area deserving of further investigation.
Notes
Acknowledgements
DL and DS are funded by the UK Medical Research Council (grant code U.1052.00.005). We are grateful to Martyn Plummer for several helpful discussions.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
References
- 1.Beal SL, Sheiner LB (1992) NONMEM user’s guide, parts I-VII. NONMEM Project Group, San FranciscoGoogle Scholar
- 2.Wakefield JC, Aarons L, Racine-Poon A (1999) The Bayesian approach to population pharmacokinetic/pharmacodynamic modelling. In: Gatsonis C, Kass RE, Carlin B, Carriquiry A, Gelman A, Verdinelli I, West M, (eds) Case studies in Bayesian statistics. Springer-Verlag, New York, pp 205–265Google Scholar
- 3.Lunn DJ (2005) Bayesian analysis of population pharmacokinetic/pharmacodynamic models. In: Husmeier D, Dybowski R, Roberts S (eds) Probabilistic modeling in bioinformatics and medical informatics. Springer-Verlag, London, pp 351–370CrossRefGoogle Scholar
- 4.Aarons L, Mandema JW, Danhof M (1991) A population analysis of the pharmacokinetics and pharmacodynamics of midazolam in the rat. J Pharmacokinet Biopharm 19:485–496PubMedCrossRefGoogle Scholar
- 5.Spiegelhalter D, Thomas A, Best N, Lunn D (2003) WinBUGS user manual, Version 1.4. Medical Research Council Biostatistics Unit, Cambridge, 2003Google Scholar
- 6.Zhang L, Beal SL, Sheiner LB (2003) Simultaneous vs sequential analysis for population PK/PD data I: best-case performance. J Pharmacokinet Pharmacodyn 30:387–404PubMedCrossRefGoogle Scholar
- 7.Zhang L, Beal SL, Sheiner LB (2003) Simultaneous vs. sequential analysis for population PK/PD data II: robustness of methods. J Pharmacokinet Pharmacodyn 30:405–416PubMedCrossRefGoogle Scholar
- 8.Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. John Wiley & Sons, New YorkGoogle Scholar
- 9.Little RJA (1992) Regression with multiple x’s: a review. J Am Statist Ass 87:1227–1237CrossRefGoogle Scholar
- 10.Meng XL (1994) Multiple-imputation inferences with uncongenial sources of input. Stat Sci 9:538–558Google Scholar
- 11.Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409CrossRefGoogle Scholar
- 12.Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE T Pattern Anal 6:721–741CrossRefGoogle Scholar
- 13.Hastings WK (1970) Monte Carlo sampling-based methods using Markov chains and their applications. Biometrika 57:97–109CrossRefGoogle Scholar
- 14.Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21:1087–1091CrossRefGoogle Scholar
- 15.Gueorguieva I, Aarons L, Rowland M (2006) Diazepam pharmacokinetics from preclinical to Phase I using a Bayesian population physiologically based pharmacokinetic model with informative prior distributions in WinBUGS. J Pharmacokinet Pharmacodyn 33:571–594PubMedCrossRefGoogle Scholar
- 16.Mu S, Ludden TM (2003) Estimation of population pharmacokinetic parameters in the presence of non-compliance. J Pharmacokinet Pharmacodyn 30:53–81PubMedCrossRefGoogle Scholar
- 17.Graham G, Gupta S, Aarons L (2002) Determination of an optimal dosage regimen using a Bayesian decision analysis of efficacy and adverse effect data. J Pharmacokinet Pharmacodyn 29:67–88PubMedCrossRefGoogle Scholar
- 18.Lunn DJ, Aarons L (1998) The pharmacokinetics of saquinavir: a Markov chain Monte Carlo population analysis. J Pharmacokinet Biopharm 26:47–74PubMedGoogle Scholar
- 19.Lunn DJ, Aarons LJ (1997) Markov chain Monte Carlo techniques for studying interoccasion and intersubject variability: application to pharmacokinetic data. Appl Stat 46:73–91Google Scholar
- 20.Best NG, Tan KKC, Gilks WR, Spiegelhalter DJ (1995) Estimation of population pharmacokinetics using the Gibbs sampler. J Pharmacokinet Biopharm 23:407–435PubMedCrossRefGoogle Scholar
- 21.Wakefield JC (1994) An expected loss approach to the design of dosage regimens via sampling-based methods. The Statistician 43:13–29CrossRefGoogle Scholar
- 22.Spiegelhalter DJ (1998) Bayesian graphical modelling: a case-study in monitoring health outcomes. Appl Stat 47:115–133Google Scholar
- 23.Lauritzen SL, Dawid AP, Larsen BN, Leimer HG (1990) Independence properties of directed Markov fields. Networks 20:491–505CrossRefGoogle Scholar
- 24.Dempster AP (1997) The direct use of likelihood for significance testing. Statist Comput 7:247–252CrossRefGoogle Scholar
- 25.Lunn DJ, Best N, Thomas A, Wakefield J, Spiegelhalter D (2002) Bayesian analysis of population PK/PD models: general concepts and software. J Pharmacokinet Pharmacodyn 29:271–307PubMedCrossRefGoogle Scholar