Journal of Pharmacokinetics and Pharmacodynamics

, Volume 38, Issue 2, pp 237–260

Informative dropout modeling of longitudinal ordered categorical data and model validation: application to exposure–response modeling of physician’s global assessment score for ustekinumab in patients with psoriasis

Authors

    • Pharmacokinetics and Pharmacometrics, Biologics Clinical PharmacologyCentocor Research and Development, Inc.
  • Philippe O. Szapary
    • Clinical ImmunologyCentocor Research and Development, Inc.
  • Newman Yeilding
    • Clinical ImmunologyCentocor Research and Development, Inc.
  • Honghui Zhou
    • Pharmacokinetics and Pharmacometrics, Biologics Clinical PharmacologyCentocor Research and Development, Inc.
Article

DOI: 10.1007/s10928-011-9191-7

Cite this article as:
Hu, C., Szapary, P.O., Yeilding, N. et al. J Pharmacokinet Pharmacodyn (2011) 38: 237. doi:10.1007/s10928-011-9191-7

Abstract

The physician’s global assessment (PGA) score is a 6-point measure of psoriasis severity that is widely used in clinical trials to assess response to psoriasis treatment. The objective of this study was to perform exposure–response modeling using the PGA score as a pharmacodynamic endpoint following treatment with ustekinumab in patients with moderate-to-severe psoriasis who participated in two Phase 3 studies (PHOENIX 1 and PHOENIX 2). Patients were randomly assigned to receive ustekinumab 45 or 90 mg or placebo, followed by active treatment or placebo crossover to ustekinumab, dose intensification or randomized withdrawal and long-term extension periods. A novel joint longitudinal-dropout model was developed from serum ustekinumab concentrations, PGA scores, and patient dropout information. The exposure–response component employed a semi-mechanistic drug model, integrated with disease progression and placebo effect under the mixed-effect logistic regression framework. This allowed potential tolerance to be investigated with a mechanistic approach. The dropout component of the joint model allowed the examination of its potential influence on the exposure–response relationship. The flexible Weibull dropout hazard function was used. Visual predictive check of the joint longitudinal-dropout model required special handling, and a conditional approach was proposed. The conditional approach was extended to external model validation. Finally, appropriate interpretation of model validation is discussed. This longitudinal-dropout model can serve as a basis to support future alternative dosing regimens for ustekinumab in patients with moderate-to-severe plaque psoriasis.

Keywords

Monoclonal antibodyInformative censoringNonlinear mixed effect modelingVisual predictive checkNONMEM

Introduction

Psoriasis is a chronic immune-mediated skin disorder that affects approximately 2–3% of the population worldwide [13]. Interleukin (IL)-12 and IL-23 have been implicated in the pathogenesis of psoriasis [46], and anti-IL-12/23 agents have shown promise for treating psoriasis and psoriatic arthritis [7]. Ustekinumab is a human monoclonal antibody that binds with high specificity and affinity to the shared p40 subunit of IL-12 and IL-23 and blocks interaction with the IL-12Rβ1 cell surface receptor. Population pharmacokinetics (PK) of ustekinumab were recently reported [8, 9]. In addition, a population mechanistic PK/pharmacodynamic (PD) model was developed using Psoriasis Area and Severity Index (PASI) scores as a nearly continuous measure of improvement in disease conditions [10].

The physician’s global assessment (PGA) score, an alternative to the PASI score, is a 6-point scale that measures disease severity (0 = cleared, 1 = minimal, 2 = mild, 3 = moderate, 4 = marked, and 5 = severe) and is commonly used as a primary or secondary endpoint in psoriasis clinical trials. It is a composite index describing the severity of the three main characteristics of psoriatic plaques: erythema, scaling, and thickness [11]. The proportions of patients achieving a PGA score ≤1 or a PGA score ≤2 are of particular clinical interest and are used as clinical trial endpoints. To our knowledge, no previous PK/PD models have been established using PGA scores in patients with psoriasis.

Logistic regression, which uses the transformation logit(x) = log[x/(1 − x)] to link the predictors to the probability of the endpoint of interest, is the standard statistical tool to model categorical endpoints. The typical model takes the following form [12, 13]:
$$ {\text{logit}}\left( {{\text{Prob}}\left[ {{\text{z}} = 1} \right]} \right) = \gamma + {\text{f}}_{\text{p}} \left( {\text{t}} \right) + {\text{f}}_{\text{d}} \left( {\text{t}} \right) + \eta $$
(1)
where Prob[z = 1] is the probability of the event of interest, γ determines the baseline probability, fp(t) represents the time effect, fd(t) represents the drug or concentration effect, and η is a normally distributed random variable with a mean of 0 and represents the between-subject variability. In this formulation, the sum γ + fp(t) + η may be interpreted as the placebo effect. Ideally, the form of fd(t) would reflect the underlying pharmacology. This approach was recently applied in rheumatoid arthritis [1214]. Hutmacher et al. [12] was the first to apply a latent variable model that used an indirect response model to characterize an underlying disease variable, which requires therapeutic area knowledge of the drug and disease. Lacroix et al. [13] used a Markov transition approach along with Eq. 1 and argued that it is superior to the latent variable approach previously proposed [12] in accommodating the within-individual correlations. Most recently, Hu et al. [14] further developed the latent variable approach with a feature that is more advantageous in modeling change-from-baseline type of variables. Hu et al. [14] argued that, while a latent-indirect model did not model within-individual correlations, it had three advantages over the Markov approach: (1) it is more likely to be predictive, (2) it relates more easily to the main clinical trial endpoint of interest, and (3) it is easier to interpret.

Although not applied to the setting of categorical analysis to our knowledge, refinements to Eq. 1 are possible. Holford and Nutt [15] argued for the importance of incorporating disease progression in PK/PD modeling, even though few patients are untreated which prevents the direct modeling of disease progression. At times, tolerance of the drug may develop, which can be modeled empirically or mechanistically. A model consistent with the pharmacology of the drug would likely better predict when extrapolated to new dosing regimens and time periods. In the widely used class of indirect models, a precursor-dependent indirect model has been developed to model drug tolerance [16, 17].

Patient dropout occurs regularly during clinical trials, and if ignored, may bias model estimation and predictions. Informative dropout (ID) is an area of interest in statistical literature, although the vast majority focuses on linear longitudinal models. Hu and Sale [18] generalized the methodology to nonlinear models suitable for PK/PD modeling. Briefly, the dropout time T is modeled by its hazard function, defined as
$$ {\text{h}}\left( {\text{t}} \right) = {\text{prob}}\left( {{\text{T}} < \left( {{\text{t}} + \Updelta {\text{t}}} \right)|{\text{T}} > {\text{t}}} \right)/\Updelta {\text{t}} $$
taking the limit Δt → 0. The survival probability that a patient remains in the study until time t is computed as
$$ S(t) = \exp \left( { - \int\limits_{0}^{t} {h(u)du} } \right) $$
from which the probability that T happens in a time interval (ti−1, ti) is calculated as S(ti−1) − S(ti). Hu and Sale [18] used the constant baseline hazard h(t) = λ. Conceivably, the hazard could increase or decrease with time, as patients might tend to dropout more or less, respectively, as treatment continues. The Weibull hazard function h(t) = aλta−1, where a and λ > 0, is more flexible [19]. It reduces to a constant hazard when a = 1 and will increase or decrease with time depending on whether a > 1 or a < 1, respectively.
If the hazard function h(t) is independent of both observed and unobserved disease status, dependent on the observed disease status (i.e., PGA) but not on the unobserved disease status, or dependent on the unobserved disease status, the dropout process is classified as completely random dropout (CRD), random dropout (RD), or ID, respectively. According to the classification as in Hu and Sale [18], assume that the true disease progress is given by Y = Y(θ, η, t), where θ are the fixed effect parameters, η are inter-individual random effects, and t is time. Let (\( {\text{t}}_{ 1} ,{\text{ t}}_{ 2} {\ldots} {\text{t}}_{\text{n}} \)) be the observation times by trial design. For a subject, let YO = (\( {\text{Y}}_{ 1} ,{\text{ Y}}_{ 2} {\ldots} {\text{Y}}_{{{\text{T}} - 1}} \)) be the observed disease progress and T be the dropout time such that the subject has not dropped out yet at time ti−1. The classification is based on the random indicator variables Ti, which takes the value of 0 or 1 and indicates whether the subject will drop out at time ti. Also let YU = {Y(θ, η, t); ti−1 < t < ti} be the process of unobserved disease progress between the previous observation ti−1 and the current time ti. Then CRD, RD, and ID are classified as follows:
  1. (a)

    completely random (CRD), if Ti is independent of η, and therefore (YO, YU).

     
  2. (b)

    random (RD), if Ti (given YO) is independent of YU, but may depend on YO. In addition, any dependence of Ti on η is only through YO.

     
  3. (c)

    informative (ID), if Ti (given YO) depends on YU, or explicitly depends on η other than through YO.

     

In the case of ID, the longitudinal model parameters must be simultaneously estimated with those of the dropout model. Whether the data support CRD could be graphically explored by plotting average observed scores of patients who dropped out in the next visit versus those who did not drop out. A special case of ID is the restrictive informative dropout (RID) model, where h(t) depends on the unobserved disease status, but not the observed.

Two approaches exist in ID modeling. The above formulation is based on the selection-model approach, which is suitable when the longitudinal component is the main focus. Formally, this involves factorizing the joint likelihood of the observed longitudinal data (YO) and the dropout (T) as in Eq. 1 in Hu and Sale [18]
$$ P(Y_{O} ,T) = \int {P(Y_{O} ,T|\eta )P(\eta )d\eta = \int {P(T|Y_{O} ,\eta )P(Y_{O} |\eta )P(\eta )d\eta } = \int {P(T|Y_{O} ,Y_{U} ,\eta )P(Y_{O} |\eta )P(\eta )d\eta } } $$
where P(η) represents inter-individual variability, P(YO|η) is the usual PD model expressing YO as a function of the fixed and random effects, and P(T|YO, YU, η) is the dropout model specifying the potential dependence of dropout time T on both observed and unobserved data, as well as the inter-individual random effects. Another possible approach, often used in statistical literature, is the pattern-mixture model approach using a different factorization of the joint likelihood as
$$ P(Y_{O} ,T) = \int {P(Y_{O} |T)P(T)dT} $$

This approach specifies the dropout model first and then the longitudinal model conditional on the dropout outcome.

Sheiner [20] described nonrandom censoring in a joint analysis with ordered categorical data. To our knowledge, ID modeling has not yet been applied to exposure–response (i.e., PK/PD) modeling of categorical data. In addition, when dropout is present, with the exception of simple dosing regimen situations, ordinary implementation of a visual predictive check (VPC) is inappropriate because it does not account for dropout. To emphasize the contrast with dropout, the term “longitudinal” is used herein to indicate the population PK/PD (i.e., exposure–response) component of the model.

Model validation is an important and often integral component of PK and PD modeling. However, confusion on this topic remains; Hu et al. [14] pointed out the misleading claim of describing a model as “validated” after conducting bootstrap analysis. In principle, model validation refers to the predictive ability of the model [21] and should therefore be a continuous measure instead of a simple yes/no.

The objective of the current analysis was to develop a semi-mechanistic population PK/PD model that allows investigation of potential tolerance and dropout influence. VPC implementations for longitudinal-dropout modeling are also discussed, and a method suitable under flexible dosing regimens is proposed. Some misconceptions of model validation are discussed, and aspects of appropriate uses are illustrated. Data from two large Phase 3 trials of ustekinumab in patients with moderate-to-severe psoriasis (PHOENIX 1 [22] and PHOENIX 2 [23]) were used. These are the same studies used in an earlier analysis by Zhou et al. [10]; however, the current analysis includes additional long-term data (randomized withdrawal and long-term extension periods).

Materials and methods

Study design

The patient populations and study designs for PHOENIX 1 [22] and PHOENIX 2 [23] have been previously described. Both were multicenter, randomized, double-blind, placebo-controlled, parallel design studies of patients with moderate-to-severe plaque psoriasis. The entry criteria identified patients with moderate-to-severe psoriasis based on PASI score and involved body surface area (BSA). The PASI measures disease severity based on plaque characteristics and the proportion of BSA involved with psoriasis. The PGA measures disease severity based on plaque characteristics only, without regard to affected BSA.

The study designs were complex, and consisted of placebo-controlled, placebo crossover, dose optimization, randomized withdrawal, and long-term extension periods. Briefly, in PHOENIX 1, a total of 766 patients were assigned to receive subcutaneous (SC) injections of ustekinumab 45 mg, ustekinumab 90 mg, or placebo at weeks 0 and 4, followed by active treatment every 12 weeks or placebo crossover to ustekinumab 45 or 90 mg starting at week 12 (weeks 12–40), followed by randomized withdrawal (weeks 40–76), and a long-term extension (week 76 onward) [22]. In PHOENIX 2, a total of 1,230 patients were assigned to the same treatment groups and design until week 28, followed by dose schedule optimization (weeks 28–52), and a long-term extension (week 52 onward) [23]. The current analysis used all data obtained up to week 100 in both trials. For PHOENIX 1, most patients had data up to week 136, with 157 patients having data at or beyond week 152.

Serum ustekinumab measurement

Blood samples for the measurement of serum ustekinumab concentrations were collected at weeks 0, 4, 12, 16, 24, 28, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, and 88 (and every 24 weeks thereafter, if available) in PHOENIX 1 and at weeks 0, 4, 12, 16, 20, 24, 28, 40, 52, and 88 (and every 24 weeks thereafter, if available) in PHOENIX 2. At visits when patients received the study agent, blood samples were collected prior to study agent administration. A validated electrochemiluminescent immunoassay method, with a lower limit of quantification (LLOQ) of 0.17 μg/mL at a minimum required 1:10 dilution, was used to measure serum ustekinumab concentrations [8]. The PK data used in this analysis included longer term data than previously reported [8, 9].

Physician’s global assessment score measurement

PGA scores were collected at weeks 0, 2, 4, and every 4 weeks thereafter up to week 88 or study unblinding, then at week 100 and every 12 weeks thereafter in PHOENIX 1 and at weeks 0, 2, 4, and every 4 weeks thereafter up to week 52 or study unblinding, and every 12 weeks after week 52 in PHOENIX 2. The numbers of patients with baseline PGA scores of 2, 3, 4, and 5 were 42, 388, 298, and 37, respectively, for PHOENIX 1 and 105, 642, 419, and 64, respectively, for PHOENIX 2.

Dataset for Pharmacokinetic/Pharmacodynamic-dropout modeling and validation

Only patients with available PK data were included in the dataset. Data from PHOENIX 2 were used for model building, and data from PHOENIX 1 were reserved for model validation. All PK and PGA data through week 100 were included in the current analysis. For PHOENIX 2, there were 9,723 PK observations and 21,711 PGA scores from a total of 1,230 patients. For PHOENIX 1, there were 9,617 PK observations and 19,957 PGA scores from a total of 765 patients. In the PHOENIX 2 study, 211 (17.2%) patients discontinued the study at various times before week 100. The exact discontinuation times were available for 68 (5.5%) patients. In PHOENIX 1, 162 (21.2%) patients discontinued the study before week 152, and the discontinuation date was known for 48 (6.3%) patients.

Software

NONMEM Version 7 [24] was used for model development and VPC simulations. The F_FLAG method was used with the Laplacian-Interaction option to accommodate the LLOQ values in PK modeling [25]. For the PK/PD-dropout model, the Laplacian option was used.

Population pharmacokinetc/pharmacodynamic model development

Pharmacokinetic model

A previously established one-compartment model with first-order absorption was used, using a confirmatory population PK model [9], and the empirical Bayesian estimates of individual PK parameters were obtained. The η-shrinkage for the apparent clearance and volume and ε-shrinkage were 8.3, 28.6, and 12.7%, respectively, for PHOENIX 2 and 9.0, 31.0, and 11.5%, respectively, for the combined dataset of PHOENIX 2 and 1. The confirmatory model was used because it was considered more robust than the exploratory model developed in Zhu et al. [8].

Pharmacokinetic/pharmacodynamic model

To accommodate the nature of the PGA score being an ordered categorical endpoint and to account for the potential influence of disease progression, a more general model than Eq. 1 was proposed below:
$$ {\text{Logit}}\left[ {{\text{prob}}\left( {{\text{PGA}} \le {\text{k}}} \right)} \right] = \alpha_{\text{k}} + {\text{f}}_{\text{z}} \left( {\text{t}} \right) + {\text{f}}_{\text{p}} \left( {\text{t}} \right) + {\text{f}}_{\text{d}} \left( {\text{t}} \right) + \eta $$
(2)
where k = 0, 1, 2, 3 are PGA scores, αk are model parameters that are monotonically increasing in k and represent the baseline probability of PGA distribution, and fz(t) represents disease progression. Specifically, disease progression was modeled as
$$ {\text{f}}_{\text{z}} \left( {\text{t}} \right) = \beta {\text{t}} $$
(3)
where β represents the rate constant of decline.
The placebo effect was modeled empirically as
$$ {\text{f}}_{\text{p}} \left( {\text{t}} \right) = {\text{Plb}}_{ \max } \left[ { 1-{ \exp }\left( { - {\text{R}}_{\text{p}} {\text{t}}} \right)} \right] $$
(4)
where Plbmax is the maximum that may be reached by fp(t), and Rp is the rate constant of onset. The model does not allow for a decline in the placebo response.
Note that, in the absence of untreated patients, disease progression and placebo effects are not separately identifiable, i.e., it can only be known that fz(t) + fp(t) represents the sum of disease progression and placebo effect. The interpretation of fz(t) as disease progression and fp(t) as the placebo effect is based on a reasonable assumption that the placebo effect plateaus in the short term and that the disease progression effect continues in the long term. The drug effect was modeled using a latent variable approach. Because ustekinumab blocks IL-12/23, which in turn affects the disease progression, this suggests that a Type I indirect model could be suitable [10]. Similar to an earlier approach [14], it was assumed that the drug effect is driven by a latent variable R(t), governed by
$$ {{\frac{{{\text{dR(t)}}}}{\text {dt}}} = \mathop k\nolimits_{in} \left( {1 - {\frac{{{\text{C}}_{\text{p}} }}{{IC_{50} + {\text{C}}_{\text{p}} }}}} \right) - \mathop {\text k}\nolimits_{\text{out}} {\text{R(t)}}} $$
(5)
where Cp is the ustekinumab concentration, and kin, IC50, and kout are parameters in a Type I indirect model. This implicitly assumes that the maximum effect, Emax, is 100%. As a latent variable, the scale of R(t) needs to be set, and it is further assumed that R = 1 at baseline, i.e., R(0) = 1, leading to kin = kout. The reduction of R(t) was assumed to drive the drug effect through
$$ {\text{f}}_{\text{d}} \left( {\text{t}} \right) = {\text{DE}}\left( { 1-{\text{R}}\left( {\text{t}} \right)} \right) $$
(6)
where DE determines the magnitude of drug influence. Equations 26 establish the link between ustekinumab concentrations and PGA scores in patients with psoriasis.
A diagram of the integrated exposure–response model linking the serum ustekinumab concentrations (Cp) to the reductions in PGA scores is presented in Fig. 1, where kin was fixed to kout R(0) = kout to preserve mass balance.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig1_HTML.gif
Fig. 1

Diagram of the semi-mechanistic pharmacokinetic/pharmacodynamic model of ustekinumab in treating psoriasis. The dashed line indicates an effect to be assessed below. ka, absorption rate; ke, elimination rate; kO, precursor formation rate; kin, disease formation rate; kout, disease amelioration rate; PGA physician’s global assessment

Tolerance assessment

A precursor-dependent indirect response model [16, 17] was used to assess the possibility of drug tolerance. This required including the additional precursor compartment in Fig. 1, with kO as the input and kin as the output; kO was an additional parameter to be estimated.

Dropout model development

For the patients with an unknown dropout time, it is reasonable to assume that dropout occurred between the time when the last observation was available and the next scheduled visit time, which is known as interval censoring in survival analysis [18]. Hu and Sale [18] did not specify how to model the exact dropout data; however, according to standard statistical theory, the likelihood of those data is given by f(t)*S(t), where f(t) is the density function for the corresponding probability distribution determined by the hazard function, h(t).

The CRD and RD models from Hu and Sale [18] were adapted straightforwardly as follows:
$$ {\text{CRD: }}{\text{h}}\left( {\text{t}} \right) = {\text{a}}\lambda {\text{t}}^{{{\text{a}} - 1}} $$
(7)
$$ {\text{RD: }}{\text{ h}}\left( {\text{t}} \right) = {\text{a}}\lambda {\text{t}}^{{{\text{a}} - 1}} *{ \exp }\left( { - \beta_{\text{O}} {\text{Y}}_{\text{O}} } \right) $$
(8)
where YO is the previously observed PGA score, and βO is the effect on dropout to be estimated. The ID model, however, could not be reasonably adapted because it is not reasonable to assume that PGA scores would be available to patients continuously between study visits. In addition, for a categorical endpoint and its given distribution, the best predictor may not be directly clear, nor is it likely informative. Alternatively, it is reasonable to assume that the overall disease status change, i.e., fz(t) + fp(t) + fd(t) + η in Eq. 2, contributes to how a patient perceives treatment success and contributes to dropout likelihood. This leads to the model
$$ {\text{ID: }}{\text{h}}\left( {\text{t}} \right) = {\text{a}}\lambda {\text{t}}^{{{\text{a}} - 1}} *{ \exp }\left( { - \beta_{\text{O}} {\text{Y}}_{\text{O}} - \beta_{ 1} \left( {{\text{f}}_{\text{z}} \left( {\text{t}} \right) + {\text{f}}_{\text{p}} \left( {\text{t}} \right) + {\text{f}}_{\text{d}} \left( {\text{t}} \right) + \eta } \right)} \right), $$
(9)
and
$$ {\text{RID: }}{\text{h}}\left( {\text{t}} \right) = {\text{a}}\lambda {\text{t}}^{{{\text{a}} - 1}} *{ \exp }\left( { - \beta_{ 1} \left( {{\text{f}}_{\text{z}} \left( {\text{t}} \right) + {\text{f}}_{\text{p}} \left( {\text{t}} \right) + {\text{f}}_{\text{d}} \left( {\text{t}} \right) + \eta } \right)} \right), $$
(10)
where β1 is the informative dropout effect to be estimated.

A longitudinal-dropout model thus consists of Eqs. 26 plus one from Eqs. 710.

The graphical assessments in Hu and Sale [18] were adopted to assess whether CRD is reasonable, along with the modified Cox-Snell residual plots to assess the dropout fit.

A sequential PK/PD approach was used. Limited simultaneous fitting [26] was also explored.

Visual predictive check of joint longitudinal-dropout model

Typically, a VPC simulates replicates of the original datasets from the model, forms a predictive distribution of the data time trend, and compares this trend with the observed data. Ordinary implementations of VPC do not account for dropout. In theory [18], this is acceptable under CRD, but would be biased under RD and ID models. To formulate the theoretical framework of VPC when dropout is present, assume that Di is the dosing history for subject \( {\text{i}} = 1, \ldots ,{\text{ n}} \), ηi is the subject-specific model parameter, Yi is the observed longitudinal data (PGA) vector, and Ti is the observed dropout time. Note that Di is only available prior to the dropout time Ti. The intended future dosing schedule is denoted as DiF. For the ease of formulation, the general continuous variable case, i.e., PGA treated as a continuous variable even though it is not the modeling approach in this manuscript, is presented first. The typical objective of a VPC is to compare an observed data quantile, e.g., 50% (median),
$$ {\text{Y}}_{\text{med}} \left( {\text{t}} \right) = {\text{median}}\left[ {{\text{Y}}_{\text{i}} \left( {\text{t}} \right)} \right] $$
with the corresponding model predictive distribution obtained via simulations. The joint dropout model requires that, for each replicate dataset, the data for subject i to be simulated from the joint model-derived distribution
$$ {\text{P}}\left( {{\text{Y}}_{\text{i}} \left( {\text{t}} \right),{\text{T}}_{\text{i}} |{\text{D}}_{\text{i}} ,{\text{D}}^{\text{F}}_{\text{i}} ,\eta_{\text{i}} } \right), $$
where Yi(t) is available only if t < Ti. This shows that, in principle, the probability distribution of Yi depends on the dropout time Ti. Under the selection-model formulation, obtaining the probability distribution of Ti requires knowing the future dosing DiF reliably.

Sometimes a nominal dosing regimen may be assumed, e.g., when predicting the outcomes of fixed dosing regimens [18]. At other times, the actual dosing regimens are likely to vary significantly from the intended regimen, which in part depends on the clinical trial conduct. Another instance is when dose titration is present, in which case, predicting future dosing regimens relies on the current model being accurate, which interjects additional uncertainty. Both of these occurred in PHOENIX 1 and 2. It may be tempting to still conduct the VPC by assuming a known future dosing regimen, such as the protocol-specified dosing regimens, last dose carried forward, or average dose in study arm. However, simple imputations of these types are unlikely to be accurate. For example, during the titration phase, a patient who dropped out due to (unobserved) efficacy would have had a higher future dose, thus for this patient the last dose carried forward would be too low. Likewise, imputing the future dose with the average dose in the study arm would produce dosing regimens with falsely reduced variability. It is clear from the joint model that errors in dosing regimens affect the longitudinal data as well as the dropout predictions. Thus, VPCs falsely assuming unknown future dosing as known will introduce biases in the predictive distributions, and the severity of the bias will depend on how likely the assumed future dosing regimen will deviate from the unknown truth.

However, the dependence on future unknown dosing can be circumvented by avoiding the full joint distribution P(Yi(t), Ti|Di, DiF, ηi), and instead working with the conditional distribution
$$ {\text{P}}\left( {{\text{Y}}_{\text{i}} \left( {\text{t}} \right)|{\text{T}}_{\text{i}} ,{\text{D}}_{\text{i}} ,{\text{D}}^{\text{F}}_{\text{i}} ,\eta_{\text{i}} } \right) = {\text{P}}\left( {{\text{Y}}_{\text{i}} \left( {\text{t}} \right)|{\text{T}}_{\text{i}} ,{\text{D}}_{\text{i}} ,\eta_{\text{i}} } \right) $$
That is, while a lack of future dosing information prevents the simulation of dropout, a conditional VPC of the joint model can still be conducted on its longitudinal subcomponent. It may be motivated from the pattern-mixture formulation of the joint model of informative dropout, and in essence checks the predictions of P(YO|T), by interpreting the observed longitudinal data as being conditional on the observed dropouts. To illustrate its implementation, for given dropout times (Ti), the observed data trend is denoted as
$$ {\text{Y}}_{\text{med}} \left( {\text{t}} \right) = {\text{median}}\left[ {{\text{Y}}_{\text{i}} \left( {\text{t}} \right)|{\text{D}}_{\text{i}} ,\eta_{\text{i}} ,{\text{T}} = {\text{T}}_{\text{i}} } \right] $$

Conducting a VPC on this requires simulating the distribution of model predictions from P(Y|Di, ηi, T = Ti), which is not explicitly available. However, it can be obtained via a conditional simulation, by simulating a large number of (Y*i, T*i) from the joint model and then selecting those Y*i with T*i = Ti. For example, in the case of interval dropout where Ti = (ti−1, ti), one may obtain a 90% predictive interval from the collection of {Y*i|Di, ηi, ti−1 < T*i < ti}. A difficulty, however, is that the number of simulations can become prohibitively large with narrow intervals and theoretically impossible when Ti is known exactly. Therefore, the larger set {Y*i|Di, ηi, ti−1 < T*i} is proposed as an approximation. That is, the conditioning is on the event that the dropout occurred after time ti−1 instead of within the interval of (ti−1, ti). For RD and ID models, this approximation provides a lower bound of the dropout effect on the VPC result. In particular, under the assumption that patients with worse disease status would be more likely to drop out, the larger set {Y*i|Di, ηi, ti−1 < T*i} is likely to be better than the intended set {Y*i|Di, ηi, ti−1 < T*i < ti}. The influence should, however, be small when the percentage of subjects who drop out is relatively small, as the completer set (ti = ∞): {Y*i|Di, ηi, ti−1 < T*i} are simulated exactly.

In the case of Yi(t) being ordered categorical data, simply replace Ymed (t) = median[Yi (t)] in the above with the vector of proportions of {Yi(t) ≤ k} at time t, where \( {\text{k}} = 0,{ 1},{ 2}, \ldots , \) are the score levels.

The ideal VPC compares the distributions of both observed and predicted data. However, in the case of categorical data, the cumulative probabilities are calculated from observed scores; i.e., the distribution of the cumulative probabilities is not directly observable and thus shall be ignored.

Note that the conditioning step of the conditional VPC approach in effect limits the between-subject variability of the subjects simulated. Therefore, prediction intervals generated by the conditional VPC approach may be expected to have reduced variability in comparison to VPCs conducted with the dropout ignored. This makes it easier to detect potential model misfits, and is the correct behavior under the joint longitudinal-dropout model.

Model selection

Candidate models are compared based on the NONMEM minimum objective function (MOF) values and a VPC. A decrease in MOF (ΔMOF) of at least 10.83, corresponding to a nominal p-value of 0.001, was judged as significant evidence to include an additional parameter.

External validation of the joint longitudinal-dropout model

The model developed from PHOENIX 2 data was validated using PHOENIX 1 data. Again, model validation should evaluate the predictive ability of the model in its intended use. The joint model consists of two components: PK/PD and dropout.

Pharmacokinetic/pharmacodynamic model validation

The endpoints of clinical interest are the probabilities of achieving a PGA score ≤1 and achieving a PGA score ≤2, indicative of disease severity being minimal or mild, respectively. Depending on the potential use of the model, several components would be of interest to validate; these may include placebo response, treatment effect onset, and the maintenance and long-term extension phases. For all treatment periods, the discrepancies between model-predicted and observed frequencies were calculated. The portion of the long-term extension period past week 100 is of separate interest because model building does not contain data from this period in PHOENIX 1. Therefore, the validation result on this period reflects the model’s extrapolation performance. Because dropout is an integral component of the entire model, it needs to be accounted for in calculating the model predictions. As noted previously, this requires using the conditional VPC approach, which validates the conditional longitudinal model P(YO|T).

Dropout model validation

If the intended use of the dropout model is to predict the dropout rate at specific times (e.g., when treatment comparisons on the primary endpoint of the clinical trial will be made), the observed and model-predicted dropout rates may be compared at these time points. Without knowing such specifics, a general measure may be considered. In addition, it may be desirable to validate specific sub-components of the joint longitudinal-dropout model that relate to dropout. One option (1) is to validate the marginal dropout model,
$$ P(T) = \int {P\left( {Y_{O} ,T|\eta } \right)P\left( {Y_{O} } \right)d} Y_{O} $$
which is not analytically available. Another option (2) is to validate the conditional dropout model,
$$ P\left( {T|\eta = E_{M} \left( {\eta |Y = Y_{O} } \right)} \right) $$
where EM denotes the expectation under the exposure–response model without the dropout component. In this formulation, η should be interpreted as individual model parameters instead of random effects and EM as empirical Bayesian estimates. A variation of option (2) is to also include dropout in the empirical Bayesian estimation as follows:
$$ E_{M} (\eta |Y = Y_{O} ,T = T_{O} )) $$

This would be more accurate, but requires using the observed dropout twice, which may be less appealing. However, this illustrates that validation results from option (2) could be conservative. To relate these options to the model equations, assuming the RID model (Eq. 10) is adopted, the entire model consists of Eqs. 26 and 10. Option (1) evaluates the ability of predicting the observed dropout of the entire joint model represented by Eqs. 26 and 10. Option (2) requires using Eqs. 26 and the new PK/PD data to obtain individual empirical Bayesian PK/PD parameters, fixing them in Eq. 10, and then validating only Eq. 10. Option (1) is more interesting, but as noted above, would require knowing the future dosing and is therefore difficult. Because there was no specific time for which the dropout rate is important, option (2) was used with the modified Cox-Snell residual plot to provide a general sense of agreement between the model prediction and the new data.

Final joint model

After model validation, the model developed from PHOENIX 2 was re-estimated using data from both PHOENIX 1 and 2. Technically speaking, the model validation was applicable only to the initial model. As the final model was based on more data, it should be considered more accurate. The earlier model validation results should be viewed as a conservative estimate of the predictive ability of the final model.

Results

Baseline demographic and disease characteristics for patients included in the current analysis were similar for PHOENIX 1 and PHOENIX 2 as shown in Zhou et al. [10]. Few observed PGA scores had a value of 5 (n = 149; 0.7%), therefore these data were merged with those for a PGA score of 4 in order to be parsimonious.

Pharmacokinetic model

Approximately 20% of the data were below the LLOQ. PK model parameter estimates were relatively similar to previous results [8, 9], and thus not reported here.

Pharmacokinetic/pharmacodynamic model

To implement the constraints of αk being monotonically increasing in k for k = 0, 1, 2, and 3 in Eq. 2, they were re-parameterized as α2, d0, d1, and d3, respectively, where d0, d1, and d3 > 0, such that α1 = α2 − d1, α0 = α1 − d0, and α3 = α2 + d3. At an early exploratory model development stage that did not include disease progression, the tolerance component was significant (ΔMOF = 43). However, with the inclusion of the disease progression component, the tolerance component became non-significant (ΔMOF = 0), and the disease progression model fit much better than the tolerance model (ΔMOF > 200). Thus, the model without the tolerance component was used. Model parameter estimates and standard errors are given in Table 1. The fit was stable, and standard errors were relatively small.
Table 1

Model parameter estimates

Parameter

Initial PK/PD model

Initial joint model

Final joint model

Estimate (SE)

Estimate (SE)

Estimate (SE)

α2

−2.95 (0.0925)

−3.07 (0.0981)

−2.94 (0.075)

d0

2.39 (0.0485)

2.39 (0.049)

2.28 (0.0357)

d1

2.58 (0.0508)

2.58 (0.0513)

2.46 (0.0377)

d3

3.36 (0.0849)

3.36 (0.0849)

3.12 (0.0599)

plbmax

1.37 (0.173)

1.35 (0.152)

1.92 (0.137)

Rp (day−1)

0.0388 (0.0069)

0.0464 (0.008)

0.023 (0.00227)

kout (day−1)

0.0243 (0.00113)

0.0242 (0.00125)

0.025 (0.00111)

DE

6.73 (0.226)

6.84 (0.269)

5.84 (0.166)

β (day−1)

0.00183 (0.000174)

0.00187 (0.00019)

0.00143 (0.0001)

λ (day−1)

0.00784 (0.000858)

0.00926 (0.0008)

a

0.691 (0.0196)

0.638 (0.0141)

β1

0.34 (0.0392)

0.322 (0.0175)

Var(η)

4.01 (0.202)

3.99 (0.201)

3.92 (0.144)

α2, d0, d1, d3, intercept parameters; plbmax, maximum placebo effect; Rp, rate of placebo effect onset; kout, disease amelioration rate; β, rate of disease progression; λ, baseline dropout rate; a, dropout shape parameter; β1, informative dropout effect; Var(η), variance of η; SE standard error, DE drug effect

VPC results of this model are shown in Fig. 2. For data analysts it could be informative to plot the results in terms of individual probabilities prob(PGA = k). However, to be consistent with the model formation (Eq. 2) and, more importantly, to show the results in terms of practical interest, the observed cumulative frequencies and the corresponding 90% prediction intervals of prob(PGA ≤ k) for k = 0, 1, and 2 and each trial arm are plotted. Because virtually all patients achieved PGA scores ≤3 by week 28, prob(PGA ≤ 3) = 1 occurred for observed data as well as the model prediction. Hence, the plot of prob(PGA ≤ 3) is not shown. The prob(PGA ≤ 1) and prob(PGA ≤ 2) are the endpoints of clinical interest, for which the model predictions are reasonably close to observed data for these important endpoints. Some discrepancies appeared between observed and model predictions for prob(PGA = 0). Overall, the model predicted the PGA frequencies reasonably well.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig2_HTML.gif
Fig. 2

Visual predictive check of the initial pharmacokinetic/pharmacodynamic model built with PHOENIX 2 data. Observed, predicted, and 90% Prediction intervals of probabilities of achieving physician’s global assessment (PGA) scores of 0, 1, and 2 versus visit time are plotted by treatment (TRT) groups (1 and 2 = placebo until week 12, followed by ustekinumab 45 or 90 mg; 3 = ustekinumab 45 mg; 4 = ustekinumab 90 mg). PGA physician’s global assessment; PI prediction interval

Several alternative models were also attempted and none resulted in a meaningful improvement of the fit. The lack of improvement in the mechanistic tolerance model [16, 17] is of notable interest because, for the observed data, treatments 1 and 2 (placebo until week 12 followed by ustekinumab 45 or 90 mg, respectively) appeared to reach higher efficacy rates than treatments 3 and 4 (ustekinumab 45 and 90 mg) after week 28. The number of patients who dropped out in the first 12 weeks for treatments 1–4 were 4, 3, 4, and 6, respectively, not enough to notably contribute to the treatment efficacy differences. Because inclusion of the tolerance terms did not result in any improvement of the fit, the apparent difference between placebo and active treatment arms may be due to noise. However in principle, the lack of improvement of the specific tolerance model could be due to an inadequate representation of tolerance and does not necessarily prove absence of tolerance.

Dropout model

Figure 3 plotted average observed scores of patients who dropped out in the next visit versus those who did not, as mentioned earlier. It suggests that dropout is highly correlated to observed PGA scores and thus CRD is unlikely. To verify the necessity of the Weibull hazard model, the simplified constant hazard model with a = 1 was also explored. Model development started with the CRD model with constant hazard function, and the RID Weibull model (Eq. 10) gave the best fit, improving from other models with NONMEM objective function differences ranging between 47 to 450. The shape parameter a < 1 indicated that the dropout hazard decreased over time, which is sensible. No other covariate effects on dropout were explored.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig3_HTML.gif
Fig. 3

Average physician’s global assessment (PGA) scores of patients who dropped out in the next visit versus those of who did not drop out in PHOENIX 2

The modified Cox-Snell residuals [18] were used to assess dropout model fits, and several results are shown in Fig. 4. Figure 4a shows that the RID constant hazard model had a consistent bias for later dropouts; Fig. 4b shows that the RD Weibull hazard model could also be improved. Figure 4c shows that the RID Weibull hazard model fit the best.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig4_HTML.gif
Fig. 4

Dropout goodness of fits for models of (a) restrictive informative dropout constant hazard, (b) random dropout Weibull hazard, and (c) restrictive informative dropout Weibull hazard for PHOENIX 2 data

Model parameter estimates of the RID Weibull hazard model are shown in Table 1. Compared to the initial estimates, these estimates show that informative dropout had a minor influence on model parameters. All parameters were well estimated with relatively small standard errors.

The conditional VPC results of the RID Weibull hazard model are shown in Fig. 5. Compared with Fig. 2, the fit did not appear to be very different from the initial PK/PD model. However, the conditional VPC has narrower prediction intervals, which may be expected.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig5_HTML.gif
Fig. 5

Conditional visual predictive check for the pharmacokinetic/pharmacodynamic component of the initial longitudinal-dropout model built with PHOENIX 2 data. PI prediction interval, PGA physician’s global assessment, TRT treatment group

The simultaneous estimation of the initial model lasted nearly 2 days on an IBM IntelliStation Z Pro workstation, and thus was applied only in addition to the final model. For both models, similar results were reached as with the sequential estimation method.

Model validation

Validation results for the longitudinal and dropout models are described separately below.

Pharmacokinetic/pharmacodynamic model validation

The conditional VPC method above was used to account for the effect of dropout. For an overview, the observed data frequencies and model predictions for PHOENIX 1 are given in Fig. 6.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig6_HTML.gif
Fig. 6

Conditional external-validation visual predictive check using PHOENIX 1 data for the pharmacokinetic/pharmacodynamic component of the initial longitudinal-dropout model. PI prediction interval, PGA physician’s global assessment, TRT treatment group

The mean absolute differences in model-predicted and observed frequencies for different treatment periods are shown in Table 2. These results indicate that when the model is used to predict the effect in the placebo-crossover period, the prediction errors were relatively large and on average could exceed 8%. For the treatment optimization period the prediction errors were relatively small, with an average absolute error of <3%. The prediction errors were about 4 and 6% for the long-term extension period and the extrapolation period, respectively.
Table 2

Conditional validation of PK/PD component of initial joint PK/PD-dropout model: mean absolute differences in model predicted and observed frequencies for different treatment periods

Discrepancies in % unit

Week 0–12

Week 12–28

Week 28–52

Week 52–100

Week >100

Placebo

Ustekinumab 45 or 90 mg

Placebo cross-over

Ustekinumab 45 or 90 mg

Ustekinumab 45 or 90 mg

Long-term extension (Ustekinumab 45 or 90 mg)

Prob(PGA ≤ 1)

2.2

6.7

8.1

2.5

5.0

3.8

6.1

Prob(PGA ≤ 2)

6.6

4.4

8.5

2.3

3.6

6.6

5.3

PK pharmacokinetic, PD pharmacodynamic, PGA physician’s global assessment

The discrepancy in the initial period of treatment 2 appeared to be primarily due to the larger placebo response in PHOENIX 1 when compared with PHOENIX 2. This was attributed in part to the discrepancies in predicting the treatment onset period. In addition, the treatment onset, during which the treatment effect was increasing, can also be expected to be more difficult to model than the treatment optimization period, where the effect was more stable.

Dropout model validation

To validate the conditional dropout model (Eq. 10), modified Cox-Snell residuals of PHOENIX 2 dropouts, conditional on the empirical Bayesian parameters based on the initial PK/PD–dropout model, were calculated. The resulting plot (Fig. 7) suggests that the dropout rates in PHOENIX 1 and 2 differed notably.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig7_HTML.gif
Fig. 7

Conditional external-validation of the dropout component of the initial longitudinal-dropout model, using PHOENIX 1 data and modified Cox-Snell residuals. PGA physician’s global assessment, TRT treatment group

Final model

The PK/PD-dropout model was re-estimated with combined data from PHOENIX 1 and PHOENIX 2, and the parameter estimates are shown in Table 1. The baseline intercept and ID parameter estimates changed little from the initial model. Estimates of the placebo effect changed, with a higher maximum effect but slower rate of onset. This can be expected from the discrepancy between the placebo effects in the two studies. This may, in part, contribute to some changes in drug effect estimates. Additionally, the baseline dropout hazard rate estimate decreased. The changes could be expected from the validation results. Standard errors of parameter estimates also decreased, as the result of including more data.

Conditional VPC of the final model is shown in Fig. 8. There were some unexpected differences between observed frequencies in different treatments. For example, even with the use of placebo before week 12, treatment 2 still had higher observed frequencies than treatment 4 around week 30. In light of these, the model predictions may be viewed as reasonable.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig8_HTML.gif
Fig. 8

Conditional visual predictive check of the pharmacokinetic/pharmacodynamic component of the final joint model, using combined data from PHOENIX 1 and 2. Probabilities are plotted by treatment (TRT) groups (1 and 2 = placebo until week 12, followed by ustekinumab 45 or 90 mg; 3 = ustekinumab 45 mg; 4 = ustekinumab 90 mg). PI prediction interval

Dropout goodness of fit of the final model appeared reasonable (Fig. 9). This implies that the notable predictive bias shown in Fig. 7 may be due more to small dropout numbers in the model building dataset than to inter-study variability.
https://static-content.springer.com/image/art%3A10.1007%2Fs10928-011-9191-7/MediaObjects/10928_2011_9191_Fig9_HTML.gif
Fig. 9

Dropout goodness of fit for the final joint model, using combined data from PHOENIX 1 and 2

Discussion

Ustekinumab, a first-in-class anti-IL12/23p40 human monoclonal antibody, is highly effective in treating moderate-to-severe plaque psoriasis [22, 23]. The PK analysis currently conducted included additional long term data and also data below the LLOQ, with results consistent with those from earlier analyses [8, 9], leading to increased confidence in understanding the PK of ustekinumab. The semi-mechanistic model described the PGA score well. As a categorical endpoint, the PGA score is less informative than continuous endpoints and did not allow assessment of between-subject variability for most model parameters of interest, which limits our ability to discern covariate influences on model parameters. Nevertheless, the model can still serve to guide future selection of dosing regimens. In addition, this establishes a needed modeling framework to describe PGA scores. When combined, these results are novel and important, especially because the PGA score is gaining prominence as a primary endpoint in psoriasis clinical trials.

An alternative approach to modeling the placebo and disease progression effects is to have them influencing an indirect model parameter [10, 14, 27]. In the absence of a mechanistic rationale, the current approach (Fig. 1) enables direct observation of these effects, thus estimation may be easier. To our knowledge, the interpretation of a disease progression effect has not been applied to exposure–response modeling of categorical data. The disease progression effect may be especially noticeable with long-term data over 1 year. It was shown that an apparent tolerance may be confounded with disease progression, and that the current data did not appear to support the presence of tolerance. This further supported the sustainable efficacy of ustekinumab for the treatment of moderate-to-severe psoriasis.

Our previous study identified a PK/PD relationship between ustekinumab levels and PASI response [10]. PASI is a composite endpoint that scores the proportion of BSA involved with psoriasis as well as plaque features of induration, scaling, and erythema, while the PGA scores only induration, scaling, and erythema. Therefore, these results demonstrate that there is a PK/PD relationship between ustekinumab levels and plaque induration, scaling, and erythema; however, these results do not address whether there is a PK/PD relationship between ustekinumab levels and involved BSA, and it remains possible that the association between the PK properties of ustekinumab and PASI response may result from the effects of ustekinumab on induration, scaling, and erythema. Additional work is warranted to evaluate the association between ustekinumab levels and improvements in involved BSA.

Appropriately accounting for the dropout effect has been shown to be important in longitudinal data modeling. This is the first application of the ID methodology to exposure–response modeling of ordered categorical data, with practically implementable model evaluation methods developed. This is also the first use of the flexible Weibull hazard function in the framework of informative exposure-response dropout modeling. Hu and Sale [18] stated that ID is more likely to be significant for longer term data. The current analysis used data that were collected over years instead of weeks as in Hu and Sale [18]. Thus, accounting for ID can be highly important. Hu and Sale [18] have also pointed out that, from a methodological standpoint, ID cannot be proven from observed data but is related to the assumptions of the longitudinal data component. These assumptions thus dictate whether ID results in better estimation of the longitudinal data model.

VPC of an ID model can be challenging, particularly when the future dosing regimen is uncertain, which in principle makes a VPC of dropout infeasible. However, the longitudinal and dropout components of the joint model can be checked separately, using the conditional VPC approach developed here. The approach, however, is more computationally intensive, and approximations may be required when the dataset is large. The extent of dropout influence on longitudinal data will be situation specific and will remain unknown until an appropriate analysis has been conducted. The conditional VPC approach can be used to investigate this, by comparing its result with that of an ordinary VPC.

It will be of interest to develop approaches for dropout VPC that are easier to evaluate, such as those of the Kaplan–Meier type. These will certainly be more computationally intensive compared with the modified Cox-Snell residuals used here, especially for RD and RID models due to the lack of analytical availability of the assumed dropout distributions.

Model validation estimates the predictive capability of the model. Ordinarily conducted VPCs and bootstraps are actually model evaluation techniques. Even when an external dataset is used, comparing observations to individual Bayesian (posthoc) estimates uses observed data twice and therefore tends to provide overly optimistic results. Appropriate model validation should focus on likely future applications instead of the current data, report a quantitative measure, and provide conservative results (if need be) instead of optimistic ones.

In conclusion, a novel semi-mechanistic joint longitudinal-dropout model was developed to link the ustekinumab dosing regimen to PGA scores. Use of the appropriate technique and interpretation of model validation was shown to be important to characterize the predictive ability of the model.

Acknowledgements

This study was funded by Centocor Research and Development, Inc. The authors are indebted to Dr. Kathleen Seitz and Ms. Alice Zong of Centocor Research & Development, Inc. for their programming support in preparing the analysis datasets. In addition, the authors would like to thank the two reviewers for their insightful comments, and also Dr. Rebecca E. Clemente and Mr. Robert Achenbach of Centocor Ortho Biotech Services, LLC. for their excellent assistance in preparing the manuscript.

Copyright information

© Springer Science+Business Media, LLC 2011