# Informative dropout modeling of longitudinal ordered categorical data and model validation: application to exposure–response modeling of physician’s global assessment score for ustekinumab in patients with psoriasis

## Authors

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s10928-011-9191-7

- Cite this article as:
- Hu, C., Szapary, P.O., Yeilding, N. et al. J Pharmacokinet Pharmacodyn (2011) 38: 237. doi:10.1007/s10928-011-9191-7

- 14 Citations
- 342 Views

## Abstract

The physician’s global assessment (PGA) score is a 6-point measure of psoriasis severity that is widely used in clinical trials to assess response to psoriasis treatment. The objective of this study was to perform exposure–response modeling using the PGA score as a pharmacodynamic endpoint following treatment with ustekinumab in patients with moderate-to-severe psoriasis who participated in two Phase 3 studies (PHOENIX 1 and PHOENIX 2). Patients were randomly assigned to receive ustekinumab 45 or 90 mg or placebo, followed by active treatment or placebo crossover to ustekinumab, dose intensification or randomized withdrawal and long-term extension periods. A novel joint longitudinal-dropout model was developed from serum ustekinumab concentrations, PGA scores, and patient dropout information. The exposure–response component employed a semi-mechanistic drug model, integrated with disease progression and placebo effect under the mixed-effect logistic regression framework. This allowed potential tolerance to be investigated with a mechanistic approach. The dropout component of the joint model allowed the examination of its potential influence on the exposure–response relationship. The flexible Weibull dropout hazard function was used. Visual predictive check of the joint longitudinal-dropout model required special handling, and a conditional approach was proposed. The conditional approach was extended to external model validation. Finally, appropriate interpretation of model validation is discussed. This longitudinal-dropout model can serve as a basis to support future alternative dosing regimens for ustekinumab in patients with moderate-to-severe plaque psoriasis.

### Keywords

Monoclonal antibodyInformative censoringNonlinear mixed effect modelingVisual predictive checkNONMEM## Introduction

Psoriasis is a chronic immune-mediated skin disorder that affects approximately 2–3% of the population worldwide [1–3]. Interleukin (IL)-12 and IL-23 have been implicated in the pathogenesis of psoriasis [4–6], and anti-IL-12/23 agents have shown promise for treating psoriasis and psoriatic arthritis [7]. Ustekinumab is a human monoclonal antibody that binds with high specificity and affinity to the shared p40 subunit of IL-12 and IL-23 and blocks interaction with the IL-12Rβ1 cell surface receptor. Population pharmacokinetics (PK) of ustekinumab were recently reported [8, 9]. In addition, a population mechanistic PK/pharmacodynamic (PD) model was developed using Psoriasis Area and Severity Index (PASI) scores as a nearly continuous measure of improvement in disease conditions [10].

The physician’s global assessment (PGA) score, an alternative to the PASI score, is a 6-point scale that measures disease severity (0 = cleared, 1 = minimal, 2 = mild, 3 = moderate, 4 = marked, and 5 = severe) and is commonly used as a primary or secondary endpoint in psoriasis clinical trials. It is a composite index describing the severity of the three main characteristics of psoriatic plaques: erythema, scaling, and thickness [11]. The proportions of patients achieving a PGA score ≤1 or a PGA score ≤2 are of particular clinical interest and are used as clinical trial endpoints. To our knowledge, no previous PK/PD models have been established using PGA scores in patients with psoriasis.

_{p}(t) represents the time effect, f

_{d}(t) represents the drug or concentration effect, and η is a normally distributed random variable with a mean of 0 and represents the between-subject variability. In this formulation, the sum γ + f

_{p}(t) + η may be interpreted as the placebo effect. Ideally, the form of f

_{d}(t) would reflect the underlying pharmacology. This approach was recently applied in rheumatoid arthritis [12–14]. Hutmacher et al. [12] was the first to apply a latent variable model that used an indirect response model to characterize an underlying disease variable, which requires therapeutic area knowledge of the drug and disease. Lacroix et al. [13] used a Markov transition approach along with Eq. 1 and argued that it is superior to the latent variable approach previously proposed [12] in accommodating the within-individual correlations. Most recently, Hu et al. [14] further developed the latent variable approach with a feature that is more advantageous in modeling change-from-baseline type of variables. Hu et al. [14] argued that, while a latent-indirect model did not model within-individual correlations, it had three advantages over the Markov approach: (1) it is more likely to be predictive, (2) it relates more easily to the main clinical trial endpoint of interest, and (3) it is easier to interpret.

Although not applied to the setting of categorical analysis to our knowledge, refinements to Eq. 1 are possible. Holford and Nutt [15] argued for the importance of incorporating disease progression in PK/PD modeling, even though few patients are untreated which prevents the direct modeling of disease progression. At times, tolerance of the drug may develop, which can be modeled empirically or mechanistically. A model consistent with the pharmacology of the drug would likely better predict when extrapolated to new dosing regimens and time periods. In the widely used class of indirect models, a precursor-dependent indirect model has been developed to model drug tolerance [16, 17].

_{i−1}, t

_{i}) is calculated as S(t

_{i−1}) − S(t

_{i}). Hu and Sale [18] used the constant baseline hazard h(t) = λ. Conceivably, the hazard could increase or decrease with time, as patients might tend to dropout more or less, respectively, as treatment continues. The Weibull hazard function h(t) = aλt

^{a−1}, where a and λ > 0, is more flexible [19]. It reduces to a constant hazard when a = 1 and will increase or decrease with time depending on whether a > 1 or a < 1, respectively.

_{O}= (\( {\text{Y}}_{ 1} ,{\text{ Y}}_{ 2} {\ldots} {\text{Y}}_{{{\text{T}} - 1}} \)) be the observed disease progress and T be the dropout time such that the subject has not dropped out yet at time t

_{i−1}. The classification is based on the random indicator variables T

_{i}, which takes the value of 0 or 1 and indicates whether the subject will drop out at time t

_{i}. Also let Y

_{U}= {Y(θ, η, t); t

_{i−1}< t < t

_{i}} be the process of unobserved disease progress between the previous observation t

_{i−1}and the current time t

_{i}. Then CRD, RD, and ID are classified as follows:

- (a)
completely random (CRD), if T

_{i}is independent of η, and therefore (Y_{O}, Y_{U}). - (b)
random (RD), if T

_{i}(given Y_{O}) is independent of Y_{U}, but may depend on Y_{O}. In addition, any dependence of T_{i}on η is only through Y_{O}. - (c)
informative (ID), if T

_{i}(given Y_{O}) depends on Y_{U}, or explicitly depends on η other than through Y_{O}.

In the case of ID, the longitudinal model parameters must be simultaneously estimated with those of the dropout model. Whether the data support CRD could be graphically explored by plotting average observed scores of patients who dropped out in the next visit versus those who did not drop out. A special case of ID is the restrictive informative dropout (RID) model, where h(t) depends on the unobserved disease status, but not the observed.

_{O}) and the dropout (T) as in Eq. 1 in Hu and Sale [18]

_{O}|η) is the usual PD model expressing Y

_{O}as a function of the fixed and random effects, and P(T|Y

_{O}, Y

_{U}, η) is the dropout model specifying the potential dependence of dropout time T on both observed and unobserved data, as well as the inter-individual random effects. Another possible approach, often used in statistical literature, is the pattern-mixture model approach using a different factorization of the joint likelihood as

This approach specifies the dropout model first and then the longitudinal model conditional on the dropout outcome.

Sheiner [20] described nonrandom censoring in a joint analysis with ordered categorical data. To our knowledge, ID modeling has not yet been applied to exposure–response (i.e., PK/PD) modeling of categorical data. In addition, when dropout is present, with the exception of simple dosing regimen situations, ordinary implementation of a visual predictive check (VPC) is inappropriate because it does not account for dropout. To emphasize the contrast with dropout, the term “longitudinal” is used herein to indicate the population PK/PD (i.e., exposure–response) component of the model.

Model validation is an important and often integral component of PK and PD modeling. However, confusion on this topic remains; Hu et al. [14] pointed out the misleading claim of describing a model as “validated” after conducting bootstrap analysis. In principle, model validation refers to the predictive ability of the model [21] and should therefore be a continuous measure instead of a simple yes/no.

The objective of the current analysis was to develop a semi-mechanistic population PK/PD model that allows investigation of potential tolerance and dropout influence. VPC implementations for longitudinal-dropout modeling are also discussed, and a method suitable under flexible dosing regimens is proposed. Some misconceptions of model validation are discussed, and aspects of appropriate uses are illustrated. Data from two large Phase 3 trials of ustekinumab in patients with moderate-to-severe psoriasis (PHOENIX 1 [22] and PHOENIX 2 [23]) were used. These are the same studies used in an earlier analysis by Zhou et al. [10]; however, the current analysis includes additional long-term data (randomized withdrawal and long-term extension periods).

## Materials and methods

### Study design

The patient populations and study designs for PHOENIX 1 [22] and PHOENIX 2 [23] have been previously described. Both were multicenter, randomized, double-blind, placebo-controlled, parallel design studies of patients with moderate-to-severe plaque psoriasis. The entry criteria identified patients with moderate-to-severe psoriasis based on PASI score and involved body surface area (BSA). The PASI measures disease severity based on plaque characteristics and the proportion of BSA involved with psoriasis. The PGA measures disease severity based on plaque characteristics only, without regard to affected BSA.

The study designs were complex, and consisted of placebo-controlled, placebo crossover, dose optimization, randomized withdrawal, and long-term extension periods. Briefly, in PHOENIX 1, a total of 766 patients were assigned to receive subcutaneous (SC) injections of ustekinumab 45 mg, ustekinumab 90 mg, or placebo at weeks 0 and 4, followed by active treatment every 12 weeks or placebo crossover to ustekinumab 45 or 90 mg starting at week 12 (weeks 12–40), followed by randomized withdrawal (weeks 40–76), and a long-term extension (week 76 onward) [22]. In PHOENIX 2, a total of 1,230 patients were assigned to the same treatment groups and design until week 28, followed by dose schedule optimization (weeks 28–52), and a long-term extension (week 52 onward) [23]. The current analysis used all data obtained up to week 100 in both trials. For PHOENIX 1, most patients had data up to week 136, with 157 patients having data at or beyond week 152.

### Serum ustekinumab measurement

Blood samples for the measurement of serum ustekinumab concentrations were collected at weeks 0, 4, 12, 16, 24, 28, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, and 88 (and every 24 weeks thereafter, if available) in PHOENIX 1 and at weeks 0, 4, 12, 16, 20, 24, 28, 40, 52, and 88 (and every 24 weeks thereafter, if available) in PHOENIX 2. At visits when patients received the study agent, blood samples were collected prior to study agent administration. A validated electrochemiluminescent immunoassay method, with a lower limit of quantification (LLOQ) of 0.17 μg/mL at a minimum required 1:10 dilution, was used to measure serum ustekinumab concentrations [8]. The PK data used in this analysis included longer term data than previously reported [8, 9].

### Physician’s global assessment score measurement

PGA scores were collected at weeks 0, 2, 4, and every 4 weeks thereafter up to week 88 or study unblinding, then at week 100 and every 12 weeks thereafter in PHOENIX 1 and at weeks 0, 2, 4, and every 4 weeks thereafter up to week 52 or study unblinding, and every 12 weeks after week 52 in PHOENIX 2. The numbers of patients with baseline PGA scores of 2, 3, 4, and 5 were 42, 388, 298, and 37, respectively, for PHOENIX 1 and 105, 642, 419, and 64, respectively, for PHOENIX 2.

## Dataset for Pharmacokinetic/Pharmacodynamic-dropout modeling and validation

Only patients with available PK data were included in the dataset. Data from PHOENIX 2 were used for model building, and data from PHOENIX 1 were reserved for model validation. All PK and PGA data through week 100 were included in the current analysis. For PHOENIX 2, there were 9,723 PK observations and 21,711 PGA scores from a total of 1,230 patients. For PHOENIX 1, there were 9,617 PK observations and 19,957 PGA scores from a total of 765 patients. In the PHOENIX 2 study, 211 (17.2%) patients discontinued the study at various times before week 100. The exact discontinuation times were available for 68 (5.5%) patients. In PHOENIX 1, 162 (21.2%) patients discontinued the study before week 152, and the discontinuation date was known for 48 (6.3%) patients.

### Software

### Population pharmacokinetc/pharmacodynamic model development

#### Pharmacokinetic model

A previously established one-compartment model with first-order absorption was used, using a confirmatory population PK model [9], and the empirical Bayesian estimates of individual PK parameters were obtained. The η-shrinkage for the apparent clearance and volume and ε-shrinkage were 8.3, 28.6, and 12.7%, respectively, for PHOENIX 2 and 9.0, 31.0, and 11.5%, respectively, for the combined dataset of PHOENIX 2 and 1. The confirmatory model was used because it was considered more robust than the exploratory model developed in Zhu et al. [8].

#### Pharmacokinetic/pharmacodynamic model

_{k}are model parameters that are monotonically increasing in k and represent the baseline probability of PGA distribution, and f

_{z}(t) represents disease progression. Specifically, disease progression was modeled as

_{max}is the maximum that may be reached by f

_{p}(t), and R

_{p}is the rate constant of onset. The model does not allow for a decline in the placebo response.

_{z}(t) + f

_{p}(t) represents the sum of disease progression and placebo effect. The interpretation of f

_{z}(t) as disease progression and f

_{p}(t) as the placebo effect is based on a reasonable assumption that the placebo effect plateaus in the short term and that the disease progression effect continues in the long term. The drug effect was modeled using a latent variable approach. Because ustekinumab blocks IL-12/23, which in turn affects the disease progression, this suggests that a Type I indirect model could be suitable [10]. Similar to an earlier approach [14], it was assumed that the drug effect is driven by a latent variable R(t), governed by

_{p}is the ustekinumab concentration, and k

_{in}, IC

_{50}, and k

_{out}are parameters in a Type I indirect model. This implicitly assumes that the maximum effect, E

_{max}, is 100%. As a latent variable, the scale of R(t) needs to be set, and it is further assumed that R = 1 at baseline, i.e., R(0) = 1, leading to k

_{in}= k

_{out}. The reduction of R(t) was assumed to drive the drug effect through

_{p}) to the reductions in PGA scores is presented in Fig. 1, where k

_{in}was fixed to k

_{out}R(0) = k

_{out}to preserve mass balance.

#### Tolerance assessment

### Dropout model development

For the patients with an unknown dropout time, it is reasonable to assume that dropout occurred between the time when the last observation was available and the next scheduled visit time, which is known as interval censoring in survival analysis [18]. Hu and Sale [18] did not specify how to model the exact dropout data; however, according to standard statistical theory, the likelihood of those data is given by f(t)*S(t), where f(t) is the density function for the corresponding probability distribution determined by the hazard function, h(t).

_{O}is the previously observed PGA score, and β

_{O}is the effect on dropout to be estimated. The ID model, however, could not be reasonably adapted because it is not reasonable to assume that PGA scores would be available to patients continuously between study visits. In addition, for a categorical endpoint and its given distribution, the best predictor may not be directly clear, nor is it likely informative. Alternatively, it is reasonable to assume that the overall disease status change, i.e., f

_{z}(t) + f

_{p}(t) + f

_{d}(t) + η in Eq. 2, contributes to how a patient perceives treatment success and contributes to dropout likelihood. This leads to the model

_{1}is the informative dropout effect to be estimated.

A longitudinal-dropout model thus consists of Eqs. 2–6 plus one from Eqs. 7–10.

The graphical assessments in Hu and Sale [18] were adopted to assess whether CRD is reasonable, along with the modified Cox-Snell residual plots to assess the dropout fit.

A sequential PK/PD approach was used. Limited simultaneous fitting [26] was also explored.

### Visual predictive check of joint longitudinal-dropout model

_{i}is the dosing history for subject \( {\text{i}} = 1, \ldots ,{\text{ n}} \), η

_{i}is the subject-specific model parameter, Y

_{i}is the observed longitudinal data (PGA) vector, and T

_{i}is the observed dropout time. Note that D

_{i}is only available prior to the dropout time T

_{i}. The intended future dosing schedule is denoted as D

_{i}

^{F}. For the ease of formulation, the general continuous variable case, i.e., PGA treated as a continuous variable even though it is not the modeling approach in this manuscript, is presented first. The typical objective of a VPC is to compare an observed data quantile, e.g., 50% (median),

_{i}(t) is available only if t < T

_{i}. This shows that, in principle, the probability distribution of Y

_{i}depends on the dropout time T

_{i}. Under the selection-model formulation, obtaining the probability distribution of T

_{i}requires knowing the future dosing D

_{i}

^{F}reliably.

Sometimes a nominal dosing regimen may be assumed, e.g., when predicting the outcomes of fixed dosing regimens [18]. At other times, the actual dosing regimens are likely to vary significantly from the intended regimen, which in part depends on the clinical trial conduct. Another instance is when dose titration is present, in which case, predicting future dosing regimens relies on the current model being accurate, which interjects additional uncertainty. Both of these occurred in PHOENIX 1 and 2. It may be tempting to still conduct the VPC by assuming a known future dosing regimen, such as the protocol-specified dosing regimens, last dose carried forward, or average dose in study arm. However, simple imputations of these types are unlikely to be accurate. For example, during the titration phase, a patient who dropped out due to (unobserved) efficacy would have had a higher future dose, thus for this patient the last dose carried forward would be too low. Likewise, imputing the future dose with the average dose in the study arm would produce dosing regimens with falsely reduced variability. It is clear from the joint model that errors in dosing regimens affect the longitudinal data as well as the dropout predictions. Thus, VPCs falsely assuming unknown future dosing as known will introduce biases in the predictive distributions, and the severity of the bias will depend on how likely the assumed future dosing regimen will deviate from the unknown truth.

_{i}(t), T

_{i}|D

_{i}, D

_{i}

^{F}, η

_{i}), and instead working with the conditional distribution

_{O}|T), by interpreting the observed longitudinal data as being conditional on the observed dropouts. To illustrate its implementation, for given dropout times (T

_{i}), the observed data trend is denoted as

Conducting a VPC on this requires simulating the distribution of model predictions from P(Y|D_{i}, η_{i}, T = T_{i}), which is not explicitly available. However, it can be obtained via a conditional simulation, by simulating a large number of (Y*_{i}, T*_{i}) from the joint model and then selecting those Y*_{i} with T*_{i} = T_{i}. For example, in the case of interval dropout where T_{i} = (t_{i−1}, t_{i}), one may obtain a 90% predictive interval from the collection of {Y*_{i}|D_{i}, η_{i}, t_{i−1} < T*_{i} < t_{i}}. A difficulty, however, is that the number of simulations can become prohibitively large with narrow intervals and theoretically impossible when T_{i} is known exactly. Therefore, the larger set {Y*_{i}|D_{i}, η_{i}, t_{i−1} < T*_{i}} is proposed as an approximation. That is, the conditioning is on the event that the dropout occurred after time t_{i−1} instead of within the interval of (t_{i−1}, t_{i}). For RD and ID models, this approximation provides a lower bound of the dropout effect on the VPC result. In particular, under the assumption that patients with worse disease status would be more likely to drop out, the larger set {Y*_{i}|D_{i}, η_{i}, t_{i−1} < T*_{i}} is likely to be better than the intended set {Y*_{i}|D_{i}, η_{i}, t_{i−1} < T*_{i} < t_{i}}. The influence should, however, be small when the percentage of subjects who drop out is relatively small, as the completer set (t_{i} = ∞): {Y*_{i}|D_{i}, η_{i}, t_{i−1} < T*_{i}} are simulated exactly.

In the case of Y_{i}(t) being ordered categorical data, simply replace Y_{med} (t) = median[Y_{i} (t)] in the above with the vector of proportions of {Y_{i}(t) ≤ k} at time t, where \( {\text{k}} = 0,{ 1},{ 2}, \ldots , \) are the score levels.

The ideal VPC compares the distributions of both observed and predicted data. However, in the case of categorical data, the cumulative probabilities are calculated from observed scores; i.e., the distribution of the cumulative probabilities is not directly observable and thus shall be ignored.

Note that the conditioning step of the conditional VPC approach in effect limits the between-subject variability of the subjects simulated. Therefore, prediction intervals generated by the conditional VPC approach may be expected to have reduced variability in comparison to VPCs conducted with the dropout ignored. This makes it easier to detect potential model misfits, and is the correct behavior under the joint longitudinal-dropout model.

#### Model selection

Candidate models are compared based on the NONMEM minimum objective function (MOF) values and a VPC. A decrease in MOF (ΔMOF) of at least 10.83, corresponding to a nominal p-value of 0.001, was judged as significant evidence to include an additional parameter.

### External validation of the joint longitudinal-dropout model

The model developed from PHOENIX 2 data was validated using PHOENIX 1 data. Again, model validation should evaluate the predictive ability of the model in its intended use. The joint model consists of two components: PK/PD and dropout.

#### Pharmacokinetic/pharmacodynamic model validation

The endpoints of clinical interest are the probabilities of achieving a PGA score ≤1 and achieving a PGA score ≤2, indicative of disease severity being minimal or mild, respectively. Depending on the potential use of the model, several components would be of interest to validate; these may include placebo response, treatment effect onset, and the maintenance and long-term extension phases. For all treatment periods, the discrepancies between model-predicted and observed frequencies were calculated. The portion of the long-term extension period past week 100 is of separate interest because model building does not contain data from this period in PHOENIX 1. Therefore, the validation result on this period reflects the model’s extrapolation performance. Because dropout is an integral component of the entire model, it needs to be accounted for in calculating the model predictions. As noted previously, this requires using the conditional VPC approach, which validates the conditional longitudinal model P(Y_{O}|T).

#### Dropout model validation

_{M}denotes the expectation under the exposure–response model without the dropout component. In this formulation, η should be interpreted as individual model parameters instead of random effects and E

_{M}as empirical Bayesian estimates. A variation of option (2) is to also include dropout in the empirical Bayesian estimation as follows:

This would be more accurate, but requires using the observed dropout twice, which may be less appealing. However, this illustrates that validation results from option (2) could be conservative. To relate these options to the model equations, assuming the RID model (Eq. 10) is adopted, the entire model consists of Eqs. 2–6 and 10. Option (1) evaluates the ability of predicting the observed dropout of the entire joint model represented by Eqs. 2–6 and 10. Option (2) requires using Eqs. 2–6 and the new PK/PD data to obtain individual empirical Bayesian PK/PD parameters, fixing them in Eq. 10, and then validating only Eq. 10. Option (1) is more interesting, but as noted above, would require knowing the future dosing and is therefore difficult. Because there was no specific time for which the dropout rate is important, option (2) was used with the modified Cox-Snell residual plot to provide a general sense of agreement between the model prediction and the new data.

### Final joint model

After model validation, the model developed from PHOENIX 2 was re-estimated using data from both PHOENIX 1 and 2. Technically speaking, the model validation was applicable only to the initial model. As the final model was based on more data, it should be considered more accurate. The earlier model validation results should be viewed as a conservative estimate of the predictive ability of the final model.

## Results

Baseline demographic and disease characteristics for patients included in the current analysis were similar for PHOENIX 1 and PHOENIX 2 as shown in Zhou et al. [10]. Few observed PGA scores had a value of 5 (n = 149; 0.7%), therefore these data were merged with those for a PGA score of 4 in order to be parsimonious.

### Pharmacokinetic model

### Pharmacokinetic/pharmacodynamic model

_{k}being monotonically increasing in k for k = 0, 1, 2, and 3 in Eq. 2, they were re-parameterized as α

_{2}, d

_{0}, d

_{1}, and d

_{3,}respectively, where d

_{0}, d

_{1}, and d

_{3}> 0, such that α

_{1}= α

_{2}− d

_{1}, α

_{0}= α

_{1}− d

_{0}, and α

_{3}= α

_{2}+ d

_{3}. At an early exploratory model development stage that did not include disease progression, the tolerance component was significant (ΔMOF = 43). However, with the inclusion of the disease progression component, the tolerance component became non-significant (ΔMOF = 0), and the disease progression model fit much better than the tolerance model (ΔMOF > 200). Thus, the model without the tolerance component was used. Model parameter estimates and standard errors are given in Table 1. The fit was stable, and standard errors were relatively small.

Model parameter estimates

Parameter | Initial PK/PD model | Initial joint model | Final joint model |
---|---|---|---|

Estimate (SE) | Estimate (SE) | Estimate (SE) | |

α | −2.95 (0.0925) | −3.07 (0.0981) | −2.94 (0.075) |

d | 2.39 (0.0485) | 2.39 (0.049) | 2.28 (0.0357) |

d | 2.58 (0.0508) | 2.58 (0.0513) | 2.46 (0.0377) |

d | 3.36 (0.0849) | 3.36 (0.0849) | 3.12 (0.0599) |

plb | 1.37 (0.173) | 1.35 (0.152) | 1.92 (0.137) |

R | 0.0388 (0.0069) | 0.0464 (0.008) | 0.023 (0.00227) |

k | 0.0243 (0.00113) | 0.0242 (0.00125) | 0.025 (0.00111) |

DE | 6.73 (0.226) | 6.84 (0.269) | 5.84 (0.166) |

β (day | 0.00183 (0.000174) | 0.00187 (0.00019) | 0.00143 (0.0001) |

λ (day | – | 0.00784 (0.000858) | 0.00926 (0.0008) |

a | – | 0.691 (0.0196) | 0.638 (0.0141) |

β | – | 0.34 (0.0392) | 0.322 (0.0175) |

Var(η) | 4.01 (0.202) | 3.99 (0.201) | 3.92 (0.144) |

Several alternative models were also attempted and none resulted in a meaningful improvement of the fit. The lack of improvement in the mechanistic tolerance model [16, 17] is of notable interest because, for the observed data, treatments 1 and 2 (placebo until week 12 followed by ustekinumab 45 or 90 mg, respectively) appeared to reach higher efficacy rates than treatments 3 and 4 (ustekinumab 45 and 90 mg) after week 28. The number of patients who dropped out in the first 12 weeks for treatments 1–4 were 4, 3, 4, and 6, respectively, not enough to notably contribute to the treatment efficacy differences. Because inclusion of the tolerance terms did not result in any improvement of the fit, the apparent difference between placebo and active treatment arms may be due to noise. However in principle, the lack of improvement of the specific tolerance model could be due to an inadequate representation of tolerance and does not necessarily prove absence of tolerance.

### Dropout model

Model parameter estimates of the RID Weibull hazard model are shown in Table 1. Compared to the initial estimates, these estimates show that informative dropout had a minor influence on model parameters. All parameters were well estimated with relatively small standard errors.

The simultaneous estimation of the initial model lasted nearly 2 days on an IBM IntelliStation Z Pro workstation, and thus was applied only in addition to the final model. For both models, similar results were reached as with the sequential estimation method.

### Model validation

Validation results for the longitudinal and dropout models are described separately below.

#### Pharmacokinetic/pharmacodynamic model validation

Conditional validation of PK/PD component of initial joint PK/PD-dropout model: mean absolute differences in model predicted and observed frequencies for different treatment periods

Discrepancies in % unit | Week 0–12 | Week 12–28 | Week 28–52 | Week 52–100 | Week >100 | ||
---|---|---|---|---|---|---|---|

Placebo | Ustekinumab 45 or 90 mg | Placebo cross-over | Ustekinumab 45 or 90 mg | Ustekinumab 45 or 90 mg | Long-term extension (Ustekinumab 45 or 90 mg) | ||

Prob(PGA ≤ 1) | 2.2 | 6.7 | 8.1 | 2.5 | 5.0 | 3.8 | 6.1 |

Prob(PGA ≤ 2) | 6.6 | 4.4 | 8.5 | 2.3 | 3.6 | 6.6 | 5.3 |

The discrepancy in the initial period of treatment 2 appeared to be primarily due to the larger placebo response in PHOENIX 1 when compared with PHOENIX 2. This was attributed in part to the discrepancies in predicting the treatment onset period. In addition, the treatment onset, during which the treatment effect was increasing, can also be expected to be more difficult to model than the treatment optimization period, where the effect was more stable.

#### Dropout model validation

### Final model

The PK/PD-dropout model was re-estimated with combined data from PHOENIX 1 and PHOENIX 2, and the parameter estimates are shown in Table 1. The baseline intercept and ID parameter estimates changed little from the initial model. Estimates of the placebo effect changed, with a higher maximum effect but slower rate of onset. This can be expected from the discrepancy between the placebo effects in the two studies. This may, in part, contribute to some changes in drug effect estimates. Additionally, the baseline dropout hazard rate estimate decreased. The changes could be expected from the validation results. Standard errors of parameter estimates also decreased, as the result of including more data.

## Discussion

Ustekinumab, a first-in-class anti-IL12/23p40 human monoclonal antibody, is highly effective in treating moderate-to-severe plaque psoriasis [22, 23]. The PK analysis currently conducted included additional long term data and also data below the LLOQ, with results consistent with those from earlier analyses [8, 9], leading to increased confidence in understanding the PK of ustekinumab. The semi-mechanistic model described the PGA score well. As a categorical endpoint, the PGA score is less informative than continuous endpoints and did not allow assessment of between-subject variability for most model parameters of interest, which limits our ability to discern covariate influences on model parameters. Nevertheless, the model can still serve to guide future selection of dosing regimens. In addition, this establishes a needed modeling framework to describe PGA scores. When combined, these results are novel and important, especially because the PGA score is gaining prominence as a primary endpoint in psoriasis clinical trials.

An alternative approach to modeling the placebo and disease progression effects is to have them influencing an indirect model parameter [10, 14, 27]. In the absence of a mechanistic rationale, the current approach (Fig. 1) enables direct observation of these effects, thus estimation may be easier. To our knowledge, the interpretation of a disease progression effect has not been applied to exposure–response modeling of categorical data. The disease progression effect may be especially noticeable with long-term data over 1 year. It was shown that an apparent tolerance may be confounded with disease progression, and that the current data did not appear to support the presence of tolerance. This further supported the sustainable efficacy of ustekinumab for the treatment of moderate-to-severe psoriasis.

Our previous study identified a PK/PD relationship between ustekinumab levels and PASI response [10]. PASI is a composite endpoint that scores the proportion of BSA involved with psoriasis as well as plaque features of induration, scaling, and erythema, while the PGA scores only induration, scaling, and erythema. Therefore, these results demonstrate that there is a PK/PD relationship between ustekinumab levels and plaque induration, scaling, and erythema; however, these results do not address whether there is a PK/PD relationship between ustekinumab levels and involved BSA, and it remains possible that the association between the PK properties of ustekinumab and PASI response may result from the effects of ustekinumab on induration, scaling, and erythema. Additional work is warranted to evaluate the association between ustekinumab levels and improvements in involved BSA.

Appropriately accounting for the dropout effect has been shown to be important in longitudinal data modeling. This is the first application of the ID methodology to exposure–response modeling of ordered categorical data, with practically implementable model evaluation methods developed. This is also the first use of the flexible Weibull hazard function in the framework of informative exposure-response dropout modeling. Hu and Sale [18] stated that ID is more likely to be significant for longer term data. The current analysis used data that were collected over years instead of weeks as in Hu and Sale [18]. Thus, accounting for ID can be highly important. Hu and Sale [18] have also pointed out that, from a methodological standpoint, ID cannot be proven from observed data but is related to the assumptions of the longitudinal data component. These assumptions thus dictate whether ID results in better estimation of the longitudinal data model.

VPC of an ID model can be challenging, particularly when the future dosing regimen is uncertain, which in principle makes a VPC of dropout infeasible. However, the longitudinal and dropout components of the joint model can be checked separately, using the conditional VPC approach developed here. The approach, however, is more computationally intensive, and approximations may be required when the dataset is large. The extent of dropout influence on longitudinal data will be situation specific and will remain unknown until an appropriate analysis has been conducted. The conditional VPC approach can be used to investigate this, by comparing its result with that of an ordinary VPC.

It will be of interest to develop approaches for dropout VPC that are easier to evaluate, such as those of the Kaplan–Meier type. These will certainly be more computationally intensive compared with the modified Cox-Snell residuals used here, especially for RD and RID models due to the lack of analytical availability of the assumed dropout distributions.

Model validation estimates the predictive capability of the model. Ordinarily conducted VPCs and bootstraps are actually model evaluation techniques. Even when an external dataset is used, comparing observations to individual Bayesian (posthoc) estimates uses observed data twice and therefore tends to provide overly optimistic results. Appropriate model validation should focus on likely future applications instead of the current data, report a quantitative measure, and provide conservative results (if need be) instead of optimistic ones.

In conclusion, a novel semi-mechanistic joint longitudinal-dropout model was developed to link the ustekinumab dosing regimen to PGA scores. Use of the appropriate technique and interpretation of model validation was shown to be important to characterize the predictive ability of the model.

## Acknowledgements

This study was funded by Centocor Research and Development, Inc. The authors are indebted to Dr. Kathleen Seitz and Ms. Alice Zong of Centocor Research & Development, Inc. for their programming support in preparing the analysis datasets. In addition, the authors would like to thank the two reviewers for their insightful comments, and also Dr. Rebecca E. Clemente and Mr. Robert Achenbach of Centocor Ortho Biotech Services, LLC. for their excellent assistance in preparing the manuscript.