Introduction

Treatment for cancer can affect patient quality of life [1, 2]. Therefore, health-related quality of life (HRQOL) measures are frequently included in randomized clinical trials in oncology. The interpretation of results for HRQOL within these randomized trials can be seriously hampered by drop-out due to relapse and death, especially because these events do affect HRQOL.

In longitudinal studies, observations of patients can be missed at a certain time point because they miss visits or do not fill in certain questionnaires, although they still participate in the study. Consequently, these patients have responses on a subset of outcome measures at different time points. Alternatively patients can drop-out, i.e., not participating in the study after a certain moment. These patients are lost to follow-up, and no information is available. In our study, we define drop-out as not filling in the HRQOL questionnaire after a certain time point, while there could be information about the survival status after that time.

There exist different reasons for drop-out. Except for drop-out caused by relapse or death, drop-out occurs due to censoring, administrative, or person-related reasons. The reasons for drop-out are often referred to as the drop-out mechanism [3]. It is important to distinguish different types of drop-out mechanisms, because each drop-out mechanism requires different analyses to get valid results. A common classification of drop-out mechanisms has been introduced by Rubin and can be summarized as follows: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR) [4].

Drop-out can be considered MCAR if the probability that a patient drops out is independent of the observed (past) response measurements and of current and future response measurements. Or, to put in other words: patients drop-out due to completely unrelated reasons. For example, when HRQOL data are lost due to administrative failure, the drop-out can be considered MCAR: the drop-out is unrelated to HRQOL scores, and therefore does not depend on the observed nor on unobserved HRQOL scores. Under the assumption of MCAR, complete case analysis (including only those patients with complete data) will give valid results.

Drop-out can be considered MAR, if the probability of drop-out depends on previously observed response measures, but not on the current and future measures. However, given the observed response measures, drop-out is assumed to be independent of the unobserved (current and future) response measures. For example, when a patient drops out because the HRQOL score is lower at earlier measurements, the drop-out can be considered MAR. In this situation, one is able to predict the measurement after drop-out based on past measurements, and random effects models (mixed models) would provide unbiased results if the model is correctly specified. In many situations, this drop-out mechanism will be present in the data.

Drop-out is missing not at random (MNAR) if the probability of drop-out is also dependent on the unobserved (current and future) response measures. In this situation, the missing measurement itself is informative. For example, when a patient has a sudden decrease in HRQOL and decides not to fill in the questionnaire, the drop-out is MNAR. In this situation, we can not predict the missing HRQOL of this patient without making further (untestable) assumptions. The reason of a sudden decrease could be the occurrence of tumor relapse, or the fact that a patient is dying. So, drop-out due to tumor relapse and death might be informative.

In trials, it is quite common to analyze longitudinal data with random effects models. This type of models can handle missing data and give valid results if MAR can be assumed. In these models, the patients’ courses over time are specified using regression equations per patient, taking the correlation between the different measurements within patients into account. Therefore, a random effects model helps understanding about how individual patients change across time [5]. However, without modeling, the drop-out mechanism informative drop-out is not taken into account. In the analysis of a trial, this might lead to serious bias, and therefore to wrong conclusions about HRQOL. To understand this, suppose that in a clinical trial, the experimental arm reduces or postpones an event as tumor relapse. Suppose, additionally, that the experimental arm has a larger (negative) impact on HRQOL than the other arm. If patients are less inclined to fill in HRQOL questionnaires after a relapse, then patients in the experimental arm are more likely to fill in the HRQOL questionnaires than patients in the control arm. So, more patients with lower HRQOL will remain in the experimental arm compared to the control arm. In this case, complete case analysis will yield biased results in favor of the standard arm, because patients with missing observations after relapse are not taken into account. A similar argument holds when using last observation carried forward, since the high HRQOL values before relapse are imputed for the missing HRQOL values after death and relapse. It is therefore important to model implicitly or explicitly the drop-out mechanism.

There are several approaches for the analysis of informative drop-out. They are rather complicated. Therefore, their complexity may be the reason why these approaches are not routinely employed in the analysis of randomized clinical trials.

Historically, there are two different likelihood-based approaches for analyses of informative drop-out that differ with respect to the way the joint distribution of responses and drop-out process is factorized (Curran et al. [6]; Michiels et al. [7]). These approaches are the pattern mixture model approach (Hedeker and Gibbons [8], Pauler et al. [9]) and the selection model approach (Diggle and Kenward [10], Little [11], and Curran et al. [12]). In the present paper, we will focus on the pattern mixture approach, because this is a relatively simple extension of the commonly used random effects model assuming MAR.

Another class of models is based on the joint modeling of response measures (e.g., HRQOL) and times to events (e.g., relapse and death). These models estimate the HRQOL measures given that a person is in a certain state at a certain time point. Examples are the models of Kurland and Heagerty [13], and of Diggle et al. [14]. These models take the ordering of event times explicitly into account and regard drop-out as a consequence of events in the past. In this sense, these models can be used for prediction. This is in contrast to the pattern mixture approach where the drop-out process is reflected in patterns and where these patterns are determined retrospectively. This approach is therefore primary appropriate for adjusting the estimates for the missing data mechanism.

In this paper, we will use data of a Dutch multi-center randomized clinical trial in patients with breast cancer as illustration for the use of the pattern mixture approach. In this trial, the long-term impact of two different chemotherapy schedules (high dose vs. standard dose) on HRQOL was compared. The results of the study with respect to the primary outcomes, relapse-free survival, and overall survival have been published by Rodenhuis et al. [15, 16]. Buijs et al. [17] analyzed the HRQOL data comparing both treatment arms assuming that patients remained disease free.

We aimed in this paper to correct the HRQOL estimation for informative drop-out in order to get a valid comparison between schedules, taking events as death and relapse into account. Since we expect that drop-out due to relapse and death are the two main sources of informative drop-out, we propose a pattern mixture approach in which patterns are defined based on the different disease states a person can have at the end of the study period. The different states we use for the patterns are quality of life related such as ‘death’, ‘alive with tumor relapse’, and ‘disease free’. We start considering the ‘usual’ random effects model analysis assuming MAR. Then, we will analyse the data with the pattern mixture approach with HRQOL-related patterns. Thereafter, outcomes of both approaches will be compared.

Methods

Dutch multi-center randomized clinical breast cancer trial

From August 1993 to July 1999, 804 patients from 10 Dutch centers were randomized in the HRQL part of the trial comparing two different chemotherapy schedules. The standard chemotherapy arm consisted of five cycles of 5-fluorouracil (500 mg/m2), epirubicin (90 mg/m2), and cyclophosphamide (500 mg/m2) (FEC). The high-dose chemotherapy arm consisted of 4 identical cycles of FEC followed by one cycle of high-dose chemotherapy comprising cyclophosphamide (6 g/m2), thiotepa (480 mg/m2), and carboplatin (1600 mg/m2) administered over 4 days followed by peripheral blood progenitor cell reinfusion on day 7. There were 885 patients randomized; 804 of them participated in the HRQOL part of the trial of which 405 patients were randomly assigned to the standard therapy and 399 patients to the experimental (high dose) therapy. The main study was a survival study. The primary outcomes were the overall survival and the disease-free survival. Secondary outcome measures were health-related quality of life measures. The Medical Ethical Committees of all participating centers approved the study. All patients gave informed consent before study entry.

Subgroup of patients

At the time of the analysis in 2003, there appeared to be no significant differences following intention to treat analyses with respect to overall and disease-free survival (Rodenhuis et al. [15]). However, a subgroup of patients with normal HER2 expression did benefit of high-dose chemotherapy as far as disease free (P = 0.002) and overall survival (P = 0.02) were concerned (Rodenhuis et al. [16]). In this paper, we focus therefore on this subgroup. Only patients participating in the HRQOL part of the trial are considered, of which 273 patients were randomized to standard dose and 288 to high-dose chemotherapy.

Quality of life assessment

The pattern mixture model is illustrated for one of the subscales of the SF-36 namely the physical functioning (PF) subscale, consisting of items which indicate the ability to perform physical activities as walking, carrying shopping, and climbing stairs. The scores range from 0 to 100, with higher scores representing higher level of functioning. The choice of this subscale was mainly based on illustrative reasons: preliminary analyses on this subscale demonstrated significant and relative large differences in favor of the standard arm. Other subscales demonstrated smaller (and/or not significant) differences.

The questionnaires were sent by mail before randomization (baseline), after chemotherapy (about 3 months), after radiotherapy (about 6 months), and thereafter every 6 months.

Due to the variation in entry date, not all patients could be studied for the maximum period of 5 years and 12 measurements points. Each patient had a study period follow-up of at least 3 years and eight measurement points. We defined patients who were alive but did not have the maximum study period follow-up as censored.

Statistical analysis

The analyses were performed with SPSS (version 16), MLWin (version 2.02), and R (version 2.6.2).

Descriptive

Analyses were based on intention to treat. A cross-sectional comparison of PF between the two doses was performed at each measurement point, presenting the observed mean and standard error per measurement point. In this analysis, differences in PF per time point between both doses were based on different sets of patients. Visual inspection of the course of the means over time gave suggestions for modeling the course over time in the random effect model and pattern mixture model.

Drop-out

A large part of drop-out was due to death and censoring (e.g., patients who did not have the maximum study period follow-up). Patients able to fill in the questionnaires at a certain time point (the patients at risk) are the patients being alive and being not censored at that time point. Percentages of patients with observed HRQOL measures (i.e., responding patients) are based on these patients at risk, and are called response rates. They were calculated for each time point and for each treatment.

A distinction in drop-out due to relapse and due to death must be made. In the case of relapse, we can assume that PF exists and is affected by a relapse, but that it is not measured. The definition of PF after death is not directly obvious. Therefore, three patterns were defined: A pattern consisting of patients who died during the study period (deceased patients), a pattern with patients with relapse who were alive at the end of the study period, and a pattern with patients who remained disease free. A cross-sectional comparison of the PF assessments of these patterns was performed to analyze descriptively whether the course over time was different for the patterns or not, and whether both doses had different responses over time per pattern.

Random effects model (assuming MAR)

A random effects model was fitted on the data with ‘course over time’, ‘treatment’ and ‘interaction course over time by treatment’ as fixed explanatory variables for the PF assuming MAR. In the following, this is called the “final model”. Since we assumed that each patient had her own course over time, time effects were also considered to be random. The ‘course over time’ was specified based on the findings of the visual inspection of the descriptive analysis. P-values smaller than 5% were considered to be significant.

We started with fitting the empty model (Snijders and Bosker) [18], using maximum likelihood. This was a model without explanatory variables; only a fixed intercept was included (parameter for the overall mean) with two random effects. This model estimated the overall mean PF of all patients and time points and estimates the variability of the PF within patients and between patients. Then, we fitted the time model using maximum likelihood. This model included only the fixed and random intercept and fixed and random time effects. This model estimated the mean PF of all patients over time. The difference in deviance (i.e., -2 times the value of the likelihood) between the empty model and time model is an indication for the effect of time on PF, and the deviances can be compared using the likelihood ratio test. Then, the final model ‘interaction course over time by treatment’ was fitted by maximum likelihood. This model estimated the mean PF over time for the different treatment arms. The performance of the final model was evaluated by the deviance compared to the deviance of the empty model and to the deviance of the time model. The difference between the final model and the time model is an indication of the effect of treatment over time on PF. The comparisons of different deviances between different nested models can be regarded as a relative for the fit of the model for the data.

These models assumed only MAR implying that drop-out dependent on the missing PF score itself (for example because of an unpredictable decrease caused by relapse) was not taken into account. The empty model, the time model, and final model are explained and specified in the appendix.

Pattern mixture approach (correction for informative drop-out)

In the pattern mixture approach, the drop-out process and the outcome measures conditional on the drop-out pattern are jointly modeled. The drop-out process was modeled by the probability to belong to a specific drop-out pattern for each dose separately, and estimated by the sample proportion. Patterns were based on the relapse and survival state of the patient at the end of the study period and defined as before: ‘deceased’, ‘alive with relapse’, and ‘disease free’. The outcome conditional on the drop-out pattern was modeled by the random effects model with additional explanatory variables indicating different follow-up patterns of patients. In the following, we call this model, the pattern mixture model. We implicitly assumed that each pattern has its own missing data process and own course over time with respect to PF. Moreover, we assumed that within each pattern, the drop-out is MAR. This means that conditional on being in a certain pattern, the missing PF can be predicted based on observed measurements of patients within this pattern. The pattern mixture model was specified with ‘course over time’, ‘treatment’, and ‘interaction course over time by treatment’ as explanatory variables (as before), and also with the variables ‘pattern’, and all possible interactions with pattern. This model was fitted using maximum likelihood. See the Appendix for the explanation and specification of the model.

Based on the difference in deviances (likelihood ratio test), it was decided which additional explanatory terms could be excluded from the model. This difference can be regarded as a relative measure for the fit of the model on the data.

For each pattern, the course over time per treatment was estimated. In order to compare both treatments for all patterns together, the effects of both doses were estimated by a weighted average over all patterns weighted by their sample proportions. An illustration of this is given in the appendix. With the delta-method (Bisshop et al. [19]), standard errors of the differences between the two treatments were estimated to determine the significance of the effects.

Comparison of the results of the random effects model and pattern mixture approach

The fit of the random effects model and the random effects part of the pattern mixture approach (i.e., the pattern mixture model) were compared based on the difference in deviance (likelihood ratio test). The results of the pattern mixture approach (weighting over all patterns) in terms of estimated differences in PF scores between both treatments arms per time point were compared to those of the random effects model assuming MAR.

Sensitivity analysis

Since in our data, not all patients had the maximum study period of 5 years, the patterns ‘alive with relapse’ and ‘disease free’ consisted of patients for which the disease state after 5 years was not known yet. Therefore, we implicitly assumed that censored patients and patients with maximum study period could be combined within each pattern and that this did not have consequences for the results. To check whether the definition of patterns for all patients (with complete and incomplete follow-up) leads to different results than the definition of patterns for only patients with maximum study period follow-up, we performed a sensitivity analysis by also modeling the data using only patients with complete 5-year follow-up and comparing the results.

Results

Descriptive

In Fig. 1, the observed means and standard errors of PF per arm and per measurement point are presented. The number of patients that contributes to each mean is in Table 1, in the column named ‘response’. For the high-dose arm, there was a large drop in the observed mean after 3 months (just after chemotherapy). There was also a decrease in observed mean for the standard dose although much smaller. After 6 months, the means were increasing again in both treatments. After 1 year, the means in both arms were similar, and remain rather constant, and they were higher than at baseline.

Fig. 1
figure 1

Observed mean and standard error per treatment and per time point for all available PF measures

Table 1 The response rates per follow-up time and per treatment arm, by number at risk, number deceased, and censored patients

Drop-out

In Table 1, the number of patients at risk (i.e., patients being alive and being not censored), the number of deceased, and the number of censored patients (with incomplete study period follow-up) are presented for each time point per chemotherapy, with also the number and percentages of observed PF scores among all patients at risk (response rates). For both schedules, the percentages were comparable for each measurement point.

In Table 2, the numbers of patients for the different patterns are presented. The majority of the patients belonged to pattern 3: ‘disease free’. About 20% died within 5 years. In Fig. 2a, the different observed courses of PF over time in terms of observed means and standard errors are given per pattern. There were small differences between the patterns during the first year, but there was an obvious difference in the observed course over time after 1 year: the observed mean of patients in the pattern ‘deceased patients’ decreased, the mean PF of patients in the pattern ‘alive with relapse’ also decreased but less steep, and the mean PF in the pattern relapse free was rather constant. In Fig. 2b–d, the observed differences between mean PF over time between the doses are presented for each pattern. During the first year, the observed course over time was similar for all patterns: a large drop at 3 months for the high dose, and a somewhat smaller drop for the standard dose. However, after 1 year, the differences between doses were not similar for the patterns. In the patterns ‘deceased’ and ‘alive with relapse’, the observed course over time suggests that patients who get recurrence or die within the study period benefitted somewhat from the high dose in the long run. In the ‘relapse free’ pattern, the observed means of both doses were similar after 1 year with a negligible difference in favor of the standard dose.

Table 2 Number of patients per arm by survival and relapse state
Fig. 2
figure 2

a Observed means and standard errors per time point per pattern. b Observed means and standard errors per arm and per time point for deceased patients. c Observed means and standard errors per arm and per time point for patients alive with relapse. d Observed means and standard errors per arm and per time point for relapse-free patients

Random effects model (assuming MAR)

Based on the findings of the descriptive analysis, the course over time was modeled by four time variables, namely three dummy time variables t0, t1, t2 on the first three time points indicating the measurements at baseline, after 3 months, after 6 months, and one variable tc indicating the time continuously after 1 year. At each measurement point, there could be differences between both doses. Therefore, the main effect ‘dose’ was included in the model, together with the interaction terms of dose with the four time variables. The intercept can be considered as the estimated PF at 1 year for the standard dose. The main effect ‘dose’ can be considered as the estimated difference between both doses at 1 year. The sum of the main dose effect and the interaction effect per time dummy is the difference between doses for each time point (baseline, after 3 months, and after 6 months). The interaction effect with the time variable indicating the time continuously after 1 year indicates the difference between doses in course over time after 1 year. (For the specification of the model, see Appendix). In Table 3, the fitted final model assuming MAR is presented, together with the empty model and the time model. The deviance of the final random effects model (i.e., 39,869) was much smaller than the deviance of the empty model (i.e., 42,022) and the time model (i.e., 40,047), indicating that the dose had a significant effect over time on the PF under the assumption of MAR (P < 0.001). In Fig. 3, the estimated mean PF scores for both doses based on this model are presented. In Table 4, the estimated differences between doses per time point (at baseline, after 3 months, and after 6 months) and the estimated difference in slope after 1 year are presented, together with their standard errors and P-values. Just after chemotherapy, after 3 months up to 1 year, there was a significant difference for PF between both arms in favor of the standard dose. After 1 year, the slope for the high-dose arm was nearly significant larger (P = 0.055) than the standard-dose arm.

Table 3 The fitted empty model, time model, and random effects model; model estimates and standard errors
Fig. 3
figure 3

The estimated course of PF over time per treatment based on the final random effects model assuming MAR

Table 4 The estimated difference between arms for the first time points t0 to t3 and for the slope after 1 year based on the random effects model

Pattern mixture approach (correction for informative drop-out)

The fit of the pattern mixture model

In Table 5, the fitted pattern mixture model is presented. This model consisted of all explanatory variables of the final random effects model assuming MAR together with all possible interactions with ‘pattern’. Since ‘pattern’ has three categories, two dummy variables were defined indicating the pattern “relapse” and the pattern “relapse free”, respectively; the reference category is ‘deceased’. Based on the small difference in deviance (P = 0.48), the main effect ‘pattern’ was excluded from the model.

Table 5 The fitted pattern mixture model; model estimates and standard errors

The estimates of the interactions between the time variables during the first year and the patterns are negative, suggesting that for the standard dose, the pattern “relapse” and the pattern “relapse free” are worse in terms of PF during the first year than the pattern “deceased”. However, these estimates are not significant.

Courses per pattern

In Fig. 4a–c, the estimated course over time for both treatments is given per pattern. It is evident that the results confirmed the findings of the descriptive analysis: the courses over time were very different for the patterns. The differences between treatments within each pattern were also in line with the descriptive analysis: during the first year, the estimated PF scores for the standard dose were larger than those for the high dose. After 1 year, the difference in slope between the high-dose arm and standard-dose arm for the patterns ‘deceased’ and ‘alive with relapse’ was estimated in favor of the high dose. For the pattern ‘relapse free’, the estimated slopes were similar for both patterns, indicating that the small difference in favor of the standard dose remained after 1 year.

Fig. 4
figure 4

a The estimated course of PF over time per treatment for pattern ‘deceased’. b The estimated course of PF over time per treatment for pattern ‘alive with relapse’. c The estimated course of PF over time per treatment for pattern ‘disease free’

All patterns together

To compare the schedules for all patterns together (weighted over all patterns), the estimated PF for both doses is shown in Fig. 5. In Table 6, the differences per time point (at baseline, after 3 months, and after 6 months) and the difference in slope after 1 year together with their standard errors are presented. During the first year, there was a significant difference in favor of the standard-dose arm. After 1 year, the slopes differed significantly in favor of the high dose (P = 0.001). However, in terms of absolute differences in PF between both doses, the differences are very small (see Fig. 5).

Fig. 5
figure 5

The estimated course of PF over time per treatment based on the pattern mixture model

Table 6 The estimated difference between arms for the first time points t0 to t3 and for the slope after 1 year based on the pattern mixture model

Comparison results of both approaches

Comparing the deviance of the pattern mixture model to the final random effects model assuming MAR, we see that the fit was significantly better since the difference in deviance is large (39,626 vs. 39,869); P < 0.0001). This indicates that under the assumption of MAR within each pattern, the pattern mixture model has a better fit than the random effects model.

Comparing the results of the random effects model assuming MAR to the results of pattern mixture approach (weighting over all patterns), see Tables 4 and 6, we see that the estimated differences between both doses were similar for both approaches. However, the estimated difference in slope after 1 year in the pattern mixture approach (0.21) was twice the estimated difference in slope of the random effects model (0.10) and was significant. However, in both approaches, the absolute differences in PF after 1 year were small, as can be concluded comparing Figs. 3, 4, and 5.

Sensitivity analysis

The results of the analysis for only patients for which the disease state was known after 5 years (patients with maximum study period) revealed no differences compared to the results of the analysis for all patients. Therefore, we do not present the results for this restricted number of patients. In our data, the consequences of defining patterns based on patients with different length of study period were negligible.

Discussion

In this paper, the pattern mixture approach as a method for correction for informative drop-out was studied. We illustrated this on data from a large multi-center randomized trial comparing the long-term impact on HRQOL of two different doses chemotherapy for patients with breast cancer. We only considered one dimension of the HRQOL, namely PF. We focused on a subgroup of patients with normal HER2 expression in their tumor for whom it was shown that the (disease free) survival was better for the high dose than for the standard-dose arm (Rodenhuis et al. [16]). We compared the results of the pattern mixture approach to the results of the commonly used random effects model for the subscale PF.

In this particular example, the pattern mixture approach leads to differences with respect to the estimation and significance of the difference in decline of PF measures over time between both arms after 1 year: The estimated difference in decline is doubled from 0.10 (P = .055) in the random effects model to 0.21 (P = 0.001) in the pattern mixture approach. However, the absolute estimated differences in PF between both chemotherapy doses after 1 year are similar in both approaches, and therefore the pattern mixture approach does not lead to other conclusions than the commonly used random effects model.

In studies with a lot of missing values due to drop-out for different reasons, it is important to explore whether the drop-out is informative or not. It is possible that for some outcomes the drop-out is informative, while for other outcomes this is not. An example for this is the case where the mean blood pressure is the main outcome in a study concerning a clinical patient population with high blood pressure, and where patients are repeatedly measured over time until their blood pressure is sufficiently reduced. So, patients are not measured any more (drop-out of the study) when they have “good” blood pressure values. In this case, the drop-out is caused by the previous measurement, and therefore not informative. So if one is planning a study with different sort of outcome measures, one must consider whether the drop-out is informative for each different outcome measure. In our data, we only considered drop-out by death or relapse as the two main sources of informative drop-out.

The crucial (and maybe the hardest) point in applying the pattern mixture approach is the choice of the patterns. The validity of the model estimates is determined by the implicit assumption that within each pattern, the missing data process is MAR. This is an untestable assumption; testing MCAR vs. MAR is possible, but testing MAR vs. MNAR is not (Molenberghs et al. [20]). The choice of the patterns must be made in such a way that this assumption is plausible. The number of patterns could be extended, based on the exact survival time or time of relapse, for example to account for the fact that shortly before death, the PF often drops and the amount of missing responses increases. In our data, we defined the patterns based on HRQOL-related events as death and relapse.

In our model, we had some censored patients, and we used their last observed event state for grouping them into patterns. Compared to the patients with maximum study period, relapse-free patients who are censored could still die or get tumor relapse within 5 years, implying that they belong to the pattern ‘deceased’ or ‘alive with relapse’. Analogously, patients with tumor relapse could still die within 5 years and therefore belongs to the pattern ‘deceased’. The implication of the different length of study period follow-up might be that the assumption of MAR within each pattern is violated. To check this, we also modeled the data for only patients with complete follow-up. Comparing these results with the results of the analysis of all patients, we found similar results with somewhat larger standard errors. This means that for our data, there is no large problem here on this point. But for the application on other data with patterns based on disease states for patients with different lengths of follow-up, one must carefully consider the point whether the data are MAR within the patterns, and check this by sensitivity analysis, if possible.

The patterns chosen here do have clinical relevance. It is clinically interesting to estimate the courses over time for relapse-free patients. This could be interesting in terms of prediction: What is the HRQOL during the next 5 years, assuming that patients remain disease free? However, our model is less appropriate for the prediction for patients alive with relapse, because the pattern is based on the state at 5 year and the estimated PF measures over time are based on PF measures for patients with relapse and patients who did not yet have a relapse at that time point. The pattern “deceased” needs also special attention. Death is different from other reasons of missing, where we can assume that the quality of life exists but is not measured. If and how quality of life after death is defined is a topic of much discussion. The random effects models used in this paper implicitly impute missing PF scores, also after a patient has died. This is done by extrapolating the patient’s individual trajectory before death. One could question this approach. Therefore, one might want to limit the analysis for patients being alive. Pauler et al. [9] proposed to use the pattern mixture model estimates conditional on not being in pattern “deceased”. Diggle et al. [14] discussed extensively the validity of different models accounting for informative drop-out, including the relatively new approaches where measures (as HRQOL) and times to events (as relapse and death) are jointly modeled. Applying the latter models to these particularly data will be a topic for new research.

In general, it is recommended in data with informative drop-out to do sensitivity analysis by using different approaches. The pattern mixture approach we propose here is rather a simple extension of the commonly used random effects models in longitudinal data, and might therefore be routinely employed in the analysis of randomized clinical trials.