Commentary to the paper by Walter Dempsey and Peter McCullagh

The authors are to be congratulated for a very interesting paper. They are also be thanked for recommending reading chapter 8 of our book on dynamic prediction (van Houwelingen and Putter 2012) The data in that chapter are extensively analyzed in the unpublished PhD thesis of Mark de Bruijne. One chapter is published as de Bruijne et al. (2001a). Unfortunately the chapter using the revival process never got published. A preprint (de Bruijne and van Houwelingen 2001b) is available. I am pleased by the introduction of the concept of “stale measurement”. It is related to the concept of “ageing covariate” in section 5.3 of van Houwelingen and Putter (2012). In de Bruijne et al. (2001a) the concept “TEL(t)= time elapsed since last observation” is introduced as a tool to adjust for the staleness of observations. It is a nice feature of the revival approach that TEL(t) is inherently taken into account. My main interest is the predictive use of the revival process. My comments arise from this preoccupation with prognosis. My plea for robustness in van Houwelingen (2014) arises from the need to validate prognostic models in new data or by crossvalidation. Robustness is needed to ensure that the models are validation-proof. In the paper, the robustness of the revival model is explored in the supplementary material, but no attempt is made to check the robustness of the prediction model. Robustness of the implied prediction model is also an important issue in Rizopoulos et al. (2014) and Rizopoulos et al. (2017). In this commentary Iwill focus on four issues: visualization of the data,more insight in the information carried by the revival process, validation of the implied prediction model and an alternative for the P(T < ∞) = 1 assumption. I will use the CSL1 data and the standard model of section 6—exponential marginal survival with λ0 = 0.164 and revival model based on the uncensored observations—to clarify my comments.


Introduction
The authors are to be congratulated for a very interesting paper. They are also be thanked for recommending reading chapter 8 of our book on dynamic prediction (van Houwelingen and Putter 2012) The data in that chapter are extensively analyzed in the unpublished PhD thesis of Mark de Bruijne. One chapter is published as de Bruijne et al. (2001a). Unfortunately the chapter using the revival process never got published. A preprint (de Bruijne and van Houwelingen 2001b) is available.
I am pleased by the introduction of the concept of "stale measurement". It is related to the concept of "ageing covariate" in section 5.3 of van Houwelingen and Putter (2012). In de Bruijne et al. (2001a) the concept "TEL(t) = time elapsed since last observation" is introduced as a tool to adjust for the staleness of observations. It is a nice feature of the revival approach that TEL(t) is inherently taken into account.
My main interest is the predictive use of the revival process. My comments arise from this preoccupation with prognosis. My plea for robustness in van Houwelingen (2014) arises from the need to validate prognostic models in new data or by crossvalidation. Robustness is needed to ensure that the models are validation-proof. In the paper, the robustness of the revival model is explored in the supplementary material, but no attempt is made to check the robustness of the prediction model. Robustness of the implied prediction model is also an important issue in Rizopoulos et al. (2014) and Rizopoulos et al. (2017).
In this commentary I will focus on four issues: visualization of the data, more insight in the information carried by the revival process, validation of the implied prediction model and an alternative for the P(T < ∞) = 1 assumption. I will use the CSL1 data and the standard model of section 6-exponential marginal survival with λ 0 = 0.164 and revival model based on the uncensored observations-to clarify my comments.

Visualization
The two graphs in Fig. 1 are helping to get more insight in the data structure. The left panel shows the Kaplan-Meier estimates for the censoring function, the survival function with its exponential fit and the fraction still at risk. The high rate of early censoring is a bit unexpected, but its discussion is beyond the scope of this commentary. The interesting point for me is that t = 9 appears to be the observation limit in this data. Only 43 patients carry information about what happens after t = 9 and most of them (36) are censored. Anything said about what happens after t = 9 is very speculative.
The right panel is an attempt to visualize how long patients are still followed up for survival after the last measurement. For each patient the difference between observed survival/censoring time and the time of the last observation can be found by the horizontal distance between the isolated dots and the dots on the 45 • line. One might wonder what happened to the patients with a wide gap between the last measurement and the survival time, but that issue is also beyond the scope of this commentary. Figure 2 shows the expected value μ(s) of the revival process for T = 1, . . . , 9, presented in follow-up time t. The solid graphs show the curves for "Null Treatment" which can be seen as the expected value corrected for the additive treatment effect. The steep decrease near t = T seems promising for the use of the revival process in dynamic prediction of survival. However, there is substantial variation in the data. The total variance computed from the three variance components in the model is 625, giving a standard deviation sd = 25. The tolerance regions μ ± 2 * sd are shown by the dotted lines. The large variation suggest that it would not be easy to infer the future T from the data available at time t.

More insight in the information carried by the revival model
If we ignore the uncertainty in the regression parameters we have a model f (obser vations|T ). Given the observation history, inference on T can be made in a very classical way by computing the log-likelihood of the data. In the main paper such a log-likelihood is shown in the left panel of Figure 5 for one specific case. To  get more insight a kind of landmark analysis was carried out, in which all 278 patients still at risk at t = 2 are considered. The number of preceding observations varied from 2 to 9 with mode=4 .For each individual the log-likelihood ll(T ) of the standard model is computed for survival time 2 ≤ T ≤ 9. For each individual the location T max of the maximum is obtained together with a quasi χ 2 = 2 · (ll max − ll min ). If this χ 2 < 3.84 and T max is not on the boundary of the interval, the 95% confidence region for T contains the whole interval [0,9]. A summary of the results is given in Table 1. Figure 2 helps to understand what is going on. If the last observation is quite low, the best fitting curve would be obtained by T max < 2. However, the patient is still alive at t = 2, which moves the T max to 2. If some observations are quite high, that would be an indication for survival beyond T = 9 and T max will end up at the right boundary. The situation is more subtle, because of the random patient effect, but it is clear that it will not be easy to predict T at t = 2. patients still at risk, obtain the predictive distribution obtained from standard model using the observations available at t L M and compare that with the actual survival data. For the sake of robustness a horizon t hor is fixed and it is investigated how well the (conditional) survival up to t hor can be predicted. Table 2 shows the results for t L M = 1, 2, . . . , 7 and t hor = t L M + 2, n = the number at risk. First a simple calibration of the marginal survival is obtained through the model λ cal = c · λ 0 applied on the patients at risk at t L M and administratively censored at t hor . The table shows the estimateĉ and it standard error. The apparent need for this correction can already be seen from Fig. 1.
Next we consider the cumulative hazard H pred from t L M up to t hor as obtained from the standard model for each patient. This can be seen as a summary of the prognosis.The modeled conditional survival is The standard deviation of H pred gives insight into the variation in prediction between the patients in the landmark data set. The performance of the model can be checked through the exponential calibration model ln(λ(t|H pred )) = α c + β c · ln(H pred ).
The calibration of the conditional survival is perfect if β c = 1 and α c = ln(0.5) = −0.69. My cautious conclusion is that the standard model is not well calibrated, but the predicted cumulative hazard might be a useful tool in landmark type models because of its significance as shown in the last column. The Weibull model might be better calibrated. but I did not check that. make sense. The advantage of the semi-parametric Cox model is that does not make any statement about what happens after the last observation. My suggestion for an alternative approach is to define an observation limit t lim , to censor all patients at this limit, to make a revival model for t < t lim using the uncensored data and for t ≤ t lm using all patients that survive up to t lim and to estimate the marginal survival by Kaplan-Meier. This approach does not need any imputation. Moreover, calibration can now be based on the Cox model as well.
The coefficients in the revival model are shown in Table 3. Table 4 shows the findings of the landmark analysis for the alternative approach. The first observation is that the marginal Kaplan-Meier does not need any calibration because it is model free. The deviations from c = 1 are due to the discrete nature of the Kaplan-Meier. The second observation is that the "prediction tool" H pred shows more variation within and between landmark sets than in Table 2. Next, we see that the calibration through the exponential model is much better: the estimatesβ c are much closer to one than in the standard model. Finally, we see that the calibration Cox model gives virtually the same β as the exponential with standard errors that are marginally smaller.

Conclusion
The revival approach can be a very useful tool for taking account of the observation time in prediction models. The comparison of different approaches to obtain prediction models using the calibration in the full data set as shown above might be optimistically biased. To avoid this optimism bias some form of cross-validation is needed, but that is beyond the scope of this commentary.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.