A hybrid landmark Aalen-Johansen estimator for transition probabilities in partially non-Markov multi-state models

Multi-state models are increasingly being used to model complex epidemiological and clinical outcomes over time. It is common to assume that the models are Markov, but the assumption can often be unrealistic. The Markov assumption is seldomly checked and violations can lead to biased estimation of many parameters of interest. This is a well known problem for the standard Aalen-Johansen estimator of transition probabilities and several alternative estimators, not relying on the Markov assumption, have been suggested. A particularly simple approach known as landmarking have resulted in the Landmark-Aalen-Johansen estimator. Since landmarking is a stratification method a disadvantage of landmarking is data reduction, leading to a loss of power. This is problematic for “less traveled” transitions, and undesirable when such transitions indeed exhibit Markov behaviour. Introducing the concept of partially non-Markov multi-state models, we suggest a hybrid landmark Aalen-Johansen estimator for transition probabilities. We also show how non-Markov transitions can be identified using a testing procedure. The proposed estimator is a compromise between regular Aalen-Johansen and landmark estimation, using transition specific landmarking, and can drastically improve statistical power. We show that the proposed estimator is consistent, but that the traditional variance estimator can underestimate the variance of both the hybrid and landmark estimator. Bootstrapping is therefore recommended. The methods are compared in a simulation study and in a real data application using registry data to model individual transitions for a birth cohort of 184 951 Norwegian men between states of sick leave, disability, education, work and unemployment. Supplementary Information The online version contains supplementary material available at 10.1007/s10985-021-09534-4.


Online Resource A
Supplementary material for the simulation study in Section 4 of the main document A 1 Testing the Markov assumption A 1.1 Experiment 1 From Figure A 1 we see that the larger the frailty variance described in Section 4 of the main document is (corresponding to a more right-skewed frailty distribution), the larger is the rejection rates of the point tests. We also see that the point tests for transitions without frailties are close to the 5 percent level. The shaded areas in the test plots are 95% pointwise Agresti-Coull confidence intervals (see e.g. Agresti and Coull ((1998))). These confidence intervals have shown to exhibit robust small sample properties and endpoint properties (i.e. with success rates close to zero or one). These are favourable features for the point tests since the sample size of the tests shrinks with increasingly late landmark time points. Furthermore, the true rejection rates of Markov transitions should be close to 0.05 (the α level) and endpoint robustness is therefore also desirable. The same type of confidence intervals are produced for the grid test. Some of the upper confidence limits of the grid test are above 1. This is due to the asymptotic formula for the intervals but of course practically impossible to achieve.  , 9, 12, 14, 17, 20, 22, 25, 28, 30} The grid tests in Table A 1 tell more or less the same story as the point tests in Figure A 1, but are slightly more conservative. This behaviour is expected because of the construction being a supremum of many point tests. Judging from the test statistics we see a clear indication of non-Markov behaviour in transition 2 → 1, but the main question is of course; is this behaviour sufficiently non-Markov to suggest a change of estimator? To answer this question we can consider the MRSE plots in the main document. When comparing the rejection rates of Figure A 2 (point test) or Table A 2 (grid test) with the MRSE plots for the transition probabilities in the main text, an interesting observation stands out. Looking at transitions 1 → 2 and 2 → 3, we see that differences in hazard estimates (suggested by the test statistics) for a particular transition, does not necessarily imply a difference between the corresponding transition probability estimates. One reason for this is that the transition probability really depends on the relation between multiple hazards rather than one particular hazard function and the net effect on the transition probability of heterogeneity in multiple transitions is not clear. Furthermore, from the model specification, the test procedure has some natural constraints in terms of which groups it is able to compare based on landmark time and state. In other words the groups we are able to compare based on landmarking are not necessarily those, which reveal the underlying heterogeneous effects. Therefore we can have situations where the test will not be able to detect significant differences although they exist.

A 2 Coverage and confidence intervals
To study the coverage and behaviour of Greenwood based confidence intervals we illustrate these for simulation experiment 1 and σ 2 = 1.2. See Figure

Online Resource B
Supplementary material for the application in Section 5 of the main document B 1 Testing the Markov assumption A formal test for identifying Markov and non-Markov transitions can be based on the logrank test as described in Section 3.2 of the main document, testing for differences in transition intensities in the landmark sample and the rest of the data (full dataset minus the landmark subset). Results for the two examples in Section 5 of the main article follows in Table B 1 and  Table B 2. Contrary to the simulation experiment, the p-values are here based on standard log-rank asymptotics; i.e. based on the chi-square distribution and not on wild bootstrapping.

Online Resource C On the identification of testable transitions
When constructing the HAJ estimator using two sample tests one needs to decide which transitions are amenable for testing. We will divide the complete set of transitions in the multi-state model into three disjoint sets E = M ∪ N ∪ O. Here M is the set of Markov transitions (i.e., the set of transitions for which Markovianity is assumed without testing), N is the set of non-Markov transitions (for these landmark estimates are used in the HAJ), and O is the set of transitions that are open for testing. Obviously, different choices of M, N and O may lead to different estimators. Choosing N = O = ∅ leads to the AJ estimator, while the choice of M = O = ∅ leads to the LMAJ estimator. An interesting observation is that many models will have untestable transitions. Take e.g. the transition from healthy state to death in an illness death model without recovery. In fact, if we by design cannot test a transition it is because the transition hazard estimate will be the same regardless of whether we landmark or not. Thus non-Markov behaviour only matters in so far as it is detectable in testable transitions. If we let G := (G, E) be the (bi)-directed graph representing transitions of X. A necessary and sufficient criteria for testing the transition i → j is the existence of subsets U, V in E satisfying: U ∩ V = ∅ and there exists paths p U , p V with root notes in U, respectively V such that p U ∩ p V = (i → j).
(C 1) From the criteria (C 1) we also see that if there exists a loop from U to U , the transitions out of U can be tested, at least in principle.