Appendix
Theoretical details on bivariate normality of \((\hat{\beta }, \hat{\gamma })\)
According to Overgaard et al. (2017), the pseudo-observation approach of this paper produces consistent and asymptotically normal parameter estimates under essentially two conditions. One condition is that the estimate \(\hat{\theta }\) of \(\theta = E(f(W))\) can be seen as a functional, \(\phi \), of the empirical distribution, \(F_n\), in a Banach space setting such that \(\phi \) is two times (Fréchet) differentiable with a Lipschitz continuous second order derivative and such that \(\Vert F_n\Vert \) converges at a certain rate. This condition ensures that the close approximation of the pseudo-observation \(\hat{\theta }_i = \theta + \dot{\theta }(X_i) + \frac{1}{n-1}\sum _{j \ne i} \ddot{\theta }(X_i, X_j) + o_P(n^{-\frac{1}{2}})\) (uniformly in i) in terms of the estimator’s first and second order influence functions, \(\dot{\theta }\) and \(\ddot{\theta }\), holds. This, in turn, implies that the less close approximation \(\hat{\theta }_i = \theta + \dot{\theta }(X_i) + o_P(1)\) also holds. The other condition is therefore that \(E(\dot{\theta }(X) \mid Z) = E(f(W) \mid Z) - \theta \), which means that the pseudo-observations carry the right information and ensures that the estimating equation is unbiased under the model. The result of Overgaard et al. (2017) is formulated for one-dimensional pseudo-observations, but generalizes to multi-dimensional outcomes. In a multi-dimensional setting, the requirements then need to hold for each outcome separately.
For pseudo-observations of the Kaplan–Meier estimate \(\hat{S}(t_l)\), the conditions above hold under assumption of positivity, i.e. \(P(C> t_l) > 0\), and completely independent censoring, i.e. that C is independent of \((D^*, Z)\), as described by Overgaard et al. (2017) based on the work of Graw et al. (2009) and Jacobsen and Martinussen (2016). For pseudo-observations of \(\hat{\mu }(t_l)\), the conditions were established by Overgaard (2019), see Example 8, under similar assumptions of positivity, completely independent censoring, here that C is independent of \((N^*, D^*, Z)\), and additionally the assumption that \(N^*(t_l)\) has a little more than finite fourth moment.
The result of Overgaard et al. (2017) is that, under regularity conditions, estimates, \(\hat{\xi } = \hat{\xi }_n\), exist that solve (4) with high probability for large n such that
$$\begin{aligned} \sqrt{n}(\hat{\xi }_n - \xi ) \end{aligned}$$
is asymptotically normal with mean 0 and variance
$$\begin{aligned} M^{-1} \Psi M^{-1}, \end{aligned}$$
where
$$\begin{aligned} M = E\left( \left( \frac{\partial m_i}{\partial \xi }\right) ^T V_i^{-1} \frac{\partial m_i}{\partial \xi } \right) \end{aligned}$$
and
$$\begin{aligned} \Psi = {\text {Var}}\left( \left( \frac{\partial m_i}{\partial \xi }\right) ^T V_i^{-1} (\theta + \dot{\theta }(X_i) - m(\xi ; Z_i)) + h(X_i)\right) \end{aligned}$$
with
$$\begin{aligned} h(x) = E\left( \left( \frac{\partial m_i}{\partial \xi }\right) ^T V_i^{-1} \ddot{\theta }(x, X_i) \right) . \end{aligned}$$
In summary, the suggested pseudo-observation approach produces consistent and asymptotically normal parameter estimates under the assumptions
-
1.
positivity, \(P(C> t_k) > 0\),
-
2.
completely independent censoring, i.e. C is independent of \((N^*, D^*, Z)\),
-
3.
a little more than finite fourth moment of \(N^*(t_k)\).
It is worth noting that the suggested estimate of \(\Psi \) can be expected to consistently estimate \({\text {Var}}\Big (\big (\frac{\partial m_i}{\partial \xi }\big )^T V_i^{-1} (\theta + \dot{\theta }(X_i) - m(\xi ; Z_i))\Big )\) but not \({\text {Var}}\Big (\big (\frac{\partial m_i}{\partial \xi }\big )^T V_i^{-1} (\theta + \dot{\theta }(X_i) - m(\xi ; Z_i)) + h(X_i)\Big )\). In other words, any contribution from the second order terms of h are not included and so the estimate, and thereby the standard errors of the sandwich variance estimator, can be expected to be biased.
Plots from simulation of bivariate normality of \((\hat{\beta }, \hat{\gamma })\)
This appendix displays additional plots visualizing the bivariate normal distribution of \((\hat{\beta }, \hat{\gamma })\) for different parameter settings and k.
\((n,\lambda _0^D, \beta ,\gamma _D, \rho ) = (100, 0.25, 0.5, 0.2, 1)\) and \(t=2\)
See Appendix Fig. 8.
\((n,\lambda _0^D, \beta ,\gamma _D, \rho ) = (100, 0.25, 0.5, -0.2, 1)\) and \(t=2\)
See Appendix Fig. 9.
\((n,\lambda _0^D, \beta ,\gamma _D, \rho ) = (100, 0.25, 0.5, 0.2, 0.75)\) and \(t=(1,2,3)\)
See Appendix Fig. 10.
\((n,\lambda _0^D, \beta ,\gamma _D, \rho ) = (100, 0.25, 0.5, 0.2, 1)\) and \(t=(1,2,3)\)
See Appendix Fig. 11.
\((n,\lambda _0^D, \beta ,\gamma _D, \rho ) = (100, 0.25, 0.5, -0.2, 1)\) and \(t=(1,2,3)\)
See Appendix Fig. 12.