Appendix
Because \(M_i(t)\) and \(M_i^*(t)\) are orthogonal, the predictable variation process of the score process can be expressed as
$$\begin{aligned}&\langle U \rangle (t; \theta ) \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\partial }{\partial \theta } \left\{ \log \lambda _i(u; \theta ) - \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] \right\} \right) ^{\otimes 2}Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u \\&\qquad + \sum _{i=1}^n \int _0^{t} \left( \frac{\partial }{\partial \theta } \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] \right) ^{\otimes 2} Y_i(u)\rho _i(u) \,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta ) + \rho _i(u)}\right) ^{\otimes 2}Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u\\&\qquad + \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta ) + \rho _i(u)}\right) ^{\otimes 2} Y_i(u)\rho _i(u) \,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right. - \frac{2 \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] } +\left. \frac{\lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2} \right) \\&\qquad \times Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u + \sum _{i=1}^n \int _0^{t} \frac{\lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2} Y_i(u)\rho _i(u)\,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta ) + \rho _i(u)}\right) Y_i(u)\,{\text {d}} u. \end{aligned}$$
The observed information process is given by
$$\begin{aligned} J(t; \theta )&= -\sum _{i=1}^n \int _0^{t} \frac{\partial ^2}{\partial \theta \partial \theta ^\top }\log \lambda _i(u; \theta ) {\text {d}}N_i(u) \\&\quad + \sum _{i=1}^n \int _0^{t} \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] {\text {d}}Q_i(u) \\&= -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) {\text {d}}N_i(u) \\&\quad +\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] - \lambda _i'(u; \theta )^{\otimes 2}}{\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] ^2}\right) {\text {d}}Q_i(u), \end{aligned}$$
where we denoted \(\lambda _i''(u; \theta ) \equiv \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \lambda _i(u; \theta )\).
Using the decompositions (4) and (5), the observed information process can be further written as
$$\begin{aligned}&J(t; \theta ) \\&\quad = -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u\\&\qquad +\,\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )[\lambda _i(u; \theta ) + \rho _i(u)] {-} \lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2}\right) Y_i(u)[\lambda _i(u; \theta ) + \rho _i(u)]\,{\text {d}} u \\&\qquad +\,{\mathcal {E}}(t) \\&\qquad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta ) + \rho _i(u)}\right) Y_i(u) \,{\text {d}} u + {\mathcal {E}}(t), \end{aligned}$$
where we denoted
$$\begin{aligned} {\mathcal {E}}(t)\equiv & {} -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) {\text {d}} M_i(u)\\&\quad +\,\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] {-} \lambda _i'(u; \theta )^{\otimes 2}}{\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] ^2}\right) \left[ {\text {d}} M_i(u) + {\text {d}} M_i^*(u)\right] . \end{aligned}$$
Therefore, \(E[\langle U \rangle (t; \theta _0)] = E[J(t; \theta _0)]\). With these results, motivating the asymptotic normality of the maximum partial likelihood estimator \(\hat{\theta }\) can proceed similarly as for parametric survival models (e.g. Kalbfleisch and Prentice 2002, p. 180). Briefly, assume a scalar \(\theta \) for notational simplicity, and denote \(U(\theta ) \equiv U(\tau ; \theta )\) and \(J(\theta ) \equiv J(\tau ; \theta )\). From the martingale central limit theorem, it follows under the standard regularity conditions that
$$\begin{aligned} \frac{\sqrt{n}}{n} U(\theta _0) \mathop {\rightarrow }\limits ^{d} N\left( 0, \varSigma (\theta _0)\right) , \end{aligned}$$
where the matrix \(\varSigma (\theta _0)\) is such that \(\frac{1}{n} \langle U \rangle (\tau ; \theta _0) \mathop {\rightarrow }\limits ^{p} \varSigma (\theta _0)\). The Taylor expansion
$$\begin{aligned} U(\hat{\theta }) = U(\theta _0) - J(\theta _0)\left( \hat{\theta }- \theta _0\right) + \frac{1}{2} \frac{\partial ^3 l(\theta ^*)}{\partial \theta ^3} \left( \hat{\theta }- \theta _0\right) ^2 \end{aligned}$$
can be used to motivate both the consistency and asymptotic normality of \(\hat{\theta }\) by assuming that the third term on the right hand side is bounded in probability. In particular, we get
$$\begin{aligned} \sqrt{n} (\hat{\theta }- \theta _0) \mathop {\rightarrow }\limits ^{d} N\left( 0, \varSigma (\theta _0)^{-1}\right) , \end{aligned}$$
where \(\varSigma (\theta _0)\) is in practice estimated by the average observed information \(\frac{1}{n} J(\hat{\theta })\) at the maximum likelihood point.