1 Introduction

Alzheimer’s disease (AD) is an irreversible and progressive neurodegenerative disorder that slowly destroys memory and thinking skills. There are three general clinical stages based on severity of symptoms: normal cognition (NC), mild cognitive impairment (MCI), and AD diagnosis [1]. Identifying prognostic risk factors for AD progression is important both for clinical research and clinical practice. For cognitively normal people, identified prognostic risk factors will raise self-awareness of early prevention and enhance diagnosis accuracy. For MCI patients, understanding risk of progression to AD will aid clinical decision-making on targeted therapies, as well as improving adherence to medical treatment and planning for long-term care. In most AD clinical studies, participants undergo multiple visits, and thus provide longitudinal information over their entire follow-up. Many AD studies only utilized baseline information to investigate prognostic risk factors for AD progression, although many risk factors change over time and are often collected longitudinally. For example, a study uses a longitudinal dataset from National Alzheimer’s Coordinating Center (NACC) to assess cognitive complaint in relation to AD progression, where baseline cognitive complaint is used although it were collected longitudinally [2]. Existing evidence suggests trajectory patterns of AD biomarkers such as neuroimaging biomarkers, cerebrovascular fluid biomarkers, neuropsychological performance (e.g. cognitive complaint) depend on the course of AD progression, implying the importance of incorporating time-dependent risk factors information when assessing their prognosis [3, 4]. This motivates the present research to model longitudinal ordinal outcomes representing disease progression at time t with time-dependent prognostic risk factors measured at time s while allowing time-varying effects.

The predictive partly conditional model (PPCM) [5], extended from the partly conditional regression model [6], is a well developed approach for analyzing longitudinal outcomes in the presence of time-varying prognostic risk factors as well as time-varying effects. Variants of the PPCM used for predicting time-to-event outcomes using longitudinal data include the partly conditional survival model [7] and the landmark analysis model [8]. Prediction models through this class of approaches have a dynamic feature that prediction of future outcomes can be updated dynamically at varying prediction time points. Other methods for dynamic prediction for time-to-event outcomes were proposed recently. For example, [9] used a joint modeling approach relating the underlying latent process of longitudinal markers with survival outcomes. [10] proposed a two-stage approach with shifts in the hazard function.

This paper investigates the PPCM that models a future ordinal longitudinal outcome Y(t) at time t given patient information \(\varvec{X}(s)\) at time \(s < t\), allowing the effect of \(\varvec{X}(s)\) to depend on its measurement time s. There are a few appealing advantages of the PPCM. First, the parameter estimation of the PPCM is based on a generalized estimating equation (GEE) method with an independent working covariance structure, leading to unbiased estimates. Second, different from other methods [9, 10] which calculate the predicted risk of a time-to-event outcome through a transformed hazard function, the PPCM directly models the predictive mean of longitudinal outcomes. Lastly, the PPCM can incorporate longitudinal risk factors as opposed to baseline information only. Most existing research for the PPCM and its variants is focused on time-to-event data analysis, but little is applied to longitudinal data. Although [5, 6] gave detailed illustrations for the applications of partly conditional models, neither provided a comprehensive simulation study. Moreover, unbiased estimation for the covariance matrix of parameters in the PPCM needs to be considered under a bivariate domain \(\mathcal {D}=\{(s, t): s < t\}\), and requires some additional conditions. These conditions have not been explored in the literature yet, and will be elaborated explicitly in a subsequent section.

The structure of this paper is outlined as follows. Section 2 introduces the theoretical framework of the PPCM, presents the GEE framework for fitting the model, and provides some regularity conditions for unbiased estimates for both for both parameters and their covariance structure. An extensive simulation study is conducted in Section 3 to assess small sample properties of the PPCM and compare it with a traditional approach ignoring time-varying risk factor information under both true and misspecified models. In Section 4, the PPCM is applied to a study of predicting AD development risks through the data from the National Alzheimer’s Coordinating Center (NACC). Finally, Section 5 concludes the PPCM approach, and discusses its potential extension.

2 Method

2.1 Predictive Partly Conditional Model (PPCM)

Let \(Y(t)=0, 1, \ldots , K\) denote the longitudinal disease stage at time t with \(K + 1\) ordinal levels, and \(\varvec{X}^*(s)\) be the matrix of p time-dependent covariates at s, respectively. It is unnecessary to adopt a separate notation for time-independent covariates, as they can be viewed as a special case of \(\varvec{X}^*(s)\) with constant values over time. In addition, let \(\mathcal {H}(t)=\{Y(s): s < t, \varvec{X}^*(s): s \le t\}\) denote the disease history up to time t, including \(\varvec{X}^*(t)\) but excluding Y(t).

First, it is important to understand the difference between the fully conditional model, the fully marginal model, and the partly conditional model for Y(t) under a non-predictive setting (i.e., using covariates information up to time t). The fully conditional model refers to \(\textrm{P}\{Y(t) \mid \mathcal {H}(t)\}\), where the distribution of Y(t) is conditional on the entire disease history up to t [11]. This approach accounts for the evolving nature of Y(t) and \(\varvec{X}^*(t)\), but is technically challenging to implement and difficult to draw statistical inference as well. The fully marginal model refers to \(\textrm{P}\{Y(t) \mid \varvec{X}^*(t)\}\), where the disease history up to t (excluding t) is marginalized out of the fully conditional distribution [12]. It has a relatively easy inference procedure, but does not explicitly model the longitudinal process, only providing a cross-sectional interpretation. The partly conditional model [6] is a compromise between the two models and refers to \(\textrm{P}\{Y(t) \mid \varvec{X}(t), M(t)\}\), where \(\varvec{X}(t)\) is a function of \(\mathcal {H}(t)\) involving the entire or partial history of the outcome process, information on death or censoring, time-dependent or time-independent covariates, among others, and M(t) is an implicitly defined variable for selecting subjects at time t in the analysis. For example, if the interest is given to modeling the risk of AD progression for NC subjects, then set \(M(t) = 1\) only for the patients with NC at time t. This is analogous to the at-risk set indicator variable in survival analysis.

The PPCM considered in this manuscript refers to \(\textrm{P}\{Y(t) \mid \varvec{X}(s), M(s)=1\}\), where \(\varvec{X}(s)\) is the variable of interest measured at or up to time s. The PPCM inherits the partly conditional model format, but the focus is switched to estimating the trajectories of predictive distributions with varying s and t [5]. More specifically, the interest is in the predictive information in \(\varvec{X}(s)\) at different \(s < t\) for predicting Y(t) as well as how predictive \(\varvec{X}(s)\) is for predicting Y(t) at different t.

As described in [5], the PPCM can be applied to both binary and continuous outcomes. It can also be easily extended to ordinal outcomes. Since our work is motivated by scientific questions in modeling Alzheimer’ Disease progression with ordinal clinical diagnosis,i.e. NC, MCI, and AD, we present our work in the context of longitudinal ordinal outcome. It should be noted that our findings are not specific to ordinal outcomes. It is applicable to other outcomes under the PPCM setting.

For an ordinal longitudinal outcome Y(t) with levels \(\{0, 1, \ldots , K\}\), a proportional odds model is defined as

$$\begin{aligned} \textrm{logit}\left[ \textrm{P}\{Y(t) \ge k \mid \varvec{X}(s), M(s)=1\}\right] = \alpha _{k} + \varvec{X}(s)^T \varvec{\beta }(s, t), \end{aligned}$$
(1)

where \(k = 1, \ldots , K\) and \(\varvec{\beta }(s, t)\) is a p-dimensional vector of time-dependent effects. For Y(t) representing an AD clinical stage outcome, k is 0 for NC, 1 for MCI, and 2 for AD, respectively. As discussed in [5], \(\varvec{\beta }(s, t)\) can be written as bivariate regression splines in a domain \(\mathcal {D}=\{(s, t): s<t\}\) as \(\varvec{\beta }_q(s, t) = \sum _{b = 1}^{B_q} \beta _q^b C_q^b(s, t)\) for \(q = 1, \ldots , p\), where \(C_q^b(s, t)\) are basis functions. The basis functions should be chosen so that they allow the regression parameter functions, \(\varvec{\beta }_q(s, t)\), to be reasonably flexible and so that estimation is stable numerically.

2.2 Estimation

Let \(Y_i(t)\) denote the longitudinal ordinal outcome for the i-th subject with \(n_i\) being the number of visits. For the ease of derivation and programming, it is convenient to represent \(Y_i(t)\) by a vector of indicators \(\varvec{Y}_i(t) = \{Y_{i0}(t), Y_{i1}(t) \ldots , Y_{iK}(t)\}\), where \(Y_{ik}(t) = 1\) if \(Y_i(t) = k\); and 0 otherwise, for \(k = 0, 1, \ldots , K\). For instance, suppose that patient i has a NC diagnosis at time t (under the diagnosis scale of NC, MCI and AD), then \(\varvec{Y}_i(t)=\{1, 0, 0\}\). Moreover, let \(\varvec{\mu }_i(s, t)=\{\varvec{\mu }_{i0}(s, t), \varvec{\mu }_{i1}(s, t), \ldots , \varvec{\mu }_{iK}(s, t)\}\) denote the predictive partly conditional mean vector corresponding to \(\varvec{Y}_i(t)\) given \(\varvec{X}(s)\), i.e.,

$$\begin{aligned} \varvec{\mu }_{ik}(s,t) = \textrm{E}\left[ Y_{ik}(t) \mid \varvec{X}_i(s), M(s) = 1\right] = \textrm{P}\left[ Y_{ik}(t) = 1 \mid \varvec{X}_i(s), M(s) = 1\right] , \end{aligned}$$

for \(k = 0, 1, \ldots , K\). Define an overall mean vector \(\varvec{\mu }_i=\{\varvec{\mu }_i(t_1, t_2), \ldots , \varvec{\mu }_i(t_1, t_{n_i}), \ldots , \varvec{\mu }_i(t_{n_i-1}, t_{n_i})\}\) within \(\mathcal {D}\). The corresponding collection of outcomes becomes \(\varvec{Y}_i=\{\varvec{Y}_i(t_2), \ldots , \varvec{Y}_i(t_{n_{i}}), \ldots , \varvec{Y}_i(t_{n_i - 1}), \varvec{Y}_i(t_{n_i}), \varvec{Y}_i(t_{n_i})\}\). These two one-to-one matched vectors allow to incorporate all possible predictive paired observations \(\{\varvec{Y}_i(t), \varvec{X}_i(s)\}\) in the PPCM estimation. Overall, there is a total of \((n_i - 1)n_i/2\) elements in paired \(\varvec{\mu }_i\) and \(\varvec{Y}_i\) when there is no restriction on \(\mathcal {D}\) other than \(s < t\). Lastly, let \(\varvec{\theta }=\{\alpha _k, k=1, \ldots , K; \beta _q^b, q=1, \ldots , p, b=1, \ldots , B_q\}\) denote the vector of all parameters of length \(L = K + \sum _{q=1}^p B_q\).

Under the PPCM framework, model (1) can be fitted using the GEE approach [13] with the estimating equation specified as

$$\begin{aligned} \sum _{i = 1}^n \varvec{S}_i(\varvec{\theta }) = \sum _{i = 1}^n \varvec{Q}_i^T \varvec{V}_i^{-1} (\varvec{Y}_i - \varvec{\mu }_i)=0, \end{aligned}$$
(2)

where \(\varvec{Q}_i\) is defined as \(\partial \varvec{\mu }_i / \partial \varvec{\theta }\) and \(\varvec{V}_i\) is the working covariance matrix of \(\varvec{Y}_i\).

2.3 Asymptotic Properties

2.3.1 Consistency of the GEE Estimator

The solution of (2) is a consistent estimator for \(\varvec{\theta }\) if \(\textrm{E}[\varvec{S}_i(\varvec{\theta }_0)]=0\). Under partly conditional model [6], it has been shown that \(\textrm{E}[\varvec{S}_i(\varvec{\theta }_0)]=0\) if using an independent working covariance matrix, i.e., \(\varvec{V}_i\) is diagonal. It is also stated in [5] that the same condition is required under PPCM, without providing details on the rationale though. In the following, the details on the derivation of this condition are provided.

The most general estimation setup allows both s and t to vary. The condition \(\textrm{E}[\varvec{S}_i(\varvec{\theta }_0)]=0\) can be rewritten as

$$\begin{aligned} \sum _{s = 1}^{n_i - 1} \sum _{t \in \mathcal {D}_t(s)} \sum _{w = 1}^{n_i - 1} \sum _{r \in \mathcal {D}_r(w)} \textrm{E}\left[ q_{il}^{(s,t)}v_i^{(s, t; w, r)}\{\varvec{Y}_i(t) - \varvec{\mu }_i(s, t)\}\right] = 0 \end{aligned}$$
(3)

for each \(l = 1, 2, \ldots L\), where \(\mathcal {D}_t(s)=\{t: (s, t) \in \mathcal {D}\}\) is the domain of future outcome time t for a given s such that (s, t) is in the domain of interest and \(\mathcal {D}_r(w)\) can be viewed as an independent copy and \(q_{il}^{(s, t)} = \partial \varvec{\mu }_i(s, t)/\partial \theta _l\) and \(v_i^{(s, t; w, r)}\) is the element of \(\varvec{V}_i^{-1}\) corresponding to predictive paired observations at times (s, t) and (w, r).

Since data will be included in the analysis only when they are available and relevant, it appears convenient to introduce an indicator variable \(I_i(s,t)\) which equals to 1 if \(M(s)=1\) and \(\{\varvec{Y}_i(t), \varvec{X}_i(s)\}\) are observed for subject i, and 0 otherwise. Thus, (3) can be further rewritten as

$$\begin{aligned} \sum _{s = 1}^{n_i - 1} \sum _{t \in \mathcal {D}_t(s)} \sum _{w = 1}^{n_i - 1} \sum _{r \in \mathcal {D}_r(w)} \textrm{E}\left[ I_i(s,t) q_{il}^{(s,t)} I_i(w,r) v_i^{(s, t; w, r)} \{\varvec{Y}_i(t) - \varvec{\mu }_i(s, t)\}\right] = 0. \end{aligned}$$
(4)

When \(s = w\) and \(t = r\), contribution from the paired observation \(\{\varvec{Y}_i(t), \varvec{X}_i(s)\}\) and \(\{\varvec{Y}_i(r), \varvec{X}_i(w)\}\) in (4) can be reduced to \(\textrm{E}\left[ I_i(s,t) q_{il}^{(s,t)}v_i^{(s, t; s, t)} \{\varvec{Y}_i(t) - \varvec{\mu }_i(s, t)\}\right] .\) Then

$$\begin{aligned}&\textrm{E}\left[ I_i(s,t) q_{il}^{(s,t)}v_i^{(s, t; s, t)} \{\varvec{Y}_i(t) - \varvec{\mu }_i(s, t)\}\right] \\&= \textrm{E}\left\{ I_i(s,t) q_{il}^{(s,t)}v_i^{(s, t; s, t)} \textrm{E}\left[ \varvec{Y}_i(t) - \varvec{\mu }_i(s, t)\mid \varvec{X}_i(s), I_i(s, t)\right] \right\} \\&= \textrm{E}\left\{ I_i(s,t) q_{il}^{(s,t)}v_i^{(s, t; s, t)} \textrm{E}\left[ \varvec{Y}_i(t) - \varvec{\mu }_i(s, t)\mid \varvec{X}_i(s), M_i(s)\right] \right\} \\&=0 \end{aligned}$$

if assuming \(\textrm{E}\left[ \varvec{Y}_i(t) \mid \varvec{X}_i(s), I_i(s, t) = 1\right] = \textrm{E}\left[ \varvec{Y}_i(t)\mid \varvec{X}_i(s), M_i(s) = 1\right] .\) This assumption states that, \(Y_i(t)\) does not depend on whether or not it is observable given the observed covariates \(\varvec{X}_i(s)\). This basically is the non-informative observation process assumption.

When \(s \ne w\) or \(t \ne r\), all off-diagonal elements of \(\varvec{V}_i\) must be equal to 0 in order for (4) to hold. In other words, \(\varvec{V}_i\) needs to be a diagonal matrix such that

$$\begin{aligned} \textrm{E}\left[ I_i(s,t) q_{il}^{(s,t)}I_i(w,r)v_i^{(s, t; w, r)} \{\varvec{Y}_i(t) - \varvec{\mu }_i(s, t)\}\right] = 0. \end{aligned}$$

2.3.2 Consistency of the Covariance Estimation of the GEE Estimator

Asymptotic properties of the parameters estimated under GEE were developed by [14] with the asymptotic covariance matrix of \(\widehat{\varvec{\theta }}\), the estimator for \(\varvec{\theta }\), given as

$$\begin{aligned} \varvec{\Sigma }&=\lim _{n \rightarrow \infty } \left( \sum _{i=1}^n \varvec{Q}_i^T \varvec{V}_i^{-1} \varvec{Q}_i \right) ^{-1} \left\{ \sum _{i=1}^n \varvec{Q}_i^T \varvec{V}_i^{-1} \textrm{Cov}(\varvec{Y}_i) \varvec{V}_i^{-1}\varvec{Q}_i \right\} \\ {}&\qquad {}\times \left( \sum _{i=1}^n \varvec{Q}_i^T \varvec{V}_i^{-1} \varvec{Q}_i \right) ^{-1}. \end{aligned}$$

In general, the estimation of \(\varvec{\Sigma }\) could be obtained by replacing \(\textrm{Cov}(\varvec{Y}_i)=\textrm{E}\left[ \{\varvec{Y}_i-\textrm{E}(\varvec{Y}_i)\}\{\varvec{Y}_i-\textrm{E}(\varvec{Y}_i)\}^T\right]\) with \((\varvec{Y}_i - \varvec{\mu }_i)(\varvec{Y}_i - \varvec{\mu }_i)^T\). However, this cannot be applied to PPCM. The conditional mean of \(\varvec{Y}_i(t)\) given \(\varvec{X}_i(s)\) will vary depending on the measure time s. Therefore, if there are any paired observations \(\{\varvec{Y}_i(t), \varvec{X}_i(s)\}\) and \(\{\varvec{Y}_i(r), \varvec{X}_i(w)\}\) with \(t = r\), \(\varvec{\mu }_i\) will include two different elements \(\varvec{\mu }_i(s, t)\) and \(\varvec{\mu }_i(w, t)\) corresponding to the same outcome \(\varvec{Y}_i(t)\), and thus the estimation of \(\varvec{\Sigma }\) will be biased. To circumvent this issue, consider a simplified version of the predictive framework. By fixing the time window \(u=t-s\) to a constant value \(u_0\), varied s will result in varied t, and thus prevent the inclusion of predictive paired observations with duplicated outcome measurement times.

In summary, under the PPCM, consistent estimation requires an independent working covariance matrix and unbiased estimation for covariance matrix requests non-duplicate outcome measurement time in the paired observations. Therefore, we suggest using a fixed prediction window \(u_0\), which implies the domain of interest is \(\mathcal {D}=\{(s, t): t-s=u_0\}\). Under this condition, model (1) can be simplified to

$$\begin{aligned} \textrm{logit}\left[ \textrm{P}\{Y(t) \ge k \mid \varvec{X}(s), M(s)=1\}\right] = \alpha _{k} + \varvec{X}(s)^T \varvec{\beta }(s), k=1, \ldots , K, \end{aligned}$$
(5)

where the interpretation of parameters are specific to \(u_0\)-year prediction.

2.4 PPCM Dataset Construction and Implementation

The above PPCM estimation could be implemented using existing statistical software by constructing a PPCM dataset using the original dataset. Specifically, we expand the original longitudinal dataset by pairwise combining observations from visit time s and visit time t which satisfy the fixed time window criteria \(t-s=u_0\). Table 1 shows an example patient record in the original longitudinal dataset and how the patient record is expanded to construct the PPCM dataset. For most studies, time window \(u = t - s\) for many paired observations \(\{Y_i(t), \varvec{X}_i(s)\}\) might be close, but not exactly equal to \(u_0\), even for studies requiring routine clinical visits. Therefore, we add a bandwidth \(\delta\) to the time window and update the domain of interest to be \(\mathcal {D}=\{(s, t): u_0- \delta \le \mid t-s \mid \le u_0+\delta \}\). The choice of \(\delta\) should depend on \(u_0\) such that the interpretation of \(\varvec{\beta }(s)\) that is specific to \(u_0\)-year time period is reasonably valid for time windows ranged from \(u_0 - \delta\) and \(u_0 + \delta\). For example, for a study with annual visits, \(\delta\) can be 0.5 year.

Table 1 Example patient records in an original longitudinal data and the constructed PPCM data

Once the PPCM dataset is constructed, standard statistical software can be used for model estimation. With an independent working covariance matrix, the maximum likelihood estimator is equivalent to GEE estimators. Therefore, any statistical software that can fit a proportional odds model can be used to obtain the estimators. In our study, we used the lrm function in the rms R package [15]. To obtain the sandwich estimator for the covariance matrix of \(\varvec{\theta }\), one may use the robcov function which accounts for correlated records from the same subject in the PPCM dataset. A R package ‘ppcm‘ for the application and extension of the proposed approach is under development and can be found on https://github.com/liud4/ppcm.

3 Simulation Studies

3.1 Simulation Setup

To evaluate the performance of PPCM estimation, several simulation studies are conducted, each with 1000 simulated datasets. Each subject in the simulated datasets has a maximum of 20 visit times \(t = 1, \ldots , 20\). We consider two fixed prediction windows \(u_0=3, 6\). The number of observations \(n_i\) is determined by the censoring time \(T_i = \max (c=u_0+1, T^{*}_i)\) where \(T_i^{*} \sim \textrm{Exp}(\lambda )\) and with \(n_i = \sum _{t=1}^{20} I(t < T_i)\). The constant \(c = u_0 + 1\) is chosen to allow all subjects to be included in the PPCM model pertaining to the restriction of \(u_0\) in \(\mathcal {D}\). The censoring time distribution parameter \(\lambda = 0.2\) for \(u_0 = 3\) and \(\lambda = 0.1\) for \(u_0 = 6\).

Next, we simulate time-dependent variables of interests \(Z_i(t)\) and then longitudinal ordinal outcome Y(t) such that Y(t) follows proportional odds model. To do so, we firstly simulate a continuous latent outcome process \(Y_i^{*}(t) = \gamma _1 X_{i1} + \gamma _2 X_{i2} + \gamma _3 Z_{i0} + f(t) + \epsilon _i\) for \(t = 1, 2, \ldots , n_i\), where \(\gamma _1 = 1\), \(\gamma _2 = 0.1\), \(\gamma _3 = 1\), \(X_{i1} \sim \textrm{Bernoulli}(p = 0.5)\), \(X_{i2} \sim \mathcal {N}(0, 1)\), \(f(t)=3t\) representing the temporal effect and ensuring that \(Y_i(t)\) is monotonically increasing over time, and \(\epsilon _i\) following a logistic distribution with location of 0 and scale of 1. The variable \(Z_{i0}\) is simulated from a series of Bridge distribution [16] where \(Z_{i0}= b_i + \sum _{t = 1}^{n_i} Z^{*}_{it}/\gamma _3\) with \(b_i \sim \mathcal {N}(0, 0.25)\) and \(Z^{*}_{it} \sim \textrm{Bridge}(\phi =0.5)\). The Bridge distribution has the property that when a logistic regression model includes a random effect following the bridge distribution, the corresponding marginal model integrating out the random effect keeps the logit format. The time-dependent covariate is defined as \(Z_{i}(t) = Z_{i0} - Z^{*}_{it}/\gamma _3\). Defining the longitudinal ordinal outcome Y(t) as

$$\begin{aligned} Y_i(t) = {\left\{ \begin{array}{ll} 0, \, Y_i^*(t) \le \alpha ^{*}_1; \\ 1, \, \alpha ^{*}_1< Y_i^*(t) \le \alpha ^{*}_2; \\ 2, \, \alpha ^{*}_2 < Y_i^*(t),\\ \end{array}\right. } \end{aligned}$$

where \(\alpha ^{*}_1=8\) and \(\alpha ^{*}_2=23\), we show in the Appendix that for \(k=1,2\)

$$\begin{aligned}&\textrm{logit}\left[ \textrm{P}\left\{ Y_i(t) \ge k \mid X_i, Z_i(s), s < t \right\} \right] \\ \qquad {}&= \phi \{- \alpha ^{*}_k + \gamma _1 X_{i1} + \gamma _2 X_{i2} + \gamma _3 Z_i(s) + f(t)\} \\ \qquad {}&= \alpha _k + \beta _1 X_{i1} + \beta _2 X_{i2} + \beta _3 Z_i(s) + \phi f(t) \end{aligned}$$

where \(\alpha _k = -\phi \alpha ^{*}_k\), \(\beta _1 = \phi \gamma _1\), \(\beta _2 = \phi \gamma _2\) and \(\beta _3 = \phi \gamma _3\).

To simulate PPCM not following logit form, we simulate \(Z^{*}_{it} \sim \mathcal {N}(0, 1)\), and calculate \(Z_i(t)\), \(Y_i^{*}(t)\), and \(Y_i(t)\) as above. We then fit PPCM using proportional odds model as a working model and assess its performance against working true values of parameters. The working true value for \(\varvec{\beta }_1\), \(\varvec{\beta }_2\), and \(\varvec{\beta }_3\) under such misspecified model setting are obtained by fitting a working proportional odds PPCM to a simulated dataset with 100,000 subjects.

For both the correct and the misspecified model, we examine the scenarios with \(n = 200\), 500, and 1000. For each scenario, in addition to fitting a PPCM with a fixed prediction window \(u_0\), a PPCM without this restriction is fitted to evaluate the impact of not satisfying the consistency condition of the covariance matrix. The latter approach is termed as PPCM-u. To compare with a commonly-used approach ignoring time-dependent covariates after baseline, a longitudinal proportional odds (LPO) model only using baseline covariate information \(\textrm{P}\left\{ Y_i(t) \ge k \mid X_i, Z_i(0) \right\}\) is adopted. The predicted risk for a selected patient profile is compared: \(X_{i1}=0\), \(X_{i2}=0\) and \(Z_i(s)=6\), with s ranging from 0 to 3 for the PPCM model. For each simulation, the following statistics are reported: bias, mean-squared error (MSE), asymptotic standard error (ASE), empirical standard deviation (ESD), and coverage probability (CP) at the 95% nominal level.

3.2 Simulation Results

Table 2 shows the simulation results under the correct model specification for both PPCM and PPCM-u approaches. Both PPCM and PPCM-u have unbiased estimates as expected. The ASE and ESD from the PPCM approach are similar to each other, with CP close to the nominal 95% level, whereas ASE is much smaller than ESD under the PPCM-u approach, with CP on average 20% lower than the nominal level. The discrepancy between ASE and ESD under PPCM-u confirms our findings that the estimated covariance matrix of the parameter estimates is biased if the prediction window is not fixed. Overall, CPs are improved with increased sample size for the PPCM, but not for the PPCM-u.

Table 2 Simulation results under the correct model specification

Results for the simulation study of the misspecified model are displayed in Table 3. For the PPCM, the estimates are close to the working value with little bias. ASE is either similar to or slightly lower than ESD in all scenarios. The CP ranges from 88.1% to 91.5% for \(u_0=3\) and 92.5% to 93.9% for \(u_0=6\). This suggests PPCM works well even under a misspecified model. On the other hand, estimates using the PPCM-u approach are generally unbiased, but with ASE much lower than ESD, resulting in low CP for all scenarios. This observance is consistent with the results from the correct model specification.

Table 3 Simulation results under the misspecified model

The comparison between the PPCM and the LPO is based on the predicted probabilities for a patient progressing to disease stage \(k\ge 2\) with \(X_1 = 0\), \(X_2 = 0\), and \(Z = 6\). Figure 1 presents the mean predicted probabilities and empirical 95% confidence interval (CI) calculated as the 2.5% and 97.5% quantile of predicted risks from 1000 simulated datasets for \(u_0 = 3\) and 6 under both the correct model and misspecified model specification with \(n = 500\) subjects. For the PPCM, estimated predicted risks are unbiased under both the correct and misspecified model specification for \(u_0 = 3\) and \(u_0 = 6\), where the true predicted probabilities curve (solid line) almost perfectly overlaps with the mean predicted probabilities (dashed line). In addition, the 95% CIs from the correct model are narrower than that from the misspecified model. Obviously, the estimated predicted probabilities under LPO are constant over time, and thus biased, as it only involves baseline covariates in the model.

Fig. 1
figure 1

Simulation results comparing predictive partly conditional model (PPCM) and longitudinal proportional odds (LPO) model with \(n=500\). Solid lines are for the true model (TM). Dotted lines are from LPO model. Dashlines are from PPCM. Shaded areas are empirical 95% confidence intervals. Mean predicted probabilities and the corresponding empirical 95% confidence intervals were presented for a subject with \(X_1 = 0\), \(X_2 = 0\) and \(Z_0 = 6\) for a \(u_0 = 3\) under the correct model specification, b \(u_0 = 3\) under the misspecified model, (c) \(u_0 = 6\) using the correct model specification, and (d) \(u_0 = 6\) under the misspecified model

4 Application

The National Alzheimer’s Coordinating Center maintains data contributed by approximately thirty-nine past and present Alzheimer’s Disease Research Centers (ADRCs) supported by the National Institute on Aging. In 2005, NACC implemented the Uniform Data Set (UDS) to collect clinical demographic information, medical history, neurological examination, and neuropsychological evaluation from all ADRCs using a standard evaluation protocol [17]. A previous cross-sectional analysis has examined source of cognitive complaints at baseline in predicting diagnosis conversion using NACC UDS dataset [2]. In this motivating study, we are interested in exploring the performance of time-dependent source of cognitive complaint among NC participants in predicting AD clinical diagnosis. Using the same inclusion/exclusion criteria as [2], a total of 3023 participants from 31 ADRCs evaluated between 9/01/2005 and 12/31/2014 as part of the UDS with diagnosis of NC at the first visit were included.

We fit the following PPCM with prediction window \(u_0 = 3\) years and bandwidth \(\delta = 1\) year, with ordinal outcome \(Y(t) = 0, 1, 2\) representing three AD diagnosis (NC, MCI and AD) respectively. A time-varying effect \(\varvec{\beta }(s)\) was considered by including interactions between a spline function \(\varvec{\beta }(s)\) and all covariates \(\varvec{X}(s)\). Only significant interactions (\(p < 0.05\)) were kept in the final model. One can use a restricted spline for \(\varvec{\beta }(s)\) to replace s in the above model to allow nonlinear time-dependent effect. Since the nonlinear effect was not significant, we used a linear term for s in this application. The source of cognitive complaint is the time-dependent covariate of interest with four mutually exclusive levels: (1) no cognitive complaint, (2) self cognitive complaint only, (3) informant cognitive complaint only, or (4) both self and informant cognitive complaint. Other time-dependent covariates are measurement time s, modified Framingham Stroke Risk Profile (mFSRP) score [18] excluding age component and global cognitive functioning assessed by the Mini-Mental State Examination (MMSE). Time-independent covariates included in the models are age at baseline, gender, education, race/ethnicity, and APOE4 carrier status. Time difference between paired observation was also adjusted The model with linear term for s in \(\varvec{\beta }(s)\) has the following format:

$$\begin{array}{*{20}r} \hfill {{\text{logit}}\left[ {P\{ {\text{Diagnosis}}(t) \ge k\} } \right] = \alpha _{k} + \beta _{1} \times {\text{Baseline Age}} + \beta _{2} \times {\text{Female}} + } \\ \hfill {\beta _{3} \times {\text{White}} + \beta _{4} \times {\text{Years of Education}} + \beta _{5} \times {\text{APOE4 + }} + } \\ \hfill {\beta _{6} \times {\text{MMSE}}(s) + \beta _{7} \times {\text{mFSRP}}(s) + \beta _{8} \times {\text{Source of Cognitive Complaint}}(s) + } \\ \hfill {\beta _{9} \times {\text{Time Different}} + \beta _{{10}} \times s + } \\ \hfill {\beta _{{11}} \times s \times {\text{Female}} + \beta _{{12}} \times s \times {\text{Years of Education}}} \\ \hfill { \ldots ,} \\ \end{array}$$

where \(\ldots\) denote the rest two-way interaction terms. We did not use age as time-dependent covariate since we included s, which is equivalent to age at the time s minus baseline age. We fitted a model specific to AD diagnosis at time s because the effects might differ vastly depending on the current disease stage. In other words, we let M(s) be 1 if \(Y(s)=0\) and 0 otherwise.

Table 4 NACC application: Results from the PPCM with time-dependent covariates

Table 4 shows the results of the PPCM application to NACC data. All covariates measured at time s except for the main effect of age are significantly predictive for 3-year AD clinical stages. NC participants who are older (OR \(= 1.08\), P-value \(< 0.001\)), White (OR \(= 1.95\), P-value \(< 0.001\)), APOE-\(\varepsilon 4\) carriers (OR \(= 2.06\), P-value \(< 0.001\)), with lower MMSE (OR \(= 0.77\), P-value \(< 0.001\)) and higher mFSRP (OR \(= 1.03\), P-value \(= 0.02\)) are at greater risk of AD progression. All three sources of cognitive complaint relative to no complaint increase the risk of AD progression with the odds ratio (OR) \(= 2.41\) (P-value \(< 0.001\)) for self complaint only, OR \(= 3.13\) (P-value \(<0.001\)) for informant complaint only and OR \(= 4.82\) (P-value \(< 0.001\)) for both complaints. These effects do not vary with measurement time s.

Since we let time-dependent effect \(\varvec{\beta }(s)\) be a linear function of s, the interaction between s and other terms explores the significance of the time-dependent effect. The significant interaction between measurement time s and gender (OR \(= 0.82\), P-value \(< 0.001\)) suggests the effect of gender varies with measurement time s as the time-dependent effect \(\beta (s)\) for age is . Similarly, the effect of education also varies with measurement time (OR \(= 0.98\), P-value \(= 0.028\)). To further illustrate the time-varying effect of gender and education, we compared 3-year predicted probabilities of AD progression for females versus males and different education levels at varying measurement time s in Figure 2 for a particular NC participant profile (initial age of 65, white, APOE-\(\varepsilon\)4 positive, MMSE score of 26, mFSRP of 16 and self-complaint only). The relationship between measurement time s and probability of developing AD varies with sex. As measurement time increases, females would decrease risk of AD progression over time, with greater decline compared to males. As measurement time increases, we see a difference in predicted probabilities based on education level. Higher education level is associated with slower increase of the predicted risk of AD progression over time among NC participants.

Fig. 2
figure 2

NACC application: Predicted risk of 3-year AD progression and the corresponding empirical 95% confidence intervals for a specific participant profile by a Sex: Female vs Male and b Education: 12 years, 15 years and 18 years. The selected participant profile using mean for continuous variables and prevalent level for categorical variable includes initial age of 65, male (for comparison by education), white, APOE-\(\varepsilon\)4 positive, 16 years of education (for comparison by sex), MMSE score of 26, mFSRP of 16 and self-complaint only

5 Discussion

In this paper, we propose a predictive partly conditional model for longitudinal ordinal outcomes which could involve both time-varying effect and time-dependent covariates. The proposed PPCM is applicable to model progressive disease severity with time-dependent risk factors/biomarkers. We discovered the conditions required for consistent estimation for both the parameters and the covariance matrix under the GEE approach, which include using an independent working covariance matrix and fixed prediction time window. Simulation studies show the proposed PPCM approach works well under both correct and misspecified model specification. Application of the PPCM to NACC data demonstrates the importance of including time-dependent information from important risk factors.

Our proposed PPCM is novel. Existing work on partly conditional models for longitudinal outcomes are limited, with most partly conditional model approaches focused on survival outcomes with time-dependent covariates. Previous use of the partly conditional model has either been restricted to association study between \(\varvec{X}(t)\) and \(\varvec{Y}(t)\) [6] or to explore predictiveness of \(\varvec{X}(s)\) for \(\varvec{Y}(t)\) when either s or t is fixed [5]. In addition, we did not find any papers on partly conditional models for longitudinal outcomes that conducted simulation studies to rigorously assess the statistical inference. Showing the predictive paired observation times (s, t) and (w, r) could not have the same future outcome time \(t=r\) in order to have consistent estimates for the covariance matrix, we provided a solution by using a fixed prediction window to obtain unbiased results. In addition, the advantage of the PPCM approach is its ease of implementation. It can be applied by creating an expanded PPCM dataset including paired observations satisfying the pre-specified domain of interest and using existing statistical software with functions to fit a proportional odds model and calculate robust sandwich standard errors. Lastly, although this paper only considers longitudinal ordinal outcomes, the proposed PPCM concept could be used for any type of longitudinal outcomes.

While the PPCM is quite flexible and can be applied to a variety of questions of inference, there are some limitations. As described in Section 2.2, the consistency of PPCM requires the non-informative observation process assumption, where the value of a future outcome does not depend on its availability given current covariate measurements. If this assumption is violated, PPCM estimation might be biased. This naturally leads to our future work to overcome this barrier.