Background

Cardiovascular disease (CVD) is a leading cause of morbidity and mortality worldwide, accounting for 47 and 39% of deaths in females and males, respectively, in European Society of Cardiology member states [1]. Risk prediction models inform the understanding and management of CVD and have become an important part of clinical decision making. Many risk prediction models for CVD use one data point per patient (usually at baseline), such as the widely used Framingham Risk Score which predicts risk for coronary heart disease, [2] or QRISK3 which predicts risk of CVD in a subset of the UK population, and is widely used in CVD risk stratification in the UK [3]. These models use many variables at baseline including systolic blood pressure (SBP), total cholesterol, high-density lipoprotein cholesterol, or smoking status. As such, many cardiovascular risk prediction models do not account for measurement error or changes in risk factors over time [4, 5] which could lead to biased estimation. For example, SBP generally increases as people age, while diastolic blood pressure initially rises but starts decreasing after the age of 60 [6]. Further, as people age, they accumulate more risk factors. These complex and dynamic changes over time must be accounted for when modelling CVD risk to achieve the most robust possible risk prediction.

In risk prediction, longitudinal data permits the study of change in risk factors over time, accounting for within person-variance and usually provides an increase in power while reducing the number of patients needed [7]. However, analysis of longitudinal data adds complexity, such as dependence between observations, informatively censored or incomplete data and non-linear trajectories of longitudinal risk factors over time. Addressing these issues can add significant complexity and computational burden to the analysis.

The association between longitudinal measurements of blood pressure and risk of CVD has been studied using summaries such as time-averaged, cumulative, [8] trajectory patterns [9] and variability [10, 11]. However, less effort has been invested in modelling the complete record of longitudinal measurements, e.g. as time-varying covariates. Using summary measures in risk prediction models could be ineffective due to possible heterogeneity of variance for the summary measure. A review of risk prediction models covering the period 2009–2016 found that 46/117 (39.3%) studies considered longitudinal data, and only 9/117 (7.7%) studies included longitudinal data as time-varying covariates [12]. A more recent review of available methods adopted for harnessing longitudinal data in clinical risk prediction showed a further increase in the development of risk prediction models over the period 2009–2018 and identified seven different methodological frameworks [13].

The aim of this review was to conduct a comprehensive methodological evaluation of the estimation of risk for developing CVD in the general population, specifically targeting studies with a longitudinal design with three or more time-points, to allow for the trajectory of the longitudinal variable(s) to be modelled in predicting CVD risk.

Material and methods

Selection criteria

This review focused on risk prediction for CVD. Studies were included if they had a longitudinal design with data analyzed over at least three time points, where the outcome was a clinical diagnosis of a cardiovascular disease(s) or mortality. Cross-sectional, animal, and paediatric studies were excluded.

Search strategy

MEDLINE-Ovid was searched from inception until 3 June 2020 with no language restrictions. Search terms used for data type and modelling type were “longitudinal, repeat* measure*, hierarchical, multilevel model*” and “change, slope, trajector*, profile, growth curve” respectively in all text. For disease area, the following search terms were used: “cardiovasc*, cerebrovasc*, atrial fibrillation, coronary (and artery or disease), stroke” in title, “cardiovascular disease, brain ischemia, heart diseases” in MeSH with subheadings or “myocardial infarction, coronary disease, stroke, intracranial hemorrhages (without intracranial hemorrhage, traumatic)” in MeSH without subheadings. The standardized search filter, along with the search approach and search terms are listed in Fig. 1 and Supplementary Table 1. Studies needed at least one term for data type, modelling type and disease area. Further, the reference lists of included studies were reviewed to identify any additional relevant articles.

Fig. 1
figure 1

Summary of search strategy

Consideration of studies for inclusion followed a three-step process. First, titles were considered. Second, abstracts of potentially eligible studies were considered. Third, after abstract screening, the full-text articles were retrieved and assessed for eligibility. The first author (DS) completed the screening of studies and other authors were consulted to resolve any queries. Reasons for exclusion were recorded.

Data extraction

The following information were extracted from each study: first author, year of publication, model type, dataset region, time period for data collection, age range, proportion of males, length of follow-up, number of patients, number of longitudinal time points, longitudinal and survival outcome data types, covariates adjusted for in longitudinal and survival models, survival and longitudinal outcomes, and characteristics of the statistical and modelling approaches used including assumptions, handling of missing data, model selection, and software used. Data extraction was conducted by the first author (DS), with other authors consulted to resolve any queries.

Results

The searches returned 2601 studies with 12 duplicates (Fig. 2). Based on screening titles and abstracts, 2150 studies were excluded. The full texts were considered for 439 articles and a further 34 were excluded due to ≥1 of the following reasons: data not longitudinal, review article, data were summary measures rather than individual patient data, or non-CVD/mortality outcome. The number of repeated measures was assessed for 405 studies. A further 325 further studies were excluded due to having less than three repeated measures reported. Eighty studies were included in the review (Fig. 2) [14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93].

Fig. 2
figure 2

Flow chart of study selection

General characteristics

Characteristics of the included studies are summarized in Table 1. Sixty (75%) studies reported analyses on large sample sizes (≥1000 patients). Exactly three longitudinal measurements were available in 27 (33.8%) studies, while 47 (58.8%) reported ≥3 data points with a mixture of median, mean or maximum number of longitudinal observations per patient; however, many studies did not utilize all available measurements. Follow-up lengths varied widely from 31 days [48] to 35 years, [50] with 29 (36.2%) reporting over a 10–20-year period. Patients were often followed up for survival after the last repeated measure, with 47 (58.8%) studies reporting a total follow-up of ≥10 years, while 31 (38.8%) reported a longitudinal outcome follow-up of ≥10 years. Over three-quarters (n = 65, 81.3%) were published after 2010, 15 studies (18.8%) were published prior to 2010. Data collection for many longitudinal datasets (n = 20, 25.0%) began in the 1980s, only 13 (16.2%) studies were from the 1990s, and about one-third were completed in the 2000s (n = 26, 32.5%).

Table 1 General characteristics of studies and outcomes included in the review

Outcome data

Most (n = 63, 78.8%) studies reported disease outcomes as time-to-event or survival outcomes. Fewer studies examined disease outcomes as binary (n = 5, 6.2%), continuous (n = 8, 10.0%) or rates (n = 4, 5.0%). Most (n = 69, 86.2%) longitudinal outcomes were continuous; other longitudinal outcome types were binary (n = 3, 3.8%), categorical (n = 5, 6.2%), or ordinal (n = 3, 3.8%).

Adjusting for covariates

Sixty-one studies (76.2%) adjusted for age and 45 (56.2%) adjusted for sex as covariates in their survival analysis, while four (5.0%) stratified by age and three (3.8%) for sex. Nine (11.2%) studies analyzed data separately for each sex. Seventeen (21.2%) longitudinal analyses were adjusted for age, while 30 (37.5%) were not. Sex was adjusted for as a covariate in 9 (11.2%) longitudinal analyses. Four (5.0%) studies analyzed longitudinal data separately by sex, and 28 (35.0%) did not adjust for sex.

Statistical analysis

This review has identified a variety of statistical analysis methods that have been incorporated to analyze time-to-event and longitudinal outcome data. Three (3.8%) used a simple statistical test [14,15,16]. For example, Albani et al. [16] used the Wilcoxon signed rank test to compare two risk scores (the Framingham Risk Score and an atherosclerotic cardiovascular disease risk score) before treatment with pasireotide and 6 and 12 months after treatment. Other statistical approaches for modelling CVD risk using longitudinal data can be divided into three categories: 1) single-stage approaches including basic summary measures, 40 (50.0%), [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56] 2) two-stage approaches using an estimated longitudinal parameter as a covariate in a survival outcome model, 29 (36.3%), [57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85] and 3) joint models fitting longitudinal and survival data simultaneously, 8 (10.0%) [86,87,88,89,90,91,92,93].

Characteristics of included studies

The characteristics of the included studies by different modelling approaches is shown in Table 2. Joint models have been fitted on smaller datasets with only one study using a joint model on a dataset of over 10,000 patients [87]. A larger proportion of two-stage or joint models had patients with a variable number of time points included compared to single-stage approaches (24/37 (64.9%) vs. 23/40 (57.5%), respectively). Five (6.3%) studies did not report the number of time points used in their analyses. Two-stage approaches were used on 10/16 (62.5%) datasets collected in Asia but only in 2/22 (9.1%) on European datasets. The longitudinal analysis in two-stage approaches rarely adjusted for age or sex, with adjustments made in 6/29 (20.7%) and 7/29 (24.1%), respectively. The frequency of studies using each model type over time is shown in Fig. 3. Since 2010, a substantial increase in the number of papers using two-stage approaches was observed with 26/65 (40.0%) using them after 2010 vs. 3/15 (20.0%) before. Use of joint models also commenced later that decade with only one study before 2015.

Table 2 Summary of characteristics of studies included in the review by model type
Fig. 3
figure 3

Stacked bar chart showing the frequency of the statistical model types by year

A complete case analysis was used in 65/80 (81.3%) studies, more often in smaller (< 1000, 16/18, 88.9%) and very large (> 10,000, 18/21, 85.7%) cohorts than medium-sized studies (1000–9999, 29/39, 74.4%) and those with a variable number of time-points (39/48, 81.3%) compared with exactly three time points (21/27, 77.8%). In addition, those with shorter follow-ups (< 10 years, 19/33, 57.6%) were more likely to use a complete case analysis. The methods used for handling missing data included multiple imputation (n = 6), single imputations (n = 3), last observation carried forward (n = 2) and indicators for missing variables (n = 2).

Single-stage approaches

A single-stage approach was used in 40 (50%) studies [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56] (Table 3). The most common risk prediction model for single-stage models was the Cox proportional hazards (PH) model (n = 25, 62.5%) [94]. The model assumes a proportional effect on the hazard; the PH assumption should be checked, either by including time-varying coefficients or by a variety of graphical testing methods, such as Schoenfeld residual plots and log-log plots. Only 9/25 (62.5%) of articles utilizing Cox PH models as a single-stage approach stated that the PH assumption was checked [95].

Table 3 Summary of single-stage models used to incorporate longitudinal data in survival models

The simplest method of utilizing the Cox PH model was used by including the values of the longitudinal outcome at baseline (Time 0) (n = 7, 17.5%) [18, 21, 24, 43, 50, 53, 54]. For example, Tanne et al. used baseline values of SBP to predict ischemic stroke mortality [53]. This model is easily interpretable clinically; it only uses data from a single time-point per patient and does not take into account all available data. Clustering and meta-analysis techniques were also incorporated through the Cox PH model. A study using impaired sleep as a CVD risk factor included patients in two separate baseline waves. Patients could appear in both waves and clustering was accounted for when fitting the Cox PH model [24]. A study examining the association between cholesterol and cardiovascular mortality fitted Cox PH models for each year of follow-up, and combined the coefficients from these models using meta-analysis techniques [50].

Three (7.5%) studies included the difference between the longitudinal predictor at baseline and a previous value as a covariate in the Cox model, [28, 35, 38] for example, risk of coronary heart disease was predicted by using the difference between a patient’s current Framingham Risk Score and their score 3 or 6 years ago [35]. This is a simple measure; however, it assumes that change is linear between two time-points. Further, three (7.5%) studies used a slope to predict CVD and the slope was calculated manually by dividing the difference by time duration [25, 29, 32]. Other summaries were included in the Cox PH models as covariates such as a mean, [36] mean change, [36] standard deviation, [19] summaries of changes between categories [20, 34, 41] and stability in categories [20, 34, 41].

Six studies (15.0%) included longitudinal predictors as time-dependent covariates in the Cox model [39, 42, 45, 47, 49, 51, 55] by splitting the timescale at each time point when predictors are updated. Reinikainen et al. included time-dependent summary measures as time-dependent covariates; updated mean values and the change between the current and previous time-points for SBP, total cholesterol and current smoking status [39].

Three studies (7.5%) used logistic regression to model a binary disease outcome [30, 31, 48]. One included the predictor at baseline, [31] another compared the predictive power at multiple time points to predict risk of myocardial infarction by including them in separate models, [30] while the third used summary measures (mean (SD), mean change from baseline, range and average daily risk range) of blood glucose to predict mortality in myocardial infarction patients [48].

Four (10.0%) studies used generalized estimating equations (GEE) to model a disease outcome. Two had binary outcomes, [17, 27] while two others modelled rates [22, 37]. Of the four studies, two used a logit link [17, 27] and two used a log link [22, 37]. All four included data from multiple time points. One of the studies used summaries of changes in socioeconomic status and lifestyle habit variables between categories such as stable, increasing (in the second or third time point), decreasing or unstable, to predict the Framingham Risk Score [22].

Two studies included baseline values of the longitudinal predictor in a Poisson regression model, [23, 26] a form of Generalized Linear Model (GLM) that can be used as a fully parametric alternative to the Cox PH model. Poisson regression for survival analysis involves splitting the follow-up time into intervals and assuming a constant baseline hazard in each interval [97].

Four (10.0%) studies modelled changes in risk scores over time using linear mixed effects (LME) models, [33, 44, 46] for example, predicting the trajectory of the Framingham Risk Score over four time-points [44]. Fixed effects linear regression was used by one study [52] to examine how change in body mass index (BMI) is correlated with the Framingham Risk Score.

Two-stage models

A two-stage modelling approach was used in 29 (36.3%) studies (Table 4) [57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85]. In a two-stage approach, the longitudinal data is first summarized with a longitudinal model(s). Parameters and/or estimates from this/these model(s) are then included as covariates in a survival model. The Cox PH model was used in most studies (n = 26, 89.7%) [57, 58, 60,61,62,63, 65,66,67,68,69,70,71,72,73, 75,76,77,78, 80,81,82,83,84,85, 88]. A weakness of the two-stage approach is that uncertainty in the longitudinal data summaries produced in the first stage is ignored.

Table 4 Summary of two stage approaches used to incorporate longitudinal data in survival models

Two methods were commonly used to generate summaries from longitudinal data to include in a Cox PH model as a covariate. The simplest method calculated summary measures such as a slope or the coefficient of variation (equivalent to residual variance) using a linear regression model for each patient in nine studies (31.0%) [57, 62, 63, 71, 78, 80, 82, 83, 85] Gao et al. used linear regression to estimate the intercept, slope, square of the slope and coefficient of variation for blood pressure that were then included in a Cox PH model to assess how variation and changes in blood pressure were associated with mortality [63].

The second most frequently used method (n = 17, 58.6%) was group-based trajectory models (GBTMs) to model the trajectory of the longitudinal variable [58, 60, 61, 65,66,67,68,69,70, 72, 73, 75,76,77, 81, 84, 88]. Wang et al. identified four separate trajectories of sleep duration and used these to predict risk of cardiovascular events or mortality [69]. Most models were fitted using the Proc Traj package from SAS [98] (n = 10, 58.8%), [60, 65,66,67,68,69,70, 73, 75, 81] although other software, including Stata (traj) [99] and R (lcmm) can be used [100]. Trajectory groups from GBTMs were also used in logistic regression (n = 1) [64] and Poisson regression (n = 1) [79] analyses of survival outcomes.

Desai et al. used weighted pooled logistic regression with inverse probability weights (IPWs) to examine the association between changes in serum uric acid and risk of incident diabetes, CVD and renal decline [59]. These models are complex, but the resulting hazard ratios can be interpreted as causal estimates assuming no unmeasured confounders [101].

Joint models

A joint modelling approach, where both the longitudinal variable and the survival model are fitted simultaneously, was used for eight studies (10.0%) [86, 87, 89, 91,92,93]; (Table 5). This approach makes full use of the available data and may be more statistically efficient than fitting a two-stage model; however, this increases the computational complexity.

Table 5 Summary of joint modelling approaches used to incorporate longitudinal data and survival data

.

Five studies (62.5%) [86, 87, 91,92,93] modelled the longitudinal outcome using an LME model and the survival outcome using a Cox PH model. One study used the model to analyze the association between blood pressure and coronary artery disease [92].

Batterham et al. used latent growth models, which is similar to LME models, to predict the slope and intercept of five different cognitive tests jointly with a Cox PH model to predict the risk of all-cause mortality and cause-specific mortality. The model is fitted using Mplus [89]. Ogata et al. used a GBTM jointly with a Cox PH model to predict risk of CVD using trajectories of fasting plasma glucose [88]. van den Hout et al. used a Bayesian approach to jointly model ordinal data from the Mini-Mental State Examination. Item response theory (IRT) models were used to model the ordinal data before using Gompertz survival models to model a multi-state outcome (e.g. healthy, history of strokes and death) [90].

Discussion

This review has identified a multitude of methods to analyze the risk of CVD using longitudinally repeated data. There has been an increase in the complexity of methodology used over the past two decades, with an increasing proportion of studies applying more efficient approaches such as two-stage and joint models over time. However, many studies only used simple analysis based on one time-point, even when more data were available.

When CVD risk was modelled in a two-stage model, two methods were commonly used: patient-level linear regression to account for longitudinal data, followed by the Cox PH model to estimate CVD risk, or GBTMs followed by the Cox PH model. On the other hand, in a joint model, the longitudinal and survival data are modelled simultaneously. Both models aimed to utilize a patient’s time-varying risk factors to predict CVD risk. These models can provide an important understanding of the association between changes in risk factors over time and CVD risk, which can be used to influence risk management decisions.

The characteristics and assumptions of a model need to be considered carefully when selecting and interpreting models. Although a time-dependent covariate Cox PH model provides an advantage by enabling risk estimates to be updated during follow-up for new individuals, the model assumes that values are constant between two time-points and are measured without error. Computationally, the model can quickly become unfeasible to fit if predictor values are updated at different time points for each individual. This model is also prone to greater overfitting as a time-dependent covariate forms a complex function over time which could lead to too much modelling; hence, this should be used with caution [102].

The disease risk is estimated as an odds ratio from logistic regression, and it should be interpreted appropriately (not as a risk ratio), especially when the outcome is not rare. Odds ratios cannot be compared between datasets or models with different independent variables because they reflect unobserved heterogeneity between observations which varies between datasets and models [103].

Three different methods to model within-patient variation with a continuous outcome were encountered: GEEs, LME models and fixed effects regression. GEEs are an extension of GLMs that allows a correlation structure between observations [104, 105]. Similarly to GLMs, using different link functions or distributions, GEEs can be used to model continuous, binary, count or binomial outcomes. LME models are an alternative for continuous outcomes, which assumes that the residual error is normally distributed and models within-patient correlation with random effects which are also assumed to be normally distributed and independent of covariates. This allows LME models to make individual patient predictions rather than just the population-level predictions from a GEE [106]. Fixed effects regression relaxes the assumption that random effects are independent of covariates. The model is computationally easier to fit than an LME model and is more appropriate if unobserved heterogeneity is correlated with covariates [107].

GBTMs are a form of a finite mixture model that is an effective way of identifying a fixed number of groups of individuals who follow similar trajectories [108]. However, they are computationally difficult to fit. The results of this model may also be difficult to apply in clinical practice as it can be difficult to assign a patient to one trajectory group by hand accurately.

In a standard joint model, the longitudinal outcome is modelled by an LME model and the survival outcome by a Cox PH model. The two outcomes are linked via shared random effects to capture the time-dependent association between longitudinal measurements and the risk of an event [109]. This association can be defined in a variety of ways, but common approaches include a linear predictor (i.e. current value), a derivative (i.e. rate of change) or an integral (i.e. cumulative effect) of the linear predictor.

The reasons for the slow increase in the utilization of two-stage and joint models is multi-factorial. Computationally these models can be much harder to fit than single-stage models, with joint models in particular conveying significant computational burden. Also, there is poor awareness of inefficiency in simple methods. Many studies may not include a statistician as part of the research team and therefore, authors may not have the requisite experience of analyzing longitudinal data. However, as these methods become more common, and software to fit the models becomes more accessible and computationally more powerful, the utilization of more efficient methods should increase over time.

Different risk prediction models are appropriate for different settings. Models may be used for prediction in a clinical setting or used for studying the association between an exposure and an outcome. Many risk prediction models require computation to obtain a precise risk prediction which poses difficulties in a clinical setting. Existing risk prediction models such as QRISK3 use online calculators to predict risk using a complex model. Inputting all longitudinal data into an online calculator may not be possible in a clinical setting. Alternatives include either using single-stage models including summaries of the longitudinal data such as means, slopes or differences or integrating the risk prediction model into EHRs software. More complex models such as two-stage or joint models are very useful for explaining associations although interpretation can require more thought. Joint models especially need greater consideration when interpreting association structures such as random effect associations. Assigning and interpreting complex groups for GBTMs can be difficult for clinicians in practice although it is sometimes possible to assign clear descriptions to GBTM groups such as high, low, increasing or decreasing.

Reporting of the data in the included studies was highly variable. For example, the number of time-points used per patient in each study was disparate with studies choosing from a selection of mean, median, a range (e.g. 3–5), the maximum possible or frequency over the follow-up period; some studies, especially studies based on electronic health records, did not report the number of time-points, resulting in difficulties ascertaining exactly how many measurements were used. Follow-up length was also described as a date range, mean, median, maximum, dates of study waves etc. This resulted in a loss of clarity, especially when studies had a separate follow-up period for longitudinal data collection and for the survival outcome. Also, some studies did not report variables removed as part of variable selection.

Strengths and limitations

This review examined all available studies that have assessed the relationship between the trajectory of longitudinal risk factors and the risk of a cardiovascular event or mortality, and summarized the methods used in analyzing longitudinal risk factors for CVD risk. This review can be readily used to identify methods for future analysis of longitudinal trajectories and risk prediction in CVD. However, due to search terms having this specific focus, single-stage models underutilizing the data available are more likely to be underrepresented.

Queries over eligibility or the article content were thoroughly discussed among the authors of this review before reaching the final decision. However, articles were searched and screened by one author and there remains a possibility of bias or error. This review focused solely on a search of MEDLINE-Ovid providing a focused and consistent search, although inclusion of other bibliographic databases may have returned other studies.

This review was designed to highlight the strengths of statistical methods for summarizing longitudinal data to predict CVD risk. A deeper comparison of the methods using simulated data have been discussed in the literature numerous times as the methods were first developed or in their application [110,111,112]. A machine learning approach may also be worth considering when designing a study, although our search only identified one study using machine learning methods [113]. Machine learning algorithms have the potential to provide stronger predictions of risk using many variables; however, this incurs greater potential for overfitting and collinearity between variables. To avoid this, machine learning applies a greater focus on increased model validation, preferably external validation [114].

Conclusions

The use of two-stage and joint models is a critical part of understanding the relationship between the longitudinal risk factors and CVD. Many studies still employ single stage approaches which often underutilize available longitudinal data when modelling cardiovascular risk. Further studies should aim to optimize the use of longitudinal data by using two-stage and joint models whenever possible for a more accurate estimation of cardiovascular risk.