Introduction

Background

Advances in medicine and improvements in public health have resulted in more people living longer with chronic illnesses for which there are few or no curative treatments [1,2,3]. Identifying individuals who have a higher mortality risk, especially earlier in their illness trajectory, can assist with reassessing goals of care, revaluating effective treatments, illness manageability, exploring psychosocial and physical issues, and initiating palliative care referral [4, 5]. Mortality prediction also has important applications in research and policy [6, 7].

Various prognostic indices to evaluate long-term mortality prediction exist [8, 9]. The quality of such studies is variable, often due to data limitations [10]; the strongest design is to utilise a large sample divided for derivation and validation, evaluated at baseline on a wide range of potentially relevant predictors, and followed reliably from baseline to death [11, 12]. The Gateway to Global Aging family of longitudinal ageing studies offer data with these characteristics in multiple countries [13]. In 2006 in the United States, Lee and colleagues developed a four-year mortality index for the general population of adults aged 50 + using data from the Health and Retirement Study (HRS) [14]. Kobayashi and colleagues (2017) followed this template using the HRS’s sister study English Longitudinal Study of Ageing (ELSA) to develop a ten-year mortality index for the general population of adults aged 50 + in England [15]. Both indices identified the age, sex, comorbid and functional characteristics associated with mortality, assigned each characteristic points based on the strength of that association, and derived the index as the sum of these points. A key strength of this approach is simplicity and parsimony, increasing usability by others. For example, calculating scale scores of functional limitations require six or eight specific data collection points. Such data collection is time-consuming in clinical practice and the data points may not be routinely available in data for research or policy. By identifying specific data points that can be elicited from routine data or very quick patient interactions (e.g. need assistance toileting), investigators have derived indices that are simple for others to apply. Both reported excellent discrimination and have been widely cited.

Aim and rationale

This study aimed to develop and validate an index that predicts four-year mortality in the general population aged 50 + in the Republic of Ireland, using the HRS sister study The Irish Longitudinal Study on Ageing (TILDA). First, we used TILDA data to replicate methods from the mortality indices in HRS and ELSA. Second, we compared performance of the Irish, US and English indices in TILDA data to assess how important local data inputs are to predictive power. Third, we extended this approach by including additional healthcare utilization and self-report wellbeing and social connectedness variables using TILDA data. This third aim allows us to identify important additional variables that are easy to elicit in clinical practice, and/or recorded in medical files, that are expected to be associated with mortality. For example, prior hospitalisation is a well-known risk in other screening tools [16, 17]. Similarly, self-reported wellbeing and social connectedness have also been shown to have strong relationships with mortality risk in previous research [18,19,20,21]. As healthcare utilization and self-report wellbeing and social connectedness variables have not been employed in the HRS or ELSA mortality indices, we can assess if their inclusion improves prior prediction efforts. Our results from each of these aims can inform policy preparation and planning of future health service provision, and for epidemiological research.

Methods

Study design and setting

We conducted secondary analysis of longitudinal cohort data. We split the sample 50:50 into derivation and validation; we identified in the derivation sample those factors associated with four-year mortality; we weighted all statistically significant factors according to strength of association and summed these weights so that each TILDA participant had a total index score based on their characteristics at Wave 1. We assessed performance of the index in the validation sample.

Ireland is a country of approximately 4.98 million people in north-western Europe [22]. The Republic of Ireland refers to the 26 counties that are sovereign and separate to the additional six counties (Northern Ireland) that make up the island of Ireland. Compared to other countries in the EU, the Republic of Ireland has a relatively young population [23], meaning that the country will have proportionally larger numbers transitioning into older age and retirement within the next 20–30 years compared to the EU average [24].

Participants and data sources

TILDA is a biennial prospective nationally representative study of older adults residing in the community in the Republic of Ireland. At Wave 1 in 2009–2010, a total of 8,174 people aged 50 + were enrolled. These 8,174 participants comprised the eligible sample for this study, with all of our predictors taken at their Wave 1 enrolment. Full details of the TILDA study design, participant selection and data collection are available elsewhere [25, 26].

All predictors examined in this study are taken from the computer-assisted personal interview (CAPI) conducted face-to-face in participants’ place of residence at Wave 1. Questions used in this study relate to participants’ demographics; cardiovascular-related illness; non-cardiovascular illness; health and lifestyle variables; functional variables; and healthcare utilization.

The outcome of interest was all-cause four-year mortality. All deaths in Ireland are recorded with the General Register Office (GRO), and TILDA data are linked to GRO data to March 2018, in a process detailed elsewhere [27]. Additionally, TILDA may learn of participant deaths after being contacted by a family member or after approaching for an interview. Therefore, for this study we have a mortality file providing full coverage of death dates within Ireland during the study period (via GRO) and additional non-comprehensive information on deaths outside the State (from family members).

Variables and data measurement

We calculated the date 1,461 days (i.e. 365.25 × 4) after each individual’s Wave 1 CAPI was conducted and cross-referenced with the mortality file to give each participant a binary outcome for that date (= 1 if died within 1,461 days; = 0 if alive after 1,461 days). For this study to correspond to the HRS & ELSA mortality indices, the predictors of interest to evaluate mortality risk were analysed across six main categories [14, 15]. Table 1 shows the variables that we used following the Lee et al. (2006) approach: demographics; cardiovascular (CV)-related illness; non-CV diagnosis of serious illness; health and lifestyle variables; functional variables. Table 2 shows additional variables that we used to extend their approach: self-report wellbeing and social connectedness, and healthcare utilization.

Table 1 Descriptive statistics for the development and validation cohorts
Table 2 Descriptive statistics for the additional self-report variables

Bias, missing data and loss to follow-up

With respect to external validity, bias concerns are very low. TILDA Wave 1 was sampled in a sophisticated way to represent the population of interest [28]. With respect to internal validity, the biggest concern was missing values, which can arise if a participant refused to respond to a question or did not know the relevant answer.

In relation to missingness, preliminary checks showed that our included predictors demonstrated either zero missingness or very low missingness (< 0.75%) in the total sample. There were two exceptions to this: drinking alcohol daily (16.9%) and BMI (28.2%). However, both of these variables were excluded after the initial bivariate analyses, as per Step one of our statistical methods (Supplementary Material), due to lack of statistical significance with the outcome. We tested the robustness of our main results to different imputation methods where data were missing. We first dropped the missing subjects and second imputed age-adjusted median values. Results did not substantively change. Loss to follow-up should be low because we have full GRO coverage, but it is possible that we missed some deaths that occurred outside Ireland and that have not become apparent in later waves when we tried to conduct an interview.

Statistical Methods

We followed and then extended the template already established to develop mortality indices in HRS [14] and ELSA [15]. Our statistical methods encompassed 10 steps and are detailed fully in the Supplementary Material. Briefly, we randomly divided the 8,174 participants into development and validation cohorts. In the development sample we checked association between each Table 1 predictor and outcome, first in bivariate regressions, then in multivariable regressions, using Bayesian Information Criteria (BIC) to refine the model. We tested the final model in order to check stability to different approaches to variable selection and then allocated each predictor a number of points where the smallest coefficient was worth one point and all other coefficients were scaled accordingly. In both the development and validation samples, discrimination of the model was evaluated using the area under the receiver operating characteristic (ROC) curves in both cohorts. We then examined the effect of including Table 2 variables as additional predictors in our mortality index, following the same steps. Finally, we compared the performance of our index to that those of Lee et al. (2006) [14] and Kobayashi et al. (2017) [15] using the chi-square test of equality of ROC areas. All statistical analyses were conducted using StataSE 16.1 [29].

Sensitivity analyses

For sensitivity analysis we established two and six-year mortality outcome variables for comparison. We checked sensitivity of results to missing data using different imputation strategies.

Results

Characteristics of the sample

In Wave 1 (2009–2010) of TILDA, 8,174 individuals aged 50 + completed data collection, and we divided these randomly into development (n = 4,121) and validation (n = 4,053) cohorts.

A total of 448 (5.5%) participants died within four years of enrolment, 217 participants in the development cohort (5.2%) and 231 participants in the validation cohort (5.7%). Descriptive characteristics of both cohorts were comparable and can be found in Tables 1 and 2.

Index Development

Following Lee et al. (2006), we first regressed every variable in Table 1 on four-year mortality in bivariate regressions [14]. Associations are shown in Table 3; each predictor is binary so each association represents the estimated odds that a participant died within four years of enrolment if having a value of 1 for the predictor versus having a value of 0 for the predictor. Of the 52 predictors, 42 were significantly associated with mortality risk in bivariate regression.

Table 3 Bivariate analysis of risk factors and 4-year mortality in development cohort

We entered these predictors into a multivariable regression using stepwise backward selection and the model retained 20 significant predictors of mortality. Following stability checks forwards and backwards, testing the stability of the list of predictors on statistical significance and BIC, our model using the Lee/Kobayashi approach included gender, age; diagnoses of heart attack and cancer; being a smoker past age 30; and difficulty walking 100 m, using the toilet and lifting 10lbs (Table 4).

Table 4 Independent risk factors for 4-year mortality in the development cohort (N = 4,121) in the multivariable analysis

We then checked whether this index was further improved by the inclusion of self-report wellbeing and social connectedness variables, and health care utilization predictors that are recorded in the TILDA survey and may be hypothesised as associated with mortality (Table 2). We performed a multivariable regression with the predictors identified using the Lee/Kobayashi approach and each additional predictor in turn, assessing model performance on BIC. Two predictors improved BIC performance: self-reported physical health described as poor, and hospital admission (ED or inpatient) in the last 12 months. When both variables were added together in a 14-variable model, this further improved BIC and all variables were significant at p < 0.05. The 14-predictor model improved discriminatory power in development and validation cohorts (ROC = 0.825 and ROC = 0.783, respectively), compared to the 12-predictor model (ROC = 0.816 and ROC = 0.774).

This model, presented in Table 4, is our final four-year mortality index for the Republic of Ireland. A risk score is calculated for each participant by adding the points for each risk factor present using the point system in Table 4. Within the 14-predictor model, the lowest score available is 0 points and highest is 22 points. A 65-year-old (2 points) male (1 point) who smoked past age 30 (2 points), has cancer (3 points) and has difficulty walking 100 m (1 point), but no other diagnoses or problems listed, would have a risk score of nine points (Table 4).

Model performance, risk stratification and international comparisons

For accurate comparisons, we evaluated the original 12-predictor model within the development and validation cohorts, and used that model to compare performances to that of the Lee et al. (2006) [14] and Kobayashi et al. (2017) [15] indices (Table 5). The point-based model showed excellent discrimination, with a ROC area of 0.816 in the development cohort and 0.774 in the validation cohort. Our model also showed great discrimination compared to the Lee et al. (2006) [14] and Kobayashi et al. (2017) [15] indices (0.784 and 0.779, respectively). Mortality rates ranged from < 1% (3/741; development cohort), 1% (5/720; validation cohort), < 1% (0/372; Lee et al., 2006) and 1% (2/393; Kobayashi et al., 2017) at the 0 point level to 75% (3/4; development cohort), 43% (3/7; validation cohort), 35% (8/23; Lee et al., 2006), and 9% (8/94; Kobayashi et al., 2017) at the ≥ 14 point level.

Table 5 Validation of the prognostic index: comparing the 12-predictor model performance by point score and comparing our index to the performances of the Lee et al. (2006) & Kobayashi et al. (2017) indices

Risk stratification by points and age groups

We evaluated the point scores for three different age groups using the 14-predictor model and found that discrimination power was satisfactory within each group within the development cohort (Fig. 1). The ROC area for age 50–64 was 0.696, age 65–74 was 0.673, and age 75 + was 0.660. We interpreted these results as indicating the 14-predictor index has similar predictive power across the age distribution of our sample.

Fig. 1
figure 1

14-Predictor model of four-year mortality by risk score in differing age groups. Note. 50–64 (N = 2,349), 65–74 = (N = 1,074) & 75 + (N = 697) [See additional file 1]

Discussion

Key results

We developed and validated a predictive index for four-year mortality in Ireland using high-quality longitudinal survey data of community-dwelling people aged over 50. Our aims of replicating and extending prior mortality indices were achieved. The 12-predictor replication model showed good discriminatory power within our development and validation cohorts, and performed well when compared to with the US[14] and UK[15] indices. The final 14-predictor extended model showed only small improvements in discriminatory power within development and validation cohorts (Table 4), and similar accuracy across different age groups (Fig. 1). Our approach followed closely prior mortality prediction efforts in the US [14] and England [15]. Our index structure exhibited a high level of consistency with those, incorporating age, gender, diagnoses of disease and functional limitations. Functional limitations data are critical to the success of these indices because they capture aspects of the severity of disease, supplementing the information in a binary diagnosis variable and thus improving on the predictive power of comorbidity indices. We extended the approach of those prior indices to include additional self-report wellbeing and social connectedness variables and healthcare utilization variables, but these resulted in only a modest improvement in model performance. While multiple studies have shown that health care use and self-reported health are significantly associated with mortality, they do not in our data substantively improve efforts to address the prediction problem. As such our results are consistent with literature elsewhere: improving the specificity of prediction tools likely requires data points that are not routinely collected [30].

Comparing the composition of different indices showed differences with both the US (where BMI, diabetes and lung disease were all risk factors) and with England (where lung disease and lack of physical activity were risk factors). There were also differences in the specific functional limitations that ended up in each model. However, when we evaluated the HRS and ELSA indices in our validation sample, all three indices had very similar performance (Table 5). International generalisability appears high, and the relevance of specific functional limitations appears low provided functional capacity is captured.

Our index may have applications in care, research and policy. In clinical settings, it is a simple guide to discriminating between individuals at low and high risk of mortality. An individual’s prognosis may be important for targeting treatments, particularly those with a long benefits horizon, e.g. cancer screening; for lifestyle advice; and for prompting goals-of-care discussions and advance care directives [9, 31]. In research, prognosis may be an important factor in designing trials and sampling in observational studies [7]. It is also a potentially powerful predictor of outcomes. For example, it is well-established that proximity to death drives health care use but this is seldom controlled for prospectively [32]. In policy, quantifying mortality risk is critical to accurate estimation of clinical benefits and health care costs. For example, efforts to estimate the population-level health effects of the Coronavirus-19 pandemic require detailed risk stratification that goes beyond age and gender adjustment to capture mortality and morbidity [33].

Limitations

Four-year mortality from TILDA baseline was 5.5%, compared to 12% in the HRS study for the same period and 25% for the ten-year mortality outcome studied in ELSA. Given the comprehensive linkage with death data in Ireland in the study period, this difference is most plausibly explained by the TILDA sample being younger and healthier at recruitment; the age profile of the studies reflects Ireland’s young population compared to other high-income countries [23]. This also meant that we had a much smaller number of cases on which to calculate the index. Smaller cell sizes increase the risk of uncertainty of the derived weights, which may harm generalisability. By splitting our sample into derivation and validation and comparing index performance in each, we show that internal validity is strong. By comparing performance of equivalent indices in other countries we show that external validity is strong. While it is possible that more cases in the data would change the weights and so improve the index performance, based on these internal and external comparisons there is no a priori reason to anticipate potential for large additional improvements. TILDA, like HRS, originally sampled community-dwelling adults, so the index will not be applicable in residential care populations. Future maturity of the TILDA sample, and updated GRO linkage, will allow us to investigate questions of sampling and sample size. For example, the timing of disease diagnosis, and the trajectories of lifestyle factors and functional difficulties, will likely be predictive of mortality. As the TILDA study adds more waves, and so investigators can employ some waves prior to baseline and have sufficient follow-up data, such analyses are planned. All included predictors are collected through self-report and interviews and may be exposed to recall error or bias. However, interviewer assistance, use of CAPI and data analysis methods are designed to combat this.

Conclusion

Our model comparisons with the US [14] and UK [15] indices shows that our 12-predictor (original replication) model performed well, and this replication suggests that generalisability is high across countries. Our 14-predictor (extended) model showed modest improvements compared to the 12-predictor model, indicating that their statistical utility is similar.

Our final 14-variable index offers a potentially useful tool that can predict four-year mortality in older community-dwelling adults in the Republic of Ireland. It can be delivered during patient interactions without the need for a full clinical history and utilised to develop care strategies. It can also serve as an instrument for future epidemiological research and policy and be used as a comparator tool for international populations.