Introduction

Older populations experience increasing longevity and longer old age. At the same time, a large variation in length of life exists among individuals of the same age and sex [1]. Factors that predict survival and longevity are an ongoing area of interest for epidemiologists, gerontologists, and scholars in other fields of social and health sciences. Such factors also likely drive the future development of length of life span in the population.

Since Erdman Palmore´s [2, 3] early work, evidence has continued to accumulate on the associations of medical, biological, and other health indicators, social and psychological factors, and demographic characteristics with survival [4,5,6,7]. The importance of predictors varies according to the age and other characteristics of the population under study, the length of the follow-up, the methods of analysis, the number and details of the variables, and the combinations of variables included in the final models of prediction. Overall, research indicates that the most significant predictors of mortality are multi-morbidity [8], cardiovascular disease [9], functional ability [10,11,12], self-rated health [1, 13], and cognitive ability [14]. Male sex and lower socioeconomic status are often mentioned as well [15], but the role of these distal indicators varies depending on the coverage and quality of the more proximal indicators. The relative predictive power of mortality risk factors seems to decline toward extremely high age [1, 16], possibly due to a ceiling effect, that is, very high baseline mortality [17], and because with increasing age death is the end result of several contributing factors and therefore “stochastic” by nature [1].

Most studies are aimed at identifying the most important individual predictors, and little is known on to what extent the variance in the long term of mortality/survival can be predicted by combinations of individual characteristics measured [18]. In this study, we investigated the individual and combined predictors of mortality/survival during a long follow-up time in an older population. By doing that, our aim is to investigate the predictors of longevity of an individual advantage compared to the life expectancy of the respective cohort.

We used individual measures of longevity and related them to actuarial life expectancies of each age cohort in the study. Different from the Cox regression model, the most frequently used method in survival/mortality studies, this method allows the calculation of the variance in mortality/survival explained by the selected variables. The method is also better suited for the study of longevity in samples with a wide age range [19]. Deeg et al.[18] and Rutherford et al. [20] suggest that to improve accuracy in longevity predictions, individual-based measures of survival time will be more sensitive to capturing heterogeneity. Further, to maximize predictive value, it is necessary to have a large enough non-selective sample and a long enough follow-up during which the majority of the sample has died; to know the exact length of survival time; and to have a wide range of potential predictors.

In this study, the aim is (1) to identify predictors of longevity in a population-based sample of people aged 60 + at baseline, and (2) to establish the joint predictive value of the combined set of predictors to investigate, what proportion of variance in survival/mortality can be explained by this combination. Our data include potential predictors from various domains describing health, functioning, social conditions and social activity measured at the baseline. We have available a relatively large population-based sample and an exceptionally long follow-up period up to 35 years. As a cross-validation, we use two different longevity measures that are based on comparisons of each individual’s survival with age- and sex-specific survival data drawn from population tables.

Methods

Data sources

The data for our study came from the Tampere Longitudinal Study on Aging (TamELSA) in Finland [21]. TamELSA was undertaken as part of the Eleven Countries Study initiated by the WHO European Office in 1979 [22, 23]. The study area, Tampere, is the largest mainland city of Finland, with population of 165 000 in 1979 (ca 244 000 in 2021). At the time of the data collection, the percentage of individuals aged 60 and older was slightly higher (17.0%) than in the country as a whole (16.2%). The ethnic and cultural backgrounds were largely similar and homogeneous. The data were collected using structured questionnaires in face-to-face interviews. The 1979 sample consisted of 1,059 individuals aged 60–89 years, 49.8% male, with an 81% response rate. In 1989, a new cohort of 395 people aged 60–69 years (52% male with a 76% response rate) was added [21].

The samples were combined to increase the number of the participants and subsequently the statistical power; the total sample comprised 1,454 individuals Just over half, 50.8% of them were men and 51% were younger individuals aged 60–69 years.

Measures of longevity

Dates of death were obtained from the Finnish Population Register. Vital status was ascertained up to 1 January 2015. Four participants were excluded from the analysis since the information about their vital status were not available (1450 individuals). By the end of study, 1338 participants died and 7.7% of participants were still alive. The maximum length of follow-up was 35 years.

We used two individual measures of longevity calculated according to each participant’s age and sex and based on the population life table. These measures allow the use of linear regression to provide the variance explained by potential predictors of longevity.

The realized probability of dying (RPD) is a relative measure of longevity that is based on comparison of the survival time of each individual of a specific age and sex with the survival time of his/her peers in the total population [18, 19, 24]. An individual’s time of death is compared with the cohort survival curve for the general population of the same age and sex, calculated from the study’s baseline onwards. An individual’s RPD is expressed as the proportion of the pertinent cohort still alive at the time of death of the individual. Values are between 0 and 1, with higher values indicating shorter survival. For example, if at the time of death of an individual 90% of his/her cohort is still alive, the value of an individual’s RPD is 0.90.

For those ca 8% who were still alive at the end of the study, the RPD value was imputed by multiplying the probability of survival in 2014 by 0.50. This multiplying factor is the expected value of RPD in the population alive at any moment in time (median population survival time).

The longevity difference (LD) is an absolute measure of longevity that is calculated as the difference between the number of years an individual survived after baseline and the actuarial life expectancy (LE) in his/her year of death, based on their sex and age [25]. Higher values indicate longer survival. For example, if a male individual at the age of 65 in the year 1979 died 15 years later, the life expectancy for 65 years old male being 12.67 at the year of his death, LD for him will be 2.33.

If a participant was alive at the end of the follow-up, the LD for this subject was estimated as the sum of the number of years from baseline and the LE based on the subject’s age and sex in 2014. The LD ranged between − 19.4 and 18.0 years.

The distribution of LD was near-normal, while the distribution of RPD was near-uniform. To normalize the distribution of RPD, log-transformation was used with the formula log [RPD/(1—RPD)]. This variable (LRPD) was used in the analyses.

The life tables for the Finnish population were obtained from the Human Mortality Database (HMD) [26] for the years 1979–1985, and from Official Statistics Finland [27] (OSF) for the period from 1986 to the end of 2014. HMD was used as data for those years were not available as electronic files from the OSC; yet all the data at HMD also originally comes from the OSC. for the period from 1986 to the end of 2014.

Predictors of longevity

Potential predictors were categorized into five domains (Table 1).

Table 1 Percentages of the categories of the variables, and associations of potential individual predictors with longevity difference (LD) and log-transformed realized probability of dying (LRPD); linear regressions, unstandardized coefficients (B) and p values

Sociodemographic characteristics included age, sex, marital status, and years in full-time education.

Health and functioning were described by the following variables: Self-reported diseases were coded into 17 categories according to the Finnish Edition of the International Classification of Diseases 1975 Revision (ICD-9), with the exception that cardiovascular diseases were divided into three subgroups: hypertonia, ischemic heart diseases, and other cardiovascular diseases.

Activities of daily living (ADL), mobility, and demanding activities were measured based on questions with four response options: (a) can do without difficulty, (b) can do with difficulty but without help, (c) can do only with help, and (d) cannot do. ADL was measured as ability to get in and out of bed, wash and bath, use the lavatory, dress and undress, and feed oneself. It was scored on a scale of 0–15, a lower score indicating poorer ADL functioning. Mobility was measured as ability to (a) move outdoors, (b) walk between rooms, (c) use stairs, (d) walk at least 400 m, and (e) carry a heavy bag of 5 kg for 100 m. It was scored on a scale of 0–15. Assessments of demanding activities comprised the ability to cut toenails, cook, do light housework, and do heavy housework. The variable scores ranged from 0 to 12.

Self-rated health was reported in five categories from very good to poor and was scored as (0) poor, (1) fairly poor, (2) average, (3) fairly good, (4) very good.

Worsening of memory and low spirits or depression were inquired using the preset response options (a) no, (b) yes, occasionally, (c) yes, often, and (d) yes, nearly continuously. Each predictor was scored on a scale of 0–3.

Regular physical exercise was coded as yes or no.

Number of years of regular smoking was considered as a continuous variable (0 as never-smoker).

Social activity

Social participation was measured by the number of engagements in social activities during the past 12 months. The activities specified were (a) family ceremonies, parties, weddings, and funerals, (b) theater shows, movies, concerts, and art exhibitions, (c) visits to clubs or societies, (d) library, (e) sport competitions, either watching or taking part, (f) religious service, g) traveling to foreign countries or in home country. Social participation was scored on a scale of 0–51.

Social contacts were measured as time since last visit received and last visit paid. Both had seven options from today or yesterday to more than six months ago, and both were scored from 0 to 6.

Helping to raise grandchildren and having good friends were both coded as yes or no.

Subjective experiences

Questions concerning unwillingness to do things or lack of energy and tiredness or feeling of faintness had four response options from no to yes; occasionally, often, and continuously, and were scored on a scale of 0–3.

Feeling forgotten, feeling unnecessary, feeling tired of life, and feeling lonely were coded as (0) often, (1) sometimes, (2) never. These variables were on a scale from 0 to 2.

Satisfaction with present life was coded as (4) very satisfied, (3) satisfied, (2) reasonably satisfied (1), unsatisfied, and (0) very unsatisfied. The variable was scored on a scale of 0–4.

Satisfaction with human relationships was coded as satisfied (1) and unsatisfied (0). The variable was scored as 0–1. Satisfaction with personal financial situation was coded as poor (0), satisfactory (1), and good (2).

Pain in joints or back trouble were coded as (0) yes, nearly continuously, (1) yes, often, 2) yes, occasionally, and (3) no, and scored 0–3.

Living conditions

Living conditions included questions on having a washing machine, telephone, freezer, and refrigerator, and having the use of a car; responses were coded as yes or no.

Being alone was coded as often, rarely, or never, ranging from 0 to 2.

Analysis

Age, number of years smoked regularly, years in full-time education, and the score of social activity were available as continuous variables. Nominal and ordinal variables were treated as continuous variables, with the exception of sex and marital status.

An effect size calculation with a power of 0.8 was conducted. Given the sample size of 1450, the study was sufficiently powered for an effect size that was 0.07 or larger to be meaningful. The analyses for both outcome variables were conducted in three steps to maximize the variance explained and to minimize the number of variables included in the final model.

First, bivariate associations of each potential predictor with the outcomes were analyzed (Table 1) and variables were identified as potential predictors if they presented a significant association (p ≤ 0.20) with LD or LRPD, respectively.

Second, the variables that showed a significant association with the outcome, 39 variables for LD and 32 variables for LRPD, were included in multivariate analysis within each domain. Backward linear regression was performed for LD and LRPD separately to determine the variance explained by each domain. Predictors from each domain that were significant at p ≤ 0.20 were retained to the next step.

Third, full linear regression models were performed with 14 variables for LD and 18 variables for LRPD, respectively, to determine which variables retained predictive power when predictors from other domains were included. At this step, a backward linear regression was performed for both outcome variables to identify the most parsimonious models.

Multicollinearity was not detected when calculating the tolerance and variance inflation factor (VIF) for each variable in both final models (tolerances were > 0.1 and VIFs < 10). The analyses were conducted using SPSS 25.

Results

LD and RPD were found to be strongly negatively correlated (Pearson correlation r = − 0.96). The mean values of LD, RPD, and LRPD were 0.42, 0.44, and − 0.39, respectively. Comparison of these values with their theoretical means of 0.50 (RPD) and 0.00 (LRPD and LD) implies that the sample is slightly healthier than the general Finnish population in these age groups.

Most individual variables were associated with one or both longevity measures, the notable exceptions being some disease categories and most variables describing living conditions (Table 1). Although both LD and LRPD were defined based on age and sex, both were significantly associated with age, and LRPD also with sex in the first step.

The variance explained by domain (Table 2) ranged from 0.7% (living conditions) to 9% health and functioning) for LD and from 1.7% (social activity) to 9.1% (health and functioning) for LRPD. No variable in the domain of living conditions was significantly associated with LRPD and therefore this domain was not included in the multivariate analysis.

Table 2 Associations of potential predictors with longevity differences (LD) and log-transformed realized probability of dying (LRPD) by domain and variance explained by each domain

In the full linear regression model (Table 3), age, marital status, mobility, self-rated health, years smoked regularly, endocrine and metabolic diseases, respiratory diseases, and unwillingness to do things or lack of energy were included as predictors of both outcomes. For LD, satisfaction with personal financial situation and having the use of a car and for LRPD, neoplasms and social activities also remained in the final model. The total variance explained (R2) was 13.8% for LD and 14.1% for LRPD.

Table 3 Final models of predictors of longevity differences (LD) and log-transformed realized probability of dying (LRPD)

Discussion

In this study, we asked, whether it is possible to predict who will live longer than their age peers. Numerous earlier studies have identified factors associated with mortality/survival, but almost all of them have focused on strength of individual risk factors in the sample studied. Our approach is different; we focus on individual longevity advantage of individuals compared to actuarial life expectancies, and aim at establishing the joint predictive value of a broad set of measures. to maximize the validity of our findings, we used both the longevity difference (LD) and the realized probability of dying (RPD) as outcome measures.

Our findings showed that many of the health and social indicators used were able to identify increased likelihood of longer or shorter longevity several decades ahead. Yet our final models could explain only less than 15% of the variance in longevity.

With a couple of exceptions, the individual indicators that remained significant in the final multivariate models were the same for both outcomes. Endocrine and metabolic diseases (most importantly diabetes), respiratory diseases, and neoplasms are known to be strong predictors of mortality [28, 29]. Self-rated health was retained with borderline significance in both models, which is consistent with several earlier studies with shorter and longer follow-ups [30, 31]. It was also a predictor of LD in the early studies by Palmore [2, 25]. Smoking and other than married marital status are also well-known predictors of mortality. A novel predictor, unwillingness to do things or lack of energy likely implies depressive feelings. Having the use of a car, another novel predictor of longevity, likely reflects both functional ability and, for older individuals in the 1970s and 1980s, socioeconomic position as well.

It is notable that several indicators of sociodemographic position, subjective experiences, and social activity, which as individual variables were strongly associated with length of live, lost their significance in multivariate models. These factors can be understood as distal predictors of mortality that impact the likelihood of mortality through their associations with health, disease and functioning. When measures describing them were included in the models, the independent association of these distal predictors disappeared.

Although both our longevity measures were based on individuals’ age and sex, which should eliminate their complicating effects at the entrance of the study [2, 18, 25], age was retained in both models. Deeg et al.[18, 19] explained a similar unexpected finding as the result of the way of imputing RPD and LD for individuals alive at the end of the follow-up. However, in the current study, only about 8% of the participants were alive at the end of the follow-up. The persistent positive association of age with LD and the negative association of age with LRPD likely reflect a bias toward an increasingly healthier selection of the initial sample at higher ages. Maintaining age in the final multivariate prediction models should eliminate such bias, however.

In spite of the wide range of measures representing different domains of life and the significant associations of several individual indicators with mortality, the total predictive value of the final models was rather low, 13.8% for LD and 14.1% for RPD. The variance of longevity remains mostly unexplained. Similar results have been reported in the few other studies using the same outcomes. For example, Deeg et al. [18] were able to explain only 25% of the variance of longevity in their 24-year follow-up study using measures from several domains of life. The low predictive values may be due to changes in health status after the baseline measurements. It is known that major chronic diseases that also are leading causes of death, such as cardiovascular diseases, cancer and dementia, are highly age-dependent and their incidence increases steeply with advancing age. These conditions, again, are known to be associated with factors measured in our study such as education, a significant predictor as an individual variable and indirectly reflected in the final model by satisfaction with financial situation and having the use of a car, and years of smoking. Functional disability, a known predictor of mortality, is also strongly associated with age and increases in frequency with every added year of life even at very old age [32]. Therefore, the major conditions predicting and leading to death were likely to appear during our very long follow-up but only after the baseline measurements. Regardless, the fact that we needed to estimate future longevity for only 8% of our sample who were still alive at the end of the follow-up, is a strong aspect of our study.

In this study, we investigated predictors of individual longevity using an exceptionally long follow-up of up to 35 years. We used a population-based sample, and almost all participants died by the end of the follow-up. We had access to reliable information on dates of death. One strength of our study was that we had access to a wide range of baseline variables that represented different domains, describing living conditions, socioeconomic position, social activity, subjective experiences, and health and functioning. Yet our models were able to explain only a small proportion of the probability of dying. Could we have achieved greater predictive value with better, particularly biological variables? In their 24-year follow-up, Deeg et al. [18] had information on ApoE gene and eight commonly used blood measurements, including creatinine, CRP and serum albumin. Together, these factors added only 3.7% to the total predictive value of the model. It is plausible to conclude that with a follow-up period of several decades, it is difficult to predict individual survival time regardless of which baseline factors are considered, as social circumstances change with time, health conditions and their severity increase, and furthermore there is a stochastic element in the process of aging and in death[11, 33].

Conclusion

There are two different messages from our study. The regular, robust associations with factors, such as social position, functional status, and smoking, confirm earlier findings in that tackling inequality and promoting healthy lifestyles are likely to increase longevity in the population. But who lives longer than their age peers and exactly how long will individuals live—that is much more difficult to predict.