Background

Over the past 20 years, there has been a great intensity of research into tools and methods available to quantify the health trajectories of older adults and, more importantly, classify their vulnerability to adverse outcomes. The concept of “biological age,” which increases with accumulated damage (or wear and tear) caused by both acute and chronic environmental and pathological stressors, as opposed to “chronological age,” which is simply the passage of time, has become central to this paradigm [1, 2]. The frailty index, which attempts to operationalize this increasing state of vulnerability by adding up a variety of pathological “deficits” across multiple biological and physiological systems [3], has been shown not only to fulfill theoretical assumptions on how damage within a complex biological network might accumulate [4], but also to reliably predict numerous age-related outcomes such as cardiovascular disease risk [5], depression [6], post-operative recovery [7] and death [8, 9]. However, the frailty index has also been criticized with regard to the length of time required to implement, the vast number of deficits required to achieve a reliable score, the use of inherently biased self-reported data and how deficits are treated when deriving a score (i.e., unweighted and often dichotomously) [10].

Another approach, which has received much attention recently, is quantifying biological age using standardized laboratory or other routine clinical measures. Essentially, an increased biological age results from exhibiting levels that are commonly observed for individuals older than one’s self, as in the Klemera–Doubal approach [11], or from exhibiting levels that depart from population averages or established clinical thresholds, as in the homeostatic dysregulation [12] or FI-lab [13] approaches, respectively. Similar to the former is the use of genome-wide DNA (i.e., CpG) methylation data to train mathematical algorithms that attempt to accurately estimate one’s age, under the assumption that age-related changes in the proportion of methylation at a given locus are due to a natural chronological process (i.e., clock) and/or the response to damage that accrues over time [14]. Two of the earliest and more popular examples of this include Horvath’s 353-CpG clock [15] and Hannum’s 71-CpG clock [16], which have been shown to correlate with established health-related risk factors [17, 18] as well as predict all-cause and disease-specific mortality [19]. Newer “second-generation” clocks, such as the Dunedin Pace of Aging Methylation [20], PhenoAge [21] and GrimAge [22], which are trained using a combination of chronological age, health-related risk factors or mortality, have proven to be additionally sensitive to detecting adverse outcomes, namely cardiopulmonary and metabolic disease [22, 23], cancer [24] and death [21, 22].

Surprisingly, only a handful of cross-sectional studies have investigated the association between epigenetically determined age and frailty, most of which only including clocks trained on age [25,26,27,28,29]. While the vast majority of these studies confirm that prevalent frailty is strongly associated with epigenetic age, important questions remain unanswered: First, given that frailty, by definition, is a syndrome encompassing age-related defects in multiple biological and physiological systems that indicate vulnerability to adverse outcomes, would it more strongly associate with clocks that are trained solely on chronological age or health-related risk factors and mortality or a combination of the two? Second, what is the capacity of these contextually different clocks in predicting the change in frailty over time? In the following study, we aimed to investigate the relationships of eight epigenetic clocks with frailty, both cross-sectionally and longitudinally, using baseline and 3-year follow-up data from the Canadian Longitudinal Study on Aging. These clocks were divided into two groups, those trained solely on chronological age and those trained on combination of age, phenotypic markers of health and mortality. The former included clocks developed by Horvath [15], Hannum [16] and Lin [30] as well as a clock by Yang [31], developed to estimate cellular turnover. The latter included four: (1) the Dunedin Pace of Aging Methylation [20], trained on a longitudinal change score of phenotypic and biological measures in relatively young adults; (2) GrimAge [22], based on DNA methylation scores for plasma biomarkers and smoking pack-year exposure, and trained to predict time-to-death; (3) PhenoAge [21], trained on a phenotypic age score known to predict all-cause mortality; and (4) a clock developed by Zhang [32], trained to predict all-cause mortality. We hypothesized that clocks trained on phenotypic markers of health or mortality would best predict changes in the frailty index over time, as compared to clocks trained on chronological age.

Results

Summary of cohort sociodemographics and frailty

As summarized in Table 1, the average age of our cohort was 62 years, half were women, and most had post-secondary education, consumed two or more servings of fruits and vegetables per day or reported a total household income of $50,000 or more. Further, nearly half were never smokers and the average physical activity score was 139, which is similar to that previously reported for community-dwelling older adults [33]. At both baseline and at 3-year follow-up, the distribution of the frailty index for the entire cohort was nearly identical: 0.141 ± 0.075 (median [min/max] = 0.127 [0.0132, 0.548]) and 0.142 ± 0.077 (0.129 [0.004, 0.543]), respectively.

Table 1 Descriptive summary of participants in the current study, stratified by change in frailty

For those participants that provided data at both time points (n = 1264), an examination of the change in frailty over 3 years (i.e., frailty at follow-up minus baseline) indicated that frailty both increased and decreased (Fig. 1). The average change in frailty was just above zero, 0.006 ± 0.044, and roughly half (54%) of participants exhibited an increase over time; while the proportion was similar between women and men, the average change was greater in women (0.008 ± 0.044 vs. 0.005 ± 0.044, respectively). In terms of a clinically meaningful difference (CMD) in frailty, previously denoted as a change of 0.03 or greater [34, 35], 25% of participants exhibited at least a CMD increase at follow-up, 11% exhibited twice that, and 3.6% exhibited three times the CMD. Alternatively, 18% of participants exhibited at least a CMD decrease at follow-up, and 5% exhibited twice that.

Fig. 1
figure 1

Summary of the change in frailty from baseline to 3-year follow-up. The change in frailty was calculated as the follow-up value minus the baseline value for all participants who provided follow-up data (n = 1264). The mean and standard deviation, and minimum and maximum change are shown, along with the number and frequency of participants that exhibited greater than one, two or three times the clinically meaningful difference (CMD) in frailty (i.e., 0.03; also shown as vertical blue lines). The vertical red line shows no difference

Characterization of the epigenetic clock measures

Eight different epigenetic clocks were calculated for the present cohort, many of which differ substantially with regard to the units they are presented in and the variance that they exhibit (Fig. 2; Additional file 1: Table S1). Nonetheless, each correlated significantly (nominal p < 0.001) with chronological age, the strongest of which is GrimAge (r = 0.90), followed by Horvath [15] (0.87), Hannum [16] (0.86), Lin [30] (0.82), PhenoAge (0.82), Zhang [32] (0.34), Yang [31] (0.30) and Dunedin PoAm (0.21). As expected, delta age estimates for each clock tended to approximate zero and varied between 3 and 5 SDs on either side of the mean. Although the delta age estimate for all clocks was significantly correlated with one another (nominal p < 0.001), those clocks that incorporated chronological age during training (i.e., Hannum [16], Horvath [15], Lin [30] and PhenoAge) tended to correlate strongest, as did those clocks that specifically trained on mortality or pace of aging (i.e., Dunedin PoAm, GrimAge, PhenoAge and Zhang [32]); Hannum [16] was unique in that it correlated relatively well with all other clocks, and Dunedin PoAm and Yang [31] was the only pair to be inversely correlated (Additional file 1: Fig. S1).

Fig. 2
figure 2

Summary of epigenetic clock measures. In each plot, a respective epigenetic clock estimate (y-axis) relative to chronological age (x-axis) is presented, along with an inserted table describing the mean (standard deviation) and minimum/maximum for the corresponding delta age estimate. Also shown in each table is the correlation (r) between the epigenetic clock estimate and chronological age

Similarly shown in recent work by Crimmins and colleagues [36], phenotype- and/or mortality-trained clocks tended to exhibit the strongest associations with sociodemographic and lifestyle factors, the highest being GrimAge, followed by Dunedin PoAm, Zhang [32] and PhenoAge (Additional file 1: Fig. S2). All delta age measures were significantly lower in females as compared to males, with exception to Yang 2016.

Associations between delta age estimates and frailty

We first measured the association of each epigenetic clock delta age estimate with frailty at baseline in separate models, adjusting for age and sex (model 1), or age, sex and other sociodemographic factors (model 2), for all participants who provided baseline data (n = 1446) (Fig. 3a; Additional file 1: Table S2). In age- and sex-adjusted models, those clocks that were trained on mortality and pace of aging exhibited the strongest associations, led by GrimAge, where frailty increased 0.020 (i.e., 27% of the SD in frailty at baseline for the entire sample) for each 1-SD change in ΔGrimAge (95% CI 0.015, 0.024); the estimates for ΔDunedin PoAm, ΔPhenoAge and ΔZhang 2017 were approximately half of that (standardized beta (95% CI): 0.013 [0.009, 0.017], 0.011 [0.007, 0.015] and 0.010 [0.006, 0.014], respectively). The clocks trained on chronological age and the mitotic clock exhibited weaker associations, and only ΔHannum 2013 (0.0057 [0.0019, 0.0095]) and ΔHorvath 2013 (0.0055 [0.0017, 0.0094]) were significant. When additionally adjusted for sociodemographics, the patterns of estimates remained the same, but were weaker in nearly every case. All mortality-trained clocks and Dunedin PoAm remained significantly associated with baseline frailty, while ΔHannum 2013 and ΔHorvath 2013 failed to retain significance.

Fig. 3
figure 3

Associations between frailty and different epigenetic clock measures. Frailty at a baseline and b after 3-year follow-up was regressed on standardized delta age estimates using gamma regression, each of which is in separate models. For both panels, model 1 represents estimates adjusted for age and sex, while model 2 represents estimates adjusted for age, sex, education, income, smoking, diet and physical activity; for b, both models were also adjusted for frailty at baseline. Beta coefficients and 95% confidence intervals (CI) are shown, and the dotted red line indicates no association

Table 2 A description of the methodology used to derive each epigenetic clock employed in the current study

To measure the association between delta age estimates and frailty at 3-year follow-up, we used the aforementioned modeling strategy and additionally adjusted for frailty at baseline for all participants that provided data at both time points (n = 1246) (Fig. 3b; Additional file 1: Table S2). As with the analysis of frailty at baseline, in age- and sex-adjusted models ΔGrimAge exhibited the strongest association with frailty at follow-up, where for every 1-SD change in ΔGrimAge frailty changed 0.003 (95% CI 0.00068, 0.00541), or approximately 7% of the SD in change in frailty for the entire sample. However, associations with ΔHannum 2013 were nearly as strong (standardized beta (95% CI): 0.0028 [0.00075, 0.00476]) and remained significant in fully adjusted models (0.0022 [0.00006, 0.00426]). We also tested whether the delta age estimates were associated with the likelihood of a CMD increase in frailty at follow-up, but only ΔGrimAge was statistically significant: in age- and sex-adjusted models, for every 1-SD increase, the odds increased by 1.22 times (95% CI 1.09, 1.37), and in models additionally adjusted for sociodemographics, the odds increased by 1.26 times (95% CI 1.09, 1.46) (Additional file 1: Table S2).

Discussion

The primary goal of the current study was to evaluate a series of conceptually diverse epigenetic clock measures with regard to their association to frailty and its change over 3 years. While the frailty index is an excellent predictor of adverse health outcomes in a variety of settings [5,6,7,8,9], it has also been criticized for being cumbersome and inherently biased [10]; hence, identifying standardized molecular measures that are indicative of its change, especially over relatively short intervals, would be of certain value. In our sample of the CLSA, the change in frailty over 3 years was normally distributed, on average increasing about 20% (i.e., 0.006) of what has been previously described as a clinically meaningful difference (i.e., 0.03) [34, 35]. After adjusting for the minimum age at recruitment, this is similar to what has previously been reported for community samples in the USA, Canada and Europe [37,38,39,40].

As hypothesized, clocks that trained on phenotypic markers and/or mortality (i.e., GrimAge, Dunedin PoAm, PhenoAge and Zhang [32]) were most strongly associated with prevalent frailty; this is supported by recently published work [41]. Of those, GrimAge exhibited the strongest association, which is not surprising, as it exhibits robust associations with healthspan and lifespan [22] and specifically incorporates DNA methylation loci that correlate with a number of frailty-related plasma biomarkers, such as leptin [42], TIMP-1 [43], beta-2 microglobulin [44] and cystatin C [45]. Given this, it is also not surprising that GrimAge was significantly associated with the change in frailty over 3 years, which was not observed for the other mortality-trained clocks. It would appear that the unique combination of chronological age and DNA methylation scores of relevant plasma proteins and smoking pack-years provides GrimAge additional sensitivity to detect changes in health that other phenotype or mortality-trained clocks are not afforded.

Among the clocks that were significantly associated with prevalent frailty, Dunedin PoAm was the second highest in magnitude. This is particularly interesting as it attempts to quantify the rate (or pace) at which physiological and phenotypic health-related biomarkers change with age, instead of their levels relative to the population mean or risk of death. The relatively strong association is warranted, especially since our frailty index is predominantly composed of health-related conditions that influence the levels of many of the 18 biomarkers that are part of the Dunedin PoAm, and in a similar direction as chronological age; for example, FEV1 decreases with age and numerous cardiopulmonary disorders [46], while C-reactive protein (CRP) [47] and mean arterial pressure [48] both increase with age and depressive symptoms. Since many of these biomarkers have also been shown to be related to the incidence of frailty-related chronic conditions, it is unclear why the Dunedin PoAm was not significantly associated with the change in frailty. This may have to do with the fact that this clock was trained on the rate of biomarker change in relatively young adults, and may not reflect the “damage” that occurs later in life, which influences the breakdown of biological networks and ultimately determines the trajectory of frailty [49]. Interestingly, the only other clock to be associated with change in frailty was Hannum [16], the coefficient for which was nearly as strong as GrimAge. Like GrimAge [23, 50], Hannum [16] has been shown to be correlated with levels of the chronic inflammatory marker CRP [51, 52], which is significantly related to both prevalent [53] and incident [54, 55] frailty. Another age-trained clock we studied, Horvath [15], was not found to be significantly related to CRP in the same studies as Hannum [16, 51, 52] and was not associated with change in frailty in the current study.

Our study featured both strengths and limitations. Strengths included a relatively large sample of participants derived from the population-based Canadian Longitudinal Study on Aging, from which we derived eight conceptually diverse epigenetic clocks and a comprehensive frailty index based on 76 deficits related to chronic conditions, well-being and physical/cognitive functioning. Furthermore, we were able to investigate frailty longitudinally, which is not common in the literature. Unfortunately, the time between frailty measures was only 3 years, which may not be as reliable a time point to accurately estimate the true trajectory of frailty.

Conclusions

In summary, we have shown that epigenetic clocks trained on phenotypic markers of health and aging and/or mortality are most strongly associated with prevalent frailty. GrimAge and Hannum [16] were the only clocks to be associated with both baseline and change in frailty, suggesting that they may be most effective at predicting health trajectories of older adults and detecting beneficial effects of healthy aging interventions.

Methods

Cohort description

This study was an analysis of data from the Canadian Longitudinal Study on Aging (CLSA) baseline (2012–2015) and first follow-up (2015–2018) collection; the CLSA study design and methods have been previously described [56]. Specifically, it was based on the CLSA comprehensive cohort (baseline dataset version 4.1; follow-up dataset version 3.0), which includes 30,097 community-dwelling adults aged 45–86 years at recruitment who provided questionnaire data through in-home interviews and provided additional physical and cognitive assessment data at one of 11 data collection sites nationwide. Within this cohort, a random pool of 10,000 participants was drawn and extensive laboratory measures, including clinical chemistry and genetics, were performed on cryopreserved blood. From this pool, 1479 participants were randomly selected for DNA methylation analysis on their baseline biospecimen. This study was approved by the Health Sciences North Research Ethics Board (#20-030).

DNA methylation analysis and description of the final sample

The proportion of methylation on cytosine–guanine (CpG) nucleotide pairs was measured using the Infinium MethylationEPIC BeadChip platform (Illumina, CA, USA) on DNA extracted from peripheral blood mononuclear cells (PBMCs); a summary of this work and the preparation of DNA methylation data can be found at: https://www.clsa-elcv.ca/doc/3491. Briefly, blood was drawn into CPT vacutainers (BD Biosciences, NJ, USA), after which PBMCs were isolated, resuspended in PBS and cryopreserved in vapor-phase liquid nitrogen. From this, DNA was extracted by QIAsymphony nucleic acid extraction platform using DNA midi kits (Qiagen, Hilden, Germany) and bisulfite-treated using the EZ DNA methylation kit (Zymo, CA, USA). Measurement of CpG methylation on converted DNA samples by MethylationEPIC array was performed according to manufacturer’s recommendations. At each step in this process (i.e., DNA extraction, bisulfite conversion and array hybridization and analysis), participant samples were batch-randomized. After the acquisition of raw data, probe-level QC was first performed using functions from the R package “minfi” [57]: The median log intensity of methylated and unmethylated channels was checked using “getQC” and exceeded the recommended threshold of 10.5 for all arrays, while the average probe detection p-value (i.e., methylated and unmethylated signals tested against background) for each array, assessed using the “detectionP” function, was at least 0.005. Array-level QC found that of the 1479 samples initially included for DNA methylation analysis, 4 were removed due to poor bisulfite conversion (i.e., < 85%), while another 29 were flagged by built-in outlier detection functions in the R packages “wateRmelon” [58] and “lumi” [59]. Hence, the final sample included 1446 participants. Of those, 1320 provided data at follow-up, while the remaining 126 participants either withdrew from the study (n = 53) or died (n = 24) prior to providing data, or data were not available for another reason (n = 49).

Derivation of estimates from published epigenetic clocks

Eight epigenetic clocks were chosen based on the phenomenon or outcome they were originally designed to estimate or predict; they are labeled using the name that they are commonly referred to or by the lead author and year of the study in which they were initially published. Horvath [15], Hannum [16] and Lin [30] were trained on chronological age, and therefore, use DNA methylation in order to estimate one’s age. Yang [31] (also known as epiTOC) was also trained on chronological age, but only at CpG sites associated with Polycomb group targets that were also constitutively unmethylated in fetal tissues; based on these criteria, the authors argued that this clock should be highly related to mitotic-like processes. Dunedin Pace of Aging Methylation (PoAm) [20] was trained on a longitudinal change score of 18 biomarkers in adults between the ages of 26 and 38 years. GrimAge [22] was developed using a two-stage process in which DNA methylation scores for 12 age-related plasma biomarkers and smoking pack-year exposure were first identified and then trained to predict time-to-death. PhenoAge [21] was trained on a phenotypic age score based on nine clinically relevant blood biomarkers and chronological age that predicted all-cause mortality, while Zhang [32] was trained on all-cause mortality. Each epigenetic clock was derived using either published software or weights and beta values normalized according to the method that would best recapitulate the authors’ original findings; this, along with the respective number of CpGs used in current study, is found in Table 2. The units for each clock are as follows: Yang 2016 estimates are presented as “pctgAge,” the average methylation level across the sites comprising epiTOC, Dunedin PoAm as years of physiological decline occurring per 12 months of calendar time and Zhang [32] as arbitrary units; all remaining clocks are presented as years. For all clocks, delta age values represent the residual of each respective clock estimate regressed on chronological age.

Outcomes

Frailty at baseline and 3-year follow-up was estimated using the frailty index approach [3], specifically, 76 deficits related to chronic conditions, activities of daily living, depression, perceptions of health, satisfaction with life, body mass and social participation, as per previous work [60, 61] (Additional file 1: Table S3). It is calculated as the proportion of deficits present relative to the total sum of deficits considered, ranging from 0 to 1, and is gamma distributed [3, 62]; hence, increasing values represent worse health and greater risk of adverse outcomes. As an example, a person reporting ten deficits would exhibit a frailty index of 0.131 (i.e., 10 divided by 76). Frailty was defined as missing for any participant missing more than seven deficit variables (~ 10%).

Covariates

The following variables were included in regression analysis given that we have previously demonstrated their association to frailty in older adults [60]: age, sex, education, income, smoking, physical activity and diet. Ethnicity was not considered given that only 6% of participants reported being a racial group other than white and even so, only slight differences in the demographic makeup and distribution of epigenetic clock measures were observed between groups (Additional file 1: Table S4). Education was categorized as less than, at least or greater than secondary education. Total household income was defined as annual earnings of less than $20,000, $20,000–50,000, $50,000–100,000 and more than $100,000. Smoking was defined as never (have not smoked 100 cigarettes in their lifetime), former (have smoked at least 100 cigarettes, but have not smoked in the past 30 days) or current (have smoked at least 100 cigarettes and have smoked at least one cigarette in the past 30 days). Physical activity was operationalized using the Physical Activity Scale for the Elderly (PASE) [33], a continuous measure in which a greater score indicates an overall greater amount of time spent per week performing activities such as walking, housework, and sports and recreational activities. Diet was evaluated based on participant fruit and vegetable consumption and defined as less than two servings daily, two–three servings and four or more servings; this information was captured within the AB SCREEN™ II assessment tool (the AB SCREEN™ II assessment tool is owned by Dr. Heather Keller. Use of the AB SCREEN™ II assessment tool was made under license from the University of Guelph). Data for all factors were obtained by a self-reported questionnaire, and refusing or being unable to answer a given question was considered missing.

Statistical analysis

All continuous sociodemographic variables were summarized as the mean and standard deviation (SD) and categorical variables as the count and percentage. For comparison between groups, either t-test or Fisher’s exact test was used, and nominal (unadjusted) p-values reported. The association of each epigenetic clock with chronological age (or among epigenetic clocks) was estimated by Pearson’s correlation, while the distribution of each delta age estimate was summarized as the mean, SD, minimum and maximum.

Associations between delta age for each clock and sociodemographic and lifestyle factors were estimated by ordinary least squares regression using two models, the first adjusting for age and sex and the second additionally adjusting for education, income, smoking, physical activity and diet. Given that the frailty index commonly follows a gamma distribution [62], the association between delta age estimates and frailty was estimated by gamma regression (identity link) using the two models as described above. In models with frailty as the dependent variable, all covariates were found to improve model fit statistics (i.e., residual deviance and Akaike's information criterion) and diagnostic criteria (i.e., normality and heteroskedasticity of residuals). For frailty at 3-year follow-up, both models were also adjusted for frailty at baseline in order to determine the association with change in frailty. In all models, delta age was standardized to have a mean of 0 and SD of 1 in order to facilitate cross-clock comparisons. Results are presented as the coefficient (i.e., beta) and 95% confidence interval, which was not adjusted for multiple testing, and any observation including missing data was excluded from analysis; p-values were not reported as confidence intervals tend to provide greater information on the effect size(s) being presented [63]. To estimate the odds of a CMD increase in frailty at follow-up related to each delta age estimate, we used ordinal regression, where the change in frailty was categorized as no increase (i.e., ΔFI ≤ 0), up to 1× CMD (i.e., 0 < ΔFI < 0.03), 1–2× CMD (i.e., 0.03 ≤ ΔFI < 0.06) and 3 or more than 3× CMD (i.e., ΔFI ≥ 0.06). These models were adjusted for age and sex and presented as the odds ratio (OR) and 95% confidence interval (as above, p-values are not provided). All analyses were performed in R version 3.6.