Abstract
Delta age is a biomarker of brain aging that captures differences between the chronological age and the predicted biological brain age. Using multimodal data of brain MRI, genomics, and blood-based biomarkers and metabolomics in UK Biobank, this study investigates an explainable and causal basis of high delta age. A visual saliency map of brain regions showed that lower volumes in the fornix and the lower part of the thalamus are key predictors of high delta age. Genome-wide association analysis of the delta age using the SNP array data identified associated variants in gene regions such as KLF3-AS1 and STX1. GWAS was also performed on the volumes in the fornix and the lower part of the thalamus, showing a high genetic correlation with delta age, indicating that they share a genetic basis. Mendelian randomization (MR) for all metabolomic biomarkers and blood-related phenotypes showed that immune-related phenotypes have a causal impact on increasing delta age. Our analysis revealed regions in the brain that are susceptible to the aging process and provided evidence of the causal and genetic connections between immune responses and brain aging.
Similar content being viewed by others
Introduction
Aging is a primary risk factor for a myriad of health problems. Since aging proceeds at different rates for each individual, various methods to measure the biological age have been developed for a more accurate diagnosis of health status. Among the concerns related to aging, cerebral atrophy, which leads to cognitive decline, is a substantial risk to the individual well-being, constituting a major public health burden. Brain volume loss is also associated with neurodegenerative diseases such as Alzheimer’s disease and Parkinson’s disease1,2.
The aging process in different brain regions can be detected through structural and functional Magnetic Resonance Imaging (MRI). As large-scale datasets such as UK Biobank that contain neuroimaging data are becoming available, there have been efforts to accurately predict an individual’s chronological age with the neuroimaging datasets. Franke et al.3 used principal component analysis and relevance vector machine to predict age. Studies since then primarily used neural network models for prediction and data-driven feature extraction4,5,6,7,8,9,10,11,12. The convolutional neural network (CNN) models have been used with a high level of accuracy. The mean absolute error of the prediction in most literature with CNN models is between 2.14 and 3.4 years.
The difference between the predicted age and the actual chronological age, called delta age, has been used as an aging biomarker3,4. After estimating the delta age, phenome-wide and genome-wide association tests have been conducted to identify significantly associated genetic and clinical factors. Recent studies have shown that bone mineral density, blood pressure, and type 2 diabetes are associated with delta age7,8. Genome-wide association analyses identified that KANSL1, MAPT-AS1, CRHR1, NSF in chromosome 17, KLF3 (chromosome 4), RUNX2 (chromosome 6), and NKX6-2 gene (chromosome 10) were significantly associated5,10,11. When combined with the cognitive test results, SNPs in MED8, COLEC10, and PLIN4 genes were also significantly associated11.
To extend our understanding of the genetic and molecular basis of brain aging, we analyzed multimodal UK Biobank data. Compared to the previous studies, our analysis includes whole-exome sequencing (WES) and metabolomics data, which enabled us to identify novel genetic and biomarker associations. In addition, we carried out large-scale Mendelian randomization studies of 310 blood and metabolomic phenotypes to identify causal biomarkers and used an explainable AI method for medical images to identify brain regions that drive high delta age.
Results
Overview of the analysis
Figure 1 provides an overview of our analysis. First, a 3D CNN model was trained for age prediction with the T1-weighted structural brain MRI of healthy white British samples in the UK Biobank, excluding individuals with diseases related to cancer, diabetes, dementia, and mental disorders. The training was conducted via cross-validation to use all available samples in the downstream analysis (see “Methods”). The Integrated Gradients (IG) method was then used to identify an accurate attribution of each voxel (volume + pixel) to the prediction13. Second, genome-wide association tests were conducted on different test levels to uncover novel loci associated with brain aging. We applied a single-variant test, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), to the array-genotyped and imputed markers, and a gene-based test, SAIGE-GENE + , to the WES datasets14,15. Lastly, we examined linear and nonlinear causal relationships between the delta age and the phenotypes (metabolomics and blood) with Mendelian randomization methods. The number of samples used in each step is in Table 1.
Age prediction accuracy and saliency map
Figure 2 and Supplementary Table 1 show the prediction results of the 3D CNN model. We used the cross-validation scheme to use all available samples10. Supplementary Table 1 shows the mean absolute error of the samples with diseases and the samples without diseases and four-fold cross-validation groups (see “Methods”). The mean absolute error in the healthy individuals was about 2.6406 years in the test set and 0.8989 years in the training set. The existing studies on brain age estimation had similar accuracy to this result7,8,9,10. The mean absolute error in the samples with diseases was 2.651 years. Figure 2a shows the strong positive correlation between the chronological age (x-axis) and the predicted age (y-axis). The age-related bias was corrected by linear regression on the chronological age (Fig. 2b)16. Figure 2c is a scatter plot between the chronological age and the adjusted delta age.
Figure 3 shows the saliency maps of the age prediction model with integrated gradients. Absolute values of the integrated gradients were averaged for 100 samples with the youngest predicted age. Figure 3a shows the voxels with averaged integrated gradients greater than five. According to the Automated Anatomical Labeling atlas (AAL) and the Natbrainlab atlas, when highlighting the regions with integrated gradients greater than ten, the regions were the fornix and the lower part of the thalamus (Fig. 3b)17,18. Similar results were shown when 100 samples were chosen randomly or by descending order of the delta age values (data not shown).
Genetic variants associated with delta age
We performed genetic association analyses for the predicted delta age. We carried out single-variant tests for 38 million array-genotyped and imputed genetic variants using SAIGE. Since the single-variant test for rare variants in WES has low power, we used a gene-based test method for the WES dataset19. We used SAIGE-GENE + to test for rare variant associations (18,308 genes).
Figure 4 is the Manhattan plot of single-variant analysis results from array-genotyped and imputed data. Table 2 lists significant variants (p value under 5e−08). The single-variant test results showed that five loci in chromosomes 1, 4, 6, 10, 11, and 17 were significantly associated with the delta age. The nearest genes were STX6, MR1, KLF3-AS1, WNT16, INPP5A, NKX6-2, and several genes in chromosome 17, including KANSL1, MAPT-AS1, and NSF. Genetic heritability calculated by the variance components in the SAIGE model was 21.5%.
The gene-based rare variant test on the WES dataset identified no significant genes. The p value threshold was the Bonferroni corrected level of 0.05 (0.05/18,308). SEC62, PPM1F, ABCC2, ADAM15, and NDN were the top five genes with the smallest p value (Supplementary Table 3).
To check whether the delta age prediction was truly driven by voxel values in the fornix and the lower part of the thalamus, we carried out the same GWAS procedure with the average voxel value of the two regions. The SAIGE results showed that the significant loci (p value under 5e−08) associated with the two regions were also concentrated on chromosome 17 (Supplementary Table 2; Supplementary Fig. 1a,b). SLC39A8 and C16orf95 genes were commonly shown to be associated with the two regions. We also calculated the genetic correlation among delta age and average voxel values of the two regions and observed high genetic correlation values (Supplementary Table 4). Our analysis results clearly demonstrated the shared genetic basis of delta age and the two regions.
Additional validation on the delta age of 1610 healthy non-British white samples was conducted. In single-SNP GWAS, two SNPs in chromosome 4 with no specific gene region and 10 SNPs in chromosome 17 in genes such as KANSL1 and LRRC37A2 had p values less than 0.05. With a slightly more lenient p value cutoff of 0.1, one SNP in chromosome 4 and 40 SNPs in chromosome 17 centered on PLEKHM1, KANSL1, LINC02210-CRHR1 were additionally significant. Due to the small sample size (905 samples), none of the genes had p values < 0.1 in non-British white samples.
Causal biomarkers of delta age
Before the causal analysis, we carried out an association analysis between delta age and each of 310 (61 blood and 249 metabolomic biomarkers) phenotypes with linear regression models. Among the 310 phenotypes, HbA1C (p value = 7.31E−28) and Glucose (p value = 1.11E−26) were most significantly associated with the delta age (Supplementary Fig. 4; Supplementary Table 7) with a positive association direction after the Benjamini–Hochberg procedure (FDR = 0.05). This may indicate the association between diabetes and brain aging.
For causal analysis, 59 had p values less than 0.05 in causal estimates from at least one of the three linear MR methods (MR-Egger regression, inverse variance weighting, and weighted median). Table 3 lists the top five phenotypes by MR-Egger regression (Eosinophil count, Eosinophil percentage, Neutrophil count, Total protein, and White blood cell count), and Supplementary Table 5 shows the other 54 phenotypes. The Top five phenotypes were immune-related biomarkers and had positive relationship with the delta age. Among them Eosinophil count (p value = 5.16E−06) was statistically significant after the Benjamini–Hochberg procedure (FDR = 0.05). The reverse causality for the eosinophil count was not significant (p value = 0.68), so there was no simultaneity.
Figure 5 is a PheWAS plot of the causal estimates from the MR-Egger regression. Overall, phenotypes related to white blood cells showed more significant causal relationships with delta age than other phenotypes. Similar results were replicated by the weighted median method (Supplementary Fig. 3; Supplementary Table 6).
Nonlinear MR analysis using piecewise MR and kernel IV showed similar results. Total cholines (p value = 0.03413), total lipids in small LDL (p value = 0.03599), and cholesteryl esters to total lipids in very large HDL percentage (p value = 0.01002) passed the test of the assumptions for instrument variable regression and returned p value less than 0.05 in the trend test, indicating that there was nonlinearity in the causal relationship between the biomarkers and delta age (the values in the parenthesis indicate the p values). Total choline (p value = 0.03413) showed a positive causal relationship with delta age when it is 2.5 mmol/l or higher. It increases delta age when it is above 2.5 mmol/l. The other two biomarkers showed an inverted U-shaped relationship (Supplementary Fig. 5). When applying FDR = 0.25, cholesteryl esters to total lipids in very large HDL percentage were the only variables that showed significant nonlinear causal relationship with delta age.
Discussion
In this paper, we have analyzed the biological risk factors of brain aging with multimodal data consisting of brain MRI, genome, blood, and metabolomics. The CNN model that predicts age from brain MRI had high accuracy with a mean absolute error of 2.64 years. Visual information of the regional importance in the brain was extracted from the neural network model. Genetic variants and biomarkers that have significant links to brain aging were identified using GWAS methods and Mendelian randomization.
Delta age is known to have associations with lower bone mineral density (BMD), higher blood pressure, poorer lung function, cognitive decline, diabetes, multiple sclerosis, and neurodegenerative disease through several studies6,7,16,20. BMD, lung function, and cognitive function tend to decrease with aging, and cognitive function is known to be highly related to brain volume21,22. In addition, diabetes and neurodegenerative disease are well-known aging-related diseases23. We also found the association between delta age and various aging-related phenotypes24,25, and showed significant relationships with lower bone mineral density, cognitive decline, and poorer lung function (Supplementary Table 8). For age-related neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, and Schizophrenia, due to the limitation of the low number of diseased individuals (n < 10), we investigated the association of epilepsy and Schizophrenia only. In epilepsy, there was a difference in delta age between case and control groups (p value = 3.0E−02). We also found an association between delta age and type 2 diabetes (p value = 3.32E−31), which has been confirmed in several previous studies. In genetic correlation analysis, no significant phenotype could be confirmed after Bonferroni correction (0.05/15), but overall results were consistent with the direction of aging (Supplementary Table 9). We also carried out a genetic correlation analysis between our brain MRI-based delta age and delta ages calculated from epigenetic age predictors (Supplementary Table 10)26. In this case, however, there were no significant relationship between them.
Many models for predicting brain age have been built3,4,5,6, but only some studies have used large-scale MRI data. In this study, we used N = 34,129 brain MRI to train the CNN model, which showed good performance in previous studies7. Next, we used integrated gradients to make accurate saliency maps by incorporating information from a wider range of pixel values not present in the original images. This information revealed important brain regions missed by other mapping methods. The saliency map in the previous studies highlighted the brain regions such as the hippocampus, brainstem, and amygdala7,8,27. When highlighting the important points with higher integrated gradients in our study, they were centered on the fornix and the lower part of the thalamus. This indicates that the aging process affects the brain mainly through the atrophy in the inner area connected to memory and learning ability28,29,30,31. In addition, genetic variants associated with volumes of these regions and delta age were highly similar, supporting the result.
Investigating gene-level rare variant associations in the WES data, we identified no significant genes associated with the delta age. However, the NDN gene, one of the top five genes with the smallest p value, is known to play an important role in neural differentiation and survival of postmitotic neurons32,33. In addition, this gene is associated with the PWS (Prader–Willi syndrome), which is known to have a significantly higher delta age in the PWS34.
In the single-variant test of array-genotyped and imputed variants, we replicated strong association signals in chromosome 17. The significant variants in other chromosomes were in STX6, MR1, KLF3-AS1, WNT16, INPP5A, and NKX6-2. STX6 and KLF3-AS1 relate to carcinogenesis35,36. MR1 and WNT16 participate in immune response via antigen presentation to T cells and lymphocyte proliferation37,38. Mutations in INPP5A and NKX6-2 were shown to cause neurologic problems39,40.
Finally, we performed causality analysis on 61 blood and 249 metabolomic biomarkers. Several studies have been conducted on the analysis of the relationship between delta age and lifestyle, disease, blood biomarkers, etc.8,16, but there are few cases where the causal relationship with delta age has been analyzed. Also, as far as we know, there is no case where the relationship between metabolomics and delta age has been analyzed. Our investigation of the causal effects of 310 blood and metabolomic biomarkers on delta age showed the potential causal roles of immune responses to delta age. Especially, biomarkers related to white blood cells had a significant causal effect. This claim is supported by existing literature41,42.
This study, however, is subject to several limitations. First, the analysis was done in the European ancestry group only. The aging process and genetic variants associated with aging can differ according to ancestry. Second, replication of the results in an independent dataset was not conducted due to a lack of datasets with genetics data, extensive biomarkers, and brain MRI. Future research should focus on addressing the limits and making results more generalizable.
In conclusion, our multimodal data analysis shows many aspects of brain aging, including brain regions most affected by the aging, associated genes, and causal biomarkers. As more biobanks with multimodal data are collected, more diverse aspects of brain aging can be revealed. Biobanks of other ancestry groups can identify novel biomarkers associated with brain aging not identified in this study. In addition, potential mediating factors between immune responses and brain aging can be revealed. This would allow a deeper understanding of brain aging mechanisms that develop proper prevention and treatment.
Methods
Data preprocessing
T1-weighted structural MRI images of 34,129 white British samples in the UK Biobank were used for the analysis to minimize the effect of ancestry (average age of 60.964)43. All of the images downloaded from the UK Biobank had been normalized into MNI152 space (Montreal Neurosciences Institute) to render the comparison of each voxel possible44. The samples were selected if they had proper images and if they had no relation to any other individuals in the dataset. Each image was resized from 182 × 216 × 182 to 128 × 128 × 128 to reduce the computation cost. First, a part of z-axis voxels (from 26 to 153 out of 182 points) was selected to include various brain regions in the prediction task. The voxels in the upper outermost surface of the brain were excluded since they were considered negligible in the prediction and redundant due to the inclusion of other parts of the cerebral cortex. Second, each 2-dimensional 182 × 216 image (x, y-axis) in z-axis points was resized to a 128 × 128 image with the nearest neighbors scaling algorithm. The preprocessing of nifti format MRI images was performed with oro.nifti and OpenImageR packages in R45,46.
For the genetics data, we downloaded bgen files of 93 million array-genotyped and imputed variants dataset. And we used plink files of 26 million whole-exome sequencing (WES) variants dataset from the UK Biobank. The 450 K WES data were used in our analysis, and the analysis was performed on the DNA nexus.
The 249 metabolomic phenotypes and 61 biomarkers from blood assays and blood count were used in the Mendelian randomization (MR) analysis. The metabolomic phenotypes and biomarkers were collected separately from MRI imaging, from 2006 to 2010; this would enable the investigation of the biomarkers’ longitudinal and cumulative effect on the brain. The missing values in the selected biomarkers were imputed with multiple imputation by chained equations (MICE) to fit the missing values to the overall multivariate distribution47. After filling the missing values, 8464 individuals with the brain MRI had corresponding values of metabolomic phenotypes, and all 34,129 individuals had values of blood-related phenotypes. The 310 selected biomarkers were divided into 17 groups, including amino acids, cholesterol, and glycolysis-related metabolites.
3D CNN prediction model and integrated gradients
3D CNN model was used to predict the age of the individuals. Supplementary Fig. 6 is the overall network structure of the prediction model. The neural network model takes the resized images (128 × 128 × 128) as input and has less than three million parameters to train. The spatial dropout layers were added in the first two blocks to prevent the model from overfitting. The kernel size is 3 × 3 × 3 in convolution layers and 2 × 2 × 2 in pooling layers. The model conducts max-pooling until the size of an image in each feature becomes 2 × 2 × 2. The number of features increases as the input image size reduces. Adam optimizer with a learning rate 0.001 and He uniform initializer were used since the activation function is the rectified linear unit (ReLU).
Healthy white British individuals (25,656) were selected to train the prediction model. Individuals with diseases (all types of cancers, diabetes, neoplasm, dementia, and mental disorders) were excluded from the training process. The dataset with healthy individuals was divided into four sets (CV1, CV2, CV3, and CV4) for four-fold cross-validation so that every sample is included in the test set at least once and has a predicted age value. When CV1 is the test dataset, the other three sets become the training dataset. For each training set, three separate models (the same structure in Supplementary Fig. 6 with different initial weights and dropouts) were trained for more robust prediction. They constitute a single model set. After training the models, the test images were given to the models as input. The average of the predictions from the three models becomes the final predicted age of the test images, hence a total of 12 models to train (three models for each of the four cross-validation batches). Prediction of the age of individuals with diseases was made with the average of the predicted age from the four trained model sets. The delta age value of each sample was calculated by subtracting the individual’s chronological age from the predicted age. The age-related bias in the delta age value was adjusted through linear regression on the chronological age. The adjusted delta age values were used in the later association tests.
To identify which regions in the brain contributes significantly to age prediction, the Integrated Gradients (IG) method was used. The IG method is an explainable AI method for neural networks that uses multiple images between blank and original images13. In this study, the number of images generated for each sample was 101 in reference to the recommended step size in the original paper.
Genome-wide association test with single variants and gene regions
We used SAIGE for array-genotyped and imputed variants and SAIGE-GENE + for the WES variants. Variants with minor allele frequency less than 0.0001 were excluded in the SAIGE analysis. SAIGE uses a mixed effect model to account for the relatedness among the individuals. SAIGE-GENE + is a gene-based rare variant association test and it performs BURDEN, SKAT, and SKAT-O tests15. Since the number of tests decreases in the gene-based test, multiple testing correction is less stringent.
34,129 (= N) samples were used in SAIGE analysis. The delta age values of the individuals were inverse-normal transformed. Covariates were sex, age, ten principal component scores, and four dummy variables which indicate different cross-validation test sets plus samples with diseases. N \(\times\) N genomic relation matrix (GRM) was calculated 784,256 markers in called autosomal genotypes. Leave-one-chromosome-out (LOCO) option was applied when estimating the GRM.
18,308 regions were identified with annotation on the WES genotype by the ANNOVAR software (version 2020Jun07)48. The size of the samples n in the gene-based test was 30,812 (white British individuals with proper MRI images included in the UK Biobank 450 k whole-exome sequencing data). The SAIGE-GENE + analysis was done with 3 MAF cutoffs (MAF = 0.01, 0.001, 0.0001) and 3 functional annotation groups (LOF, LOF + Missense, LOF + Missense + Synonymous). In addition to the same covariates in the SAIGE test, the batch indicator variable was included to adjust for possible batch effect in the WES dataset.
Genetic correlation among delta age and the average volume of two brain regions (the fornix and the lower part of the thalamus) was derived using LD score regression with western European LD scores49.
We also calculated the genetic correlation between delta age and age-related phenotypes and epigenetic delta ages using LD score regression. Summary statistics used for calculating genetic correlations were downloaded from (1) International Genomics of Alzheimer’s Project (IGAP) for Alzheimer’s disease, (2) Schizophrenia Working Group of the Psychiatric Genomics Consortium (PGC) for schizophrenia, (3) McCartney et al., for DNA methylation biomarkers, and (4) Pan-UK Biobank for the rest of the phenotypes26,50,51.
Linear and nonlinear mendelian randomization
We used variations of Mendelian randomization methods to identify the causal effect of biomarkers (explanatory variable) on delta age (outcome). The overall workflow of the Mendelian randomization in this study is in Supplementary Fig. 7. Instrument genetic markers for each of the 310 biomarkers were selected as follows. First, we chose variants significantly associated with the explanatory variable (p value under 5e−8) among the called autosomal genetic markers. The markers with minor allele frequencies less than 0.01 were pruned because the estimation of the effect size from rare variants is unstable. The GWAS summary statistics for the metabolomics measured by the Nightingale Health are from open datasets in MRC Integrative Epidemiology Unit at the University of Bristol (IEU)52. The GWAS results of blood phenotypes are from Pan-UK Biobank GWAS summary statistics by the Broad Institute (available at https://pan.ukbb.broadinstitute.org). We used effect size, standard error, and p value from the samples with European ancestry. Second, the linkage disequilibrium (LD) pruning process was conducted with PLINK software (version 1.90) with a window size of 50 base pairs to ensure that the selected instruments were independent of each other53. Pairs of variants with a correlation coefficient larger than 0.01 were LD pruned. Lastly, since the markers should not directly affect the outcome, we excluded variants that were also significant for outcome through Bonferroni correction (0.05/number of markers left). The effect size and standard error of the remaining markers were used in the MR analysis.
Linear and nonlinear explanatory variable-outcome relationship
We first calculated the causal impacts of the biomarkers (explanatory variable) on delta age (outcome) with three linear Mendelian randomization methods: MR-Egger regression, weighted median method (WM), and inverse variance weighting method (IVW)54,55,56. The MendelianRandomization package in R was used to carry out the MR-Egger regression57. The estimates and p values from the default version of MR-Egger regression were selected. The same was done for the estimates of the WM and IVW method. The estimates in all methods were assumed to follow the normal distribution. The Benjamini–Hochberg procedure was applied to control the false discovery rates at 0.0558.
We also carried out an association analysis between delta age and blood-chemistry and metabolomics biomarkers. The following linear regression model of a single biomarker was used.
Nonlinear MR was conducted with the nlmr package in R59. The key rationale for using the nonlinear Mendelian randomization method is to find a nonlinear pattern of causal estimates from different ranges of explanatory variable, as the impact of explanatory variable on delta age can vary according to the ranges.
Instrument variable regression has some assumptions to be satisfied. First, the instruments should have significant correlation with the exposure. Second, the instrument should affect the outcome only through the exposure. These assumptions were tested by checking the p value of the Pearson’s correlation coefficient and conditional independence test. Among the instruments selected for each biomarker from the linear MR procedure, the ones with positive effect sizes were collected. We constructed each sample’s single allele score \(G\) with those markers in order to increase power and avoid weak instrument bias60. Each marker has genotype 0, 1, or 2 for each sample. When there are n samples and m genetic markers, the allele score of sample i is in (1).
Here \({\beta }_{j}\) is the effect size of the jth marker and \({g}_{ij}\) is the genotype of sample i in the jth marker. The allele scores and the explanatory variable values were tested for a significant positive relationship (p value of the Pearson’s correlation coefficient lower than 0.05 using the cor.test function in R). The exclusion restriction assumption is not perfectly testable, so we assumed that there are only confounders affecting both the explanatory variable and the outcome61. In that case, adjusting for the explanatory variable would render the instruments and outcome statistically independent except for the association induced by “collider bias”62. The collider bias is unlikely to happen due to the fact that the delta age was calculated from images taken after the level of the biomarker had been measured. This makes the direction of the effect from the explanatory variable to the outcome, not the other way around. Conditional independence test then can catch the violation of the exclusion restriction. The allele score shows a genetically determined level of the biomarker, and the delta age was calculated from images taken after the level of the biomarker had been measured. In order to test for nonlinear conditional independence, Randomized Conditional Independence Test (RCIT) was used63. We repeated the RCIT three times. The assumptions were considered to have been met if none of the three tests had a p value less than 0.05 with the null hypothesis of the conditional independence between Y and G given X. Only 12 out of 310 variables were found to satisfy all the assumptions for instrument variable regression.
Then, we conducted the piecewise MR analysis for the 12 variables that met the assumptions59. The samples were divided into ten groups according to deciles by the IV-free explanatory variable. Assumptions of the IV-explanatory variable relationship in the piecewise MR are the homogeneity and the linearity across all samples. These assumptions were tested by the heterogeneity test using Q statistics. If the null hypothesis of homogeneity in the estimates between the groups was not rejected in the heterogeneity test, the trend test was conducted on the biomarker. The trend test evaluated whether the local average causal effect in each group is explained by the average value of the explanatory variable in the corresponding group. Kernel IV regression with radial basis function kernel was performed with the 12 passed phenotypes to check if the results were replicated. Due to the heavy computation cost, we sampled 2000 individuals for the Kernel IV regression. The test values were 1000 numbers with an equal distance between the minimum and the maximum of the explanatory variables.
Ethics statement and consent to participate
UK Biobank has approval from the North West Multi-centre Research Ethics Committee (REC reference: 11/NW/03820). This research was conducted according to the principles expressed in the Declaration of Helsinki. UK Biobank participants are volunteers who have provided written informed consent. Personally identifiable information was not used.
Data availability
The UK Biobank data are publicly available (https://www.ukbiobank.ac.uk/). Our study made use of imaging-derived phenotypes generated by an image-processing pipeline developed and run on behalf of UK Biobank64. The summary statistics used in the Mendelian randomization analysis are available for public download at the IEU OpenGWAS Project (https://gwas.mrcieu.ac.uk) and the Pan-UK Biobank (https://pan.ukbb.broadinstitute.org). The summary statistics used in the genetic correlation analysis are available for public download at the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (https://www.niagads.org/datasets/ng00075) and the Psychiatric Genomics Consortium website (https://pgc.unc.edu/). The summary statistics for epigenetic biomarkers are publicly available (https://datashare.ed.ac.uk/handle/10283/3645).
Code availability
The code used in the analyses is available at our Github page. https://github.com/Flumenlucidum/Brain-Aging.
References
Fox, N. C. & Schott, J. M. Imaging cerebral atrophy: Normal ageing to Alzheimer’s disease. Lancet 363, 392–394 (2004).
Nagano-Saito, A. et al. Cerebral atrophy and its relation to cognitive impairment in Parkinson disease. Neurology 64, 224–229 (2005).
Franke, K., Ziegler, G., Klöppel, S., Gaser, C. & Initiative, A. S. D. N. Estimating the age of healthy subjects from T1-weighted MRI scans using kernel methods: Exploring the influence of various parameters. Neuroimage 50, 883–892 (2010).
Cole, J. H. et al. Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. Neuroimage 163, 115–124 (2017).
Jónsson, B. A. et al. Brain age prediction using deep learning uncovers associated sequence variants. Nat. Commun. 10, 1–10 (2019).
Xifra-Porxas, A., Ghosh, A., Mitsis, G. D. & Boudrias, M. H. Estimating brain age from structural MRI and MEG data: Insights from dimensionality reduction techniques. Neuroimage 231, 117822 (2021).
Kolbeinsson, A. et al. Accelerated MRI-predicted brain ageing and its associations with cardiometabolic and brain disorders. Sci. Rep. 10, 1–9 (2020).
Dinsdale, N. K. et al. Learning patterns of the ageing brain in MRI using deep convolutional networks. Neuroimage 224, 117401 (2021).
Peng, H., Gong, W., Beckmann, C. F., Vedaldi, A. & Smith, S. M. Accurate brain age prediction with lightweight deep neural networks. Med. Image Anal. 68, 101871 (2021).
Ning, K. et al. Improving brain age estimates with deep learning leads to identification of novel genetic factors associated with brain aging. Neurobiol. Aging 105, 199–204 (2021).
Le Goallec, A., Diai, S., Collin, S., Vincent, T. & Patel, C. J. Using deep learning to predict brain age from brain magnetic resonance images and cognitive tests reveals that anatomical and functional brain aging are phenotypically and genetically distinct. medRxiv 20, 20 (2021).
Lam, P. K. et al. In 16th International Symposium on Medical Information Processing and Analysis. 11–20 (SPIE).
Sundararajan, M., Taly, A. & Yan, Q. In International Conference on Machine Learning. 3319–3328 (PMLR).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. D. & Lin, X. Sequence kernel association tests for the combined effect of rare and common variants. Am. J. Human Genet. 92, 841–853 (2013).
Smith, S. M., Vidaurre, D., Alfaro-Almagro, F., Nichols, T. E. & Miller, K. L. Estimation of brain age delta from brain imaging. Neuroimage 200, 528–539 (2019).
Tzourio-Mazoyer, N. et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15, 273–289 (2002).
Catani, M. & De Schotten, M. T. A diffusion tensor imaging tractography atlas for virtual in vivo dissections. Cortex 44, 1105–1132 (2008).
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: Study designs and statistical tests. Am. J. Human Genet. 95, 5–23 (2014).
Cole, J. et al. Brain age predicts mortality. Mol. Psychiatry 23, 1385–1392 (2018).
Royle, N. A. et al. Estimated maximal and current brain volume predict cognitive ability in old age. Neurobiol. Aging 34, 2726–2733 (2013).
Lövdén, M. et al. Does variability in cognitive performance correlate with frontal brain volume?. Neuroimage 64, 209–215 (2013).
Hou, Y. et al. Ageing as a risk factor for neurodegenerative disease. Nat. Rev. Neurol. 15, 565–581 (2019).
Simm, A. et al. Potential biomarkers of ageing. Biol. Chem. 389, 257–265 (2008).
Bell, J. T. et al. Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population. PLoS Genet. 8, e1002629 (2012).
McCartney, D. L. et al. Genome-wide association studies identify 137 genetic loci for DNA methylation biomarkers of aging. Genome Biol. 22, 194 (2021).
Bintsi, K.-M., Baltatzis, V., Hammers, A. & Rueckert, D. Interpretability of Machine Intelligence in Medical Image Computing, and Topological Data Analysis and Its Applications for Medical Data 65–74 (Springer, 2021).
Deeb, W. et al. Fornix-region deep brain stimulation-induced memory flashbacks in Alzheimer’s disease. N. Engl. J. Med. 381, 783–785 (2019).
Foster, C. M., Kennedy, K. M., Hoagey, D. A. & Rodrigue, K. M. The role of hippocampal subfield volume and fornix microstructure in episodic memory across the lifespan. Hippocampus 29, 1206–1223 (2019).
Cherubini, A., Péran, P., Caltagirone, C., Sabatini, U. & Spalletta, G. Aging of subcortical nuclei: Microstructural, mineralization and atrophy modifications measured in vivo using MRI. Neuroimage 48, 29–36 (2009).
Wolff, M. & Vann, S. D. The cognitive thalamus as a gateway to mental representations. J. Neurosci. 39, 3–14 (2019).
Yoshikawa, K. Cell cycle regulators in neural stem cells and postmitotic neurons. Neurosci. Res. 37, 1–14 (2000).
Kuwajima, T., Nishimura, I. & Yoshikawa, K. Necdin promotes GABAergic neuron differentiation in cooperation with Dlx homeodomain proteins. J. Neurosci. 26, 5383–5392 (2006).
Azor, A. M. et al. Increased brain age in adults with Prader-Willi syndrome. Neuroimage Clin. 21, 101664 (2019).
Du, J., Liu, X., Wu, Y., Zhu, J. & Tang, Y. Essential role of STX6 in esophageal squamous cell carcinoma growth and migration. Biochem. Biophys. Res. Commun. 472, 60–67 (2016).
Liu, J.-Q. et al. lncRNA KLF3-AS1 suppresses cell migration and invasion in ESCC by impairing miR-185-5p-targeted KLF3 inhibition. Mol. Ther. Nucleic Acids 20, 231–241 (2020).
De Libero, G., Chancellor, A. & Mori, L. Antigen specificities and functional properties of MR1-restricted T cells. Mol. Immunol. 130, 148–153 (2021).
Mazieres, J. et al. Inhibition of Wnt16 in human acute lymphoblastoid leukemia cells containing the t (1; 19) translocation induces apoptosis. Oncogene 24, 5396–5400 (2005).
Liu, Q. et al. Cerebellum-enriched protein INPP5A contributes to selective neuropathology in mouse model of spinocerebellar ataxias type 17. Nat. Commun. 11, 1–13 (2020).
Chelban, V. et al. Genetic and phenotypic characterization of NKX6-2-related spastic ataxia and hypomyelination. Eur. J. Neurol. 27, 334–342 (2020).
Ising, C. & Heneka, M. T. Functional and structural damage of neurons by innate immune mechanisms during neurodegeneration. Cell Death Dis. 9, 1–8 (2018).
Corlier, F. et al. Systemic inflammation as a predictor of brain aging: Contributions of physical activity, metabolic risk, and genetic risk. Neuroimage 172, 118–129 (2018).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Grabner, G. et al. Symmetric atlasing and model based segmentation: An application to the hippocampus in older adults. Med. Image Comput. Comput. Assist. Interv. 9, 58–66 (2006).
Whitcher, B., Schmid, V. J. & Thorton, A. Working with the DICOM and NIfTI Data Standards in R. J. Stat. Softw. 44, 1–29 (2011).
Mouselimis, L. OpenImageR: An Image Processing Toolkit. R package version 1 (2017).
White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 30, 377–399 (2011).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164–e164 (2010).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Abeta, tau, immunity and lipid processing. Nat. Genet. 51, 414–430 (2019).
Schizophrenia Working Group of the Psychiatric Genomics C. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Elsworth, B. L. et al. The MRC IEU OpenGWAS data infrastructure. bioRxiv (2020).
Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Human Genet. 81, 559–575 (2007).
Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).
Bowden, J., DaveySmith, G. & Burgess, S. Mendelian randomization with invalid instruments: Effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Bowden, J., DaveySmith, G., Haycock, P. C. & Burgess, S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).
Yavorska, O. O. & Burgess, S. MendelianRandomization: An R package for performing Mendelian randomization analyses using summarized data. Int. J. Epidemiol. 46, 1734–1739 (2017).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57, 289–300 (1995).
Staley, J. R. & Burgess, S. Semiparametric methods for estimation of a nonlinear exposure-outcome relationship using instrumental variables with application to Mendelian randomization. Genet. Epidemiol. 41, 341–352 (2017).
Burgess, S. & Thompson, S. G. Use of allele scores as instrumental variables for Mendelian randomization. Int. J. Epidemiol. 42, 1134–1144 (2013).
Yazdani, A. et al. From classical Mendelian randomization to causal networks for systematic integration of multi-omics. Front. Genet. 13, 990486 (2022).
Glymour, M. M., Tchetgen Tchetgen, E. J. & Robins, J. M. Credible Mendelian randomization studies: Approaches for evaluating the instrumental variable assumptions. Am. J. Epidemiol. 175, 332–339 (2012).
Strobl, E. V., Zhang, K. & Visweswaran, S. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. J. Causal Inference 7, 25 (2019).
Alfaro-Almagro, F. et al. Image processing and quality control for the first 10,000 brain imaging datasets from Uk biobank. Neuroimage 166, 400–424 (2018).
Acknowledgements
This research was supported by Big Brain Project through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (No. 2021M3E5D2A0102249311), and the Brain Pool Plus (BP+, Brain Pool+) Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2020H1D3A2A03100666). UK Biobank data were accessed under the accession number UKB: 45227.
Author information
Authors and Affiliations
Contributions
These authors contributed equally: J.K. and J.L. J.K. and S.L. designed the experiments. J.K., J.L., and S.L. constructed and developed the age prediction model. J.K. and S.L. wrote the manuscript. K.N performed genetic correlation analysis. All authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kim, J., Lee, J., Nam, K. et al. Investigation of genetic variants and causal biomarkers associated with brain aging. Sci Rep 13, 1526 (2023). https://doi.org/10.1038/s41598-023-27903-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-27903-x
- Springer Nature Limited