Investigation of genetic variants and causal biomarkers associated with brain aging

Kim, Jangho; Lee, Junhyeong; Nam, Kisung; Lee, Seunggeun

doi:10.1038/s41598-023-27903-x

Investigation of genetic variants and causal biomarkers associated with brain aging

Article
Open access
Published: 27 January 2023

Volume 13, article number 1526, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Investigation of genetic variants and causal biomarkers associated with brain aging

Download PDF

Jangho Kim¹^na1,
Junhyeong Lee¹^na1,
Kisung Nam¹ &
…
Seunggeun Lee¹

2343 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

Delta age is a biomarker of brain aging that captures differences between the chronological age and the predicted biological brain age. Using multimodal data of brain MRI, genomics, and blood-based biomarkers and metabolomics in UK Biobank, this study investigates an explainable and causal basis of high delta age. A visual saliency map of brain regions showed that lower volumes in the fornix and the lower part of the thalamus are key predictors of high delta age. Genome-wide association analysis of the delta age using the SNP array data identified associated variants in gene regions such as KLF3-AS1 and STX1. GWAS was also performed on the volumes in the fornix and the lower part of the thalamus, showing a high genetic correlation with delta age, indicating that they share a genetic basis. Mendelian randomization (MR) for all metabolomic biomarkers and blood-related phenotypes showed that immune-related phenotypes have a causal impact on increasing delta age. Our analysis revealed regions in the brain that are susceptible to the aging process and provided evidence of the causal and genetic connections between immune responses and brain aging.

The genetic architecture of multimodal human brain age

Article Open access 23 March 2024

Bayesian association scan reveals loci associated with human lifespan and linked biomarkers

Article Open access 27 July 2017

Genetic architecture of brain age and its causal relations with brain and mental disorders

Article Open access 10 May 2023

Introduction

Aging is a primary risk factor for a myriad of health problems. Since aging proceeds at different rates for each individual, various methods to measure the biological age have been developed for a more accurate diagnosis of health status. Among the concerns related to aging, cerebral atrophy, which leads to cognitive decline, is a substantial risk to the individual well-being, constituting a major public health burden. Brain volume loss is also associated with neurodegenerative diseases such as Alzheimer’s disease and Parkinson’s disease^1,2.

The aging process in different brain regions can be detected through structural and functional Magnetic Resonance Imaging (MRI). As large-scale datasets such as UK Biobank that contain neuroimaging data are becoming available, there have been efforts to accurately predict an individual’s chronological age with the neuroimaging datasets. Franke et al.³ used principal component analysis and relevance vector machine to predict age. Studies since then primarily used neural network models for prediction and data-driven feature extraction^{4,5,6,7,8,9,10,11,12}. The convolutional neural network (CNN) models have been used with a high level of accuracy. The mean absolute error of the prediction in most literature with CNN models is between 2.14 and 3.4 years.

The difference between the predicted age and the actual chronological age, called delta age, has been used as an aging biomarker^3,4. After estimating the delta age, phenome-wide and genome-wide association tests have been conducted to identify significantly associated genetic and clinical factors. Recent studies have shown that bone mineral density, blood pressure, and type 2 diabetes are associated with delta age^7,8. Genome-wide association analyses identified that KANSL1, MAPT-AS1, CRHR1, NSF in chromosome 17, KLF3 (chromosome 4), RUNX2 (chromosome 6), and NKX6-2 gene (chromosome 10) were significantly associated^5,10,11. When combined with the cognitive test results, SNPs in MED8, COLEC10, and PLIN4 genes were also significantly associated¹¹.

To extend our understanding of the genetic and molecular basis of brain aging, we analyzed multimodal UK Biobank data. Compared to the previous studies, our analysis includes whole-exome sequencing (WES) and metabolomics data, which enabled us to identify novel genetic and biomarker associations. In addition, we carried out large-scale Mendelian randomization studies of 310 blood and metabolomic phenotypes to identify causal biomarkers and used an explainable AI method for medical images to identify brain regions that drive high delta age.

Results

Overview of the analysis

Figure 1 provides an overview of our analysis. First, a 3D CNN model was trained for age prediction with the T1-weighted structural brain MRI of healthy white British samples in the UK Biobank, excluding individuals with diseases related to cancer, diabetes, dementia, and mental disorders. The training was conducted via cross-validation to use all available samples in the downstream analysis (see “Methods”). The Integrated Gradients (IG) method was then used to identify an accurate attribution of each voxel (volume + pixel) to the prediction¹³. Second, genome-wide association tests were conducted on different test levels to uncover novel loci associated with brain aging. We applied a single-variant test, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), to the array-genotyped and imputed markers, and a gene-based test, SAIGE-GENE + , to the WES datasets^14,15. Lastly, we examined linear and nonlinear causal relationships between the delta age and the phenotypes (metabolomics and blood) with Mendelian randomization methods. The number of samples used in each step is in Table 1.

Table 1 The number of samples in each stage of the analysis.

Full size table

Age prediction accuracy and saliency map

Figure 2 and Supplementary Table 1 show the prediction results of the 3D CNN model. We used the cross-validation scheme to use all available samples¹⁰. Supplementary Table 1 shows the mean absolute error of the samples with diseases and the samples without diseases and four-fold cross-validation groups (see “Methods”). The mean absolute error in the healthy individuals was about 2.6406 years in the test set and 0.8989 years in the training set. The existing studies on brain age estimation had similar accuracy to this result^7,8,9,10. The mean absolute error in the samples with diseases was 2.651 years. Figure 2a shows the strong positive correlation between the chronological age (x-axis) and the predicted age (y-axis). The age-related bias was corrected by linear regression on the chronological age (Fig. 2b)¹⁶. Figure 2c is a scatter plot between the chronological age and the adjusted delta age.

Figure 3 shows the saliency maps of the age prediction model with integrated gradients. Absolute values of the integrated gradients were averaged for 100 samples with the youngest predicted age. Figure 3a shows the voxels with averaged integrated gradients greater than five. According to the Automated Anatomical Labeling atlas (AAL) and the Natbrainlab atlas, when highlighting the regions with integrated gradients greater than ten, the regions were the fornix and the lower part of the thalamus (Fig. 3b)^17,18. Similar results were shown when 100 samples were chosen randomly or by descending order of the delta age values (data not shown).

Genetic variants associated with delta age

We performed genetic association analyses for the predicted delta age. We carried out single-variant tests for 38 million array-genotyped and imputed genetic variants using SAIGE. Since the single-variant test for rare variants in WES has low power, we used a gene-based test method for the WES dataset¹⁹. We used SAIGE-GENE + to test for rare variant associations (18,308 genes).

Figure 4 is the Manhattan plot of single-variant analysis results from array-genotyped and imputed data. Table 2 lists significant variants (p value under 5e−08). The single-variant test results showed that five loci in chromosomes 1, 4, 6, 10, 11, and 17 were significantly associated with the delta age. The nearest genes were STX6, MR1, KLF3-AS1, WNT16, INPP5A, NKX6-2, and several genes in chromosome 17, including KANSL1, MAPT-AS1, and NSF. Genetic heritability calculated by the variance components in the SAIGE model was 21.5%.

Table 2 Significant loci associated with delta age (p value < 5E-08).

Full size table

The gene-based rare variant test on the WES dataset identified no significant genes. The p value threshold was the Bonferroni corrected level of 0.05 (0.05/18,308). SEC62, PPM1F, ABCC2, ADAM15, and NDN were the top five genes with the smallest p value (Supplementary Table 3).

To check whether the delta age prediction was truly driven by voxel values in the fornix and the lower part of the thalamus, we carried out the same GWAS procedure with the average voxel value of the two regions. The SAIGE results showed that the significant loci (p value under 5e−08) associated with the two regions were also concentrated on chromosome 17 (Supplementary Table 2; Supplementary Fig. 1a,b). SLC39A8 and C16orf95 genes were commonly shown to be associated with the two regions. We also calculated the genetic correlation among delta age and average voxel values of the two regions and observed high genetic correlation values (Supplementary Table 4). Our analysis results clearly demonstrated the shared genetic basis of delta age and the two regions.

Additional validation on the delta age of 1610 healthy non-British white samples was conducted. In single-SNP GWAS, two SNPs in chromosome 4 with no specific gene region and 10 SNPs in chromosome 17 in genes such as KANSL1 and LRRC37A2 had p values less than 0.05. With a slightly more lenient p value cutoff of 0.1, one SNP in chromosome 4 and 40 SNPs in chromosome 17 centered on PLEKHM1, KANSL1, LINC02210-CRHR1 were additionally significant. Due to the small sample size (905 samples), none of the genes had p values < 0.1 in non-British white samples.

Causal biomarkers of delta age

Before the causal analysis, we carried out an association analysis between delta age and each of 310 (61 blood and 249 metabolomic biomarkers) phenotypes with linear regression models. Among the 310 phenotypes, HbA1C (p value = 7.31E−28) and Glucose (p value = 1.11E−26) were most significantly associated with the delta age (Supplementary Fig. 4; Supplementary Table 7) with a positive association direction after the Benjamini–Hochberg procedure (FDR = 0.05). This may indicate the association between diabetes and brain aging.

For causal analysis, 59 had p values less than 0.05 in causal estimates from at least one of the three linear MR methods (MR-Egger regression, inverse variance weighting, and weighted median). Table 3 lists the top five phenotypes by MR-Egger regression (Eosinophil count, Eosinophil percentage, Neutrophil count, Total protein, and White blood cell count), and Supplementary Table 5 shows the other 54 phenotypes. The Top five phenotypes were immune-related biomarkers and had positive relationship with the delta age. Among them Eosinophil count (p value = 5.16E−06) was statistically significant after the Benjamini–Hochberg procedure (FDR = 0.05). The reverse causality for the eosinophil count was not significant (p value = 0.68), so there was no simultaneity.

Table 3 Top 5 blood biomarkers with the smallest p value.

Full size table

Figure 5 is a PheWAS plot of the causal estimates from the MR-Egger regression. Overall, phenotypes related to white blood cells showed more significant causal relationships with delta age than other phenotypes. Similar results were replicated by the weighted median method (Supplementary Fig. 3; Supplementary Table 6).

Nonlinear MR analysis using piecewise MR and kernel IV showed similar results. Total cholines (p value = 0.03413), total lipids in small LDL (p value = 0.03599), and cholesteryl esters to total lipids in very large HDL percentage (p value = 0.01002) passed the test of the assumptions for instrument variable regression and returned p value less than 0.05 in the trend test, indicating that there was nonlinearity in the causal relationship between the biomarkers and delta age (the values in the parenthesis indicate the p values). Total choline (p value = 0.03413) showed a positive causal relationship with delta age when it is 2.5 mmol/l or higher. It increases delta age when it is above 2.5 mmol/l. The other two biomarkers showed an inverted U-shaped relationship (Supplementary Fig. 5). When applying FDR = 0.25, cholesteryl esters to total lipids in very large HDL percentage were the only variables that showed significant nonlinear causal relationship with delta age.

Discussion

In this paper, we have analyzed the biological risk factors of brain aging with multimodal data consisting of brain MRI, genome, blood, and metabolomics. The CNN model that predicts age from brain MRI had high accuracy with a mean absolute error of 2.64 years. Visual information of the regional importance in the brain was extracted from the neural network model. Genetic variants and biomarkers that have significant links to brain aging were identified using GWAS methods and Mendelian randomization.

Delta age is known to have associations with lower bone mineral density (BMD), higher blood pressure, poorer lung function, cognitive decline, diabetes, multiple sclerosis, and neurodegenerative disease through several studies^6,7,16,20. BMD, lung function, and cognitive function tend to decrease with aging, and cognitive function is known to be highly related to brain volume^21,22. In addition, diabetes and neurodegenerative disease are well-known aging-related diseases²³. We also found the association between delta age and various aging-related phenotypes^24,25, and showed significant relationships with lower bone mineral density, cognitive decline, and poorer lung function (Supplementary Table 8). For age-related neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, and Schizophrenia, due to the limitation of the low number of diseased individuals (n < 10), we investigated the association of epilepsy and Schizophrenia only. In epilepsy, there was a difference in delta age between case and control groups (p value = 3.0E−02). We also found an association between delta age and type 2 diabetes (p value = 3.32E−31), which has been confirmed in several previous studies. In genetic correlation analysis, no significant phenotype could be confirmed after Bonferroni correction (0.05/15), but overall results were consistent with the direction of aging (Supplementary Table 9). We also carried out a genetic correlation analysis between our brain MRI-based delta age and delta ages calculated from epigenetic age predictors (Supplementary Table 10)²⁶. In this case, however, there were no significant relationship between them.

Many models for predicting brain age have been built^3,4,5,6, but only some studies have used large-scale MRI data. In this study, we used N = 34,129 brain MRI to train the CNN model, which showed good performance in previous studies⁷. Next, we used integrated gradients to make accurate saliency maps by incorporating information from a wider range of pixel values not present in the original images. This information revealed important brain regions missed by other mapping methods. The saliency map in the previous studies highlighted the brain regions such as the hippocampus, brainstem, and amygdala^7,8,27. When highlighting the important points with higher integrated gradients in our study, they were centered on the fornix and the lower part of the thalamus. This indicates that the aging process affects the brain mainly through the atrophy in the inner area connected to memory and learning ability^28,29,30,31. In addition, genetic variants associated with volumes of these regions and delta age were highly similar, supporting the result.

Investigating gene-level rare variant associations in the WES data, we identified no significant genes associated with the delta age. However, the NDN gene, one of the top five genes with the smallest p value, is known to play an important role in neural differentiation and survival of postmitotic neurons^32,33. In addition, this gene is associated with the PWS (Prader–Willi syndrome), which is known to have a significantly higher delta age in the PWS³⁴.

In the single-variant test of array-genotyped and imputed variants, we replicated strong association signals in chromosome 17. The significant variants in other chromosomes were in STX6, MR1, KLF3-AS1, WNT16, INPP5A, and NKX6-2. STX6 and KLF3-AS1 relate to carcinogenesis^35,36. MR1 and WNT16 participate in immune response via antigen presentation to T cells and lymphocyte proliferation^37,38. Mutations in INPP5A and NKX6-2 were shown to cause neurologic problems^39,40.

Finally, we performed causality analysis on 61 blood and 249 metabolomic biomarkers. Several studies have been conducted on the analysis of the relationship between delta age and lifestyle, disease, blood biomarkers, etc.^8,16, but there are few cases where the causal relationship with delta age has been analyzed. Also, as far as we know, there is no case where the relationship between metabolomics and delta age has been analyzed. Our investigation of the causal effects of 310 blood and metabolomic biomarkers on delta age showed the potential causal roles of immune responses to delta age. Especially, biomarkers related to white blood cells had a significant causal effect. This claim is supported by existing literature^41,42.

This study, however, is subject to several limitations. First, the analysis was done in the European ancestry group only. The aging process and genetic variants associated with aging can differ according to ancestry. Second, replication of the results in an independent dataset was not conducted due to a lack of datasets with genetics data, extensive biomarkers, and brain MRI. Future research should focus on addressing the limits and making results more generalizable.

In conclusion, our multimodal data analysis shows many aspects of brain aging, including brain regions most affected by the aging, associated genes, and causal biomarkers. As more biobanks with multimodal data are collected, more diverse aspects of brain aging can be revealed. Biobanks of other ancestry groups can identify novel biomarkers associated with brain aging not identified in this study. In addition, potential mediating factors between immune responses and brain aging can be revealed. This would allow a deeper understanding of brain aging mechanisms that develop proper prevention and treatment.

Methods

Data preprocessing

T1-weighted structural MRI images of 34,129 white British samples in the UK Biobank were used for the analysis to minimize the effect of ancestry (average age of 60.964)⁴³. All of the images downloaded from the UK Biobank had been normalized into MNI152 space (Montreal Neurosciences Institute) to render the comparison of each voxel possible⁴⁴. The samples were selected if they had proper images and if they had no relation to any other individuals in the dataset. Each image was resized from 182 × 216 × 182 to 128 × 128 × 128 to reduce the computation cost. First, a part of z-axis voxels (from 26 to 153 out of 182 points) was selected to include various brain regions in the prediction task. The voxels in the upper outermost surface of the brain were excluded since they were considered negligible in the prediction and redundant due to the inclusion of other parts of the cerebral cortex. Second, each 2-dimensional 182 × 216 image (x, y-axis) in z-axis points was resized to a 128 × 128 image with the nearest neighbors scaling algorithm. The preprocessing of nifti format MRI images was performed with oro.nifti and OpenImageR packages in R^45,46.

For the genetics data, we downloaded bgen files of 93 million array-genotyped and imputed variants dataset. And we used plink files of 26 million whole-exome sequencing (WES) variants dataset from the UK Biobank. The 450 K WES data were used in our analysis, and the analysis was performed on the DNA nexus.

The 249 metabolomic phenotypes and 61 biomarkers from blood assays and blood count were used in the Mendelian randomization (MR) analysis. The metabolomic phenotypes and biomarkers were collected separately from MRI imaging, from 2006 to 2010; this would enable the investigation of the biomarkers’ longitudinal and cumulative effect on the brain. The missing values in the selected biomarkers were imputed with multiple imputation by chained equations (MICE) to fit the missing values to the overall multivariate distribution⁴⁷. After filling the missing values, 8464 individuals with the brain MRI had corresponding values of metabolomic phenotypes, and all 34,129 individuals had values of blood-related phenotypes. The 310 selected biomarkers were divided into 17 groups, including amino acids, cholesterol, and glycolysis-related metabolites.

3D CNN prediction model and integrated gradients

3D CNN model was used to predict the age of the individuals. Supplementary Fig. 6 is the overall network structure of the prediction model. The neural network model takes the resized images (128 × 128 × 128) as input and has less than three million parameters to train. The spatial dropout layers were added in the first two blocks to prevent the model from overfitting. The kernel size is 3 × 3 × 3 in convolution layers and 2 × 2 × 2 in pooling layers. The model conducts max-pooling until the size of an image in each feature becomes 2 × 2 × 2. The number of features increases as the input image size reduces. Adam optimizer with a learning rate 0.001 and He uniform initializer were used since the activation function is the rectified linear unit (ReLU).

Healthy white British individuals (25,656) were selected to train the prediction model. Individuals with diseases (all types of cancers, diabetes, neoplasm, dementia, and mental disorders) were excluded from the training process. The dataset with healthy individuals was divided into four sets (CV1, CV2, CV3, and CV4) for four-fold cross-validation so that every sample is included in the test set at least once and has a predicted age value. When CV1 is the test dataset, the other three sets become the training dataset. For each training set, three separate models (the same structure in Supplementary Fig. 6 with different initial weights and dropouts) were trained for more robust prediction. They constitute a single model set. After training the models, the test images were given to the models as input. The average of the predictions from the three models becomes the final predicted age of the test images, hence a total of 12 models to train (three models for each of the four cross-validation batches). Prediction of the age of individuals with diseases was made with the average of the predicted age from the four trained model sets. The delta age value of each sample was calculated by subtracting the individual’s chronological age from the predicted age. The age-related bias in the delta age value was adjusted through linear regression on the chronological age. The adjusted delta age values were used in the later association tests.

To identify which regions in the brain contributes significantly to age prediction, the Integrated Gradients (IG) method was used. The IG method is an explainable AI method for neural networks that uses multiple images between blank and original images¹³. In this study, the number of images generated for each sample was 101 in reference to the recommended step size in the original paper.

Genome-wide association test with single variants and gene regions

We used SAIGE for array-genotyped and imputed variants and SAIGE-GENE + for the WES variants. Variants with minor allele frequency less than 0.0001 were excluded in the SAIGE analysis. SAIGE uses a mixed effect model to account for the relatedness among the individuals. SAIGE-GENE + is a gene-based rare variant association test and it performs BURDEN, SKAT, and SKAT-O tests¹⁵. Since the number of tests decreases in the gene-based test, multiple testing correction is less stringent.

34,129 (= N) samples were used in SAIGE analysis. The delta age values of the individuals were inverse-normal transformed. Covariates were sex, age, ten principal component scores, and four dummy variables which indicate different cross-validation test sets plus samples with diseases. N $\times$ N genomic relation matrix (GRM) was calculated 784,256 markers in called autosomal genotypes. Leave-one-chromosome-out (LOCO) option was applied when estimating the GRM.

18,308 regions were identified with annotation on the WES genotype by the ANNOVAR software (version 2020Jun07)⁴⁸. The size of the samples n in the gene-based test was 30,812 (white British individuals with proper MRI images included in the UK Biobank 450 k whole-exome sequencing data). The SAIGE-GENE + analysis was done with 3 MAF cutoffs (MAF = 0.01, 0.001, 0.0001) and 3 functional annotation groups (LOF, LOF + Missense, LOF + Missense + Synonymous). In addition to the same covariates in the SAIGE test, the batch indicator variable was included to adjust for possible batch effect in the WES dataset.

Genetic correlation among delta age and the average volume of two brain regions (the fornix and the lower part of the thalamus) was derived using LD score regression with western European LD scores⁴⁹.

We also calculated the genetic correlation between delta age and age-related phenotypes and epigenetic delta ages using LD score regression. Summary statistics used for calculating genetic correlations were downloaded from (1) International Genomics of Alzheimer’s Project (IGAP) for Alzheimer’s disease, (2) Schizophrenia Working Group of the Psychiatric Genomics Consortium (PGC) for schizophrenia, (3) McCartney et al., for DNA methylation biomarkers, and (4) Pan-UK Biobank for the rest of the phenotypes^26,50,51.

Linear and nonlinear mendelian randomization

We used variations of Mendelian randomization methods to identify the causal effect of biomarkers (explanatory variable) on delta age (outcome). The overall workflow of the Mendelian randomization in this study is in Supplementary Fig. 7. Instrument genetic markers for each of the 310 biomarkers were selected as follows. First, we chose variants significantly associated with the explanatory variable (p value under 5e−8) among the called autosomal genetic markers. The markers with minor allele frequencies less than 0.01 were pruned because the estimation of the effect size from rare variants is unstable. The GWAS summary statistics for the metabolomics measured by the Nightingale Health are from open datasets in MRC Integrative Epidemiology Unit at the University of Bristol (IEU)⁵². The GWAS results of blood phenotypes are from Pan-UK Biobank GWAS summary statistics by the Broad Institute (available at https://pan.ukbb.broadinstitute.org). We used effect size, standard error, and p value from the samples with European ancestry. Second, the linkage disequilibrium (LD) pruning process was conducted with PLINK software (version 1.90) with a window size of 50 base pairs to ensure that the selected instruments were independent of each other⁵³. Pairs of variants with a correlation coefficient larger than 0.01 were LD pruned. Lastly, since the markers should not directly affect the outcome, we excluded variants that were also significant for outcome through Bonferroni correction (0.05/number of markers left). The effect size and standard error of the remaining markers were used in the MR analysis.

Linear and nonlinear explanatory variable-outcome relationship

We first calculated the causal impacts of the biomarkers (explanatory variable) on delta age (outcome) with three linear Mendelian randomization methods: MR-Egger regression, weighted median method (WM), and inverse variance weighting method (IVW)^54,55,56. The MendelianRandomization package in R was used to carry out the MR-Egger regression⁵⁷. The estimates and p values from the default version of MR-Egger regression were selected. The same was done for the estimates of the WM and IVW method. The estimates in all methods were assumed to follow the normal distribution. The Benjamini–Hochberg procedure was applied to control the false discovery rates at 0.05⁵⁸.

We also carried out an association analysis between delta age and blood-chemistry and metabolomics biomarkers. The following linear regression model of a single biomarker was used.

$$\mathrm{delta\; age }\sim \mathrm{ biomarker }+\mathrm{ sex }+\mathrm{ age }+\mathrm{ PC\; scores }+\mathrm{ cross\;validation\; batch}$$

Nonlinear MR was conducted with the nlmr package in R⁵⁹. The key rationale for using the nonlinear Mendelian randomization method is to find a nonlinear pattern of causal estimates from different ranges of explanatory variable, as the impact of explanatory variable on delta age can vary according to the ranges.

Instrument variable regression has some assumptions to be satisfied. First, the instruments should have significant correlation with the exposure. Second, the instrument should affect the outcome only through the exposure. These assumptions were tested by checking the p value of the Pearson’s correlation coefficient and conditional independence test. Among the instruments selected for each biomarker from the linear MR procedure, the ones with positive effect sizes were collected. We constructed each sample’s single allele score $G$ with those markers in order to increase power and avoid weak instrument bias⁶⁰. Each marker has genotype 0, 1, or 2 for each sample. When there are n samples and m genetic markers, the allele score of sample i is in (1).

$${G}_{i}={\sum }_{j=1}^{m}{\beta }_{j}{g}_{ij}$$

(1)

Here ${\beta }_{j}$ is the effect size of the jth marker and ${g}_{ij}$ is the genotype of sample i in the jth marker. The allele scores and the explanatory variable values were tested for a significant positive relationship (p value of the Pearson’s correlation coefficient lower than 0.05 using the cor.test function in R). The exclusion restriction assumption is not perfectly testable, so we assumed that there are only confounders affecting both the explanatory variable and the outcome⁶¹. In that case, adjusting for the explanatory variable would render the instruments and outcome statistically independent except for the association induced by “collider bias”⁶². The collider bias is unlikely to happen due to the fact that the delta age was calculated from images taken after the level of the biomarker had been measured. This makes the direction of the effect from the explanatory variable to the outcome, not the other way around. Conditional independence test then can catch the violation of the exclusion restriction. The allele score shows a genetically determined level of the biomarker, and the delta age was calculated from images taken after the level of the biomarker had been measured. In order to test for nonlinear conditional independence, Randomized Conditional Independence Test (RCIT) was used⁶³. We repeated the RCIT three times. The assumptions were considered to have been met if none of the three tests had a p value less than 0.05 with the null hypothesis of the conditional independence between Y and G given X. Only 12 out of 310 variables were found to satisfy all the assumptions for instrument variable regression.

Then, we conducted the piecewise MR analysis for the 12 variables that met the assumptions⁵⁹. The samples were divided into ten groups according to deciles by the IV-free explanatory variable. Assumptions of the IV-explanatory variable relationship in the piecewise MR are the homogeneity and the linearity across all samples. These assumptions were tested by the heterogeneity test using Q statistics. If the null hypothesis of homogeneity in the estimates between the groups was not rejected in the heterogeneity test, the trend test was conducted on the biomarker. The trend test evaluated whether the local average causal effect in each group is explained by the average value of the explanatory variable in the corresponding group. Kernel IV regression with radial basis function kernel was performed with the 12 passed phenotypes to check if the results were replicated. Due to the heavy computation cost, we sampled 2000 individuals for the Kernel IV regression. The test values were 1000 numbers with an equal distance between the minimum and the maximum of the explanatory variables.

Ethics statement and consent to participate

UK Biobank has approval from the North West Multi-centre Research Ethics Committee (REC reference: 11/NW/03820). This research was conducted according to the principles expressed in the Declaration of Helsinki. UK Biobank participants are volunteers who have provided written informed consent. Personally identifiable information was not used.

Data availability

The UK Biobank data are publicly available (https://www.ukbiobank.ac.uk/). Our study made use of imaging-derived phenotypes generated by an image-processing pipeline developed and run on behalf of UK Biobank⁶⁴. The summary statistics used in the Mendelian randomization analysis are available for public download at the IEU OpenGWAS Project (https://gwas.mrcieu.ac.uk) and the Pan-UK Biobank (https://pan.ukbb.broadinstitute.org). The summary statistics used in the genetic correlation analysis are available for public download at the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (https://www.niagads.org/datasets/ng00075) and the Psychiatric Genomics Consortium website (https://pgc.unc.edu/). The summary statistics for epigenetic biomarkers are publicly available (https://datashare.ed.ac.uk/handle/10283/3645).

Code availability

The code used in the analyses is available at our Github page. https://github.com/Flumenlucidum/Brain-Aging.

References

Fox, N. C. & Schott, J. M. Imaging cerebral atrophy: Normal ageing to Alzheimer’s disease. Lancet 363, 392–394 (2004).
Article Google Scholar
Nagano-Saito, A. et al. Cerebral atrophy and its relation to cognitive impairment in Parkinson disease. Neurology 64, 224–229 (2005).
Article CAS Google Scholar
Franke, K., Ziegler, G., Klöppel, S., Gaser, C. & Initiative, A. S. D. N. Estimating the age of healthy subjects from T1-weighted MRI scans using kernel methods: Exploring the influence of various parameters. Neuroimage 50, 883–892 (2010).
Article Google Scholar
Cole, J. H. et al. Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. Neuroimage 163, 115–124 (2017).
Article Google Scholar
Jónsson, B. A. et al. Brain age prediction using deep learning uncovers associated sequence variants. Nat. Commun. 10, 1–10 (2019).
Article Google Scholar
Xifra-Porxas, A., Ghosh, A., Mitsis, G. D. & Boudrias, M. H. Estimating brain age from structural MRI and MEG data: Insights from dimensionality reduction techniques. Neuroimage 231, 117822 (2021).
Article Google Scholar
Kolbeinsson, A. et al. Accelerated MRI-predicted brain ageing and its associations with cardiometabolic and brain disorders. Sci. Rep. 10, 1–9 (2020).
Article Google Scholar
Dinsdale, N. K. et al. Learning patterns of the ageing brain in MRI using deep convolutional networks. Neuroimage 224, 117401 (2021).
Article Google Scholar
Peng, H., Gong, W., Beckmann, C. F., Vedaldi, A. & Smith, S. M. Accurate brain age prediction with lightweight deep neural networks. Med. Image Anal. 68, 101871 (2021).
Article Google Scholar
Ning, K. et al. Improving brain age estimates with deep learning leads to identification of novel genetic factors associated with brain aging. Neurobiol. Aging 105, 199–204 (2021).
Article CAS Google Scholar
Le Goallec, A., Diai, S., Collin, S., Vincent, T. & Patel, C. J. Using deep learning to predict brain age from brain magnetic resonance images and cognitive tests reveals that anatomical and functional brain aging are phenotypically and genetically distinct. medRxiv 20, 20 (2021).
Google Scholar
Lam, P. K. et al. In 16th International Symposium on Medical Information Processing and Analysis. 11–20 (SPIE).
Sundararajan, M., Taly, A. & Yan, Q. In International Conference on Machine Learning. 3319–3328 (PMLR).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Article CAS Google Scholar
Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. D. & Lin, X. Sequence kernel association tests for the combined effect of rare and common variants. Am. J. Human Genet. 92, 841–853 (2013).
Article CAS Google Scholar
Smith, S. M., Vidaurre, D., Alfaro-Almagro, F., Nichols, T. E. & Miller, K. L. Estimation of brain age delta from brain imaging. Neuroimage 200, 528–539 (2019).
Article Google Scholar
Tzourio-Mazoyer, N. et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15, 273–289 (2002).
Article CAS Google Scholar
Catani, M. & De Schotten, M. T. A diffusion tensor imaging tractography atlas for virtual in vivo dissections. Cortex 44, 1105–1132 (2008).
Article Google Scholar
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: Study designs and statistical tests. Am. J. Human Genet. 95, 5–23 (2014).
Article CAS Google Scholar
Cole, J. et al. Brain age predicts mortality. Mol. Psychiatry 23, 1385–1392 (2018).
Article CAS Google Scholar
Royle, N. A. et al. Estimated maximal and current brain volume predict cognitive ability in old age. Neurobiol. Aging 34, 2726–2733 (2013).
Article Google Scholar
Lövdén, M. et al. Does variability in cognitive performance correlate with frontal brain volume?. Neuroimage 64, 209–215 (2013).
Article Google Scholar
Hou, Y. et al. Ageing as a risk factor for neurodegenerative disease. Nat. Rev. Neurol. 15, 565–581 (2019).
Article Google Scholar
Simm, A. et al. Potential biomarkers of ageing. Biol. Chem. 389, 257–265 (2008).
Article CAS Google Scholar
Bell, J. T. et al. Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population. PLoS Genet. 8, e1002629 (2012).
Article CAS Google Scholar
McCartney, D. L. et al. Genome-wide association studies identify 137 genetic loci for DNA methylation biomarkers of aging. Genome Biol. 22, 194 (2021).
Article CAS Google Scholar
Bintsi, K.-M., Baltatzis, V., Hammers, A. & Rueckert, D. Interpretability of Machine Intelligence in Medical Image Computing, and Topological Data Analysis and Its Applications for Medical Data 65–74 (Springer, 2021).
Google Scholar
Deeb, W. et al. Fornix-region deep brain stimulation-induced memory flashbacks in Alzheimer’s disease. N. Engl. J. Med. 381, 783–785 (2019).
Article Google Scholar
Foster, C. M., Kennedy, K. M., Hoagey, D. A. & Rodrigue, K. M. The role of hippocampal subfield volume and fornix microstructure in episodic memory across the lifespan. Hippocampus 29, 1206–1223 (2019).
Article CAS Google Scholar
Cherubini, A., Péran, P., Caltagirone, C., Sabatini, U. & Spalletta, G. Aging of subcortical nuclei: Microstructural, mineralization and atrophy modifications measured in vivo using MRI. Neuroimage 48, 29–36 (2009).
Article Google Scholar
Wolff, M. & Vann, S. D. The cognitive thalamus as a gateway to mental representations. J. Neurosci. 39, 3–14 (2019).
Article CAS Google Scholar
Yoshikawa, K. Cell cycle regulators in neural stem cells and postmitotic neurons. Neurosci. Res. 37, 1–14 (2000).
Article CAS Google Scholar
Kuwajima, T., Nishimura, I. & Yoshikawa, K. Necdin promotes GABAergic neuron differentiation in cooperation with Dlx homeodomain proteins. J. Neurosci. 26, 5383–5392 (2006).
Article CAS Google Scholar
Azor, A. M. et al. Increased brain age in adults with Prader-Willi syndrome. Neuroimage Clin. 21, 101664 (2019).
Article Google Scholar
Du, J., Liu, X., Wu, Y., Zhu, J. & Tang, Y. Essential role of STX6 in esophageal squamous cell carcinoma growth and migration. Biochem. Biophys. Res. Commun. 472, 60–67 (2016).
Article CAS Google Scholar
Liu, J.-Q. et al. lncRNA KLF3-AS1 suppresses cell migration and invasion in ESCC by impairing miR-185-5p-targeted KLF3 inhibition. Mol. Ther. Nucleic Acids 20, 231–241 (2020).
Article Google Scholar
De Libero, G., Chancellor, A. & Mori, L. Antigen specificities and functional properties of MR1-restricted T cells. Mol. Immunol. 130, 148–153 (2021).
Article Google Scholar
Mazieres, J. et al. Inhibition of Wnt16 in human acute lymphoblastoid leukemia cells containing the t (1; 19) translocation induces apoptosis. Oncogene 24, 5396–5400 (2005).
Article CAS Google Scholar
Liu, Q. et al. Cerebellum-enriched protein INPP5A contributes to selective neuropathology in mouse model of spinocerebellar ataxias type 17. Nat. Commun. 11, 1–13 (2020).
ADS CAS Google Scholar
Chelban, V. et al. Genetic and phenotypic characterization of NKX6-2-related spastic ataxia and hypomyelination. Eur. J. Neurol. 27, 334–342 (2020).
Article CAS Google Scholar
Ising, C. & Heneka, M. T. Functional and structural damage of neurons by innate immune mechanisms during neurodegeneration. Cell Death Dis. 9, 1–8 (2018).
Article CAS Google Scholar
Corlier, F. et al. Systemic inflammation as a predictor of brain aging: Contributions of physical activity, metabolic risk, and genetic risk. Neuroimage 172, 118–129 (2018).
Article Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article ADS CAS Google Scholar
Grabner, G. et al. Symmetric atlasing and model based segmentation: An application to the hippocampus in older adults. Med. Image Comput. Comput. Assist. Interv. 9, 58–66 (2006).
Google Scholar
Whitcher, B., Schmid, V. J. & Thorton, A. Working with the DICOM and NIfTI Data Standards in R. J. Stat. Softw. 44, 1–29 (2011).
Article Google Scholar
Mouselimis, L. OpenImageR: An Image Processing Toolkit. R package version 1 (2017).
White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 30, 377–399 (2011).
Article Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164–e164 (2010).
Article Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Article CAS Google Scholar
Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Abeta, tau, immunity and lipid processing. Nat. Genet. 51, 414–430 (2019).
Article CAS Google Scholar
Schizophrenia Working Group of the Psychiatric Genomics C. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article ADS Google Scholar
Elsworth, B. L. et al. The MRC IEU OpenGWAS data infrastructure. bioRxiv (2020).
Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Human Genet. 81, 559–575 (2007).
Article CAS Google Scholar
Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).
Article Google Scholar
Bowden, J., DaveySmith, G. & Burgess, S. Mendelian randomization with invalid instruments: Effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Article Google Scholar
Bowden, J., DaveySmith, G., Haycock, P. C. & Burgess, S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).
Article Google Scholar
Yavorska, O. O. & Burgess, S. MendelianRandomization: An R package for performing Mendelian randomization analyses using summarized data. Int. J. Epidemiol. 46, 1734–1739 (2017).
Article Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57, 289–300 (1995).
MATH Google Scholar
Staley, J. R. & Burgess, S. Semiparametric methods for estimation of a nonlinear exposure-outcome relationship using instrumental variables with application to Mendelian randomization. Genet. Epidemiol. 41, 341–352 (2017).
Article Google Scholar
Burgess, S. & Thompson, S. G. Use of allele scores as instrumental variables for Mendelian randomization. Int. J. Epidemiol. 42, 1134–1144 (2013).
Article Google Scholar
Yazdani, A. et al. From classical Mendelian randomization to causal networks for systematic integration of multi-omics. Front. Genet. 13, 990486 (2022).
Article CAS Google Scholar
Glymour, M. M., Tchetgen Tchetgen, E. J. & Robins, J. M. Credible Mendelian randomization studies: Approaches for evaluating the instrumental variable assumptions. Am. J. Epidemiol. 175, 332–339 (2012).
Article Google Scholar
Strobl, E. V., Zhang, K. & Visweswaran, S. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. J. Causal Inference 7, 25 (2019).
Article Google Scholar
Alfaro-Almagro, F. et al. Image processing and quality control for the first 10,000 brain imaging datasets from Uk biobank. Neuroimage 166, 400–424 (2018).
Article Google Scholar

Download references

Acknowledgements

This research was supported by Big Brain Project through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (No. 2021M3E5D2A0102249311), and the Brain Pool Plus (BP+, Brain Pool+) Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2020H1D3A2A03100666). UK Biobank data were accessed under the accession number UKB: 45227.

Author information

These authors contributed equally: Jangho Kim and Junhyeong Lee.

Authors and Affiliations

Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea
Jangho Kim, Junhyeong Lee, Kisung Nam & Seunggeun Lee

Authors

Jangho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Junhyeong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kisung Nam
View author publications
You can also search for this author in PubMed Google Scholar
Seunggeun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

These authors contributed equally: J.K. and J.L. J.K. and S.L. designed the experiments. J.K., J.L., and S.L. constructed and developed the age prediction model. J.K. and S.L. wrote the manuscript. K.N performed genetic correlation analysis. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Seunggeun Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, J., Lee, J., Nam, K. et al. Investigation of genetic variants and causal biomarkers associated with brain aging. Sci Rep 13, 1526 (2023). https://doi.org/10.1038/s41598-023-27903-x

Download citation

Received: 25 September 2022
Accepted: 10 January 2023
Published: 27 January 2023
DOI: https://doi.org/10.1038/s41598-023-27903-x
Springer Nature Limited

Investigation of genetic variants and causal biomarkers associated with brain aging

Abstract

Similar content being viewed by others

The genetic architecture of multimodal human brain age

Bayesian association scan reveals loci associated with human lifespan and linked biomarkers

Genetic architecture of brain age and its causal relations with brain and mental disorders

Introduction