Introduction

Aging is a primary risk factor for a myriad of health problems. Since aging proceeds at different rates for each individual, various methods to measure the biological age have been developed for a more accurate diagnosis of health status. Among the concerns related to aging, cerebral atrophy, which leads to cognitive decline, is a substantial risk to the individual well-being, constituting a major public health burden. Brain volume loss is also associated with neurodegenerative diseases such as Alzheimer’s disease and Parkinson’s disease1,2.

The aging process in different brain regions can be detected through structural and functional Magnetic Resonance Imaging (MRI). As large-scale datasets such as UK Biobank that contain neuroimaging data are becoming available, there have been efforts to accurately predict an individual’s chronological age with the neuroimaging datasets. Franke et al.3 used principal component analysis and relevance vector machine to predict age. Studies since then primarily used neural network models for prediction and data-driven feature extraction4,5,6,7,8,9,10,11,12. The convolutional neural network (CNN) models have been used with a high level of accuracy. The mean absolute error of the prediction in most literature with CNN models is between 2.14 and 3.4 years.

The difference between the predicted age and the actual chronological age, called delta age, has been used as an aging biomarker3,4. After estimating the delta age, phenome-wide and genome-wide association tests have been conducted to identify significantly associated genetic and clinical factors. Recent studies have shown that bone mineral density, blood pressure, and type 2 diabetes are associated with delta age7,8. Genome-wide association analyses identified that KANSL1, MAPT-AS1, CRHR1, NSF in chromosome 17, KLF3 (chromosome 4), RUNX2 (chromosome 6), and NKX6-2 gene (chromosome 10) were significantly associated5,10,11. When combined with the cognitive test results, SNPs in MED8, COLEC10, and PLIN4 genes were also significantly associated11.

To extend our understanding of the genetic and molecular basis of brain aging, we analyzed multimodal UK Biobank data. Compared to the previous studies, our analysis includes whole-exome sequencing (WES) and metabolomics data, which enabled us to identify novel genetic and biomarker associations. In addition, we carried out large-scale Mendelian randomization studies of 310 blood and metabolomic phenotypes to identify causal biomarkers and used an explainable AI method for medical images to identify brain regions that drive high delta age.

Results

Overview of the analysis

Figure 1 provides an overview of our analysis. First, a 3D CNN model was trained for age prediction with the T1-weighted structural brain MRI of healthy white British samples in the UK Biobank, excluding individuals with diseases related to cancer, diabetes, dementia, and mental disorders. The training was conducted via cross-validation to use all available samples in the downstream analysis (see “Methods”). The Integrated Gradients (IG) method was then used to identify an accurate attribution of each voxel (volume + pixel) to the prediction13. Second, genome-wide association tests were conducted on different test levels to uncover novel loci associated with brain aging. We applied a single-variant test, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), to the array-genotyped and imputed markers, and a gene-based test, SAIGE-GENE + , to the WES datasets14,15. Lastly, we examined linear and nonlinear causal relationships between the delta age and the phenotypes (metabolomics and blood) with Mendelian randomization methods. The number of samples used in each step is in Table 1.

Figure 1
figure 1

Overview of the analysis.

Table 1 The number of samples in each stage of the analysis.

Age prediction accuracy and saliency map

Figure 2 and Supplementary Table 1 show the prediction results of the 3D CNN model. We used the cross-validation scheme to use all available samples10. Supplementary Table 1 shows the mean absolute error of the samples with diseases and the samples without diseases and four-fold cross-validation groups (see “Methods”). The mean absolute error in the healthy individuals was about 2.6406 years in the test set and 0.8989 years in the training set. The existing studies on brain age estimation had similar accuracy to this result7,8,9,10. The mean absolute error in the samples with diseases was 2.651 years. Figure 2a shows the strong positive correlation between the chronological age (x-axis) and the predicted age (y-axis). The age-related bias was corrected by linear regression on the chronological age (Fig. 2b)16. Figure 2c is a scatter plot between the chronological age and the adjusted delta age.

Figure 2
figure 2

Scatterplots of chronological age (x-axis) vs. (a) predicted age, (b) delta age, and (c) bias-adjusted delta age (y-axis).

Figure 3 shows the saliency maps of the age prediction model with integrated gradients. Absolute values of the integrated gradients were averaged for 100 samples with the youngest predicted age. Figure 3a shows the voxels with averaged integrated gradients greater than five. According to the Automated Anatomical Labeling atlas (AAL) and the Natbrainlab atlas, when highlighting the regions with integrated gradients greater than ten, the regions were the fornix and the lower part of the thalamus (Fig. 3b)17,18. Similar results were shown when 100 samples were chosen randomly or by descending order of the delta age values (data not shown).

Figure 3
figure 3

Integrated gradient (IG) analysis results. (a) Points with absolute IG > 5 in the four CV model sets from 100 individuals with the youngest predicted age. (b) A brain image at the x-coordinate of 62. Voxels with absolute IG > 10 are highlighted with colors.

Genetic variants associated with delta age

We performed genetic association analyses for the predicted delta age. We carried out single-variant tests for 38 million array-genotyped and imputed genetic variants using SAIGE. Since the single-variant test for rare variants in WES has low power, we used a gene-based test method for the WES dataset19. We used SAIGE-GENE + to test for rare variant associations (18,308 genes).

Figure 4 is the Manhattan plot of single-variant analysis results from array-genotyped and imputed data. Table 2 lists significant variants (p value under 5e−08). The single-variant test results showed that five loci in chromosomes 1, 4, 6, 10, 11, and 17 were significantly associated with the delta age. The nearest genes were STX6, MR1, KLF3-AS1, WNT16, INPP5A, NKX6-2, and several genes in chromosome 17, including KANSL1, MAPT-AS1, and NSF. Genetic heritability calculated by the variance components in the SAIGE model was 21.5%.

Figure 4
figure 4

Manhattan plot of Single-variant tests with the array-genotyped and imputed genetics data on the delta age.

Table 2 Significant loci associated with delta age (p value  < 5E-08).

The gene-based rare variant test on the WES dataset identified no significant genes. The p value threshold was the Bonferroni corrected level of 0.05 (0.05/18,308). SEC62, PPM1F, ABCC2, ADAM15, and NDN were the top five genes with the smallest p value (Supplementary Table 3).

To check whether the delta age prediction was truly driven by voxel values in the fornix and the lower part of the thalamus, we carried out the same GWAS procedure with the average voxel value of the two regions. The SAIGE results showed that the significant loci (p value under 5e−08) associated with the two regions were also concentrated on chromosome 17 (Supplementary Table 2; Supplementary Fig. 1a,b). SLC39A8 and C16orf95 genes were commonly shown to be associated with the two regions. We also calculated the genetic correlation among delta age and average voxel values of the two regions and observed high genetic correlation values (Supplementary Table 4). Our analysis results clearly demonstrated the shared genetic basis of delta age and the two regions.

Additional validation on the delta age of 1610 healthy non-British white samples was conducted. In single-SNP GWAS, two SNPs in chromosome 4 with no specific gene region and 10 SNPs in chromosome 17 in genes such as KANSL1 and LRRC37A2 had p values less than 0.05. With a slightly more lenient p value cutoff of 0.1, one SNP in chromosome 4 and 40 SNPs in chromosome 17 centered on PLEKHM1, KANSL1, LINC02210-CRHR1 were additionally significant. Due to the small sample size (905 samples), none of the genes had p values < 0.1 in non-British white samples.

Causal biomarkers of delta age

Before the causal analysis, we carried out an association analysis between delta age and each of 310 (61 blood and 249 metabolomic biomarkers) phenotypes with linear regression models. Among the 310 phenotypes, HbA1C (p value = 7.31E−28) and Glucose (p value = 1.11E−26) were most significantly associated with the delta age (Supplementary Fig. 4; Supplementary Table 7) with a positive association direction after the Benjamini–Hochberg procedure (FDR = 0.05). This may indicate the association between diabetes and brain aging.

For causal analysis, 59 had p values less than 0.05 in causal estimates from at least one of the three linear MR methods (MR-Egger regression, inverse variance weighting, and weighted median). Table 3 lists the top five phenotypes by MR-Egger regression (Eosinophil count, Eosinophil percentage, Neutrophil count, Total protein, and White blood cell count), and Supplementary Table 5 shows the other 54 phenotypes. The Top five phenotypes were immune-related biomarkers and had positive relationship with the delta age. Among them Eosinophil count (p value = 5.16E−06) was statistically significant after the Benjamini–Hochberg procedure (FDR = 0.05). The reverse causality for the eosinophil count was not significant (p value = 0.68), so there was no simultaneity.

Table 3 Top 5 blood biomarkers with the smallest p value.

Figure 5 is a PheWAS plot of the causal estimates from the MR-Egger regression. Overall, phenotypes related to white blood cells showed more significant causal relationships with delta age than other phenotypes. Similar results were replicated by the weighted median method (Supplementary Fig. 3; Supplementary Table 6).

Figure 5
figure 5

A PheWAS plot of linear MR causal estimates from MR-Egger regression. The horizontal red line is the p value threshold of 0.05. Each color indicates a different group of biomarkers. The two most significant in each group with p values < 0.05 were labeled. The shape of the points indicates the direction of the causal effect. Triangles are the biomarkers that contribute to higher delta age, and circles are the biomarkers that contribute to lower delta age.

Nonlinear MR analysis using piecewise MR and kernel IV showed similar results. Total cholines (p value  = 0.03413), total lipids in small LDL (p value  = 0.03599), and cholesteryl esters to total lipids in very large HDL percentage (p value  = 0.01002) passed the test of the assumptions for instrument variable regression and returned p value less than 0.05 in the trend test, indicating that there was nonlinearity in the causal relationship between the biomarkers and delta age (the values in the parenthesis indicate the p values). Total choline (p value = 0.03413) showed a positive causal relationship with delta age when it is 2.5 mmol/l or higher. It increases delta age when it is above 2.5 mmol/l. The other two biomarkers showed an inverted U-shaped relationship (Supplementary Fig. 5). When applying FDR = 0.25, cholesteryl esters to total lipids in very large HDL percentage were the only variables that showed significant nonlinear causal relationship with delta age.

Discussion

In this paper, we have analyzed the biological risk factors of brain aging with multimodal data consisting of brain MRI, genome, blood, and metabolomics. The CNN model that predicts age from brain MRI had high accuracy with a mean absolute error of 2.64 years. Visual information of the regional importance in the brain was extracted from the neural network model. Genetic variants and biomarkers that have significant links to brain aging were identified using GWAS methods and Mendelian randomization.

Delta age is known to have associations with lower bone mineral density (BMD), higher blood pressure, poorer lung function, cognitive decline, diabetes, multiple sclerosis, and neurodegenerative disease through several studies6,7,16,20. BMD, lung function, and cognitive function tend to decrease with aging, and cognitive function is known to be highly related to brain volume21,22. In addition, diabetes and neurodegenerative disease are well-known aging-related diseases23. We also found the association between delta age and various aging-related phenotypes24,25, and showed significant relationships with lower bone mineral density, cognitive decline, and poorer lung function (Supplementary Table 8). For age-related neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, and Schizophrenia, due to the limitation of the low number of diseased individuals (n < 10), we investigated the association of epilepsy and Schizophrenia only. In epilepsy, there was a difference in delta age between case and control groups (p value = 3.0E−02). We also found an association between delta age and type 2 diabetes (p value = 3.32E−31), which has been confirmed in several previous studies. In genetic correlation analysis, no significant phenotype could be confirmed after Bonferroni correction (0.05/15), but overall results were consistent with the direction of aging (Supplementary Table 9). We also carried out a genetic correlation analysis between our brain MRI-based delta age and delta ages calculated from epigenetic age predictors (Supplementary Table 10)26. In this case, however, there were no significant relationship between them.

Many models for predicting brain age have been built3,4,5,6, but only some studies have used large-scale MRI data. In this study, we used N = 34,129 brain MRI to train the CNN model, which showed good performance in previous studies7. Next, we used integrated gradients to make accurate saliency maps by incorporating information from a wider range of pixel values not present in the original images. This information revealed important brain regions missed by other mapping methods. The saliency map in the previous studies highlighted the brain regions such as the hippocampus, brainstem, and amygdala7,8,27. When highlighting the important points with higher integrated gradients in our study, they were centered on the fornix and the lower part of the thalamus. This indicates that the aging process affects the brain mainly through the atrophy in the inner area connected to memory and learning ability28,29,30,31. In addition, genetic variants associated with volumes of these regions and delta age were highly similar, supporting the result.

Investigating gene-level rare variant associations in the WES data, we identified no significant genes associated with the delta age. However, the NDN gene, one of the top five genes with the smallest p value, is known to play an important role in neural differentiation and survival of postmitotic neurons32,33. In addition, this gene is associated with the PWS (Prader–Willi syndrome), which is known to have a significantly higher delta age in the PWS34.

In the single-variant test of array-genotyped and imputed variants, we replicated strong association signals in chromosome 17. The significant variants in other chromosomes were in STX6, MR1, KLF3-AS1, WNT16, INPP5A, and NKX6-2. STX6 and KLF3-AS1 relate to carcinogenesis35,36. MR1 and WNT16 participate in immune response via antigen presentation to T cells and lymphocyte proliferation37,38. Mutations in INPP5A and NKX6-2 were shown to cause neurologic problems39,40.

Finally, we performed causality analysis on 61 blood and 249 metabolomic biomarkers. Several studies have been conducted on the analysis of the relationship between delta age and lifestyle, disease, blood biomarkers, etc.8,16, but there are few cases where the causal relationship with delta age has been analyzed. Also, as far as we know, there is no case where the relationship between metabolomics and delta age has been analyzed. Our investigation of the causal effects of 310 blood and metabolomic biomarkers on delta age showed the potential causal roles of immune responses to delta age. Especially, biomarkers related to white blood cells had a significant causal effect. This claim is supported by existing literature41,42.

This study, however, is subject to several limitations. First, the analysis was done in the European ancestry group only. The aging process and genetic variants associated with aging can differ according to ancestry. Second, replication of the results in an independent dataset was not conducted due to a lack of datasets with genetics data, extensive biomarkers, and brain MRI. Future research should focus on addressing the limits and making results more generalizable.

In conclusion, our multimodal data analysis shows many aspects of brain aging, including brain regions most affected by the aging, associated genes, and causal biomarkers. As more biobanks with multimodal data are collected, more diverse aspects of brain aging can be revealed. Biobanks of other ancestry groups can identify novel biomarkers associated with brain aging not identified in this study. In addition, potential mediating factors between immune responses and brain aging can be revealed. This would allow a deeper understanding of brain aging mechanisms that develop proper prevention and treatment.

Methods

Data preprocessing

T1-weighted structural MRI images of 34,129 white British samples in the UK Biobank were used for the analysis to minimize the effect of ancestry (average age of 60.964)43. All of the images downloaded from the UK Biobank had been normalized into MNI152 space (Montreal Neurosciences Institute) to render the comparison of each voxel possible44. The samples were selected if they had proper images and if they had no relation to any other individuals in the dataset. Each image was resized from 182 × 216 × 182 to 128 × 128 × 128 to reduce the computation cost. First, a part of z-axis voxels (from 26 to 153 out of 182 points) was selected to include various brain regions in the prediction task. The voxels in the upper outermost surface of the brain were excluded since they were considered negligible in the prediction and redundant due to the inclusion of other parts of the cerebral cortex. Second, each 2-dimensional 182 × 216 image (x, y-axis) in z-axis points was resized to a 128 × 128 image with the nearest neighbors scaling algorithm. The preprocessing of nifti format MRI images was performed with oro.nifti and OpenImageR packages in R45,46.

For the genetics data, we downloaded bgen files of 93 million array-genotyped and imputed variants dataset. And we used plink files of 26 million whole-exome sequencing (WES) variants dataset from the UK Biobank. The 450 K WES data were used in our analysis, and the analysis was performed on the DNA nexus.

The 249 metabolomic phenotypes and 61 biomarkers from blood assays and blood count were used in the Mendelian randomization (MR) analysis. The metabolomic phenotypes and biomarkers were collected separately from MRI imaging, from 2006 to 2010; this would enable the investigation of the biomarkers’ longitudinal and cumulative effect on the brain. The missing values in the selected biomarkers were imputed with multiple imputation by chained equations (MICE) to fit the missing values to the overall multivariate distribution47. After filling the missing values, 8464 individuals with the brain MRI had corresponding values of metabolomic phenotypes, and all 34,129 individuals had values of blood-related phenotypes. The 310 selected biomarkers were divided into 17 groups, including amino acids, cholesterol, and glycolysis-related metabolites.

3D CNN prediction model and integrated gradients

3D CNN model was used to predict the age of the individuals. Supplementary Fig. 6 is the overall network structure of the prediction model. The neural network model takes the resized images (128 × 128 × 128) as input and has less than three million parameters to train. The spatial dropout layers were added in the first two blocks to prevent the model from overfitting. The kernel size is 3 × 3 × 3 in convolution layers and 2 × 2 × 2 in pooling layers. The model conducts max-pooling until the size of an image in each feature becomes 2 × 2 × 2. The number of features increases as the input image size reduces. Adam optimizer with a learning rate 0.001 and He uniform initializer were used since the activation function is the rectified linear unit (ReLU).

Healthy white British individuals (25,656) were selected to train the prediction model. Individuals with diseases (all types of cancers, diabetes, neoplasm, dementia, and mental disorders) were excluded from the training process. The dataset with healthy individuals was divided into four sets (CV1, CV2, CV3, and CV4) for four-fold cross-validation so that every sample is included in the test set at least once and has a predicted age value. When CV1 is the test dataset, the other three sets become the training dataset. For each training set, three separate models (the same structure in Supplementary Fig. 6 with different initial weights and dropouts) were trained for more robust prediction. They constitute a single model set. After training the models, the test images were given to the models as input. The average of the predictions from the three models becomes the final predicted age of the test images, hence a total of 12 models to train (three models for each of the four cross-validation batches). Prediction of the age of individuals with diseases was made with the average of the predicted age from the four trained model sets. The delta age value of each sample was calculated by subtracting the individual’s chronological age from the predicted age. The age-related bias in the delta age value was adjusted through linear regression on the chronological age. The adjusted delta age values were used in the later association tests.

To identify which regions in the brain contributes significantly to age prediction, the Integrated Gradients (IG) method was used. The IG method is an explainable AI method for neural networks that uses multiple images between blank and original images13. In this study, the number of images generated for each sample was 101 in reference to the recommended step size in the original paper.

Genome-wide association test with single variants and gene regions

We used SAIGE for array-genotyped and imputed variants and SAIGE-GENE + for the WES variants. Variants with minor allele frequency less than 0.0001 were excluded in the SAIGE analysis. SAIGE uses a mixed effect model to account for the relatedness among the individuals. SAIGE-GENE + is a gene-based rare variant association test and it performs BURDEN, SKAT, and SKAT-O tests15. Since the number of tests decreases in the gene-based test, multiple testing correction is less stringent.

34,129 (= N) samples were used in SAIGE analysis. The delta age values of the individuals were inverse-normal transformed. Covariates were sex, age, ten principal component scores, and four dummy variables which indicate different cross-validation test sets plus samples with diseases. N \(\times\) N genomic relation matrix (GRM) was calculated 784,256 markers in called autosomal genotypes. Leave-one-chromosome-out (LOCO) option was applied when estimating the GRM.

18,308 regions were identified with annotation on the WES genotype by the ANNOVAR software (version 2020Jun07)48. The size of the samples n in the gene-based test was 30,812 (white British individuals with proper MRI images included in the UK Biobank 450 k whole-exome sequencing data). The SAIGE-GENE + analysis was done with 3 MAF cutoffs (MAF = 0.01, 0.001, 0.0001) and 3 functional annotation groups (LOF, LOF + Missense, LOF + Missense + Synonymous). In addition to the same covariates in the SAIGE test, the batch indicator variable was included to adjust for possible batch effect in the WES dataset.

Genetic correlation among delta age and the average volume of two brain regions (the fornix and the lower part of the thalamus) was derived using LD score regression with western European LD scores49.

We also calculated the genetic correlation between delta age and age-related phenotypes and epigenetic delta ages using LD score regression. Summary statistics used for calculating genetic correlations were downloaded from (1) International Genomics of Alzheimer’s Project (IGAP) for Alzheimer’s disease, (2) Schizophrenia Working Group of the Psychiatric Genomics Consortium (PGC) for schizophrenia, (3) McCartney et al., for DNA methylation biomarkers, and (4) Pan-UK Biobank for the rest of the phenotypes26,50,51.

Linear and nonlinear mendelian randomization

We used variations of Mendelian randomization methods to identify the causal effect of biomarkers (explanatory variable) on delta age (outcome). The overall workflow of the Mendelian randomization in this study is in Supplementary Fig. 7. Instrument genetic markers for each of the 310 biomarkers were selected as follows. First, we chose variants significantly associated with the explanatory variable (p value under 5e−8) among the called autosomal genetic markers. The markers with minor allele frequencies less than 0.01 were pruned because the estimation of the effect size from rare variants is unstable. The GWAS summary statistics for the metabolomics measured by the Nightingale Health are from open datasets in MRC Integrative Epidemiology Unit at the University of Bristol (IEU)52. The GWAS results of blood phenotypes are from Pan-UK Biobank GWAS summary statistics by the Broad Institute (available at https://pan.ukbb.broadinstitute.org). We used effect size, standard error, and p value from the samples with European ancestry. Second, the linkage disequilibrium (LD) pruning process was conducted with PLINK software (version 1.90) with a window size of 50 base pairs to ensure that the selected instruments were independent of each other53. Pairs of variants with a correlation coefficient larger than 0.01 were LD pruned. Lastly, since the markers should not directly affect the outcome, we excluded variants that were also significant for outcome through Bonferroni correction (0.05/number of markers left). The effect size and standard error of the remaining markers were used in the MR analysis.

Linear and nonlinear explanatory variable-outcome relationship

We first calculated the causal impacts of the biomarkers (explanatory variable) on delta age (outcome) with three linear Mendelian randomization methods: MR-Egger regression, weighted median method (WM), and inverse variance weighting method (IVW)54,55,56. The MendelianRandomization package in R was used to carry out the MR-Egger regression57. The estimates and p values from the default version of MR-Egger regression were selected. The same was done for the estimates of the WM and IVW method. The estimates in all methods were assumed to follow the normal distribution. The Benjamini–Hochberg procedure was applied to control the false discovery rates at 0.0558.

We also carried out an association analysis between delta age and blood-chemistry and metabolomics biomarkers. The following linear regression model of a single biomarker was used.

$$\mathrm{delta\; age }\sim \mathrm{ biomarker }+\mathrm{ sex }+\mathrm{ age }+\mathrm{ PC\; scores }+\mathrm{ cross\;validation\; batch}$$

Nonlinear MR was conducted with the nlmr package in R59. The key rationale for using the nonlinear Mendelian randomization method is to find a nonlinear pattern of causal estimates from different ranges of explanatory variable, as the impact of explanatory variable on delta age can vary according to the ranges.

Instrument variable regression has some assumptions to be satisfied. First, the instruments should have significant correlation with the exposure. Second, the instrument should affect the outcome only through the exposure. These assumptions were tested by checking the p value of the Pearson’s correlation coefficient and conditional independence test. Among the instruments selected for each biomarker from the linear MR procedure, the ones with positive effect sizes were collected. We constructed each sample’s single allele score \(G\) with those markers in order to increase power and avoid weak instrument bias60. Each marker has genotype 0, 1, or 2 for each sample. When there are n samples and m genetic markers, the allele score of sample i is in (1).

$${G}_{i}={\sum }_{j=1}^{m}{\beta }_{j}{g}_{ij}$$
(1)

Here \({\beta }_{j}\) is the effect size of the jth marker and \({g}_{ij}\) is the genotype of sample i in the jth marker. The allele scores and the explanatory variable values were tested for a significant positive relationship (p value of the Pearson’s correlation coefficient lower than 0.05 using the cor.test function in R). The exclusion restriction assumption is not perfectly testable, so we assumed that there are only confounders affecting both the explanatory variable and the outcome61. In that case, adjusting for the explanatory variable would render the instruments and outcome statistically independent except for the association induced by “collider bias”62. The collider bias is unlikely to happen due to the fact that the delta age was calculated from images taken after the level of the biomarker had been measured. This makes the direction of the effect from the explanatory variable to the outcome, not the other way around. Conditional independence test then can catch the violation of the exclusion restriction. The allele score shows a genetically determined level of the biomarker, and the delta age was calculated from images taken after the level of the biomarker had been measured. In order to test for nonlinear conditional independence, Randomized Conditional Independence Test (RCIT) was used63. We repeated the RCIT three times. The assumptions were considered to have been met if none of the three tests had a p value less than 0.05 with the null hypothesis of the conditional independence between Y and G given X. Only 12 out of 310 variables were found to satisfy all the assumptions for instrument variable regression.

Then, we conducted the piecewise MR analysis for the 12 variables that met the assumptions59. The samples were divided into ten groups according to deciles by the IV-free explanatory variable. Assumptions of the IV-explanatory variable relationship in the piecewise MR are the homogeneity and the linearity across all samples. These assumptions were tested by the heterogeneity test using Q statistics. If the null hypothesis of homogeneity in the estimates between the groups was not rejected in the heterogeneity test, the trend test was conducted on the biomarker. The trend test evaluated whether the local average causal effect in each group is explained by the average value of the explanatory variable in the corresponding group. Kernel IV regression with radial basis function kernel was performed with the 12 passed phenotypes to check if the results were replicated. Due to the heavy computation cost, we sampled 2000 individuals for the Kernel IV regression. The test values were 1000 numbers with an equal distance between the minimum and the maximum of the explanatory variables.

Ethics statement and consent to participate

UK Biobank has approval from the North West Multi-centre Research Ethics Committee (REC reference: 11/NW/03820). This research was conducted according to the principles expressed in the Declaration of Helsinki. UK Biobank participants are volunteers who have provided written informed consent. Personally identifiable information was not used.