Background

Alzheimer’s disease (AD), as the most common form of dementia, is an irreversible neurodegenerative disease. Epidemiological investigations have reported that about 55 million people worldwide live with AD and other types of dementia today [1]. The number is expected to reach 78 million by 2030 (World Alzheimer Report 2021, www.alz.co.uk). The primary clinical manifestations of AD include progressive impairments in memory and other cognitive functions, accompanied by several pathophysiological changes, such as amyloid deposition and neurofibrillary tangle formation. However, the aetiology and pathogenesis leading to heterogeneity in these manifestations among AD patients remain unclear. In addition, no effective therapeutic strategies are available for AD [2]. High-throughput imaging and genomics studies can provide valid information on AD pathology, and gain insights into the early detection and treatment of AD patients, and thus have attracted much attention recently.

Genomic studies have been developed over three decades [3,4,5]. In 1984, Glenner et al. [6] first isolated amyloid-β (Aβ) peptide from plaques in AD patients, and this peptide was shown to be generated from the amyloid precursor protein (APP) through its sequential cleavage by two enzymes: β-secretase and γ-secretase [3]. This finding was later confirmed by genetic mutations in APP in 1991 [7] and presenilins (PSEN1 and PSEN2) in 1995 [8, 9]. The above genomic studies support an evident molecular mechanism underlying AD, resulting in the amyloid hypothesis. Additionally, the apolipoprotein E (APOE) ɛ4 allele has been reported to be associated with AD risk [10]. APOE can bind to Aβ, which influences the clearance of soluble Aβ and Aβ aggregation [11, 12], and regulates Aβ metabolism [13]. Notably, APOE ɛ4 binds more rapidly than APOE ɛ3, resulting in accelerated formation of fibrils [14]. Furthermore, with the development of high-throughput sequencing technology, genome-wide association studies (GWAS) have identified thousands of risk variants related to complex diseases and traits, including AD [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]. These studies have improved the understanding of genetic complexity and provided insights into the molecular pathways of AD pathogenesis. However, significant results are not only dependent on sufficiently large sample sizes but also require further analysis of gene-to-disease specificity.

Alternatively, neuroimaging technologies [35, 36] such as structural magnetic resonance imaging (sMRI), functional MRI (fMRI), diffusion tensor imaging (DTI), and positron emission tomography (PET), enable noninvasive detection of brain degeneration from the perspective of brain structure and function. SMRI can provide accurate in vivo quantification of specific regions with cortical and subcortical grey matter (GM) atrophy and white matter (WM) lesions associated with AD pathology, even at the mild cognitive impairment (MCI) stage [37, 38]. DTI is another MRI technique that is sensitive to translational motion of water molecules throughout the brain, providing quantification of WM tissue microstructure and visualization of WM tract abnormalities in AD patients. FMRI can measure brain activity by detecting associated changes in blood flow when no task is being performed, and task fMRI focuses on activity activation. Moreover, PET scans can demonstrate characteristic patterns of amyloid load, tau burden and glucose metabolism in AD patients by using specific molecular imaging tracers. The advanced imaging technologies have played important roles in quantitative assessment of biomarkers and understanding processes underlying AD. The National Institute on Aging−Alzheimer’s Association (NIA−AA) outlined in 2018 an unbiased descriptive AD biomarker classification scheme, called the ATN (amyloid, tau, neurodegeneration) diagnosis framework [39]. However, due to the complex heterogeneity of AD, the interactions among accessible, objective imaging markers and the complete pathological loop that is formed remain unknown. The emerging field of imaging biomarker genomics that combines multimodal imaging and high-throughput sequencing technologies, is committed to analysing associations between imaging phenotypes and genomics data and using imaging phenotypes as intermediate phenotypes between genetic variants and clinical diagnosis to investigate the pathogenesis of AD. Hence, the imaging biomarker genomics approach can overcome the shortcomings of separate genomics or imaging analysis, in that it can confirm gene-to-disease specificity, promote the biological interpretability of pathological biomarkers, and contribute to the diagnosis, treatment and prevention of AD with multiscale imaging and genetic features.

When combined with clinical information, the imaging biomarker genomics approach may even facilitate precision medicine (Fig. 1). In this review, we provide a comprehensive summary of the brain imaging biomarker genomics approach, including (1) the basic analytical framework of brain imaging biomarker genomics studies and (2) implementation of this approach in AD based on the ATN framework, for exploring and validating AD biomarkers/variants and performing AD diagnosis and prognosis analysis. In particular, we introduce some key considerations relevant to studies using the brain imaging biomarker genomics approach and provide perspectives on the integration of neuroimaging and multiomics data and further methodological possibilities.

Fig. 1
figure 1

Landscape of advances of the AD imaging biomarker genomics field. This field covers genomics, imaging, and clinical information, ultimately pointing towards integrated diagnosis and precision medicine. CSF cerebrospinal fluid, CT computed tomography, MMSE mini-mental state examination, MoCA montreal cognitive assessment, AVLT auditory-verbal learning test, AFT animal fluency test, BNT boston naming test, MES memory and executive screening scale

In particular, this study focuses on neuroimaging markers based on the ATN framework. Other biomarkers, such as various cerebrospinal fluid (CSF) biomarkers, electroencephalography (EEG) or magnetoencephalography (MEG) markers, are excluded. In addition, other risk factors for AD (e.g. sex, education, cognitive tests, etc.) will not be discussed in this paper.

Methods

Literature was searched in Google Scholar and PubMed databases. Only human studies in English language, published from January 1991 (the publication year of earliest gene cloning of APP mutations) to December 2021 were reviewed. A total of 1095 records were yielded, of which 910 records were left after duplicate removal. A thorough description of the search strategy is provided in Additional file 1.

The inclusion criteria were as follows: (1) studies that identified AD candidate variants in large GWAS and meta-analyses, or described imaging biomarker genomics associations based on the ATN framework, such as genome-wide associations, polygenic scores analyses, AD classification diagnosis and prognosis, etc.; (2) studies focused on quantitative analysis of neuroimaging markers by using amyloid PET, tau PET, fluorodeoxyglucose (FDG) PET, anatomic MRI, or other MRI techniques including fMRI and DTI; (3) studies focused on single nucleotide polymorphism (SNP) genotype analysis. Articles were excluded if they were: (1) case reports, reviews, study-design protocols, books and documents, thesis, editorials, communications, opinion (methodological perspective) articles, and letters to the editors; (2) animal studies; (3) focused on methodological proposal and comparison, (4) not related to neuroimaging markers based on the ATN framework (e.g., various CSF biomarkers or EEG recording), or focused on other risk factors for AD (e.g., sex, education, cognitive tests). Finally, 105 records were included in this review. The detailed process of literature search and screening is presented in Fig. 2.

Fig. 2
figure 2

A flowchart of the search and screening process for articles included in this review

Evolving technologies of brain imaging biomarker genomics

The research field of brain imaging biomarker genomics has been developing for two decades. Initially, twin-based and family-based genetic designs were used to calculate the heritability of measures derived from neuroimaging, such as brain volume [40,41,42], functional connectivity [43], and WM structure [44]. These studies have confirmed that the brain imaging measures have a moderate to strong genetic effect in AD [45], suggesting the potential value of brain imaging biomarker genomics studies in AD. In this section, we will introduce the evolving technologies in this field and describe the technical frameworks used in AD research from both genetic and imaging perspectives.

Analytical procedures for AD imaging

The systematic framework of brain imaging biomarker genomics for AD is composed of three panels: imaging, genomics and imaging biomarker genomics (Fig. 3).

Fig. 3
figure 3

Systematic computational framework for studies in the field of AD brain imaging biomarker genomics. The top panel indicates the analytical steps involved in imaging: image preprocessing, identification of regions of interest, feature extraction, feature selection, and model building and evaluation. The middle panel represents genomics procedures: genetic preprocessing, feature extraction and dimension reduction, model building, and statistical analysis. The bottom panel indicates integrated analysis methods in studies of imaging biomarker genomics, including association analysis, classification and prediction

Based on the ATN framework, the commonly used imaging techniques for AD are MRI and PET. MRI mainly includes sMRI, fMRI and DTI. PET imaging includes [18F] FDG PET, [18F] AV45 or [11C] Pittsburgh compound B ([11C] PiB) amyloid PET, and [18F] AV1451 tau-PET. Advances in imaging technologies have led to noninvasive or minimally invasive imaging of biomarkers, which may help capture all aspects of the disease process, including amyloid deposition [46], tau pathology [47], functional decline [48] and neuronal loss [49]. Below are the calculation frameworks for imaging analysis.

Step 1 Image preprocessing

High-resolution sMRI preprocessing includes realignment, segmentation, spatial normalization and smoothing. PET image processing includes realignment, coregistration, partial-volume correction, spatial normalization and smoothing. Resting-state fMRI preprocessing includes removal of unstable time points, slice timing corrections, head-motion corrections, baseline drift removal, spatial normalization and spatial smoothing. DTI data preprocessing includes skull stripping, background region filtering, and head-motion and eddy-current corrections. Several toolboxes can be used for this purpose, such as FSL (FMRIB’s Software Library) that processes MRI images (task or resting-state fMRI, sMRI, DTI, etc.) [50], Freesurfer that provides a series of algorithms to quantify brain functional and structural markers [51], and statistical parametric mapping (SPM) that is used for PET image preprocessing [52, 53]. More specifically, Data Processing & Analysis for Brain Imaging (DPABI) provides a complete resting-state fMRI analysis pipeline [54]. Other toolkits, such as DPARSF (Data Processing Assistant for Resting-State fMRI) and REST (Resting-State fMRI Data Analysis Toolkit) are also useful for fMRI analysis.

Step 2 Identification of regions of interest (ROIs) and feature extraction

This step includes precise identification of ROIs and extraction of imaging features [55, 56]. There are two common approaches to locating ROIs in brain imaging analyses: the voxel-based morphometry (VBM)-based method and the atlas-based method. VBM can achieve quantitative detection of differences in voxel-level imaging characteristics between groups. The atlas-based method projects the partitioning information from the standard brain atlas onto the images to identify specific brain regions. The identification of ROIs is followed by manual/automatic extraction of imaging features. The detailed characterization and calculation of imaging features are elaborated in Table 1. Feature extraction can usually be carried out by using FSL, Freesurfer, DPABI, SPM, the radiomics tool developed by Vallieres et al. (https://github.com/mvallieres/radiomics), and the Brain Connectivity Toolbox for graph theory-based brain network analyses [57].

Table 1 Summary of imaging radiomics features and calculation formulas

Step 3 Feature selection and model building

The aims of feature selection are to reduce feature redundancy and remove irrelevant features. Common feature selection methods include consistent stability analysis, statistical tests (two-sample t-test and rank-sum test), correlation analysis, sparse-group lasso, etc. There are two types of model construction: classification/prediction models and other statistical analysis models, such as regression analysis, correlation analysis, and survival analysis. Finally, model generalization capabilities are evaluated in terms of accuracy, sensitivity, specificity, correlation coefficient, and regression coefficient.

The above processes could also be carried out using deep learning (DL) algorithms, which can automatically extract quantitative and high-throughput features from medical images by end-to-end deep neural networks, which avoids complex hand-coding and does not need prior knowledge [58,59,60,61].

Analytical procedures for AD genomics

Early studies of brain genomics mainly focused on linkage and association analyses [62], in which candidate genetic markers were selected typically based on a hypothesis that implicates certain genes in AD pathogenesis. Advances in large-scale genotyping technologies enable comprehensive, unbiased GWAS, which can simultaneously test thousands of genetic markers. Nevertheless, GWAS might not avoid statistical artefacts that arise from the large number of tests. Systematic meta-analysis can alleviate this situation because this approach can quantitatively synthesize published genotype data for each polymorphism and produce a summary risk estimate (called the odds ratio) that contributes to the overall interpretation of association studies independent of positive or negative outcomes. Moreover, with the increase of sample sizes in GWAS analyses, ploygenic scores (PGS) are emerging as a novel statistical index that associates the collective individual SNP genotypes with specific diseases [63, 64]. In summary, AD genomics studies are mainly concentrated on traditional linkage and association analyses, large-scale case–control GWAS, systematic GWAS meta-analyses and recent PGS analyses, which facilitate identification of novel AD susceptibility genes as well as early diagnosis and prevention. The calculation frameworks for genomic analysis are mainly as follows.

Step 1 Genomic data preprocessing

As the first step, genomic data preprocessing includes quality control and imputation of genotyping data. Standard genotyping data quality control at the sample and variant level can be performed following a previously published pipeline [65, 66]. Genotyping data imputation is performed based on the Haplotype Reference Consortium (full panel) and the 1000 Genomes reference panel (for indels only).

Step 2 Feature extraction, selection and model building

This step aims at data mining and statistical analysis. Data mining focuses on feature extraction and dimensionality reduction, and constructs classification/prediction and statistical models with consideration of the complex nature of large genomics data. Statistical analysis mainly refers to construction of threshold-based association analysis models, including GWAS and meta-analysis. Subsequently, replication studies are always conducted to validate the results.

Step 3 Downstream analyses

Downstream analyses include conditional analysis, statistical fine-mapping analysis, colocalization with expression quantitative trait loci and metabolism quantitative trait loci, functional annotation, network analysis, gene-based analysis, gene set or tissue enrichment analysis, linkage disequilibrium analysis, PGS analysis, gene pleiotropy, heritability, genetic correlation calculation, etc.

Analytical procedures for AD imaging biomarker genomics

In general, the research field of AD imaging biomarker genomics is mainly focused on univariate or multivariate association analyses using imaging phenotypes as an intermediate. For example, Kim et al. [67] investigated genetic variants that influence cortical atrophy in 919 participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. They analyzed correlations between 3,041,429 SNPs selected based on GWAS and cortical thickness in the whole brain. This study included three steps: (1) imaging/genomic data preprocessing; (2) calculation of cortical thickness as an imaging feature; and (3) statistical analysis. The results of the study identified that rs10109716 in ST18 and rs661526 in NFIA are significantly associated with the mean cortical thicknesses of the left inferior frontal gyrus and left parahippocampal gyrus, respectively. In addition, Ning et al. [68] employed a neural network (NN) framework that combined both brain atrophic measurements and SNP genotype data to distinguish AD patients from healthy controls (HC). In this study, volumes of 16 ROIs selected based on prior knowledge on brain regions affected by AD were used as the imaging feature, and genotypes of APOE ɛ4 risk allele and 19 SNPs were used as the genetic features. The results showed that the NN model with both imaging and genetic features had an area under the receiver operating characteristic curve (AUC) of 0.99 in classifying AD and HC subjects.

Implementation of AD imaging biomarker genomics studies

Findings from studies on candidate genetic variants for AD

Since imaging biomarker genomics studies rely in part on  prior knowledge of candidate genetic variants, we summarize the candidate variants in accordance with the timeline of identification in large GWAS and meta-analyses. Initially, mutations of APP, PSEN1 and PSEN2 genes were found in molecular studies in 1993 and in 1995, which caused rare, Mendelian forms of the disease, usually resulting in early-onset AD. APOE was recognized as the strongest susceptibility gene for late-onset AD (LOAD) in 1995. In studies to confirm new risk loci related to LOAD, GWAS and meta-analyses further identified a series of loci relevant to LOAD. The first GWAS study was conducted in 2007. Later, GWAS studies were separately performed in four LOAD genetic consortia (Genetic and Environmental Risk in Alzheimer’s Disease, European Alzheimer’s Disease Initiative, Cohorts for Heart and Aging Research in Genomic Epidemiology, and Alzheimer’s Disease Genetic Consortium), which identified a total of 11 loci, namely, CLU, PICALM, CR1, BIN1, CD2AP, CD33, EPHA1, MS4A4A, ABCA7, MS4A6A, and MS4A4E [16, 27,28,29,30]. Under the support from the International Genomics of Alzheimer’s Project (IGAP), a meta-analysis including 74,046 individuals of European ancestry further identified 11 new susceptibility loci for AD, which were HLA-DRB5, SORL1, PTK2B, SLC24A4-RIN3, ZCWPW1, NME8, FERMT2, CELF1, INPP5D, MEF2C and CASS4 [31]. A case–control study of 85,133 subjects from the IGAP identified 3 rare coding variants in PLCG2, ABI3, and TREM2, which are highly expressed in microglia, highlighting the contribution of microglial-mediated innate immunity to the development of AD [32]. Given the difficulty of AD case confirmation, a case–control genome-wide association study by proxy (GWAX) was conducted with the UK Biobank dataset using family history of disease (14,482 proxy cases, i.e., relatives of affected individuals and 10,0082 proxy controls, i.e., relatives of unaffected individuals). Meta-analysis of the previously published IGAP GWAS results combining with the above-highlighted GWAX summary statistics identified 4 new risk loci associated with AD (HBEGF, ECHDC3, SPPL2A, and SCIMP) [33]. In the following year, a second meta-analysis of IGAP data and parental history of AD in an expanded UK Biobank dataset (n = 314,278) based on the previous proxy-phenotype AD study by Liu et al. identified 3 new loci (ADAM10, KAT8, and ACE) [34]. A larger meta-analysis with clinically diagnosed AD and AD-by-proxy (71,888 cases, 383,378 controls), using cohorts collected by the Psychiatric Genomics Consortium Alzheimer, the IGAP, the Alzheimer’s Disease Sequencing Project and AD-by-proxy from UK Biobank, yielded 8 loci (ADAMTS4, HESX1, CLNK, CNTAP2, APH1B, ABI3, ALPK2, and ACO74212.3) [21]. An expanded IGAP analysis (n = 94,437) confirmed 20 previous LOAD risk loci and identified 5 new loci (IQCK, ACE, ADAM10, ADAMTS1 and WWOX) [20], two of which (ACE and ADAM10) had been recently identified in the study of Marioni et al. [34]. Following the meta-analysis of Lambert et al. and Marioni et al., an updated meta-analysis of GWAX in the UK Biobank with the latest GWAS for AD diagnosis was performed and identified 37 risk loci and 4 new associations (CCDC6, TSPAN14, NCK2 and SPRED2) [24]. Finally, the most recent GWAS with 1,126,563 individuals, which expanded on the basis of Jansen’s work and contained the largest sample size thus far, identified 38 loci, including 7 loci (AGRN, TNIP1, TMEM106B, GRN, HAVCR2, NTN5 and LILRB2) that had not been reported previously [25]. A detailed summary of the representative AD candidate genes is shown in Table 2. Figure 4 depicts a circular diagram of AD genetic risk factors according to several postgenomics analyses based on animal and cellular models, although the AD genetic background remains largely unidentified.

Table 2 Summary of candidate genes used in AD pathology
Fig. 4
figure 4

Adapted from Dourlen P et al. Acta Neuropathologica. 2019 Aug; 138 (2):221–236. Reprinted with permission from Springer Nature

Circular diagram of AD genetic risk factors. From outside to inside: (1) genomic loci in alphabetical order; (2) genes therein; (3) expression profiles of these genes in different cell types of the brain (greyscale); and (4) pathways/processes/proteins to which these genes have been functionally linked (colour lines).

Findings from studies on AD candidate imaging biomarkers

In earlier studies, pairwise univariate analysis was performed to identify associations between genetic markers and imaging phenotypes. To accommodate more flexible associations involving multiple genetic markers and multiple imaging phenotypes, multiple regression and multivariate models have been used in recent studies in combination with machine learning (ML) methods [69]. In the following, we will review candidate-gene, genome-wide and polygenic associations with imaging-derived traits, according to the ATN framework for AD biomarkers proposed by NIA-AA in 2018 (Table 3) [39].

Table 3 Summary of AD-relevant effects based on candidate imaging biomarkers and association studies

Imaging genomics analysis of “A” biomarker

Of the ATN framework, “A” refers to the Aβ plaque biomarker, including cortical amyloid PET ligand binding and CSF Aβ42 level. The deposition of amyloid plaques in the brain is one of the two main pathological signs of AD. As a reliable imaging phenotype of AD, amyloid PET can selectively detect Aβ deposition in the brain. A number of studies using amyloid PET have investigated how various genetic variants influence Aβ burden.

At the candidate-gene level, Drzezga et al. [70] examined the effect of APOE genotype on the levels of [11C] PiB PET Aβ plaques in AD patients using the VBM-based method and regression analysis. The results showed higher levels of Aβ plaque deposition in ε4-positive patients in bilateral temporoparietal and frontal cortical areas. Apostolova et al. [17] investigated the associations of the top 20 AD risk variants with brain amyloidosis using ADNI datasets by multivariable linear regression analysis. The results showed that the ABCA7 gene has the strongest association with amyloid deposition, while the APOE ε4 and FERMT2 genes show stage-dependent associations with amyloid deposition, especially in the MCI stage.

At the genome-wide level, Yan et al. [149] conducted a GWAS meta-analysis using [11C] PiB PET imaging from the ADNI datasets, and found that the APOE region showed the most significant association with brain Aβ burden. Ramanan et al. [150] performed the first GWAS of cortical Aβ burden in humans using data from ADNI-2 and ADNI-Grand Opportunity and reported that APOE and BCHE (BUCHE) are independent regulators of amyloid deposition in the brain, accounting for nearly 15% of the variance in cross-sectional amyloid load. At the polygenic level, Tan et al. [151] observed a strong association between polygenic hazard scores and Aβ uptake. A detailed summary of these findings is shown in Table 3.

Imaging genomics analysis of “T” biomarker

“T” refers to the tau biomarker, including CSF phosphorylated tau and cortical tau PET. The twisted strands of the protein tau (tangles) inside neurons are the other pathological marker of AD. Although tau pathology serves as a primary brain pathology associated with cognitive impairment in AD, most previous studies have focused on CSF tau levels, which reflect tau production rather than the amount of pathological tau deposition in the brain. The recent advent of AV1451 tau-PET imaging has allowed the assessment of fibrillary tangles in the living brain.

At the candidate-gene level, Smith et al. [83] reported that the [18F] AV1451 tau-PET imaging is strongly correlated with tau neuropathology in MAPT (microtubule-associated protein tau) mutation carriers. After that, Yan et al. [88] explored the association of sex and APOE ε4 with brain tau deposition and atrophy in older adults with AD, and found that female APOE ε4 carriers (FACs) have elevated tau-PET SUVR in comparison to non-FACs. Therriault et al. [86] and Neitzel et al. [89] independently evaluated different datasets and reported that APOE ε4 is associated with higher tau accumulation and that this association is independent of amyloid burden. Regarding other AD candidate genes, Franzmeier et al. [87, 90] and Neitzel et al. [91] suggested that the BIN1 rs744373 SNP and Klotho-VS heterozygosity are associated with higher and lower pathologic tau levels, respectively, by analyses of variance and multiple linear regression.

At the genome-wide level, Ramanan et al. [152] conducted the first neuroimaging GWAS of tau pathology in 754 individuals. The findings not only confirmed the association of MAPT with tau burden, but also identified the NTNG2-rs75546066 locus as having a novel protective effect against tau pathology.

At the polygenic level, Sun et al. [92] assessed PGS values as a predictor of tau pathology in non-demented individuals. The results showed that higher PGS values were correlated with elevated tau-PET uptake values, and the significance remained when APOE was regressed.

Imaging genomics analysis of “N” biomarker

“N” refers to neurodegeneration or neuronal injury, including CSF total tau level, [18F]FDG PET hypometabolism, and atrophy on sMRI. Among them, sMRI is the most widely used technology in imaging biomarker genomics studies to extract targeted imaging phenotypes, with increased discriminative power and improved biological interpretability. [18F]FDG PET can detect brain glucose metabolism and provide important pathological staging information. Several studies have also investigated how various genetic variants influence brain glucose metabolism.

At the candidate-gene level, the associations of APOE with MRI genotypes have been investigated, especially between ε4 carriers and noncarriers. For example, Wolk et al. [95] found that the APOE genotype affects cognitive and anatomic phenotypic expression of AD, in that the ɛ4 carriers with mild AD show greater impairment on measures of memory retention and greater MTL atrophy compared to noncarriers who are more impaired in working memory and show greater frontoparietal atrophy. Risacher et al. [153] found that the annual percent change rate of MRI atrophy is influenced by the APOE genotype. Morgen et al. [99] found that the genetic interaction between PICALM and APOE is associated with brain atrophy and cognitive impairment using univariate analysis of variance. Moreover, Biffi et al. [96] investigated the impact of multiple GWAS-validated and GWAS-promising candidate loci on hippocampal volume, amygdala volume, WM lesion volume, entorhinal cortical thickness, parahippocampal gyrus thickness and temporal pole cortical thickness. The study indicated that genetic variants that modulate AD risk as revealed in previous GWASs may influence neuroimaging measures. In addition, BIN1 and CNTN5 were identified as two novel loci that show associations with multiple MRI characteristics, which are of interest for further studies. Regarding brain glucose metabolism biomarkers, Lehmann et al. [77] assessed the relationships between glucose metabolism and APOE genotype in clinical AD patients, with one-way analysis of variance and Tukey’s post-hoc test, and found a greater degree of medial temporal hypometabolism in APOE ε4 carriers. Miller et al. [121] explored and confirmed the associations between rare variants in splicing regulatory element loci of EXOC3L4 and global cortical glucose metabolism in the ADNI cohort. Notably, Seo et al. [123] analyzed the effects of 132 selected susceptibility genes previously identified to be associated with LOAD, on neurodegenerative brain features by using neuroimaging data from the KBASE (Korean Brain Aging Study for Early Diagnosis and Prediction of Alzheimer’s disease) cohort, including [11C]PiB PET, [18F]FDG PET, and MRI. In contrast to previous studies, this study utilized five in vivo AD pathologies and associated them with both common and rare genetic variants by performing targeted sequencing of 132 candidate genes.

At the genome-wide level, Kong et al. [122] performed the first GWAS examining brain FDG metabolism in 222 subjects from the ADNI cohort in 2018, and identified RBFOX1 (RNA-binding Fox1) SNP rs12444565 to have a strong association with brain glucose metabolism. Wang et al. [124] identified two genome-wide significant SNPs, rs4819351 in AGPAT3 (1-acylglycerol-3-phosphate O-acyltransferase 3) and rs13387360 in LOC101928196, that had strong protective effects against the longitudinal metabolic decline in the right temporal gyrus and the left angular gyrus, respectively. At the polygenic level, Desikan et al. [102] reported that the polygenic hazard score was associated with longitudinal MRI-derived volume loss in the entorhinal cortex and hippocampus.

In addition to the above “N” biomarker, many other advanced MRI technologies have also been applied to study the influence of genetic variation on functional or WM alterations. Based on the DTI technology, WM alterations have been found in AD and MCI, and APOE may play a role in modulating these alterations [140, 141, 143, 144, 146,147,148]. Some researchers have reported differences in WM integrity between healthy APOE ɛ4 carriers and noncarriers by using diffusion parameters, including fractional anisotropy, mean diffusivity, and radial diffusivity. In addition, Gu et al. [146] performed a meta-analysis of associations of the PSEN1 genotype with WM integrity and brain metabolism, and indicated that PSEN1 is associated with mean diffusivity increase in DTI markers and decreased brain metabolism. Foley et al. [106] analyzed associations between AD polygenic risk scores and diffusion-weighted parameters in young adults, and revealed that the fractional anisotropy of the right cingulum is correlated with AD polygenic risk score. Regarding fMRI, both resting-state fMRI and task-fMRI were conducted to evaluate associations of brain activity with APOE and other AD risk genes [129, 130, 133, 136]. Many of these studies were performed in healthy older adults [125,126,127,128, 131, 132, 135] to investigate potential risk-allele influences on functional brain activity. It is worth noting that Jahanshad et al. [98] explored the heritability of various brain connections based on genome-wide associations and discovered the SPON1 (F-spondin) rs2618516 variant to affect dementia severity. Besides, Su et al. [134] investigated the associations between AD PGS and functional connectivity in the default mode network, and found significant correlations in the temporal cortex.

Figure 5 illustrates the mapping of associations between genomic data and brain functional networks, which are classified into 7 brain networks according to Yeo’s template, including visual network, somatomotor network, dual attention network, salience network, limbic network, frontoparietal network, and default mode network. In summary, associative studies of AD brain imaging biomarker genomics can provide new insights into the pathological and genetic mechanisms underlying AD. In addition, the number of genome-wide studies is relatively small compared with candidate-gene association studies, which may be caused by the scarcity of neuroimaging data. However, studies only focused on selected candidate genes may ignore potential interactions among multiple significant genetic variants, which emphasizes the necessity of genome-wide interaction and PGS analyses with improvement in multimodal imaging databases.

Fig. 5
figure 5

The relationship between genomic data and 7 specific brain networks from Yeo’s template. These associations are respectively marked in colors consistent with the corresponding brain networks. DAN dual attention network, DMN default mode network, FN frontoparietal network, SMN somatomotor network, SN salience network, VN visual network

AD diagnosis and prognosis based on brain imaging biomarker genomics

Recent advances of artificial intelligence (AI) techniques enable automatic combination of multimodal neuroimaging and genomics data to provide complementary and comprehensive information for AD diagnosis and prognosis. Specifically, ML methods have been widely implemented in computer-aided diagnosis of AD, including traditional ML models and advanced DL algorithms. The traditional classification models include support vector machine (SVM), random forest (RF), linear discriminant analysis (LDA) and regression models (RL). De Velasco et al. [154] compared performances of ML models least absolute shrinkage and selection operator (LASSO), k-nearest neighbour (KNN), and SVM in predicting LOAD from genetic variation data, with SVM showing the best performance (AUC = 0.72). In addition, APOE genotype is the most commonly utilized genomic data. For example, Gray et al. [155] performed multi-modality classification based on joint embedding of sMRI, FDG PET, CSF biomarkers, and APOE genotype data, using a multimodal RF model and a fourfold cross validation (CV) to predict AD, and achieved an accuracy of 89% in classifying AD from healthy controls. Similarly, by combining sMRI, FDG PET, CSF biomarkers, APOE genotype, age, sex and body mass index, Kohannim et al. [156] selected a SVM model and performed leave-one-out CV for AD and MCI classification and prediction of future cognitive decline within 1 year, and achieved a maximum of 90% accuracy for AD vs healthy controls. To distinguish between stable and progressive MCI, Dukart et al. [157] used a plain Bayesian (naive Bayesian, NB) algorithm based on APOE genotype, neuropsychological assessment, sMRI, and FDG PET, achieving an accuracy of approximately 87%. Moreover, Bi et al. [158] combined fMRI and SNP data and used the multimodal RF algorithm to distinguish AD from normal control, and finally obtained AD prediction accuracy of 87%. Varol et al. [159] proposed the heterogeneity through discriminative analysis (HYDRA) algorithm to predict AD based on combined sMRI and SNP data, with the highest AUC value being 0.942.

On the other hand, in the context of DL method, Liu et al. [160] integrated DTI and SNP data with deep convolutional neural networks for prediction of AD, and obtained AUC values of 0.8571, 0.8291, 0.8583, and 0.7756 at baseline, 6 months, 12 months and 24 months, respectively. Similarly, combining sMRI and SNP data, Ning et al. [68] used a neural network to predict AD and achieved an AUC value of 0.992. Moreover, based on sMRI, demographics, neuropsychological assessment and APOE genotype data, Spasov et al. [161] used the convolutional neural network model to distinguish MCI patients who would develop AD within 3 years from patients with stable MCI, with an AUC value of 0.925. By combining sMRI, FDG PET and SNP data, Zhou et al. [162] conducted three-stage deep feature learning and fusion to simultaneously predict HC, MCI and AD, with an accuracy of 65%, which was higher than that of other ML classification methods. In addition to the joint use of imaging and clinical information, combination with multiomics information is also an emerging trend in AD research. Shigemizu et al. [163] integrated genomic data and microRNA expression profiles to construct a proportional hazards model-based prognostic model to identify MCI individuals at high risk of AD. A consistency index of 0.702 was obtained on an independent test set. A detailed list of machine learning-based studies of imaging biomarker genomics is provided in Table 4.

Table 4 Application of machine learning based on imaging biomarker genomics in AD diagnosis and prognosis

In summary, the above-mentioned studies show that ML methods with multimodal data such as imaging, clinical and multiomics data as input measures, are valuable tools for prognosis and risk stratification of AD with improved accuracy.

Key considerations and perspectives regarding AD imaging biomarker genomics

As a novel approach, the brain imaging biomarker genomics technique still needs further optimization, mainly in the following aspects.

Variable control in calculations

Calculations in AD imaging biomarker genomics can be influenced by various factors. Differences in physiological, demographic, and environmental factors can affect heritability estimates and measurements of brain-related features, which may obscure the disease-related effects and limit the utility of brain-related features as endophenotypes. Some recent studies have investigated associations of APOE ε4 status and sex  with cognitive memory [88, 95, 168,169,170]. Therefore, these potential confounding factors should be included as covariates to improve comparability and reliability of findings. In particular, sex, education and APOE ε4 status are always used as covariates in large imaging–genomics GWAS and meta-analyses. Another way to avoid these potential influences is to carry out studies in healthy individuals or in a single ethnic or sex group. Ethnicity is another critical factor. Independent replication and meta-analyses remain the most reliable methods for reducing false-positive findings [171]. Comprehensive and ethnicity-homogeneous databases are needed to verify the generalizability and robustness of significant results. Compared to candidate-gene analyses which could not account for epistatic effects between genes, genome-wide analysis is more unbiased, thus underscoring again the significance of large samples in the future.

Use of prior knowledge  on calculations

Interpretation of results is a focus of brain imaging biomarker genomics for AD. The use of prior knowledge, such as the Allen Human Brain atlas (AHBA), can facilitate calculations in brain imaging biomarker genomics and correlate spatial variations at the molecular scale with macroscopic neuroimaging phenotypes. For example, Franzmeier et al. [90] and Neitzel et al. [91] have used the AHBA to explore associations of BIN1 rs744373 and KL-VS heterozygosity with tau accumulation, respectively. Moreover, Sepulcre et al. [172] have developed a novel graph theory approach named directional graph theory regression (DGTR) to investigate the intersection of tau/Aβ pathological changes in the brain and the genetic transcriptome of AHBA. This approach can potentially be applied to explore more phenotype-genotype associations. Taken together, increasing the sensitivity and power of genetic effects, adequately utilizing ROIs, reliably stimulating responses, and highlighting differences among individuals are extremely necessary. For example, identifying differential masks first, as ROIs on a unique dataset, will lead to higher sensitivity.

Generalization of multivariate approaches beyond GWAS

Currently, biomarkers derived from GWASs were usually identified based on clinical outcomes. This approach has both advantages and disadvantages. Compared with imaging phenotypes limited by the scarcity of neuroimaging data, it is easier and more feasible to obtain a large number of clinical phenotypes, thus better meeting the prerequisites of large-scale GWAS and reducing greatly false-positive results. However, the accuracy of this approach is influenced by the sample size and statistical methods. In contrast, combining neuroimaging markers with GWAS genetic phenotypes can explain potential biological mechanisms in relatively small sample sizes.

Therefore, imaging biomarker genomics studies are gaining novel insights in comparison to traditional GWAS analyses. For example, data-driven multivariate approaches are emerging to explain more imaging-genetic variants, such as sparse canonical correlation analysis and parallel independent component analysis [69]. These multivariate approaches have provided increased detection power and put forward new technical challenges, including data dimensionality reduction and feature selection strategies. Besides, the GWAS analysis pipelines are also expected to be further optimized to process complex and high-dimension genetic data automatically.

Combination of AI and brain imaging biomarker genomics

Currently, ML methods have been widely used for AD diagnosis and prognosis. On the one hand, traditional ML and advanced DL algorithms are relatively mature computational methods in AD imaging studies and include model building, feature processing and model evaluation. On the other hand, combination of genomics calculations with ML algorithms has not been widely performed. Applications of deep neural networks in genetic studies are still scarce, although seminal studies have demonstrated the accessibility of deep neural networks to DNA sequencing data, resulting in generation of DeepBind, DeepSEA and Basset networks [173,174,175,176]. Therefore, more efforts should be focused on the development of solutions for technical challenges especially for DL algorithms, such as how to reduce dimensionality of multimodal data, how to integrate imaging and genomics data, and how to interpret the effectiveness of DL features.

Integration of multiomics data

AD imaging biomarker genomics research has identified numerous novel genetic variants and gained insights into disease mechanisms. However, the pathological mechanisms underlying AD are still far from well understood. Apart from the development of methods, the integration of multimodal imaging data and genomics, microRNAomics, metabolomics, proteomics, and transcriptomics will continue to be an important research direction. Genomics is now the most mature omic technology with development of high-throughput genotyping arrays and sequencing strategies. Other omic technologies have also been incorporated into research domains. For example, mass spectrometry-based proteomics has driven deep profiling of the proteome in AD. The AD proteomic review by Bai et al. [177] indicated that proteomics-driven systems biology would be a promising frontier to link genotype, proteotype, and phenotype and accelerate improvement in AD models and treatment strategies. Besides, neuroimaging markers are not limited to MRI and PET markers. During the last few decades, EEG and MEG techniques have also been commonly applied in AD studies. For instance, alterations of brain rhythms and functional connectivity have been revealed in EEG and MEG studies [178,179,180]. Relationships between various AD genetic risk factors and EEG phenotypes have also been reported [181,182,183,184]. Hence, compared with a single omics category, integration of multiomics information allows systemic exploration at multiscale layers to better understand the comprehensive biological information flow that underlies the disease and to pave the way for precision medicine.

Conclusions

The field of brain imaging biomarker genomics has made tremendous progress in the last decade to capture novel genetic variants and explore potential disease pathophysiology mechanisms. Future studies in this field are anticipated to move forward to precise medicine, to identify significant findings that can be used in clinical practice, and to achieve computer-aided AD diagnosis and prognosis. Therefore, further development of current research methods and integration of information will continue to be an important research direction. There is no doubt that unbiased genome-wide approaches remain critical, and replication studies are necessary. Advances in next-generation sequencing approaches coupled with more refined brain mapping (such as AHBA that maps genetic variants to brain tissues) are increasingly promoting the interpretability of findings from imaging biomarker gemonics. In addition, DL algorithms allow for integration of multiple preprocessing steps into a single model to improve AD diagnosis and prognosis. In summary, current studies in the AD imaging biomarker genomics field have profiled the brain mechanisms at an unprecedented scale, raising new hypotheses for subsequent validation.