Background

Along with amyloid pathology and tau-related neurodegeneration, multiple other molecular alterations and pathway dysregulations have been reported in Alzheimer’s disease (AD). Indeed, there is strong evidence that pathophysiological changes involving neuroinflammation [1] lipid metabolism [2], one-carbon metabolism [3], amino acids [4], and glucose metabolism [5], among others, are present in AD. However, the contribution and relevance of these alterations to clinical manifestation and progression of the disease as well as their inter-individual variations, and complex interrelations, remain poorly understood. While these processes are generally not considered part of the “core” AD pathology, they may substantially contribute to the development of amyloid pathology and neurodegeneration and precipitate the manifestation of symptoms. As they may be occurring at early clinical and preclinical disease stages, a better understanding of these processes may be highly relevant for both early diagnosis and prognosis and the design of targeted interventions to interfere with developing AD pathology and clinical disease progression.

‘Omics approaches and technologies have made major progress over the past decade to resolve the complexity of the metabolome, lipidome and proteome [6]. As powerful phenotyping technologies, ‘omics significantly accelerate the understanding of mechanisms of pathophysiological alterations that underlie complex diseases such as AD [7, 8]. Beyond the potential of identifying altered biofluid molecule profiles that could be used as biomarkers, these technological advances also offer the opportunity to explore different types of molecules in parallel by combining multiple ‘omics methods. Recent statistical advances have made it possible to integrate the information from multiple data modalities for a thorough exploration of endophenotype networks and biological interactions related to disease [9]. While multi-omics approaches have recently shown their potential in relation to different other pathological conditions [10,11,12], these methods still need to be more broadly adapted and applied in AD [13].

Here, we hypothesized that specific patterns of proteins, lipids, neuroinflammatory markers, and metabolites are associated with core features of the AD pathology and indicate disease-related, inter-connected biological pathway alterations. We investigated these alterations across multiple biochemical pathways by using a multi-layer dataset acquired by analysis of cerebrospinal fluid (CSF) from a cohort of elder subjects with normal cognition. In order to integrate data from different ‘omics platforms in an unbiased fashion while considering interactions between modalities, we combined different approaches including single ‘omics analysis and multi-omics factor analysis (MOFA) [14,15,16].

Methods

Study population

One hundred and twenty community dwelling individuals, aged 55 or older, including subjects with normal cognition, mild cognitive impairment (MCI), or mild AD dementia (defined as previously described [3]), were enrolled into a brain aging study conducted in the Department of Psychiatry and the Department of Clinical Neurosciences, University Hospital of Lausanne, Switzerland. They were recruited among memory clinic outpatients or through advertisement. An overall clinical, neurological, and comprehensive neuropsychological assessment was performed between 2013 and 2016, which included the Mini Mental State Examination (MMSE, [17]) and Clinical Dementia Rating (CDR, [18]). Candidates with unstable medical conditions or with neurological or psychiatric diseases that could interfere with cognitive performance were excluded as previously described [19]. Clinical and neuropsychological follow-up evaluations were performed at 18 and 36 months using the same methods and tests.

Study procedures

Clinical assessment

We determined Mini-Mental State Examination (MMSE), CDR, and CDR sum of boxes (CDR-SoB), for all participants. CDR-SoB and CDR were based on the information available from the participant and his/her relative, the clinical examination, and comprehensive neuropsychological test performance, as previously described [19].

Biochemical sample collection and handling

Ten to 12 ml of cerebrospinal fluid (CSF) obtained from lumbar punctures conducted after an overnight fast at participant inclusion were spun down at 4 °C, immediately aliquoted, and snap frozen at − 80 °C until assayed [19], with no freeze-thaw cycles allowed. Samples were stored for a maximum of 3 years before analysis. Study personnel blinded to clinical data performed biochemical and genetic analyses.

Cerebrospinal fluid AD biomarkers

CSF beta-amyloid 1-42 (Aβ1-42), total-tau (Tau), and tau phosphorylated at threonine 181 (P-Tau) concentrations were measured using commercially available ELISA kits (Fujirebio, Gent, Belgium) in all samples within the cohort.

Analyte measurements

Multiple ‘omics data from different pathways and various biological levels were acquired from a vast majority of participants within the cohort (n = 114/120 for proteomics, 118 for metabolomics, 119 for neuroinflammation and one-carbon metabolism, and 120 for lipidomics).

CSF samples were measured using an untargeted shotgun proteomic workflow based on liquid chromatography (LC) tandem MS (MS/MS) using an Ultimate 3000 RSLC nano system and a hybrid linear ion trap-Orbitrap (LTQ-OT) Elite (Thermo Scientific, San Jose, CA, USA) [20, 21]. Relative quantification of proteins between the samples was obtained using isobaric tagging with the tandem mass tag technology [22]. Full experimental details and parameters of the proteomic analysis have been published previously [23, 24]. A targeted subset of thirty-seven CSF inflammatory proteins including IFNγ, IL-1β, IL-2, IL-4, IL-6, IL-8, IL-10, IL-13, TNFα, IL-1α, IL-5, IL-7, IL-12/23p40, IL-15, IL-16, IL-17A, TNFβ, VEGFA, Eotaxin, MIP-1β, Eotaxin-3, TARC, IP-10, MIP-1α, MCP-1, MDC, MCP-4, VEGF-C, VEGF-D, Tie-2, sFLT-1 (VEGFR-1), PIGF-1R, bFGF, SAA1, CRP, sVCAM-1, and sICAM-1 were separately quantified using a sandwich immunoassay (Meso Scale Diagnostics (MSD), Rockville, MD, USA), according to the manufacturer’s instructions. This platform has been validated by the manufacturer (https://www.mesoscale.com/~/media/files/product%20inserts/neuroinflammation%20panel%201%20human%20insert.pdf) and has been previously used successfully [25].

CSF lipids were quantified using an MS-based shotgun approach [26]. This technology can cover 22 quantifiable different lipid classes encompassing more than 200 lipid species; it achieves absolute quantification, by inclusion of internal standards for every lipid class measured. Figure-of-merits are an average coefficient of variation of < 10% (intra-day), approximatively 10% (inter-day), and approximatively 15% (inter-site) for most lipid species.

Metabolomic profiling was carried out by means of 1H NMR spectroscopy, as reported previously [27]. This approach covered major metabolic pathways, including amino acids, carboxylic acids, and central energy metabolism. Metabolites within the one-carbon pathway are a hypothesis-driven subset of metabolites [3] and were separately analyzed using LC-MS/MS as previously described [28] with an Accela UHPLC 1250 Pump coupled to a TSQ Quantum Vantage triple quadrupole mass spectrometer equipped with a heated electrospray ionization source (Thermo Scientific). Selected reaction monitoring transitions have been described previously [28].

The initial number of analytes measured in CSF, the final number of analytes selected per platform (a total of 891 analytes covered), and quantification method used for each platform are summarized in Table 1.

Table 1 Datasets used in this study

Genetic measures

The APOE genotype was determined by PCR as previously described [29]. Participants with one or more e4 alleles were classified as carriers.

Data preparation and transformation

Lipidomics

Twenty-six high-quality intact lipids with less than 5% of null values were selected as continuous numerical markers from 65 original measurements. Numerical lipid marker values were log10-transformed prior to analysis.

Metabolomics

Seventy-one peak integrals were originally measured in CSF. Sixty-three analytes with less than 5% missing values were selected from the obtained spectra. Peak integral values were log10-transformed prior to analysis.

One-carbon metabolomics

Seventeen analytes were initially measured in CSF. Some analytes could not be measured in the majority of samples and were excluded from the analysis (i.e., homocysteic acid, dimethylglycine, betaine, total homocysteine, pyridoxine, and pyridoxamine); taurine and glycine data were inconsistent and were also filtered out resulting in 9 measured analytes (i.e., choline, cystathionine, methionine, riboflavin, S-adenosylhomocysteine, S-adenosylmethionine, serine, cysteine and 5-methyltetrahydrofolate). Analytes with more than 5% missing data points were also removed. Numerical values were log10-transformed prior to analysis.

Neuroinflammatory markers

Thirty-eight markers were measured in CSF. Calibration curves, batch effects, and analytes with more than 5% missing data points were removed, and lower limit of quantification was controlled. After this quality control, 17 markers were removed, resulting in 21 markers selected. Concentrations were log10-transformed prior to analysis.

Proteomics

Relative quantification data were available for all subject samples as log2 ratios as previously described [24]. Analytes with more than 5% missing data points were removed, resulting in 768 proteins measured from an initial number of 791.

Before analysis, outliers (i.e., data points that exceeded the cutoff value of mean ± 3 × standard deviation) were replaced by the cutoff value in all datasets (n = 28 for lipidomics, 36 for metabolomics, none for one-carbon metabolomics, 17 for neuroinflammatory markers, and 345 for proteomics). For all datasets, this represented below 1% of all data points. Missing values were replaced using an iterative Markov chain Monte Carlo method before single-modality feature selection approaches, but were not replaced for multi-omics analysis as the MOFA method can handle large proportions of missing values [14].

Statistical and analytical approaches

Descriptive statistics for the cohort were performed using t tests comparing AD and control groups for continuous variables and chi-square tests for categorical variables. Data was clustered by hierarchical clustering across samples and factors values or loadings.

Feature selection methods

Single-modality approaches

To overcome the bias resulting from correlation between variables and thus unreliability and saturation of standard regression techniques, we used Elastic-Net regularization (α = 0.5) for regression analysis. This was performed separately for each individual ‘omics platform in the whole cohort using R software with custom routines implementing the glmnet package [34]. Each pre-specified CSF biomarker endpoint was considered as a continuous dependent variable and associated features were identified using a value of λ (lambda) that minimized the 10-fold cross validated error.

Multi-omics factor analysis

This analysis was performed using the MOFA package on the whole cohort in R and Python software [14]. Latent factors (also referred as LFs) were selected to explain a minimum of 2% variance in at least one data type. The MOFA model was trained over 938 iterations with a convergence threshold of 0.1. Individual analytes were selected if their normalized absolute loading value was > 0.8 within any given LF in order to include only analytes with strong associations. More details about the MOFA method can be found in Additional file 1. The trained MOFA model was validated using both a correlation approach and CSF AD biomarker predictions (Additional file 2, Figures S1 and S2, respectively).

Associations with CSF biomarkers of AD

In order to evaluate the correlations of the analytes identified by the MOFA model with CSF biomarkers of AD (CSF Aβ1-42, Tau and P-Tau), we used two-tailed Spearman’s rho. Benjamini-Hochberg correction of P value for multiple testing was then applied using a false-discovery rate of 0.1.

Models for the prediction of AD and of cognitive decline

Predictions were ran using the glm package in R. Subjects were classified as controls or AD, according to the presence or absence of an AD CSF biomarker profile, defined by a CSF P-Tau/Aβ1-42 ratio > 0.0779 based on center data [30]. We constructed a reference model including the following covariates for AD prediction: age, sex, years of education, baseline MMSE score, and APOE4 carrier status. MMSE change at last available follow-up for one hundred and three participants, with nineteen participants followed up at 18 months only, and eighty-four at 36 months, was used to classify participants with decreased global cognition as follows: MMSE score at baseline – MMSE at last follow-up visit ≥ 2. Another reference model including age, sex, years of education, baseline MMSE score, APOE4 carrier status, and time to last follow-up was constructed for cognitive decline prediction. We then used an iterative approach, first adding all analytes identified by the MOFA model individually to the above reference model and selected the model displaying the smallest Akaike information criterion (AIC) value to select the best molecule to add at each iteration. We repeated this process over successive iterations, adding a single analyte each time. Performance of the models was analyzed by comparing area under the curve (AUC) of the resulting ROC curves using the DeLong method. No further improvements to the AUC were observed after five iterations for both predictions. Confusion matrices to assess sensitivity and specificity were calculated for all models.

Pathway enrichment

Proteins selected by the MOFA model were searched for in the UniProt database [35], and their entry number was then subsequently used within the Reactome database [36]. A separate over-representation analysis was performed for each LF. This analysis used hypergeometric distribution to determine which pathways and biological reactions were over-represented within the dataset. Over-represented pathways were then manually grouped into broader ontology-based categories (Additional file 3, Table S1).

Results

Cohort description

The clinical and demographical characteristics of the participants included in this study are shown in Table 2.

Table 2 Study cohort

Single-modality feature selection

Elastic-Net regression within each single ‘omics modality identified 82 molecules associated with CSF “core” biomarkers of AD pathology (i.e., Aβ1-42, Tau and P-Tau) within the whole cohort (Tables 3, 4, and 5). Twenty out of the thirty-seven proteins selected were correlated with at least one CSF AD biomarker. Only two neuroinflammatory molecules displayed no correlation with CSF AD biomarkers. Conversely, only two molecules at metabolomics level and two lipids were correlated with CSF AD biomarkers. Finally, total cysteine showed no correlations (Additional file 3, Table S2). Strikingly, distinct panels of CSF analytes were associated with either Aβ1-42, or Tau and P-Tau, reflecting alterations of different pathways in relation to amyloid pathology, neurodegeneration, and tau pathology, with very little overlap (Fig. 1). Only protein 14-3-3 zeta/delta was associated with all three biomarkers.

Table 3 Analytes associated with CSF Aβ1-42
Table 4 Analytes associated with CSF Tau
Table 5 Analytes associated with CSF P-Tau
Fig. 1
figure 1

Venn diagram of associations with CSF core AD biomarkers. Venn diagram of associations of analytes obtained by regression models with CSF core AD biomarkers. Number of molecules identified as well as those shared between biomarkers is shown. The full list of associated molecules is presented in Tables 3, 4, and 5

Overview of the MOFA model

In parallel, we trained a MOFA model on the whole cohort, to identify major dimensions of heterogeneity (latent factors; LFs) responsible for the variance within the cohort. This model identified five LFs that each explained a minimum of 2% variance in at least one of the analyzed metabolic levels. Among these factors, LF1 and LF2 were present in most multi-omics modalities, revealing a broad participation to variance within the cohort (Fig. 2). On the other hand, the remaining LFs only captured variance across some modalities (three for LF4 and LF5, two for LF3) and had a smaller contribution to overall variance. Across all LFs, the CSF AD biomarkers accounted for 38.5%, proteins 39.8%, lipids 10.3%, neuroinflammation markers 10.3%, one-carbon metabolites 9%, and other metabolites 3.7% of the variance contained within the cohort (Fig. 2). We next produced clustered heatmaps of the weight (i.e., the association of an individual molecule with the LF) of each analyte across different LFs (Fig. 3a–e) revealing specific patterns of associations between analytes within each analyzed multi-omic level and LFs. For example, a subset of proteins with a negative association with LF1 have a positive association with LF2 (Fig. 3a) and molecules within the one-carbon metabolism are differentially associated with LF1 and LF2 (Fig. 3c). These patterns suggest groups of molecules interacting together with specific effects on LFs through common pathways. Because only three CSF core AD biomarkers were measured, we did not produce heatmaps to analyze the association of Aβ1-42, Tau and P-Tau with these five LFs, but rather, we inspected their absolute individual loadings across all LFs (Fig. 4). This revealed that individual CSF AD biomarkers had different contributions across the identified LFs. CSF Tau and P-Tau levels were strongly associated with LF1, LF2, and LF3, while Aβ1-42 was the main contributor to variance among the CSF AD biomarkers to LF4 and LF5 indicating that these latter LFs were associated with amyloid pathology and the former with tau pathology and neurodegeneration.

Fig. 2
figure 2

Overview of the MOFA model. Overview of the trained MOFA model showing variance (R2) within the cohort explained by each modality (top) and latent factors (LFs, bottom) from the trained MOFA model

Fig. 3
figure 3

Clustering of loadings across latent factors. Heatmaps of hierarchical clustering of the measured loadings across in LFs for data obtained from proteomics (a), neuroinflammation markers (b), one-carbon metabolism (c), metabolomics (d), and lipidomics (e) showing clusters of analytes along the X-axis and the association of each individual analyte with each LF (shown on the Y-axis). Note the distinct pattern within each LF. Color scale indicates both the direction and strength of relative associations

Fig. 4
figure 4

Loadings of CSF AD biomarkers. Normalized loadings of CSF AD biomarkers shown on the X-axis across the five latent factors of the trained MOFA model. Positive or negative signs indicate the relative direction of the CSF AD biomarkers with the associated latent factor. Note that signs are relative within a single latent factor for biomarker weights

Individual analyte contributions to LFs

We next addressed the contribution of individual molecules to variance within the cohort and how these molecules aligned with CSF AD biomarkers. We selected analytes with absolute normalized loadings > 0.8 within any given LF derived from the MOFA model. This approach selected 37 proteins, 7 neuroinflammatory markers, 3 one-carbon metabolites, 5 lipids, and 7 other metabolites (not counting analytes selected in multiple LFs) that contributed the most to variance within the cohort (Table 6). We next investigated the relationship between the identified analytes and the expression levels of CSF AD biomarkers in the relevant LFs. We have shown that the LFs are differentially associated with the individual CSF AD biomarkers. Molecules selected within each LF are therefore associated with the CSF AD biomarkers. Indeed, twenty out of thirty-seven proteins were correlated with at least one CSF AD biomarker. Only two neuroinflammatory molecules displayed no correlation with CSF AD biomarkers. Conversely, only two molecules at metabolomics level and two lipids were correlated with CSF AD biomarkers. Finally, in one-carbon metabolism, total cysteine showed no correlations (Additional file 3, Table S2). As they were selected by the MOFA model, these molecules are part of LFs who have an effect on the variance within the cohort. By considering which LF they were selected from, we can infer that they are part of an interacting set of analytes who is associated with changes in CSF AD biomarkers. For example, despite showing no correlations with any CSF AD biomarker, total cysteine is selected by our MOFA model in LF1, LF2, and LF5. Because these LFs have strong associations with Tau, P-Tau, and Aβ1-42 (Fig. 4), total cysteine is related to a pathway or group of interacting molecules in these LFs that together are associated with the markers of core AD pathology. Kininogen-1 (KNG1) also displayed no correlation with CSF AD biomarker, but we can infer it is part of a group of molecules associated with Tau and P-Tau, since it was selected in LF2. In some cases, the MOFA model revealed supplementary associations. For example, cholesteryl ester SE 27:1 16:0 showed no correlations with Aβ1-42 and strong correlations with both Tau and P-Tau. This lipid was selected by our MOFA model in LF4, however, suggesting it is part of an interactome with a strong association with Aβ1-42.

Table 6 Analytes associated with latent factors

Cross-modality interactions

Some of the identified LFs only contain a subset of the tested modalities (Fig. 2). For instance, one-carbon metabolism and metabolomics were only weakly associated with LF3 and LF4, whereas lipidomics was nearly absent from LF3 and LF5. Therefore, the contribution of individual LFs to total variance results from a specific combination of the different ‘omics modalities. In addition, individual molecules also presented different patterns of association across LFs. For example, a subset of lipids, including PC 32:0, PC 34:1, LPA 18:3, and TAG 54:3, had a strong positive association with LF2 and a weak negative association with LF4. Since LF2 was associated with all tested modalities (Fig. 2), this indicates these analytes interact within multiple biological pathways and could be within a hub of metabolic changes. Furthermore, LF2 is associated with both Tau and P-Tau indicating neurodegeneration and tau pathology could therefore relate to a more general metabolic alteration. This is supported by the association of PC 32:0 with tau pathology in single ‘omics. In contrast, LF4 is strongly associated with amyloid pathology and it is only associated with changes in lipids and proteins (in addition to CSF AD biomarkers). Therefore, only a subset of lipids directly interacts with amyloid pathology. Taken together, these results show that the different aspects of AD pathology derive from fundamentally different biological pathways and alterations.

Prediction of AD pathology and cognitive decline using MOFA-selected molecules

In addition to associations with CSF biomarkers of AD pathology, we found associations with AD reported in the literature for 37 out of 58 metabolites selected by our MOFA model (Table 6). Also, 29 of the selected proteins and 5 lipids correlated with either baseline CDR-SoB score or MMSE score Additional file 3, Table S3), while,14 of the selected proteins were associated with the presence of cognitive impairment at baseline in regression models (Additional file 3, Table S4). Therefore, in order to confirm a posteriori the importance of the molecules with high weights selected from LFs within the MOFA model, we evaluated their ability to predict either cerebral AD pathology or global cognitive decline. The model for AD prediction selected four analytes: protein 14-3-3 zeta/delta, clusterin, interleukin-15, and transgelin-2, that together improved the AUC of the ROC curve when compared to the reference model (Fig. 5a, p value = 0.002). In addition, both sensitivity (0.71 to 0.86) and specificity (0.87 to 0.96) were improved from the reference model. Further, adding to a reference model for the prediction of cognitive decline, four selected molecules, protein 14-3-3 zeta/delta, clusterin, cholesteryl ester 27:1 16:0, and monocyte chemoattractant protein-1, improved its AUC (Fig. 5b, p value = 0.0047). This also improved sensitivity (0.56 to 0.80) but not specificity (0.89 to 0.88). For both prediction of cerebral AD and of cognitive decline, the addition of single molecules to the reference models did not improve prediction (data not shown).

Fig. 5
figure 5

Clinical predictions. Binary logistic regression models to improve clinical predictions. a ROC curves and AUCs for the reference model including APOE status (green) and the final prediction model of AD pathology (red) obtained after addition of four analytes (14-3-3 zeta/delta, clusterin, interleukin-15, and transgelin-2) selected by the MOFA model. b Confusion matrix of the final prediction model of AD. c ROC curves and AUCs for the reference model including APOE status (green) and the final prediction model of cognitive decline (red) obtained after addition of four analytes (14-3-3 zeta/delta, clusterin, cholesteryl ester 27:1 16:0 and monocyte chemoattractant protein-1) selected by the MOFA model. d Confusion matrix of the final prediction model of cognitive decline

Metabolic pathway enrichment

Using the Reactome database and coarse-grain ontological categories (See the “Methods” section and Additional file 3, Table S1), we investigated which biological pathways were over-represented within each LF for the proteomic modality. Other modalities were not analyzed in this fashion since they were selected a priori to represent distinct metabolic pathways (one-carbon metabolism and inflammatory markers) or did not contain enough molecules to conduct pathway analysis. Lipids were also excluded from this analysis since our quantification method did not allow to dissociate between different isoforms of compounds with the same chemical formula. This approach revealed an overrepresentation of the hemostasis (28.8%), immune response (20.8%), and extracellular matrix signaling pathways (8.8%) (Fig. 6).

Fig. 6
figure 6

Pathway enrichment. Pathway enrichment analysis of identified proteins across LFs and overall. The number of over-represented categories within each LF (expressed as a percentage) as well as across all LFs is represented. NB: the low number of analytes associated with LF3 did not allow for an enrichment analysis

Discussion

Here, we applied a multi-layered integrative approach to disentangle sources of variance within a cohort of elderly participants with normal cognition, mild cognitive impairment, or mild AD dementia. We identified five major dimensions of heterogeneity that together comprehensively explained the variance within the cohort and were associated with core AD pathology. Further analysis revealed multiple interactions between single ‘omics modalities, distinct multi-omics molecular patterns differentially associated with amyloid aggregation, neurodegeneration, and tau hyperphosphorylation, and novel molecules associated with cognitive impairment. Specific signatures of four molecules improved the accuracy of both AD and cognitive decline prediction. Additionally, pathway enrichment showed over-representation of the hemostasis, immune response and extracellular matrix signaling pathways in association with AD.

Single modality feature selection

We first used Elastic-net regression, to identify molecules associated with individual biomarkers of CSF AD pathology without considering any possible interactions between different ‘omics modalities. This approach identified several proteins (SPARC-related modular calcium-binding protein 1, brain acid soluble protein 1, neuromodulin, pyruvate kinase PKM, thymosin beta-10, 14-3-3 protein zeta/delta, and fructose-bisphosphate aldolase A) in strong accordance with recent studies of the AD CSF proteome [24, 65]. The zeta/delta isoform of protein 14-3-3 was associated with Aβ1-42, Tau, and P-Tau levels. This apoptosis inhibitor, one of the most abundant proteins in the brain, was previously found to exhibit altered levels in AD and modulate AD risk [66, 67]. We also identified associations of neurofilament medium polypeptide with Tau levels and of reelin with Aβ1-42 and Tau levels. Both these molecules have previously been associated with AD [68,69,70]. Regarding neuroinflammatory molecules, C-reactive protein and monocyte chemoattractant protein-1 have previously been associated with AD, albeit in plasma [71]. In addition, we have also previously shown that soluble intracellular cell adhesion molecule-1 in CSF is associated with AD [30]. At metabolite level, we identified 10 molecules in CSF associated with Tau and P-Tau, which differ from the blood biomarkers associated with AD identified in a recent study in a large sample [72]. Overall, our approach identified more molecules associated with AD pathology as compared to previous studies. A likely source of differences is the use of Elastic-Net regression in the current study which eliminates saturation of the regression and could therefore identify more associations. On the other hand, our approach did not identify any lipids or metabolites associated with Aβ1-42 levels. It is possible that it is not the individual levels of these molecules that are associated with Aβ1-42 levels, but rather lipidomic pathway alterations where associations of individual lipids are weak but the overall pathway in which they are embedded is strongly associated with amyloid pathology.

Heterogeneity within the cohort

An important strength of our study is to consider all interactions between multiple biological levels and their associations with the heterogeneity within the cohort. This was achieved by training a MOFA model on the multi-omics dataset which has the advantage of not giving any additional analysis weight to the established CSF biomarkers of core AD pathology while also reducing the complexity of the data to better depict the sources of variation. This revealed proteomic measures and CSF core AD biomarkers as the main contributors to the variance with both having very similar contributions to variance, albeit from 768 proteins versus 3 AD biomarkers. The biomarker contribution was expected as our sample contains a large proportion of participants with AD, each displaying CSF AD biomarkers significantly different from subjects without AD. The large contribution of proteomics to variance could derive, at least in part, from the fact that protein expression levels reflect the effects of different environments, life style, health conditions, and genetic backgrounds; all factors potentially affecting protein expression and regulation [73]. Nonetheless, MOFA analysis identified 21 proteins with previously reported association to AD, suggesting, along with the associations with AD biomarkers observed here, that protein contribution to variance is linked to AD pathology. These findings further show that the MOFA approach can accurately disentangle the inter-individual heterogeneity driven by AD pathology and differentiate between individual (i.e., not repeated in the dataset) and cohort heterogeneity (i.e., underlying changes in many participants). Conversely, the metabolomic dataset was only responsible for a small amount of the cohort heterogeneity (3.7%), a possible explanation being that it represents individual heterogeneity for the most part caused by the environment, disease processes, or nutritional habits. This low contribution of metabolomics to variance could also result from the lower dimensionality of the metabolomics dataset as molecules within had lower concentrations compared to molecules in the other modalities. Yet, despite this low level of variance, our model was able to correctly retrieve metabolites previously reported in association with AD, underlining the sensitivity of the model. This is further supported by the ability of our approach to determine a four-molecule signature that improves the prediction of AD pathology.

Associations between LF and specific aspects of AD pathology

Our analysis revealed that LFs 1–3 were primarily associated with neuronal injury while LF4 and LF5 were mainly associated with amyloid pathology. In addition, both Tau and P-Tau were negatively associated with LF3, while Aβ1-42 presented a positive, albeit weaker, association with this same LF. Conversely, Tau and P-Tau were positively while Aβ1-42 was strongly negatively associated with LF4. Whether the LF3 and LF4, that show congruent relationships with amyloid aggregation, tau pathology and neurodegeneration, may be particularly relevant for AD remains to be confirmed. The finding that the zeta/delta isoform of protein 14-3-3, selected in LF4, is associated with all AD CSF biomarkers along with its contribution to predictive models of cerebral AD is in line with the hypothesis that these LFs are involved in AD pathology.

Interactions between LFs and ‘omics modalities

Besides the identification of molecular profiles and metabolic pathways alterations associated with AD, our approach offers the unique ability to disentangle how components of individual LFs interact with each other to explain variance within the cohort. This not only reveals specific interactions between subsets of molecules and particular metabolic pathways but also offers a unique view into how multiple biological levels interact in the context of AD pathology and how they are related to specific aspects of the pathology. In addition, this approach has the advantage of not being biased towards any known biological alteration of AD pathology or giving any particular weight to a specific molecule or metabolic pathway. In the context of AD, this could lead to the identification of pathways and alterations not directly related to the core features of AD pathology, better reflecting the heterogeneity of the disease. The presence of clusters within each LF also suggests specific groups of molecules interacting with each other across LFs. A more comprehensive analysis of the role of these clusters of analytes may be addressed in future studies.

Novel associations uncovered by the MOFA model

The MOFA model uncovered additional relationships not revealed by other analysis exploration paradigms since it does not only consider molecules from one modality but the whole dataset from different ‘omics. These additional findings may result from the downstream effects of these molecules or from interactions with other modalities. These include the association of total cysteine with CSF Aβ1-42, Tau, and P-Tau or the association of kininogen-1 (KNG1) with Tau and P-Tau. While cysteine was previously linked with AD [60], KNG-1 has been associated with other neurodegenerative disorders [74], but its association with AD, and in particular tau pathology and neuronal injury, is novel. The MOFA model also identified molecules previously associated with cognitive impairment, such as dynein light-chain 2, cytoplasmic (DYL2), and neurexophilin-4 (NXPH4). Both were associated with LF1 and with cognitive impairment. DYL2 is thought to regulate dynein function [75] and maintain cytoskeletal structure, therefore regulating synaptic function [76]. NXPH4 structurally resembles neurexophilin-1, an α-neurexin ligand, which promotes adhesion between dendrites and axons and modulates specific cerebellar synapses and motor functions [77]. Altered levels of these proteins may therefore be associated with neurodegeneration processes and related to cognitive impairment in AD. Another novel analyte we identified is the cholesteryl ester SE 27:1 16:0. While links between phosphatidylcholine metabolism and AD in general [78] and PC 32:0 in particular [64] have been previously reported, to our knowledge, cholesteryl esters have not previously been associated with AD pathology. In our MOFA model, this cholesteryl ester was strongly correlated to LF4, suggesting a role in amyloid pathology. These molecules were also associated with cognitive performance as measured by MMSE. Together, these results demonstrate the capacity of integrative multi-omics to provide additional insights into the relationship of molecular alterations with specific aspects of the AD pathology.

Prediction of AD pathology and cognitive decline using MOFA-selected molecules

Molecular signatures associated with AD or predictive of cognitive decline were derived from our model. Both signatures contain four molecules each, taken from multiple biological levels and significantly improved prediction performance when added to reference models. The four molecules from the combination related to AD pathology have each been associated with AD previously [38, 42, 49, 54]. From the molecule combination that improved prediction of cognitive decline, three molecules have been linked to cognitive decline in previous reports [79,80,81], while one molecule, cholesteryl ester 27:1 16:0 was not. Both signatures also share two common molecules, protein 14-3-3 zeta/delta and clusterin, suggesting these belong to common biological pathways both associated with AD and relevant for cognitive decline. Cholesteryl ester 27:1 16:0 and monocyte chemoattractant protein-1 may indicate pathway alterations without a strong and direct link to core AD pathology but having impact on the rapidity of cognitive decline. These predictive models also demonstrate the ability of this approach to identify biomarker candidates for both AD pathology and cognitive decline. Additional investigation and validation in independent cohorts is required before possible clinical use.

Infer pathway relationships with AD pathology

One important strength of the MOFA approach is that it enables addressing the relationship between multiple biological pathways and associate them with sources of variance (i.e., LFs). Using over-representation of metabolic pathways, we were able to show that individual LFs, and the main related pathological aspects of AD (i.e., amyloid aggregation, neurodegeneration and tau pathology) are associated with distinct pathways. Hemostasis and immune response were the most over-represented. Only the immune response was associated with all LFs in which individual pathways could be identified. LF1 and LF2 presented a significant enrichment in biomolecules implicated in hemostasis, suggesting an association between this pathway and neuronal injury, and tau pathology. While an association between hemostasis and amyloid pathology pathway was previously described [82], in particular related to expression of amyloid precursor protein and release of Aβ [83], there have also been recent reports of an association between Tau and hemostasis [84]. Molecules involved in the extracellular matrix were significantly enriched in LF2, also suggesting an association with tau-related pathology, in line with previous reports [85]. However, this pathway was not detected within LF1 or other LFs. We therefore hypothesize that the molecules involved are those presenting a specific pattern of association with LF2, such as PC 32:0, PC 34:1, LPA 18:3, and TAG 54:3. Neuronal function was confined to associations with LF5, suggesting little variation and differences in signal transmission and synaptic function across the cohort since this LF only explained 8% of the variance. Nonetheless, this result suggests an association with amyloid pathology, which is in accordance with previous findings of amyloid being released in an activity-dependent fashion from neurons and modulating synaptic function and plasticity [86, 87]. Overall, the enriched metabolic pathways suggest that AD pathology affects not only pathways related to neuronal biological systems but is linked to a broader spectrum of metabolic dysfunctions.

Limitations

A limitation of this study is the different amounts of analytes measured by individual quantification method (i.e., > 500 proteins measured vs. < 100 metabolites/lipids for example) resulting from methodological differences. This approach prevents measuring the relative importance of the analytes or their combinations but allows the identification of altered pathways or molecular signatures from different modalities. Furthermore, since the data entered in the MOFA analyses did not include information regarding clinical stages, stage-specific alterations have not been addressed. Also, the inclusion of some targeted analysis results in the multi-omics models may be considered as a limitation. While the proteomic and lipidomic dataset are hypothesis-free measurements and the study could be limited to this data, we chose to include further available modalities. In particular, we considered neuroinflammation and one-carbon metabolism given their previously reported associations with AD and relevance for brain metabolism. However, the replication of these and other previously reported associations in our MOFA model along with the ability of the MOFA selected marker combinations to improve prediction of AD and of cognitive decline support the validity of the new findings revealed in the present study.

Conclusions

Here, applying integrative multi-omics in AD, we have identified five axes of variation within a cohort of individuals with normal cognition or with cognitive impairment. These five LFs were associated with different aspects of the core AD pathology. We confirmed several previously reported associations with AD and identified new molecular patterns interrelated within each LF. Additionally, we identified molecular biomarker signatures improving the diagnosis of AD pathology and the prediction of future cognitive decline. Furthermore, using pathway enrichment analysis, we have revealed metabolic pathways represented within single LFs and explored specific relationships with markers of amyloid pathology, neuronal injury, and tau hyperphosphorylation. These findings demonstrate the added value of integrative multi-omics analysis to uncover interrelated pathway alterations in AD and its ability to identify biomarker combinations that may be used in clinical practice. This is relevant for the development of both personalized diagnosis and tailored therapeutic interventions in AD.