Background

Alzheimer’s disease is a progressive neurodegenerative disease and the most common cause of dementia, inflicting enormous personal, social, economic and societal costs [1]. Classical AD pathology comprises amyloid beta (Aβ) plaque deposition and hyperphosphorylated tau (p-tau) tangle accumulation [2], although Aβ plaques are frequent in brains of cognitively unimpaired elderly individuals [3]. Early diagnosis, essential for effective interventions, is difficult. Genetic, behavioural, imaging and fluid biomarker methods have all been proposed. In cerebrospinal fluid (CSF), levels of the Aβ42 and Aβ40 peptides, total or hyperphosphorylated tau (tTau or pTau), neurofilament light (NfL; nonspecific marker of neuronal damage) and glial fibrillary acidic protein (GFAP; marker of astrocytic reactivity) collectively establish the AT(N) (amyloid/tau/neurodegeneration) diagnostic framework for AD, currently the best predictive biomarker set [4]. Given the invasive nature and cost of CSF collection and brain imaging, reliable blood biomarkers that facilitate AD diagnosis are much needed; plasma AT(N) biomarkers can be informative in predicting AD [5, 6], but other biomarkers are needed. AD genetics point to the immune system as a potential source.

Inflammation has long been recognised as a culprit in neurodegenerative diseases, with many chronic inflammatory conditions, including obesity, diabetes and inflammatory bowel disease, associated with increased risk for AD [7,8,9]. An intact blood brain barrier (BBB) ensures brain immune privilege; however, BBB disruption occurs in AD permitting immune mediators from the periphery to penetrate the brain parenchyma and cause brain inflammation. Neuroinflammation markers, including pro-inflammatory cytokines [10], have been proposed as biomarkers for neurodegenerative diseases. Complement is a core part of the innate immune system and potent driver of inflammation in immune defence and in pathology. Evidence implicating complement in AD emerged in the 1980s; immunostaining of brain tissue showed C1q, C3, factor H (FH), and clusterin colocalising with Aβ plaques and surrounding sites of neuronal damage [11, 12]. Moreover, aggregated Aβ directly activated the complement cascade by interacting with C1q [13], and C1s displayed chaperone activity to inhibit aggregation of Aβ1–42 fibrils in vitro [14]. Many studies of complement proteins as AD biomarkers in plasma and/or CSF have been published with inconsistent results, although plasma clusterin consistently emerges as a biomarker for AD [15, 16].

Sporadic AD shows a significant contribution from genetics, accounting for 68–79% heritability for late onset (after 65 years) AD (LOAD), rising to over 90% for those with onset before 65 (early onset; EOAD) [17]. Seventy-five genome-wide significant AD risk loci have been identified to date [18]; these include the complement genes CLU and CR1, both in the top 5 most significant hits in GWAS [19], and recently a suggestive association with C1S (OR 1.05, p = 9.9 × 10–7) [18]. The risk variants in these genes are single nucleotide polymorphisms (SNPs) in non-coding regions (rs11136000, rs9331888 in CLU; rs6656401 in CR1; rs3919533 in C1S), or within exonic regions, causing amino acid substitutions (rs6691117 in CR1; I2065V). The CLU minor allele at rs11136000 is associated with reduced AD risk [20]. AD risk SNPs in CR1 are associated with CR1 length polymorphism; the minor allele at rs6656401 marks carriers of the CR1*2 isoform expressing an additional long homologous repeat and C3b binding site [21]. Initially identified as risk for age related macular degeneration (AMD), SNPs in FH have been associated with rate of cognitive decline in AD and shown to modify FH mRNA expression in the brain, leading to impaired complement regulation [22, 23]. For other non-coding SNPs in complement genes, mechanism of effect is not known.

While previous large genetic studies have performed GWAS to identify new loci associated with AD, GWAS using an intermediate phenotype (including plasma or CSF protein levels, imaging data, or any other quantifiable trait) allow a deeper analysis of uncharacterised mechanisms of effect [24,25,26,27,28]. The endophenotype more directly interrogates the effect of a single gene, is genetically simpler, evaluates the direct effect of the SNP and is less impacted by other genetic and biological variables than conventional GWAS. The resultant reduction in “noise” increases statistical power enabling the use of smaller cohorts. Using protein biomarker concentrations as endophenotype permits interpretation of the role of GWAS-implicated intronic SNPs by linking them directly to a change in plasma levels. This approach can also identify novel SNPs that influence the expression of a protein of interest and increase statistical power by focusing on a single quantitative trait [29, 30]. Although our focus was on AD GWAS-implicated SNPs in complement genes, the endophenotype approach enables additional complex genetic analysis to screen for all genomic loci that associate with changes in plasma levels of the complement biomarkers, critical because protein expression is determined by a multi-locus consensus involving coding and non-coding regions.

We measured five complement biomarkers, clusterin, soluble CR1 (sCR1), C1s, C1q and FH, selected based on genetic or functional association with AD, in a cohort comprising 504 cognitively unimpaired elderly controls, 912 EOAD and 492 LOAD subjects. Predictive value of individual complement biomarkers and sets of complement biomarkers in combination with AT(N) markers was assessed in ROC analysis. Complement biomarker levels were then used as endophenotype in a GWAS to identify SNPs that impacted complement biomarker levels.

Main body

Methods

Subjects

Sporadic AD and control plasma samples [n = 504 control, 912 EOAD (onset < 65 years), 492 LOAD (onset > 65 years)] were a subset of the AD Cardiff Cohort, collected between 2004 and 2020 from individuals recruited from UK community and hospital settings using MRC, Moondance Foundation, and Health and Care Research Wales (HCRW) funding (Table 1). All individuals were of Caucasian descent. The effect of storage time was tested and found not to significantly affect concentrations of the measured proteins.

Table 1 Cohort clinical, demographic, and genetic information

AD diagnosis was established using a comprehensive, standardized and validated clinical and neuropsychological assessment [31], in accordance with the National Institute of Neurological and Communication Disorders and Stroke and the Alzheimer's disease and Related Disorders Associations (NINCDS-ADRDA) clinical diagnostic criteria for AD. All diagnoses were based on a semi-structured interview with known validity for AD pathology (i.e. positive predictive value of 92–95%) which included: mini-mental state examination (MMSE); The Cambridge Mental Disorders of the Elderly Examination (CAMDEX; informant interview); The Blessed Dementia Scale; The Bristol Activities of Daily Living Scale; Webster Rating Scale; Global Deterioration Scale (GDS); Cornell Scale for Depression in Dementia; Neuropsychiatric Inventory (NPI) (12-item version) [31, 32]. Control subjects were either spouses of AD patients or selected from primary-care practices situated in the same geographical areas. Assessment of controls was as described above, including CAMDEX and GDS; exclusion criteria were the presence of dementia, depression, delirium, or other illnesses likely to significantly reduce cognitive function. Controls were purposely selected for advanced age (age range 59–100; mean 82.5) with no dementia to reduce the potential for conversion to disease [33]. Age at assessment, sex, MMSE score, and genome-wide array genotyping (Illumina 610, Illumina 550, or global screening array) was available for most samples. Age at onset was also available for 1396 cases, and disease duration was calculated for these. Ethical approval was obtained from the Multi-centre Research Ethics Committee, relevant local ethics committees and NHS trusts in the recruiting regions. Demographics are summarised in Table 1.

Measurement of complement proteins by ELISA

Five complement components (clusterin, sCR1, C1s, C1q, FH) were measured in all plasma samples by ELISA. Antibodies, protein standards, and assay characteristics are detailed in Additional file 1: Table S1. Plasma samples stored at − 80 °C were defrosted immediately prior to assay, vortexed briefly, diluted in phosphate-buffered saline containing 0.05% Tween-20 (PBST) and 0.2% bovine serum albumin (BSA) and kept on ice or stored at − 80 °C until used.

Capture antibodies were immobilised overnight at 4 °C on 96-well immunoplates (Fisher Scientific #1039451) at concentrations between 2–20 µg/ml in 50 µl/well carbonate-bicarbonate buffer (pH 9.6). Wells were blocked by incubation with 100 µl 2% BSA in 0.05% PBST for 1 h at 37 °C, washed once with PBST, and plasma samples or protein standards (50 µl) added at a suitable dilution (Additional file 1: Table S1). Plates were incubated for 90 min at 37 °C, washed three times and detection antibodies added at concentrations between 1–2 µg/ml in 50 µl/well 0.2% BSA in PBST for 1 h at 37 °C. For assays where the detection antibody was not directly labelled, HRP-labelled secondary antibody (anti-mouse or anti-rabbit IgG as appropriate, Jackson ImmunoResearch #715-035-151, #711-035-152) was added to washed plates at a suitable dilution for 1 h at 37 °C. Finally, plates were washed and developed using OPD substrate (Sigma-Aldrich, #P9187) for 3–15 min (consistently for each assay), followed by addition of 5% H2SO4 to quench the reaction. Optical densities were read at 492 nm using a microplate reader (Infinite F50, Tecan #30190077). All samples were measured in duplicate, blinded to diagnosis. Intra- and inter-assay coefficients of variation were below 15% for all assays.

Measurement of p-tau181, Aβ40, Aβ42, NfL, GFAP

Plasma concentrations of p-tau181, Aβ40, Aβ42, NfL and GFAP had previously been measured in these samples using Simoa assays (Quanterix, Billerica, MA, USA) [34]. The measurements were performed in one round of experiments using one batch of reagents with the analysts blinded to diagnosis and clinical data. Intra-assay coefficients of variation were below 10%.

Statistics

Data were analysed by constructing an 8-point standard curve using known concentrations of pure protein for each assay, interpolating the averaged optical density values for each sample on the curve, and multiplying the obtained values by the dilution factor. Data were plotted using GraphPad Prism 5, tested for normality using the Kolmogorov–Smirnov test and found not to be normally distributed. Data were analysed statistically (α = 0.05) using IBM SPSS Statistics 26 by Mann–Whitney, Kruskal–Wallis tests with Dunn’s multiple comparisons post-hoc test, generalised linear models adjusting for sex and age (formula: Protein ~ Age + Sex + Phenotype) or Pearson correlations as appropriate. To test the effect of SNPs on protein levels, an interaction term between SNP and disease was used (formula: Protein ~ Age + Sex + Phenotype*SNP).

For ROC analyses, a series of generalised linear models (GLMs) using different combinations of protein measurement data were constructed in R using the base “stats” package, with a “binomial” model for error distribution and link function specified. EOAD and LOAD phenotypes were separated and classified as “1” for EOAD, “2” for LOAD and “0” for controls. The GLMs followed the formula pattern: Phenotype ~ Protein 1 (+ Protein 2…). A stepAIC (Akaike Information Criterion) model was run for both EOAD and LOAD to identify the optimal features to retain in the final model. To enhance the generalisability and applicability of our approach, models with fewer protein measurements were favoured. Data were randomly split 70:30 into “training” and “test” sets to prevent over-fitting and stratified to maintain case/control proportions; area under the curve (AUC) of the “test” data in ROC analysis was reported. Prior to analysis, protein levels were adjusted for age and sex and standardised to a mean of 0 and standard deviation of 1 to maintain equal contributions of each protein to analyses and prevent bias from proteins with wider ranges; both unadjusted and adjusted values were tested in the ROC analysis. The different models were compared using ROC curves; 95% confidence intervals were calculated using the default ‘bootstrap’ method with 2000 replicates for each AUC to provide a measure of uncertainty and model stability.

GWAS

All individuals included in the analysis had both genetic and biomarker information available. Genotype data were quality controlled (QCed) as described previously [35] and imputed via the Michigan Imputation Server using Minimac3 [36] and the Haplotype Reference Consortium reference panel. The data were combined and QCed with heterozygosity abs(F-het) > 5%, missing data proportion per person > 5%, related individuals with \(\hat{\pi }\) > 0.2, gender mismatch, or were population outliers based on European population from 1000 Genomes [37]. Variants with minor allele frequency (MAF) < 1%, missing data proportion (MISS) > 5%, or Hardy–Weinberg Equilibrium (HWE) p ≤ 10–6 were excluded. To exclude batch effects, an association test was run between controls and variants; those variants with p-value < 0.001 were excluded, retaining 480,021 variants. Genetic data were aligned to human genome assembly GRCh37/hg19 and imputed as described above. Finally, all data were combined, related individuals with \(\hat{\pi }\) > 0.2 were removed, and variants with MAF < 5%, poor accuracy of imputation (INFO) < 0.8, MISS > 5% or HWE p ≤ 10–6 were removed. After these corrections the final dataset contained 4,618,496 variants.

Genome-wide SNP-based association analyses were performed for each complement biomarker using linear regression modelling with PLINK as previously described [34]. Association analyses of SNPs with the biomarkers were adjusted for age and sex, five principal components (PCs) and case–control status (“caseness”), the latter introduced to reduce the variation due to differences in association patterns of biomarkers between cases and controls while retaining all available samples in the analysis to maintain statistical power. Genetic data for index SNPs associated with AD in CLU, CR1, C1S, and APOE (rs7412, rs429358) were extracted using PLINK, and LD between them determined using PLINK (https://www.cog-genomics.org/plink/2.0/, [38]). Manhattan plots were generated using qqman library in R, and association results for a particular gene/region were visualised using LocusZoom online tool (http://locuszoom.org, [39]). Raw GWAS data were uploaded to LocusZoom and set for PLINK analysis, the SNP of interest was specified under “region to display” with a ± 50–200 Kb flanking size, and R-square was determined in relation to the specified SNP, or the most significant SNP for CFH. R2 scores were calculated using the hg19/1000 Genomes Nov 2014 EUR dataset.

Results

Levels of complement proteins in plasma are altered in AD

We measured complement components C1q, clusterin, sCR1, C1s, and FH, selected because each has been implicated in AD by immunohistochemistry, biomarker analyses or genetics. Compared to controls, early and late AD cases had significantly higher levels of plasma C1q (149.8 µg/ml EOAD, 142.9 µg/ml LOAD, 107.9 µg/ml control; p < 0.001, Fig. 1A) and clusterin (224.6 µg/ml EOAD, 205.8 µg/ml LOAD, 195.8 µg/ml control, p < 0.005; Fig. 1B). In control but not AD subjects, clusterin levels were significantly lower in men compared to women (206.8 vs 182.8 µg/ml, p < 0.001; Fig. 1B). Compared to controls, early and late AD cases had significantly lower levels of plasma sCR1 (15.29 ng/ml EOAD, 15.32 ng/ml LOAD, 16.74 ng/ml control; p = 0.031, Fig. 1C) and FH (353.7 µg/ml EOAD, 359.7 µg/ml LOAD, 382.3 µg/ml control, p < 0.05, Fig. 1E). Plasma sCR1 levels were lower in males compared to females in AD and control samples (14.83 vs 15.82 ng/ml in AD, p < 0.001; 15.41 vs 17.82 ng/ml in control, p < 0.001, Fig. 1C). In AD cases, FH levels were significantly lower in men compared to women (346.3 vs 366.2 µg/ml, p < 0.01, Fig. 1E). Plasma levels of C1s did not differ between AD and control subjects (28.22 µg/ml EOAD, 28.02 µg/ml LOAD, 27.72 µg/ml control, Fig. 1D). In control but not AD samples, males had significantly lower levels of C1s compared to females (26.52 vs 28.71 µg/ml, p < 0.05, Fig. 1D).

Fig. 1
figure 1

Complement proteins in AD. A C1q plasma levels were significantly elevated in both early and late onset AD compared to controls; there were no significant gender differences in C1q levels. B Clusterin levels were significantly elevated in early and late onset AD compared to controls; levels were significantly higher in females compared to males in controls but not AD. C Soluble CR1 levels were significantly decreased in AD compared to controls though significance was lost when split into early and late onset AD groups; levels were significantly lower in male subjects in AD and control groups. D Plasma C1s levels did not differ significantly between AD and controls; levels were significantly lower in males compared to females in the control group. E FH levels were significantly decreased in both early and late onset AD compared with controls; levels were significantly lower in males compared to females in the AD group. Data were corrected for age and sex; results are shown as mean ± SD, analysed using Mann–Whitney tests or generalised linear models including age and sex as covariates. *p < 0.05; **p < 0.01; ***p < 0.001. M male, F female. Numbers for each data set are shown in Table 1

Complement protein levels correlate with each other and some AT(N) markers

To identify patterns of dysregulation in the complement system and relationships between complement and already established AD biomarkers, we performed correlation analyses. Significant positive correlations were identified between C1s/C1q, C1q/FH, C1s/FH in both AD and control groups (p < 0.001, Additional file 1: Table S2). No significant correlations were identified between complement protein levels and MMSE score, age at onset/study inclusion or disease duration. In controls, significant positive correlations were found between complement and AT(N) biomarker levels: clusterin/NfL; sCR1/GFAP, sCR1/NfL; C1q/Aβ40, while in the AD group there were significant negative correlations between clusterin/Aβ40, clusterin/Aβ42 (Additional file 1: Table S2).

ROC analyses identify complement proteins that distinguish AD from controls

ROC analyses were performed on complement and AT(N) biomarkers to determine their utility in distinguishing LOAD and EOAD from control (Fig. 2). The AD groups were analysed separately because direction of effect for each protein (Additional file 1: Fig. S1) and distributions of protein concentrations across groups and ages (Additional file 1: Figs. S2–S3) differed significantly between LOAD and EOAD (Additional file 1: Table S3). Protein levels, unadjusted (Fig. 2A, B) and adjusted for age/sex (Fig. 2C, D) were included in ROC analyses for comparison. Among complement proteins analysed individually, C1q reached the highest AUC (0.601 for EOAD, 0.655 for LOAD), while amyloid markers performed best among AT(N) proteins (Aβ40 0.616 for EOAD, 0.657 for LOAD; Aβ42 0.611 for EOAD, 0.665 for LOAD; Fig. 2A–D); the Aβ42/Aβ40 ratio had a low AUC in both AD groups (0.491 EOAD, 0.561 LOAD). StepAIC models were used to identify best model for each phenotype. For EOAD, combining Aβ40, Aβ42/Aβ40, GFAP and C1q gave an AUC of 0.681 (AIC 1064.17); for LOAD, Aβ40, Aβ42/Aβ40, pTau181, NfL, GFAP, CR1, C1q, FH gave an AUC of 0.824 (AIC 644.41) (Additional file 1: Table S4). Inclusion of other biomarkers did not significantly improve performance (Fig. 2E, F). DeLong tests confirmed that levels of the implicated proteins had a significant effect on the predictive capacity of the overall model after accounting for age, sex and APOE status (Additional file 1: Table S5). Complete ROC statistics for the protein biomarker comparisons between EOAD and controls, LOAD and controls and EOAD and LOAD, both unadjusted and adjusted for age and sex are shown in Additional file 1: Tables S6 and S7.

Fig. 2
figure 2

ROC Curves demonstrate the AD predictive capacity of complement biomarkers. ROC curves were generated using multiple GLMs for each protein. AUC statistics for individual (AD) or combined (E, F) analytes are shown for EOAD (A, C, E) and LOAD (B, D, F). 95% confidence intervals, calculated using bootstrapping with 2000 replicates, are included in brackets for each GLM. Proteins unadjusted (A, B) and adjusted (C, D) for age and sex were included for comparison. Linear predictors were selected based on results of stepAIC models of adjusted data for EOAD (E) and LOAD (F), with the best model based on highest AICs relative to model complexity plotted first (EOAD = Aβ40 + Aβ40/Aβ42 + GFAP + C1q; LOAD = Aβ40 + Aβ40/Aβ42 + pTau18 + NfL + GFAP + CR1 + C1q + Factor H for LOAD) and then 3 regressive steps from the stepAIC results plotted sequentially

Clusterin levels are not impacted by AD risk SNPs or APOE status

SNPs in CLU previously identified in AD GWAS [18] [rs11136000, rs9331896, rs2279590 in linkage disequilibrium (LD; r2 > 0.85) and rs9331888 (r2 with rs11136000 = 0.26)] were tested for impact on plasma clusterin by comparing levels in carriers of the minor alleles at rs11136000 (T/C, used as surrogate for SNPs in LD) and rs9331888 (G/C) with carriers of the major allele. Neither the SNP cluster defined by rs11136000 nor the unlinked SNP rs9331888 impacted clusterin levels, whether assessed in the whole sample set, AD alone or control alone (Fig. 3A, B). When using an interaction term between SNPs and disease status in a series of GLMs, no significant effects were found for either rs11136000 or rs9331888 on clusterin levels (Additional file 1: Table S8). There was no significant effect of APOE status on clusterin levels, although there was a non-significant trend towards increased levels in APOEε4 carriers (Fig. 3C). To determine whether any other SNPs in the genome were associated with plasma clusterin levels, a GWAS was performed with plasma clusterin as endophenotype. The resulting Manhattan plot did not show any genetic variants in CLU or elsewhere in the genome that significantly associated with clusterin plasma levels (Fig. 3D).

Fig. 3
figure 3

Effects of genetic variants in CLU on clusterin protein levels. A, B No significant effects of either rs11136000 (A: marking the LD block) or rs9331888 (B) in CLU were identified on plasma levels of clusterin). C Clusterin levels were not significantly different between APOE genotypes. Data are shown as mean ± SD and were analysed statistically using Kruskal–Wallis test with Dunn’s multiple comparisons post-hoc test. Numbers for homozygote and heterozygote carriers of each SNP are shown in Table 1. D Manhattan plot of GWAS results of the whole sample set using plasma clusterin levels as an intermediate phenotype found no significant variants

AD risk SNPs in CR1 significantly impact sCR1 levels

The impact of AD-associated SNPs in CR1 on plasma sCR1 levels was tested. Presence of the minor allele at rs6691117 (Ile2065Val, G/A) associated with significantly reduced plasma sCR1 levels in the combined cohort but when separated into AD and control groups the significance was lost (Fig. 4A). The intronic SNP rs6656401 (A/G) marks the AD-associated copy number variation in CR1; presence of the minor allele was associated with significantly higher sCR1 levels in the whole population and after separation into AD and control groups (Fig. 4B). An interaction term between SNPs and disease status in a series of GLMs revealed no significant effects for either rs6691117 or rs6656401 on plasma sCR1 levels (Additional file 1: Table S8). GWAS on the combined cohort identified a cluster of genome-wide significant loci in chromosome 1; LocusZoom views showed that these were within CR1 in high LD with rs6656401, further strengthening the robust association between the rs6656401-tagged LD block and increased plasma sCR1 levels (Fig. 4D). In contrast, rs6691117 did not reach GWS for association with plasma sCR1 changes (Fig. 4E).

Fig. 4
figure 4

Effects of genetic variants in CR1 on sCR1 protein levels. A, B The minor allele (G) at rs6691117 was significantly associated with a decrease in sCR1, while the minor (A) allele at rs6656401 was strongly associated with increased sCR1 levels in plasma. Data are shown as mean ± SD and were analysed using Kruskal–Wallis with Dunn’s multiple comparisons post-hoc tests. *p < 0.05; **p < 0.01; ***p < 0.001. Numbers for homozygote and heterozygote carriers of each SNP are shown in Table 1. C Manhattan plot of GWAS results on the whole sample set (n = 1667) with plasma sCR1 as endophenotype identifies multiple loci in the CR1 gene on chromosome 1 significantly associated with changes in plasma sCR1 levels. D, E LocusZoom plots in the region identify a cluster of significant SNPs in high LD with rs6656401 (D), while rs6691117 was below the significance threshold (E)

C1s levels are significantly affected by SNPs in C1S

Carriers of the minor allele at rs3919533 (C/T) in C1S had significantly lower C1s levels (Fig. 5A). GWAS performed on the combined cohort identified significant hits impacting C1s concentration in chromosome 12; LocusZoom views revealed a cluster of significant SNPs within C1S on chromosome 12 in moderate LD with the risk SNP rs3919533 located upstream of C1S (Fig. 5B, C).

Fig. 5
figure 5

Effects of genetic variants in C1S on C1s protein levels. A Presence of the minor allele (C) at rs3919533 was gene dose-dependently associated with lower C1s levels in both the AD and control groups. Data are shown as mean ± SD and were analysed statistically using Kruskal–Wallis tests with Dunn’s multiple comparisons post-hoc test. **p < 0.01; ***p < 0.001. Numbers for homozygote and heterozygote carriers of the SNP are shown in Table 1. B Manhattan plot of GWAS results on the whole sample set (n = 1713) shows loci in chromosome 12 were significantly associated with changes in plasma C1s levels. C LocusZoom plot displays a cluster of significant SNPs within C1S, in moderate LD with rs3919533

FH levels are significantly affected by SNPs in CFH

The impact of AD-associated SNPs in CFH on plasma FH levels was tested. Carriers of the minor allele at rs6664877 (T/C) in CFH had a significant increase in FH levels in the combined cohort and after separation into AD and controls (Fig. 6A). GWAS performed on the combined cohort using plasma FH levels as endophenotype identified GWS hits in chromosome 1; LocusZoom views revealed clusters of significant SNPs within CFH and also in the adjacent CFHR4 gene (Fig. 6B, C). Rs6664877 (T/C) in CFH was the most significant SNP impacting FH levels (p = 1.05 × 10–9).

Fig. 6
figure 6

SNPs in CFH are significantly associated with changes in plasma FH levels. A Minor allele (T) carriers at rs6664877 had significantly higher plasma FH levels compared to major allele carriers in the combined cohort and in AD and controls analysed separately. Data are means ± SD and were analysed statistically using Mann–Whitney tests. **p < 0.01; ***p < 0.001. Numbers for homozygote and heterozygote carriers of the SNP are shown in Table 1. B Manhattan plot of GWAS results on the whole sample set (n = 1713) using plasma FH as endophenotype identifies loci in chromosome 1 significantly associated with changes in plasma FH levels. C LocusZoom analysis identifies a cluster of significant SNPs within CFH and a second cluster in the adjacent CFHR4 gene that influence plasma FH levels. Rs6664877 was the most significant hit

Discussion

As the incidence of AD climbs ever higher, there is a critical need for early detection, accurate diagnosis and prediction of disease risk. Predictive tests would enable population screening, close monitoring of at-risk individuals and pre-symptomatic intervention. Numerous studies have explored fluid biomarkers. The combination of CSF pTau, Aβ and neurofilament (AT(N)) aid diagnosis and are highly informative of disease progression, particularly useful in selection for and monitoring of clinical trials [5, 6]. However, CSF sampling is invasive and the high sensitivity assays required to measure the AT(N) markers in plasma are costly, restricting their broader use. This prompted us to seek blood biomarkers reflecting changes in the inflammatory component of AD. Complement has long been associated with AD; many studies have shown complement proteins, including C1q, clusterin, and FH, co-localising with Aβ plaques, and significant alterations in complement mRNA and protein levels in brain, plasma, and CSF in AD [12, 40]. Moreover, GWAS have consistently identified complement genes significantly associated with AD [18, 19].

We investigated plasma levels of AD-relevant complement proteins (C1q, clusterin, sCR1, C1s, FH) in AD patients and controls, and assessed their prediction accuracy in diagnosing AD alongside AT(N) biomarkers. C1q, the initiator of classical pathway activation, is present in amyloid plaques; in vitro, C1q enhanced Aβ aggregation but inhibited uptake of Aβ by microglia [41]. We found that plasma C1q was significantly increased in AD patients, suggesting increased complement activating capacity. C1q showed the best prediction accuracy among the measured complement proteins, particularly in late onset disease (AUC = 0.601 for EOAD, 0.655 for LOAD); adding in other complement biomarkers or AT(N) biomarkers did not increase the predictive accuracy of C1q alone. C1q levels are reported to increase with age and associate with age-related arterial stiffness [42]. The controls in our cohort were deliberately selected for advanced age; therefore, the observed differences cannot be explained by an age effect. C1q levels positively correlated with Aβ40 in controls, highlighting a subset of these who may be at increased risk for developing Aβ pathology.

Clusterin, also known as ApoJ, is the complement protein most studied as an AD biomarker. It is a multifaceted protein with important roles in AD pathology. It inhibits Aβ nucleation and enhances its clearance from the brain, and clusterin knockout mice develop Aβ deposition on cerebrovasculature [43,44,45]. Among its many roles, clusterin regulates the terminal pathway of complement, inhibiting formation of the membrane attack complex. Clusterin was previously reported to be increased in AD and mild cognitive impairment (MCI) plasma and CSF, although results are inconsistent [15, 46, 47]. In agreement with the consensus, we found a significant increase in plasma clusterin levels in AD patients, particularly in EOAD; however, this did not translate into good prediction accuracy in ROC analyses (AUC = 0.581 for EOAD, 0.529 for LOAD). Clusterin can readily cross the BBB, has been shown to sequester Aβ40 and prevent Aβ42 aggregation [14, 48,49,50]; the inverse correlation with plasma Aβ we identified may thus reflect ongoing changes in the AD brain. In controls, plasma levels of clusterin and NfL were positively correlated.

CLU, the gene encoding clusterin, is a major AD GWAS hit with multiple intronic SNPs identified, including a SNP cluster in tight LD (rs11136000, rs9331896, rs2279590) and an unlinked SNP (rs9331888) [19]. The SNP cluster associated with decreased AD risk and better cognitive scores [19, 20]; however, it was not associated with the increased CLU mRNA levels reported in AD brains [51]. GWAS using plasma clusterin as endophenotype revealed no significant associations between clusterin levels and variants in CLU or elsewhere in the genome, demonstrating that the observed differences in clusterin plasma levels were not caused by genetic variation in CLU or other genes included in current GWAS genotyping arrays. The report that clusterin levels were increased in brains but not plasma of APOEε4 carriers [52,53,54] provoked us to test impact of APOE status on plasma clusterin levels; no significant differences were seen.

CR1, a receptor for the complement opsonic fragments C3b/C4b that plays key roles in immune complex handling in the periphery, was linked to AD in GWAS [19]. We recently showed that CR1 is abundantly expressed in brain, and that expression is markedly increased in AD [55]. Four isoforms of human CR1 exist due to gene duplications and rearrangements, differing considerably in length. The two common forms, CR1*1 and CR1*2, comprise respectively 4 and 5 functional units termed long homologous repeats; the latter is risk for AD [2155]. In the current study, sCR1 was significantly decreased in AD patients, supporting our past findings in an independent cohort [56]; however, ROC analyses showed poor predictive accuracy for sCR1 (AUC = 0.543 for EOAD, 0.554 for LOAD). sCR1 levels correlated with NfL and GFAP levels in controls. SNPs in CR1 have previously been shown to significantly contribute to AD risk and influence plasma sCR1 levels [21, 57]. The minor allele at rs6656401, associated with expression of the risk variant CR1*2 [21], was linked to accelerated cognitive decline [58], reduced CR1 density on erythrocytes and increased sCR1 plasma levels [59]. We also found that carriers of the rs6656401 SNP, whether AD or control, had significantly increased plasma sCR1 levels. In contrast, carriers of the minor allele at rs6691117, a missense variant (I2065V) associated with decreased brain volume in MCI subjects [60], had significantly decreased sCR1 levels; the membrane-proximal position of this amino acid change suggests an effect on proteolytic cleavage of CR1. The causative association between these SNPs and changes in plasma sCR1 levels was tested using endophenotype GWAS; the rs6656401-tagged LD block associated with expression of CR1*2 was highly significant in the analysis, confirming its direct effect on plasma sCR1 levels; in contrast, the rs6691117 SNP did not reach GWS for association with plasma sCR1 changes.

C1s in the C1 complex cleaves C4 and C2 to initiate the classical complement pathway. Our interest in C1s was sparked by the recent report of a near-GWS association between rs3919533, a SNP located 5 kb upstream of C1S, and AD risk in European subjects [18]. No studies of C1s plasma levels in AD have been published, although C1s expression was decreased in CSF proteome of AD patients [61]. We found no difference in plasma C1s levels between AD patients and controls. Nevertheless, we investigated whether the GWAS-implicated SNP impacts plasma C1s levels and found that carriers of the minor allele at rs3919533 had significantly lower C1s levels. When plasma C1s levels were used as an intermediate phenotype in GWAS, we identified a cluster of highly significant SNPs located within C1S, including rs3919533, that associated with significantly decreased plasma C1s levels in both AD and control subjects. Taken together, the data suggest that variants in C1S may contribute to AD risk through changes in plasma C1s levels.

FH, an essential regulator of the alternative pathway, was previously reported to be decreased in plasma and CSF from AD patients, the latter specifically in amyloid positive cases [62, 63]. Consistent with these reports, we found a significant decrease in plasma FH in AD patients. Lower FH levels may impact regulation of the alternative pathway amplification loop. Although no variants in CFH have been reported to associate with AD risk in GWAS in Caucasian populations, there are numerous associations with other inflammatory diseases, notably AMD, a retinal neurodegenerative disease that has many similarities to AD [64, 65]. Two coding SNPs in CFH, rs1061170 (Y402H) and rs800292 (I62V) strongly impact risk of AMD. Although neither of these SNPs were significant in AD GWAS in Caucasians, both were strongly associated with AD risk and rate of atrophy in a Chinese cohort [22]. In a small case–control study in a Caucasian population, rs1061170 was associated with AD risk but only in individuals carrying the APOEε4 allele [23]. Although neither of these SNPs were significant in the FH endophenotype GWAS, the analysis identified a LD cluster of GWS SNPs in CFH and downstream, adjacent to CFHR4, that associated with changes in plasma FH levels. The lead variant, rs6664877, was associated with significantly increased plasma FH levels in both AD and control subjects. This intronic CFH SNP has not been previously described or related to any pathology, making it, together with the SNPs in strong LD, interesting candidates for further studies into roles of CFH variants in AD. A recent study of the genetic architecture of the human plasma proteome in healthy blood donors identified several associations between protein levels and complement genes; notably, variants in CFH significantly associated with 59 proteins [66].

Conclusions

We demonstrate dysregulation of the complement system in AD plasma compared to controls. Clusterin and C1q were elevated, and sCR1 and FH decreased. C1q levels distinguished AD from controls with good predictive power particularly for LOAD (AUC 0.655). We show that SNPs in CR1, C1S, and CFH, some previously associated with AD, others novel, significantly influenced plasma concentrations of the respective proteins, suggesting a mechanism by which they impact disease risk. Although the changes observed are modest, we have shown before that even small changes in complement protein activities or levels can markedly impact risk of systemic diseases [67]. Limitations of the work relate to the nature of the cohort: 1. Assignation to AD or control groups was done by extensive cognitive testing in the cohort without recourse to imaging or CSF biomarker data; 2. The control and AD groups were not age-matched, indeed, controls were deliberately selected for advanced age with no evidence of cognitive impairment to exclude incipient cases, a clear advantage for genetic studies; 3. We split the AD cohort into early (onset before 65) and late (onset after 65), an arbitrary but useful distinction that can highlight early changes, but did not consider other potential stratifiers; 4. We did not include any individuals with mild cognitive impairment that could be followed over time, a future study could explore the time course of complement dysregulation in early disease to identify causative roles. Despite these limitations, our findings build a strong case for roles of genetically determined complement parameters in dictating risk of AD that may be useful in predicting AD and identifying novel routes to therapy.