Background

Cerebrospinal fluid (CSF) biomarkers are increasingly used to support a diagnosis of Alzheimer’s disease (AD). CSF amyloid beta (Aβ)1–42, total tau (T-tau), and phosphorylated tau (P-tau) have utility in differentiating AD from controls and in predicting conversion from mild cognitive impairment (MCI) to AD dementia [1, 2]. Consequently, these measures are included in clinical [3] and research diagnostic criteria [4].

A variety of other CSF measures relevant to neurodegeneration are now available. These include markers of amyloid processing (AβX-38, AβX-40, AβX-42, soluble amyloid precursor protein (sAPP)α, and sAPPβ), large fibre axonal degeneration (neurofilament light chain (NFL)), and neuroinflammation (chitinase-3-like protein 1, also known as YKL-40). The AβX-42/X-40 ratio rather than Aβ1–42 alone may correct for inter-individual differences in amyloid production [5] and may improve clinical diagnostic specificity [6]. Meta-analytical data confirm that YKL-40 and NFL are elevated in clinically diagnosed AD CSF compared with controls [2].

While most prior studies have focussed on distinguishing patients with AD from controls or predicting MCI conversion to AD, a major challenge in clinical practice is to distinguish AD from other neurodegenerative disorders, including frontotemporal dementia (FTD), dementia with Lewy bodies (DLB), semantic dementia (SD), and progressive non-fluent aphasia (PNFA). Here, the role of CSF biomarkers is much less well established.

The principal aims of this study were to determine the diagnostic utility of an extended panel of CSF biomarkers (including two biomarker ratios) both individually and in models incorporating multiple biomarkers to distinguish AD from a range of other primary neurodegenerative dementias in clinical practice, and to validate diagnostic cut-points using a second, independent cohort.

Methods

The study was conducted in accordance with relevant clinical research regulations, and with ethical approvals in place (Queen Square ethics committee approval reference numbers 13 LO 1155 and 12 LO 1504). Written informed consent was obtained from participants where appropriate.

Two independent cohorts were studied. A test cohort was used to estimate cut-points and to determine the diagnostic utility of each biomarker for differentiating AD from the other groups. A validation cohort was then used to assess the sensitivity and specificity of these cut-points to distinguish AD from all other subjects, from controls, and from other dementias.

Test cohort

We included individuals referred to the Queen Square Specialist Cognitive Disorders service who had a diagnostic CSF examination between 1 January 2008 and 1 January 2012. Without knowledge of the CSF result, electronic patient records were interrogated to determine the pre-lumbar puncture (LP) diagnosis, most recent clinical diagnosis, time from earliest symptom (reported by individual or their family/caregiver) to LP, mini-mental state examination (MMSE) score at LP, and time from LP to most recent clinical assessment. Consensus criteria were used to classify individuals as: probable AD (including amnestic, logopenic aphasia, and posterior cortical atrophy variants) [3]; DLB [7]; behavioural variant FTD (bvFTD) [8]; PNFA [8]; and SD [8, 9]. The diagnosis was confirmed in 20 cases at autopsy; two patients with AD had presenilin 1 mutations, and three cases of BvFTD had C9ORF72 mutations and one a Tau mutation. The pre-LP clinical diagnosis (i.e. without the CSF result) was used for establishing biomarker utility. A second neurologist independently assessed approximately 45% of the cases notes; there was 95.8% diagnostic agreement between raters.

Validation cohort

All individuals seen in our service who had a diagnostic CSF examination between 16 May 2013 and 16 May 2016 and who fulfilled consensus criteria for a dementia diagnosis (as above) were included. Twelve individuals with an AD diagnosis had an amyloid positron emission tomography (PET) scan, which was positive in all cases.

Healthy controls

Healthy controls were recruited for research and were usually partners of affected individuals. No control had a memory complaint at recruitment or at 1-year follow-up.

Sample treatment and analysis

CSF was collected as previously described [10], i.e. by LP between 9 am and 3 pm into a polypropylene vessel, centrifuged, and frozen. Samples were thawed at the bench for 1 h. The volume of CSF differed between individuals; accordingly, not all biomarker measurements were made for all members of the test cohort (see Table 1 for details).

Table 1 Test cohort demographic and biomarker data for all diagnostic groups

Aβ1–42, T-tau, and P-tau assays were performed in batches according to local laboratory standard operating procedures to achieve inter-day coefficients of variation (CV) < 10%. Other assays (AβX-38, AβX-40 and AβX-42, NFL, YKL-40, sAPPα, and sAPPβ) were carried out at a single time point in the Neurochemistry laboratory of the University of Gothenburg by board-certified laboratory technicians. We achieved inter-plate CV of around < 10% for all assays except sAPPα and sAPPβ (details are provided in Additional file 1). The validation cohort were tested at the Institute of Neurology, UCL. Details of the CSF methodology are provided in Additional file 1.

Statistical analysis

Analyses were carried out using Stata Version 14.1 (Texas, USA). Data distribution was assessed and values outside an assay’s reliable detectable range were assigned maximum/minimum values. Medians and interquartile ranges were used to describe demographic and clinical characteristics and CSF biomarker data by diagnostic group. Missing CSF biomarker values were assumed to be missing completely at random [11], i.e. that the missingness mechanism was unrelated to any covariates relevant to the analysis. CSF biomarkers were compared between diagnostic groups using log-transformed data due to skewed and/or truncated data, and a generalised least squares linear regression model was used (an extension of the t test/analysis of variance (ANOVA) model that allows different group-specific residual variances). These global tests for differences between groups were assessed first across all groups including healthy controls, then only in cases with dementia, and finally in cases with dementia also adjusting for age, sex, and disease duration. Post-hoc pairwise comparisons between diagnostic groups were made when the initial (unadjusted) global test across dementia-only groups was statistically significant (p < 0.05), and in any biomarker where the unadjusted p value was > 0.05 but the adjusted p value was < 0.05. For the pairwise comparisons, a conservative Bonferroni-adjusted threshold p value for significance (p < 0.003) was also used, based on 15 pairwise tests for each biomarker.

Non-parametric receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to quantify how well each biomarker discriminated between AD and each other diagnostic group (or combinations of groups). The group sizes varied greatly, reflecting the prevalence of these conditions in the population. Assuming that a biomarker is associated with disease, AUC can be considered a simple measure of the probability that a randomly selected case would have a higher biomarker value than a control, assuming higher values are associated with disease (vice versa if lower values are associated with disease) [12].

For the five best-performing (based on AUCs) biomarkers for each of the group comparisons, cut-points and conservative exact binomial confidence intervals were estimated for a set sensitivity of 85%, as suggested by the Reagan consensus report [13], and the associated specificities calculated. For a set sensitivity of 85% (i.e. an 85% probability of a positive test among patients with disease), given that AD is always set as the ‘case’ in any comparison, the optimal cut-point for any specific biomarker is the same regardless of which other diagnostic group is being used as the comparator; it is the specificity that changes for different comparators.

ROC curves from logistic regression models incorporating up to five best-performing biomarkers (based on highest AUC) were used to calculate AUCs where group sizes were sufficiently large (> 10 subjects in each of two groups compared) to avoid over-fitting, with bias corrected bootstrapped confidence intervals for the AUC (2000 replications). The analyses used the ‘leave one out’ approach to address the potential for over-optimistic estimates of AUCs and specificities obtained from these joint models, as this can particularly be an issue when AD cases greatly outnumber the comparator group.

The estimated cut-points of those biomarkers which showed utility in differentiating AD from one or more groups in the test cohort, and for which measures were available, were used to calculate sensitivity and specificity in the validation cohort; due to the small numbers in some diagnostic groups, we only assessed the ability to distinguish AD from controls, from other dementias, and from other dementias and controls combined. Similarly, sensitivity and specificity were calculated in the pathology/genetically confirmed sub-cohort to distinguish AD from other dementias.

Results

Subject demographics

We included 418 subjects, 275 in the test and 143 in the validation cohorts. The test cohort comprised 245 patients with dementia (AD (n = 156, including 27 posterior cortical atrophy (PCA) and 12 logopenic progressive aphasia (LPA)), DLB (n = 20), bvFTD (n = 45), PNFA (n = 17), and SD (n = 7)), and 30 controls. All groups had a similar disease duration (symptom onset to LP) except for the SD group who presented later (Table 1). The DLB group was older than the other disease groups and the proportion of males was higher for DLB and SD than the other groups. Of the 143 individuals in the validation cohort, 104 had AD, 29 had other dementias (5 DLB, 12 bvFTD, 3 PNFA, and 9 SD) and 10 were controls.

Pathology and genetic confirmation

In total 26 subjects were pathologically or genetically confirmed. Eleven subjects in the test cohort who received a clinical diagnosis of AD (including two with PCA) had a pathological diagnosis of AD at autopsy. None of the subjects diagnosed with AD during life had a non-AD pathological diagnosis. A further two subjects had presenilin 1 mutations known to cause AD. One subject with DLB received a pathological diagnosis of mixed AD/DLB pathology. Five cases with a clinical diagnosis of bvFTD received a pathological diagnosis: one had frontotemporal lobar degeneration with TDP-43 pathology type 3; one had a tauopathy with features compatible with chronic traumatic encephalopathy; one had Pick’s disease; one had FTLD-TDP Type A; and one had mixed AD, Lewy body pathology, and TDP 43 pathology. Four further cases had confirmed genetic bvFTD (three with C9ORF72 mutations and one a Tau mutation). Two patients with PNFA reached autopsy. One had mixed pathology with Pick’s disease, AD pathology, cerebral amyloid angiopathy Lewy body pathology and the other FTLD-TDP Type A pathology. One patient with SD received a pathological diagnosis of FTLD-TDP Type C pathology.

CSF biomarker concentrations

The biomarker profile of each diagnostic group in the test cohort is shown in Table 1 and box-plots are provided in Fig. 1. In the validation cohort data were available for five biomarkers/ratios: Aβ1–42 (n = 143); T-tau (n = 143); P-tau (n = 131); T-tau/Aβ1–42 ratio (n = 143); and AβX-42/X-40 ratio (n = 140). In the pathology/genetically confirmed sub-cohort, data were available for: Aβ1–42 (n = 26); T-tau (n = 26); P-tau (n = 19); T-tau/Aβ1–42 ratio (n = 26); and AβX-42/X-40 ratio (n = 17).

Fig. 1
figure 1

Box-plots and whiskers (25th–75th percentiles) and outliers of measured biomarker concentrations presented by disease group (pre-lumbar puncture diagnosis) and unadjusted pairwise comparisons (p-values). X-axis: pre-lumbar puncture diagnosis. Aβ amyloid beta, AD Alzheimer’s disease, APP amyloid precursor protein, bvFTD behavioural variant frontotemporal dementia, DLB Lewy body dementia, HC healthy controls, NFL neurofilament light chain, PNFA progressive non-fluent aphasia, P-tau phosphorylated tau, SD semantic dementia, T-tau total tau

Comparisons between the groups based on regression analyses are shown in Table 2. There was a significant difference (p < 0.05) between disease groups for all tested biomarkers when controls were included. When excluding the control group this remained the case for nine measures. Additionally, when adjusting for age, sex, and disease duration there was evidence for a difference (p = 0.04) between groups for one additional biomarker (YKL-40) whereas no difference had been apparent in the unadjusted analysis (p = 0.51).

Table 2 Regression analyses comparing biomarkers between all disease groups classified according to pre-lumbar puncture diagnosis, with and without healthy controls

Figure 1 shows pairwise comparisons between diagnostic groups where the (unadjusted) global test across dementia-only groups was statistically significant (unadjusted p < 0.05). A summary of where there was evidence of a difference in mean biomarker concentration is shown in Table 3 for each pairwise comparison, both for an unadjusted p < 0.05 threshold for significance and a conservative Bonferroni-adjusted p < 0.003 threshold.

Table 3 Summary of the biomarkers that are significantly different between neurodegenerative disorders

Based on the conservative Bonferroni-adjusted threshold for significance, T-tau/Aβ1–42 ratio, T-tau, and P-tau were significantly elevated in AD compared with each of the other neurodegenerative disorders tested, except PNFA. AβX-42/AβX-40 was significantly lower in the AD cohort than in bvFTD and SD. Aβ1–42 concentrations were lowest in the AD and DLB groups; there was no evidence this biomarker differed between these two disease groups. NFL was significantly higher in all neurodegenerative disorders compared with healthy controls (Fig. 1); concentrations were higher in the SD and PNFA groups compared with the AD group (Table 3). APPα and APPβ were significantly lower in bvFTD compared with AD, PNFA, and healthy controls (Fig. 1).

AβX-38 and AβX-40 concentrations were lower in all neurodegenerative diseases, except SD, compared with controls (p < 0.001) but there were no pairwise significant differences between each of the diseases. YKL-40 concentrations were higher across all dementias relative to healthy controls but not between diseases in the unadjusted analyses; after adjusting for age, sex, and time from symptom onset to LP there was evidence of a difference between DLB and bvFTD (p = 0.003).

Diagnostic utility of CSF biomarkers

Cut-points for each biomarker at a pre-determined fixed sensitivity of 85% are shown in Table 4. A summary of the ‘top 5’ biomarkers (by AUC) is given in Table 5, with the highest AUCs varying between 0.79 and 0.95; the specificities are also shown and varied between 24% and 100%.

Table 4 Optimal cut-point (95% CI) for AD* at a sensitivity of 85%
Table 5 AUC (and 95% CI) and specificity (at a fixed sensitivity of 85%) of the ‘top 5’ biomarkers, comparing AD with other neurodegenerative disorders and controls

Table 5 also shows the results from incorporating the best-performing biomarkers into a single model for each of the comparisons of AD against other groups. There was no suggestion that including more than one biomarker usefully improved AUC or specificity when compared to the single biomarker with highest AUC or specificity, respectively.

Validation

In the validation cohort we calculated sensitivity and specificity for Aβ1–42, T-tau, P-tau, T-tau/Aβ1–42, and AβX-42/X-40 using the optimal cut-points determined in the test cohort that provided a sensitivity of 85% (Additional file 2: Table S1). Sensitivities were very consistent with the 85%, ranging from 83 to 88% for all biomarkers compared between all groups except for Aβ1–42 where the sensitivity was lower (71%). We also calculated sensitivities and specificities of these biomarkers for the pathologically or genetically defined cases (n = 26) (Additional file 2: Table S1), finding superior sensitivities (83–100%) and broadly comparable specificities given the smaller sample sizes and missing values for some biomarkers.

Discussion

In this single centre, primarily clinic-based study we show that some biomarkers with proven ability to distinguish AD from healthy controls [2] also have utility for differentiating AD from other neurodegenerative dementias in clinical practice. In particular, T-tau/Aβ1–42 and AβX-42/X-40 ratios combine high sensitivity (85%) and good specificity (> 70%) for distinguishing AD not only from controls but also from SD and bvFTD; Aβ1–42 performed similarly well for distinguishing AD from controls and SD. In contrast, none of the biomarkers, or models with multiple biomarkers, could reliably differentiate AD from DLB or PNFA with high specificity.

The cut-points we generated are similar to those found in other studies. For differentiating AD subjects from healthy controls we found broad agreement with those reported in previous studies [14] for Aβ1–42, T-tau/Aβ1–42, and AβX-42/X-40. The exception was P-tau, where our cut-point (48.9 pg/mL) was lower than that quoted by the kit manufacturer (61 pg/mL) [15]. This may reflect our choice of a set sensitivity of 85% (resulting in a specificity of 54%) compared with the manufacturer’s 80% (with a specificity of 87%).

Overall, we found no evidence that models incorporating multiple biomarkers (or simple ratios) materially improved AUC or specificity compared to the best-performing single biomarker (or ratio) with highest AUC or specificity, respectively. Specifically, for AD vs healthy controls we were able to achieve good sensitivity and specificity using Aβ1–42, T-tau/Aβ1–42, and AβX-42/X-40 without using complex models of multiple biomarkers or formulae that have been proposed in other studies [16, 17].

It was possible to differentiate AD from SD or bvFTD with good sensitivity and specificity particularly using AβX-42/X-40. While the 100% specificity for AβX-42/X-40 to distinguish AD from SD is inevitably influenced by the small SD sample size, the generally high specificities are likely to reflect that SD is very pathologically homogeneous, typically being underpinned by TDP 43 type C pathology [18, 19] as was the case in the one SD case in this cohort who came to autopsy. Using AβX-42/X-40, the specificity for AD versus bvFTD was still high (85%) despite the fact that bvFTD can sometimes be caused by AD pathology, or have co-existent AD pathology [19].

We found that no single or ratio of CSF biomarkers achieved useful specificity for distinguishing AD from DLB [20, 21]. P-tau and T-tau were the best performing biomarkers but, consistent with a previous meta-analysis [22], they were not diagnostically useful, achieving specificities of only approximately 50%. This is likely to reflect that AD pathology is very common in pathologically confirmed DLB [23], as was seen in the one subject in this cohort with clinically diagnosed DLB who had mixed AD/DLB pathology at autopsy. Improving specificity is therefore likely to require a positive biomarker for DLB pathology, e.g. a reliable marker of alpha-synuclein inclusions. An enzyme-linked immunosorbent assay (ELISA) biomarker for DLB pathology has slightly improved the diagnostic utility of CSF biomarkers for differentiating AD from DLB [24]; more recently, a real-time quaking induced conversion assay (RT-QUIC) showed significant promise as a highly specific test for DLB pathology [25].

None of the biomarkers was useful for differentiating AD from PNFA; the best performing measure was NFL, which achieved a specificity of only 50%. PNFA is classically considered within the FTD spectrum, but 10–30% of cases have AD pathology at autopsy [26, 27]. In this cohort two PNFA case had had an autopsy, where mixed pathology (Pick’s disease, AD, cerebral amyloid angiopathy, and Lewy Body pathology) and FTLD TDP 43 pathology were found. The relatively poor specificity for any CSF biomarker in this group is likely therefore to reflect cases of PNFA due to AD, and PNFA with mixed AD pathology, and emphasizes the need for pathology-specific biomarkers for the non-AD dementias.

While T-tau/Aβ1–42 ratio performed well in several of the disease group comparisons, neither T-tau nor P-tau was diagnostically useful alone, conferring specificities of at most 64%. CSF Aβ1–42 alone was relatively poor at distinguishing AD from other neurodegenerative disorders (except for SD), in line with other studies [22]. Specificity was, however, consistently improved using the AβX-42/X-40 ratio [28,29,30]. AβX-40 is the most abundant soluble Aβ peptide and less likely than Aβ1–42 to aggregate, and thus incorporating both in a ratio may account for inter-individual physiological differences in amyloid processing [31]. AβX-42/X-40 ratio performed at least as well as T-tau/Aβ1–42 ratio; adding T-tau to AβX-42/X-40 did not improve specificity, suggesting that the AβX-42/X-40 ratio alone may a reliable means of identifying brain amyloid deposition.

While the focus of the study was on differentiating AD from other dementias, a number of potentially interesting findings emerge from some of the more novel biomarkers. Our finding that NFL concentration was highest in SD is consistent with a number of previous studies [32,33,34]. NFL is thought to be a marker of large axonal neurodegeneration [35] and is elevated in a number of non-AD diseases [36,37,38], particularly FTD and motor neurone disease [39]. We found that the concentration of YKL-40 was elevated in AD compared to controls, in keeping with prior studies [2, 40]. We did not find either APPα or APPβ to be useful in differentiating AD from controls.

This study has a number of caveats. We used clinical diagnosis based on a blinded independent assessment using contemporary clinical criteria to establish the diagnosis, rather than post-mortem confirmation of underlying pathology or pathologies. Very few CSF studies in dementia have pathological confirmation of diagnosis, and this is therefore a limitation of most work in the literature. However, we were able to confirm a definite pathological or genetic diagnosis in 26/245 subjects with dementia in the test cohort. In cases fulfilling clinical criteria for AD, approximately 10% had either pathological confirmation, genetic confirmation, or supportive amyloid imaging, with no false positive diagnoses. Similarly, bvFTD and SD diagnoses were supported by pathological confirmation in approximately 10% of cases with all having FTD pathology or mixed FTD/AD pathology.

There is not perfect concordance between clinical diagnosis and underlying pathology, and this varies considerably depending on the clinical syndrome. In patients diagnosed with probable AD, the sensitivity and specificity for underlying AD pathology are in the order of approximately 75% and 60%, respectively [41]. AD pathology is found in approximately 55% of cases of DLB [42], approximately 40% of PNFA cases [43], 5–6% of bvFTD [44], and between 0 and 15% of SD cases [19, 45, 46]. The results in this study are broadly consistent with these figures; indeed, the best specificity found for each group is strikingly similar to the proportion who would be expected not to have AD pathology at post mortem (SD 100%, bvFTD 85%, PNFA 50%, DLB 50%). This is therefore consistent with our interpretation that current biomarkers are good at distinguishing AD from syndromes that are not usually caused by AD (e.g. SD and bvFTD) but not from those commonly caused by AD (PNFA) or where there is AD co-pathology (DLB).

The number of samples in some groups was comparatively small, particularly in the rarer clinical syndromes, but are likely to represent the proportion of patients who might undergo diagnostic CSF examination. There is no optimal means of determining biomarker cut-points [12], but we used a consistent and recommended method of fixing sensitivity at 85%. There was variability in the inter-plate variability depending on the analyte measured. While most assays achieved inter-day and inter-plate variability of < 10%, we acknowledge that the inter-plate CV for the APP ELISA assays were > 10% and results should be interpreted with caution. Finally, while we used an extended CSF panel, this was not comprehensive and did not for example include neurogranin, which may have good specificity for AD [47].

Conclusions

Biomarkers in routine clinical use (particularly AβX-42/X-40 and T-tau/Aβ1–42 ratios) not only have utility in distinguishing AD from controls, but also from bvFTD and SD. These measures, and the other biomarkers tested, have less utility in differentiating AD from DLB and PNFA, likely reflecting varying degrees of AD (amyloid) pathology in these conditions. This study provides an evidence base for the use of CSF biomarkers for the differential diagnosis of AD, highlights the potential utility of the AβX-42/X-40 ratio, and shows that novel biomarkers specific for other non-AD disorders are required.