Introduction

High-quality, reliable, well-validated biomarkers, reflective of biological processes are required to understand the complexity of neurodegenerative diseases, such as Alzheimer’s disease (AD) [1]. In AD, using biomarkers would allow for increasing diagnostic accuracy, guiding patient stratification, monitoring the effect of treatment on underlying pathologies, and providing surrogate measures of disease activity to monitor and evaluate outcomes [2]. Comparability of cerebrospinal fluid (CSF) biomarkers between studies has been limited, partially due to methodology differences across cohorts. This issue has been somewhat circumvented by fully automated assays used in research and standardization of preanalytical procedures [3, 4]. Importantly, some CSF biomarker immunoassays (amyloid-β1–42 [Aβ42], phosphorylated 181P tau [pTau], and total tau [tTau]) are well validated for future wide-spread use in the clinical setting [5,6,7]. However, the challenge remains to have standardized clinical endpoints, statistical approaches, and immunoassay platforms that would enable unified biomarker utility and interpretation of results across independent multicenter studies.

The NeuroToolKit (NTK; Roche Diagnostics International Ltd) is a panel of 12 automated CSF immunoassays for biomarkers linked to neurodegeneration [8, 9]. This panel is designed to accelerate biomarker development in AD and other neurological disorders by generating robust, comparable, high-quality biomarker data across multiple research and clinical cohorts.

Using CSF biomarker data collected from participating sites, we aimed to address three prioritized research questions: (i) Comparative: Can a correction factor for biomarkers affected by different preanalytical procedures be applied that allows for comparison across multiple cohorts? (ii) Diagnostic: How much do the biomarker concentrations vary between cognitively unimpaired (CU) individuals and patients with mild cognitive impairment (MCI) or AD-dementia? (iii) Clinical: How well do biomarker concentrations correlate with clinical measures of cognition?

Methods

This analysis utilizes data from three cohorts participating in the NTK project, which were selected to provide data spanning the entire AD continuum. The ALFA+ study (NCT02485730) aimed to characterize preclinical AD in CU individuals, most with a family history of AD (n=398) [8]. The Wisconsin cohort (n=651) comprised several longitudinal studies that utilized the same preanalytic protocol and included CU individuals, participants with MCI, or AD-dementia, enriched for parental history of AD [10]. The Abby/Blaze cohort (n=164) comprised participants in the ABBY (NCT01343966) and the BLAZE (NCT01397578) studies for patients with mild to moderate AD-dementia [11, 12]. Full eligibility criteria for each of the respective cohorts are described in the Supplementary Methods. All cohorts in the present analysis excluded participants who had comorbidities that would affect cognition. Some medications that affected cognition, such as sleep aids, were permitted in the Wisconsin cohort.

For the purposes of this analysis, the correction reference group for each cohort was defined as participants who were CU, APOE-ε4 allele non-carriers, and aged <65 years. As the Abby/Blaze cohort only included participants with AD-dementia, a correction reference group could not be defined.

Biomarkers

CSF biomarkers included chitinase-3-like protein-1 (YKL40), soluble triggering receptor expressed on myeloid cells 2 (sTREM2), glial fibrillary acidic protein (GFAP), interleukin (IL)-6, neurofilament light (NfL), neurogranin, S100, alpha-synuclein (α-Syn), amyloid-β1–40 (Aβ40), Aβ42, pTau, and tTau. CSF biomarker samples obtained at baseline/enrollment were included. All biomarkers were measured using the NTK panel of immunoassays, which currently includes the commercially available Elecsys β-amyloid (1–42) CSF, Elecsys total Tau CSF, and Elecsys phospho-Tau (181P) CSF immunoassays, and robust prototype assays for the nine remaining biomarkers. Biomarkers Aβ42, Aβ40, pTau, tTau, s100, and IL-6 were measured using the cobas e 601 analyzer, and the remaining biomarkers were measured using the cobas e 411 analyzer (both Roche Diagnostics International Ltd).

Preanalytical factor correction

The preanalytical procedures employed by each cohort are detailed in the Supplementary Materials. Sample collection within the Wisconsin cohort was initiated ahead of standardized preanalytical protocol dissemination [9]; therefore, the correction factors are calculated in the respective correction reference groups (participants who were CU, APOE-ε4 allele non-carriers, and aged <65 years) of the Wisconsin and ALFA+ cohorts assuming the ALFA+ cohort being the “standard cohort.” The correction factor was calculated using the formula:

$$\textrm{Correction}\ \textrm{factor}=\textrm{median}\left(\textrm{ALFA}+\textrm{cohort}\right)/\textrm{median}\left(\textrm{Wisconsin}\ \textrm{cohort}\right)$$

Application of the correction factor was deemed successful in accounting for preanalytical variations by assessment of biomarker distribution overlap before and after correction, i.e., if following correction the biomarker distributions had good overlap, the correction was a success. The correction was applied to CSF biomarkers: α-Syn, Aβ40, and Aβ42 (Table S1), which are known to be significantly affected by preanalytical protocols, specifically related to the ability of these biomarkers to stick to the tubes used during testing [4, 13]. Conversely, the remaining biomarkers, such as pTau and tTau, appear to be unaffected by the tubes employed [4]. Natural variations between the cohorts were unaffected. Variations between cohorts may also result from inherent cohort differences or from cultural bias, e.g., in cognitive assessments.

CSF amyloid-β cut-off value derivation

Amyloid-β pathology was determined by CSF Aβ42/Aβ40 ratio for this analysis; the results are provided in the Supplementary Materials. To derive the cut-off values for the CSF Aβ42/Aβ40 ratio, Gaussian mixture modeling was independently applied to the ALFA+ and Wisconsin cohorts. The optimal number of Gaussians was set as two, after testing models with two, three, and four Gaussians (Supplementary Materials). Derived cut-off values were defined as x±2*s with differently defined parameters for x and s: (i) x=μ, s=σ (mu, sigma; Gaussian parameters of the amyloid-β negative [A−] population); (ii) x=mean, s=SD of samples assigned to A− population; and (iii) x=median, s=rSD of samples assigned to A− population (Tables S2–S5). The resulting cut-off values for the Aβ42/Aβ40 ratio were defined as 0.071 for the ALFA+ cohort [8] and 0.060 (0.075 after correction) for the Wisconsin cohort. Only patients with AD-dementia were included in the ABBY and BLAZE studies; therefore, cut-off values were not defined as this cohort was not divided by amyloid-β status. For comparison, the cut-off values determined for the ALFA+ cohort were applied to the Wisconsin cohort after correction.

Cognitive assessments

All participants completed the MMSE [14] and Clinical Dementia Rating scale Sum of Boxes (CDR-SB) [15] cognitive assessments during the respective studies. The time between CSF biomarker collection at baseline/enrollment and cognitive assessment varied for each participant and in some cases was up to 1 year. For the longitudinal studies, the cognitive assessment closest to the first lumbar puncture was used in this analysis, including those cognitive assessments performed before the lumbar puncture. The ALFA+ cohort only included participants with CDR-SB=0, per the study exclusion criteria [16]. Calculations of a modified Preclinical Alzheimer Cognitive Composite (PACC) were based on methods proposed by Donohue et al. [17], Papp et al. [18], and Jonaitis et al. [19]. Variables included in the composite in the ALFA+ cohort were Semantic Fluency (animal naming), Free and Cued Selective Reminding Test with Total Immediate Recall, and Wechsler Adult Intelligence Scale-Revised Coding subtest. In the Wisconsin cohort, Semantic Fluency (animal naming), Rey Auditory Verbal Learning Test Trials 1–5 Sum, and Wechsler Adult Intelligence Scale-Revised Coding subtest were included. PACC was not used for the Abby/Blaze cohort.

Statistical analyses

To compare biomarker concentrations across cohorts, the median concentration and interquartile range of all NTK CSF biomarkers before and after correction were calculated for all cohorts, therefore enabling the inclusion of outlying samples. The robust-to-outliers standard deviation (rSD) was estimated based on percentile values (rSD=[value of 84.13% percentile − value of 15.87% percentile]/2). The distributions of the CSF biomarker concentrations within the same disease stage across the cohorts were statistically compared before and after correction. To compare baseline/enrollment values for each CSF biomarker for both CU individuals and patients with AD-dementia separately, correlation values were computed using Spearman’s rho.

Fold change was calculated using the canonical fold change calculation in CSF biomarker concentrations in CU A− individuals from either the ALFA+ or Wisconsin cohorts with (i) CU amyloid-β-positive (A+) individuals (ALFA+), (ii) CU A+ individuals (Wisconsin), (iii) patients with MCI A+ (Wisconsin), (iv) patients with AD-dementia (Wisconsin), and (v) patients with AD-dementia (Abby/Blaze). Receiver operating characteristic (ROC) analyses, presented with an area under the curve (AUC) and 95% confidence intervals, comparing CSF biomarker concentrations in CU A− individuals with (i) CU A+ individuals (ALFA+), (ii) CU A+ individuals (Wisconsin), (iii) patients with MCI A+ (Wisconsin), and (iv) and patients with AD-dementia (Wisconsin) were performed.

Spearman’s rho correlation between the concentration of all the biomarkers and cognitive performance, reflected in MMSE and/or PACC scores, in the different disease stages was computed. To assess cognitive scores from each cohort on a similar scale relative to that cohort’s control participants, standardization of each individual raw score into z-scores was performed using the means and rSDs obtained from the A− samples of each control group as a reference. All three obtained z-scores were averaged. The obtained PACC values were re-standardized using the mean and rSD from the A− samples of the control group. Missing data in any of the raw scores led to a missing PACC value.

Results

No differences were found between cohorts in age, sex, years of education, MMSE score, or APOE-ε4 carriership status within the different disease stages (Table 1). Prior to correction for preanalytical protocol differences, concentrations of α-Syn, Aβ40, and Aβ42 were significantly higher in CU A− individuals in the ALFA+ cohort compared with the Wisconsin cohort. Biomarker distributions for the ALFA+ and Wisconsin cohorts (Fig. S1) illustrate that biomarker concentrations were comparable between cohorts in the reference groups following correction. CSF biomarkers GFAP, IL-6, S100, pTau, and pTau/Aβ42 were significantly lower in the ALFA+ cohort compared with the Wisconsin cohort in CU A− individuals (Table 2). These differences may arise from intrinsic cohort differences or from measurement bias; therefore, the raw values of these biomarkers cannot be directly compared between cohorts, but the trends within cohorts can be compared with each other. For the remaining biomarkers, there were no significant differences between cohorts. The details of all the CSF biomarker concentrations, including before and after correction for those affected, across all cohorts and disease stages are reported in Table 2. Biomarker concentration distributions before and after correction represented as boxplots are shown in Figs. 1 and 2. Classification of amyloid-β status, including the use of a single cut-off value for both the ALFA+ and Wisconsin cohorts, had no impact on the comparability of biomarker concentrations across cohorts (Table S6, Table S7).

Table 1 Characterization of cohorts (amyloid status as defined by Aβ42/Aβ40 ratio)
Table 2 Biomarker data for each cohort, including uncorrected and corrected data (amyloid status as defined by Aβ42/Aβ40 ratio)
Fig. 1
figure 1

Box plots of uncorrected biomarkers by disease stage and amyloid-β status as defined by Aβ42/Aβ40 ratio. Aβ40, amyloid-β1–40; Aβ42, amyloid-β1–42; AD, Alzheimer’s disease; CU, cognitively unimpaired; GFAP, glial fibrillary acidic protein; MCI, mild cognitive impairment; NfL, neurofilament light; pTau, phosphorylated tau; tTau, total tau; YKL40, chitinase-3-like protein-1

Fig. 2
figure 2

Box plots of biomarkers by disease stage and amyloid-β status as defined by Aβ42/Aβ40 ratio; A uncorrected values and B corrected values. Aβ40, amyloid-β1–40; Aβ42, amyloid-β1–42; AD, Alzheimer’s disease; CU, cognitively unimpaired; MCI, mild cognitive impairment; pTau, phosphorylated tau

Of the biomarker values that did not undergo correction, YKL40, GFAP, NfL, neurogranin, pTau, and tTau were all increased in patients with AD-dementia or MCI who were A+ compared with CU individuals and patients with MCI who were A−. Both α-Syn and the pTau/Aβ42 ratio values were increased in patients with AD-dementia or MCI who were A+. Values for Aβ42 and the Aβ42/Aβ40 ratio were decreased in patients with AD-dementia or MCI. These results applied to both the corrected and uncorrected values. Correlations of CSF biomarkers were comparable across cohorts within the same disease stage (Supplementary Results, Fig. S2).

Diagnostic variations

The fold change in CSF biomarker concentration compared with CU A− individuals was comparable across cohorts (Tables S8–S10). Across all cohorts, the fold change showed a similar pattern regardless of whether it was calculated using values from the CU A− individuals group from the same cohort or from a different cohort. NfL, pTau, and tTau concentrations were higher in patients with AD-dementia, followed by patients with MCI, compared with CU individuals. Figure 3 shows the fold change displayed as a forest plot of the biomarker concentration compared with the CU A− individual group (derived from either the Wisconsin cohort or the ALFA+ cohort).

Fig. 3
figure 3

Fold change vs CU, Aβ42/Aβ40 amyloid-β negative with age <65 years. Aβ40, amyloid-β1–40; Aβ42, amyloid-β1–42; AD, Alzheimer’s disease; CU, cognitively unimpaired; GFAP, glial fibrillary acidic protein; IL, interleukin; MCI, mild cognitive impairment; NfL, neurofilament light; pTau, phosphorylated tau; sTREM2, soluble triggering receptor expressed on myeloid cells 2; tTau, total tau; YKL40, chitinase-3-like protein-1

ROC analyses (Fig. S3) and AUC data (Fig. 4 and Table S11) confirmed the results shown by fold change. Among the CSF biomarkers evaluated, NfL (followed by YKL40, GFAP, Aβ42, pTau, and tTau) had the greatest diagnostic value for discriminating CU A− individuals from patients with MCI or AD-dementia. The results were not affected by amyloid-β status as Aβ42 showed the greatest ability of all the biomarkers measured to discriminate CU A− individuals from CU A+ individuals in both the ALFA+ and Wisconsin cohorts.

Fig. 4
figure 4

Forest plot of ROC analyses AUC with 95% confidence intervals (amyloid status as defined by Aβ42/Aβ40 ratio). Aβ40, amyloid-β1–40; Aβ42, amyloid-β1–42; AD, Alzheimer’s disease; AUC, area under the curve; CI, confidence interval; CU, cognitively unimpaired; GFAP, glial fibrillary acidic protein; IL, interleukin; MCI, mild cognitive impairment; NfL, neurofilament light; pTau, phosphorylated tau; ROC, receiver operating characteristics; sTREM2, soluble triggering receptor expressed on myeloid cells 2; tTau, total tau; YKL40, chitinase-3-like protein-1

Correlation with clinical measures of cognition

The correlation between CSF biomarker concentration and cognitive scores (MMSE and/or PACC) appears comparable across cohorts. The correlation of the biomarker concentration and the cognitive scores in the different disease stages and cohorts is shown as a forest plot in Fig. 5. For CU A+ individuals, the strongest correlation among different biomarker concentrations and PACC score was found in YKL40, GFAP, and NfL. For patients with AD-dementia (Wisconsin and Abby/Blaze cohorts), the strongest correlation between biomarker concentrations and MMSE was found in NfL.

Fig. 5
figure 5

Baseline correlation between NTK biomarkers and cognitive scores (amyloid status as defined by Aβ42/Aβ40 ratio). Aβ40, amyloid-β1–40; Aβ42, amyloid-β1–42; AD, Alzheimer’s disease; CI, confidence interval; CU, cognitively unimpaired; GFAP, glial fibrillary acidic protein; IL, interleukin; MCI, mild cognitive impairment; NfL, neurofilament light; NTK, NeuroToolKit; PACC, Preclinical Alzheimer Cognitive Composite; pTau, phosphorylated tau; sTREM2, soluble triggering receptor expressed on myeloid cells 2; tTau, total tau; YKL40, chitinase-3-like protein-1

Discussion

The NTK project involves the evaluation of a large panel of CSF biomarkers of AD pathology and glial activity that may address the goals outlined in the introduction for diagnosis, prognosis, predicting treatment response, and surrogate outcomes in AD. Following recently published studies describing the utility of the NTK immunoassays in clinical settings [8, 9, 20], this analysis demonstrates that the application of a correction factor for biomarkers affected by preanalytical variability improves the comparability of NTK data across three independent cohorts spanning the AD continuum. After establishing the adequacy of the correction factor, we examined the cohorts for insights regarding biomarker relationships across the disease stages. In response to the three prioritized research questions discussed in the introduction, this analysis provided the following answers.

Comparative: The correction factor for preanalytical protocol variations employed here enabled the comparison of data across cohorts, rendering significant differences in CSF biomarker concentrations irrelevant. The correction factor was not developed specifically for the present cohorts and is generalizable, which enables the introduction of further cohorts into the comparative framework. Comparability of CSF biomarker data acquired with the NTK was found to be robust to methods of amyloid-β status classification, suggesting the possibility of introducing future cohorts with or without having calculated their own cut-off values. Further investigation with additional cohorts is needed. Correlations between all CSF biomarkers were consistent for CU individuals in both the ALFA+ and Wisconsin cohorts and for patients with AD-dementia in both the Wisconsin and Abby/Blaze cohorts. These results indicate that the data produced by the NTK were comparable across cohorts, meaning results and subsequent analyses may be combined.

Diagnostic: CSF biomarker differences for CU A− individuals compared with patients with AD-dementia, were comparable across cohorts, regardless of the CU A− individual cohort origin. The fold change for CU A− individuals compared with patients with MCI and patients with AD-dementia was largest for NfL, pTau, and tTau. The diagnostic utility of the aforementioned biomarkers was confirmed with ROC and AUC analyses, suggesting these biomarkers, and Aβ42, are the most appropriate for disease stage differentiation.

Clinical: Correlations of CSF biomarker concentrations with cognitive score were comparable across cohorts. Associations between biomarker concentrations and clinical cognitive scores have been the focus of several studies to validate measures of cognition but only few studies include a panel of CSF biomarkers [21,22,23,24,25]. Correlations between CSF biomarker concentrations and cognitive scores are indicative of the prognostic utility of the biomarker in question [21]. Here, we found that clinical cognitive scores had the strongest correlation with NfL, in the CU A+ and AD-dementia disease stages. For CU A− individuals, NfL correlated with the PACC score in the Wisconsin cohort, but less so in the ALFA+ cohort. Our findings on the moderate, inverse correlation of NfL with clinical measures of cognition are consistent with the literature [26] indicating markers of neurodegeneration track with cognition. No strong correlations of biomarkers for patients with MCI A−/A+ were observed.

Consistently across the three prioritized research questions examining correlations with biomarker concentrations, NfL is the most promising CSF biomarker for AD along with the well-established Aβ42, pTau, and tTau. No statistically significant differences in NfL concentration between cohorts were observed within disease stage. Additionally, NfL differentiated disease stage and correlated with measures of cognition. The observation of NfL differentiation between disease stages is consistent with recent literature, which includes the correlation of NfL with imaging markers of neurodegeneration [27]. However, NfL is not a specific biomarker for AD, but rather for several neuroinflammatory diseases [28,29,30]. Analyzing the biomarker results of NfL within the context of other biomarkers of neuroinflammation, glial activity, and known AD pathology is key to the utility of NfL within the AD landscape. As such, the NTK panel provides clinical utility from both the biomarkers at an individual level and as a whole. In the current study, NfL did not strongly correlate with the other biomarkers included in the NTK; these results are in contrast to the literature, which describes NfL correlations with (plasma) GFAP [31], pTau, tTau, and neurogranin [26].

In the AD field, vast amounts of biomarker data are generated; however, comparability between biomarker datasets is not routinely employed due to the heterogeneity of the disease and the technical variability associated with preanalytical protocols and the immunoassay platforms used. Several pre-competitive efforts to generate comparable datasets, harmonize and maximize the interpretability of the biomarker results collected, such as the NTK project and the Global Biomarker Standardization Consortium (GBSC; a collaborative effort in the acceleration of biomarker standardization [32]), are ongoing. Measures have also been taken to standardize methods of CSF collection and measurement to ensure reproducible and consistent results across multiple cohorts and immunoassays [3, 4]. The aim is to clarify the clinical utility of biomarkers to inform clinical trial design and for diagnostic development.

To our knowledge, the NTK project is the first large-scale project that aims to generate robust and comparable biomarker data across multiple independent cohorts in AD. While cross-cohort examinations of datasets have been employed for single biomarker validation studies [33], a project with the number of biomarkers described in this analysis is uncommon. Other large-scale studies investigating the utility of biomarkers at various disease stages include the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [34] and the Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging [35]. However, these studies do not include multiple cohorts and are geographically restricted, and the biomarkers are measured using multiple immunoassay platforms. The NTK project seeks to maximize the comparability of biomarker measurements between cohorts by generating and analyzing biomarker data using a common platform and immunoassay.

A key strength of this study is the large dataset included, spanning several countries and native languages. Importantly, the cloud-based DRE employed in the NTK project allows for an increase in data sharing with interpretation unified across several cohorts. The researchers involved in the NTK project retain control over their data and through collaborative efforts they can work with other consenting cohorts to enrich their datasets and provide further insights. The standardized statistical analysis on single cohorts, as well as the correction procedure, comparisons, and graphical representations across different cohorts described here, can also be applied to other cohorts via the NTK app (currently under development). Following the approval and willingness of collaboration of all data owners, and with the accrual of added data and cohorts, the feasibility of the NTK app will be thoroughly explored.

Limitations

Possible limitations of the present study include the decision not to use amyloid-β positron emission tomography data as a common comparator across cohorts to determine amyloid-β status, and the use of only single-timepoint data for individual assessments (e.g., only one MMSE assessment was included per individual). Future studies should include the clinical follow-up data for the participants. In the longitudinal cohorts included in this study, CSF sample collection and cognitive assessments were not necessarily completed at the same time. Cognitive assessments and CSF sample collection were performed within a year of each other, or the sample was excluded from this analysis. Although AD is a slowly evolving disease, this may have led to variations in the correlations for these measures. However, for individuals with MCI and CU individuals, pTau, tTau, Aβ42, and neurogranin have been shown to be stable over a 2-year period [36, 37]. In addition, the Abby/Blaze cohort of individuals with AD-dementia, cognitive assessment, and CSF sample collection both occurred at baseline. Compared with the CU (n=944) and AD-dementia (n=213) patient populations, the MCI population (n=56) is relatively small. Expanding the results within this important patient population may lead to further insights into AD disease pathogenesis and the clinical utility of the NTK panel of immunoassays.

The correction for preanalytical protocols was based on several assumptions: (i) the ALFA+ cohort samples were all collected according to the standardized protocol, but the Wisconsin cohort samples were not; (ii) sample handling procedures only affect α-Syn, Aβ42, and Aβ40, which may be disproved with future research; (iii) sample handling procedures have a similar effect on samples with high concentrations and those with lower concentrations. These assumptions may explain for the variations between cohorts seen in the biomarker values that were not corrected; however, it is beyond the scope of this study to determine the cause of such variations. In addition, correction for preanalytical protocols was employed and not harmonization of all biomarker results. While harmonization of the biomarker results would have allowed for head-to-head analysis, as no bridging samples (measured on the same instrument at the same time from all cohorts) required to perform a harmonization were collected correction was employed. The rationale for the correction factor was that existing cohorts were used, meaning some of the samples had been collected prior to the initiation of the present analysis. As such, reference materials and methods were not used. The correction factor is therefore employed as a solution to enable the inclusion of these data. If the analysis were to be performed as part of a prospective study, the correction factor may not be required if standardized procedures and reference materials were employed.

In summary, the robust prototype NTK panel of immunoassays provides biomarker data that can be used to support the utility of biomarkers in clinical trials and in the diagnostic clinical setting. Our study supports the feasibility of cross-cohort collation of data provided by the NTK immunoassays to enable further insight gathering into the underlying pathogenesis of AD. Our next step is to use a DRE to implement the standardized statistical analysis plan and increase the interpretation of results across studies. In addition, the NTK project will expand to include supplementary CSF immunoassays beyond the 12 in this study, as well as plasma biomarker immunoassays.