Background

Biomarkers such as amyloid beta1-42 (Abeta) and phosphorylated tau (p-tau) in cerebrospinal fluid (CSF) provide evidence on the neuropathological process underlying a patient’s cognitive decline [1]. Determining the underlying cause of cognitive complaints is particularly useful in the pre-dementia stage of mild cognitive impairment (MCI), as it provides important prognostic information [2]. Appropriate use criteria for the use of CSF biomarkers have been published, aiming to guide clinicians in the use of these biomarkers. In these criteria, longstanding and unexplained MCI is considered an indication for additional biomarker testing [3, 4]. The clinical practice guidelines of the American Association of Neurology (AAN) for MCI are more reluctant and recommend against the use of biomarkers in clinical practice as it is currently unclear how to value additional diagnostic testing in pre-dementia stages [5]. In line with this practice guideline, clinicians tend to implicitly steer against biomarker testing in MCI patients [6], even when multiple studies have shown the prognostic value of CSF biomarkers in MCI on a group level [7, 8]. We think that this suboptimal use of biomarkers in the clinic might be due to the lack of practical cost-efficient tools.

In a former study, we constructed personalized prognostic models that enable estimation of prognosis in terms of dementia conversion for an individual MCI patient, based on available biomarkers [9, 10]. We showed that the use of CSF biomarkers improves prognostic performance over the use of demographic information and magnetic resonance imaging (MRI) information. Nonetheless, biomarker testing is unlikely to contribute to a more accurate prognosis in every MCI patient [11,12,13]. Here, we took as a starting point the notion that these same models could have additional value as a decision support tool, to aid clinicians in selecting patients for additional CSF biomarker testing.

We aimed to derive an algorithm to select MCI patients for CSF testing and to provide an estimate of the optimal proportion of patients to undergo CSF biomarker testing.

Methods

Patients

We selected n = 402 patients with a baseline diagnosis of MCI from the Amsterdam Dementia Cohort [14, 15]. Inclusion criteria were availability of MRI data, CSF data and at least 6 months of follow-up. Diagnostic workup consisted of a standardized 1-day baseline assessment. Clinical diagnosis was made by consensus in a multidisciplinary meeting [14]. Until early 2012, the MCI diagnosis was based on Petersen’s criteria [16]. From 2012 onwards, we used the core clinical criteria of the National Institute on Aging-Alzheimer’s Association (NIA-AA) criteria for MCI [2]. Standardized annual follow-up included a follow-up visit with the neurologist and neuropsychologist. The diagnosis was re-evaluated in a multi-disciplinary meeting of the professionals involved. Specific dementia types were diagnosed using established clinical criteria [17,18,19,20,21,22].

MRI

Scans before 2008 were performed on 1.0 and 1.5 Tesla scanners (Siemens Magnetom Avanto, Vision, Impact and Sonata, GE Healthcare Signa HDXT). From 2008 and on, MRI of the brain was performed on 3 T scanners (MR750, GE Medical Systems, Milwaukee, WI, USA; Ingenuity TF PET/MR, Philips Medical Systems, Best, The Netherlands; Titan, Toshiba Medical Systems, Japan). All images were performed according to a standardized protocol [23], of which we only used sagittal 3D T1-weighted images with coronal reformats in this study. All scans were reviewed by experienced neuroradiologists. We quantified left and right hippocampal volumes (HCV, mL) using FSL FIRST (FMRIBs Integrated registration and segmentation tool), which were summed for analysis [24].

CSF analysis

CSF was obtained by lumbar puncture, collected in polypropylene tubes (Sarstedt, Nurmberg, Germany), and processed according to international guidelines [25,26,27]. Abeta (1-42) and phosphorylated tau (p-tau) concentrations were measured using sandwich ELISAs (Innotest, Fujirebio, Gent, Belgium [28]. We adjusted Abeta concentrations for upward drift [29].

Stepwise approach

To determine which proportion of MCI patients should receive additional biomarker testing, we applied a stepwise approach. The procedure consisted of three steps.

Step 1: obtain progression probability

We took as a starting point our recently published and validated prognostic models to predict probability of progression to dementia within 3 years in MCI patients. These models were constructed with Cox regression and are described and validated in van Maurik et al. (2019) [9]. Here we assigned dementia progression probabilities (range 0–100%) to patients based on clinical data only (i.e. without CSF biomarkers), based on two diagnostic scenarios and using the following two models [9]:

  1. 1.

    Prognostic model based on demographic characteristics only (age, sex, and Mini-Mental State Examination (MMSE) score) further referred to as “demographics only”

  2. 2.

    Prognostic model based on demographic characteristics and hippocampal volume (HCV) (age, sex, MMSE and HCV; further referred to as “demographics and MRI.”

We report Harrell’s C statistics [30] and 3-year Brier scores [31, 32]. Harrell’s C statistic compares event times of pairs of patients and hence is a measure of how well the model discriminates between patients with different times to dementias. A Harrell’s C score does however not mean that the model’s progression probabilities are well-calibrated to the data. Therefore, we report Harrell’s C together with the 3-year Brier score. The 3-year Brier score measures the quadrative distance between the dementia status after 3 years and the model progression probability, thus is reflective of prognostic accuracy capturing both discrimination and calibration.

Step 2: refine prognosis using a stepwise approach

We reasoned that patients with high or low progression probabilities based on clinical data only are unlikely to benefit from additional biomarker testing, in terms of improving the prognostic accuracy for dementia conversion. On the other hand, in patients that have an initial progression probability in the center of all patients’ prognostic probabilities, additional biomarker testing could improve the prognosis. Therefore, in our MCI group, we defined the median 3-year progression probability according to the demographic and/or MRI information as most uncertain since it is the predicted prevalence of 3-year progression.

Subsequently, we used a stepwise approach and added additional CSF biomarker data (Abeta and p-tau concentration in CSF; further referred to as additional CSF) to refine prognosis in the 10% (between percentile 45–55) of patients surrounding the median 3-year progression probabilities. Of note, due to the high correlation of p-tau and total tau (t-tau), t-tau concentrations are not included in the models. Details on the selection of variables in the models are described elsewhere [9, 10]. Meaning that after the first 10% of patients, the prognosis is refined with biomarker data in 20% of patients (between percentile 40–60), then 30% (between percentile 35–65), and so on. Supplemental Table 1 provides an overview of 3-year prognostic probabilities (i.e., probability thresholds) that correspond with these percentiles. Patients with 3-year progression probabilities outside these percentile ranges receive a prognosis from the more simple demographic or MRI model. We performed this stepwise approach by fivefold cross-validation and added additional CSF biomarkers on (1) the demographics information only and (2) demographics and MRI. Overall cross-validated performance of this stepwise model was defined based on the combination of the proportion of patients with probabilities based on clinical information only and the proportion of patients with additional CSF biomarker testing.

Step 3: classification performance comparison

We plotted cross-validated Harrell’s C and 3-year Brier scores of stepwise models with increasing proportion of patients receiving biomarker testing against the models with clinical data only (demographics only/demographics and MRI) and the model with additional CSF testing for all patients. This allowed us to identify the optimal proportion of patients where the stepwise approach performed better than the model with clinical data only and equally good as the additional CSF biomarker model in terms of prognostic discrimination (Harrell’s C) and prognostic accuracy (3-year Brier scores).

As we used percentiles of the calculated prognostic probabilities with demographic and/or MRI data, the optimal proportion that is selected corresponds with certain demographic or MRI-model derived probabilities (supplemental Table 1). As a result, the optimal proportion also provided us with an algorithm that defines the threshold of demographic or MRI-model derived probabilities where additional biomarker testing would be indicated, further referred to as probability thresholds.

Evaluation of stepwise approach

Lastly, we applied the identified probability thresholds found by the stepwise approach in the BioFINDER cohort [33]. From the BioFINDER study, we included n = 221 patients with a baseline diagnosis of MCI with available MRI and CSF data and at least 6 months of follow-up. Prognostic probabilities are calculated based on demographic information only and on demographic and MRI information. Based on the identified probability thresholds, the prognosis is refined with additional CSF for only a proportion of patients. Discriminative performance and prediction accuracy in this independent cohort was defined on the combination of the proportion of patients with probabilities based on clinical information only and the proportion of patients with additional CSF biomarker testing.

We illustrate the practical use of the developed algorithm with two cases, one in whom additional CSF testing adds prognostic information, and one where it did not add prognostic information. For the reader to appreciate the clinical characteristics of MCI patients that were or were not selected for additional CSF testing, we will report on the clinical and demographic data for selected patients, patients below the lower probability threshold (not selected) and patients above the upper probability threshold (not selected).

Results

Table 1 presents the patient characteristics. Mean age of the MCI patients was 66 ± 8 years, 164 (41%) were female, and mean MMSE score was 27 ± 2 points. Overall, 189 (47%) patients progressed to dementia during 3 ± 2 years of follow-up.

Table 1 Patient characteristics

In Fig. 1, the stepwise approach from demographic information only to additional CSF testing is shown. This figure shows the prognostic discrimination and prognostic accuracy of the stepwise model in comparison with demographic information only (Harrell’s C = 0.60, 3-year Brier score = 0.198) and demographics with additional CSF model when CSF results were included from all patients (Harrell’s C = 0.70, 3-year Brier score = 0.186). The discriminative performance of the stepwise model started to increase if 10% of the patients surrounding the median received CSF testing. The discriminative performance of the stepwise model gradually further increased, until it performed similarly to the CSF model (Fig. 1a) when 50% of the patients underwent CSF testing (Harrell’s C = 0.67). Brier scores showed a similar pattern and were comparable with the CSF models if also 50% of the patients received CSF (3-year Brier score = 0.190, Fig. 1b).

Fig. 1
figure 1

Model performance comparison from demographic only to additional CSF. Comparison of model performance of the stepwise approach (black) from demographic information (red) only to additional CSF testing (blue). a. Prognostic discrimination measured with cross-validated Harrell’s C. b. Prognostic accuracy measured with cross-validated 3-year brier scores. A lower brier score indicates a better prognostic accuracy. CSF, cerebrospinal fluid

Figure 2 shows the stepwise approach from demographic and MRI information (Harrell’s C = 0.61, 3-year Brier score = 0.195) to additional CSF testing (Harrell’s C = 0.70, 3-year Brier score = 0.187). The stepwise model again started to increase if 10% of the patients received CSF testing and performed similarly to the CSF in all patients model (Fig. 2a) when 50% of the patients received CSF testing (Harrell’s C = 0.67). Brier scores showed a similar, although more wiggly, pattern and was comparable with the full CSF model if also 50% of the patients received CSF (3-year Brier score = 0.190, Fig. 2b). Table 2 shows the characteristics of patients that were and were not selected based on demographic and/or MRI information.

Fig. 2
figure 2

Model performance comparison from demographics and MRI to additional CSF. Comparison of model performance of the stepwise approach (black) from demographic and MRI information (red) only to additional CSF testing (blue). a. Prognostic discrimination measured with cross-validated Harrell’s C. b. Prognostic accuracy measured with cross-validated 3-year brier scores. A lower brier score indicates a better prognostic accuracy. CSF, cerebrospinal fluid; MRI, magnetic resonance imaging

Table 2 Characteristics of MCI patients stratified by selection for additional CSF testing

Subsequently, we evaluated the identified probability thresholds in the BioFINDER study (supplemental Table 1). Patient characteristics of the BioFINDER study are reported in supplemental Table 2. Applying the identified probability thresholds by the stepwise approach in the BioFINDER study would select 51% for CSF biomarker testing based on demographic information. Based on demographic and MRI information, the algorithm would select 48% of patients for additional CSF testing. CSF testing only in this proportion of patients yielded a better performance in comparison with CSF testing in none of the patients and a similar performance in comparison with CSF testing in all MCI patients (supplemental Table 3).

To illustrate the practical implementation of our algorithm for additional CSF biomarker testing, we present two clinical cases. For patient A, based on age (70 years), sex (female), and MMSE score (28), the 3-year progression probability was estimated to be 49.7%. This probability falls within the identified probability of the 50% of patients surrounding the median, and therefore additional CSF testing would be recommended. Adding CSF information (Abeta = 1188, p-tau = 47) resulted in a far lower progression probability of 17.8%.

For patient B, both demographic and imaging information were available. Based on age (54 years), sex (male), MMSE (29), and HCV (sum; 7 cm3), the 3-year progression probability was estimated to be 14.0%. This probability falls outside the identified probabilities of the 50% of patients surrounding the median based on demographic information and MRI. As the progression probability was already low, the algorithm does not recommend to add CSF testing. The progression probability of the ATN model (additional CSF testing; Abeta = 1349, p-tau = 44) for this patient was 9.2% and showed that CSF indeed did not meaningfully alter the estimated prognosis of this patient.

Discussion

We developed an algorithm to identify those MCI patients most likely to benefit from additional biomarker testing. We showed that CSF biomarker testing adds prognostic value to clinical information in half of the MCI patients. The findings were replicated in an independent cohort. As such, we achieved a CSF saving recommendation without reducing prognostic accuracy.

In the decision to perform additional diagnostic testing, it is important to specify to what end a diagnostic test is performed, e.g., to identify or exclude Alzheimer’s disease (AD) pathology, predict clinical progression, change disease management, and/or improve well-being. The BIOMARKAPD project, a multidisciplinary working group, ranked these clinical questions on importance, and showed that CSF biomarkers are particularly useful to identify AD pathology and to predict progression to AD dementia in MCI patients [34]. Their recommendations are similar to those of the appropriate use criteria for CSF [4]. Both advise on CSF testing in all MCI patients. However, these recommendations are based on studies that investigated the additional value of CSF in terms of diagnostic or prognostic accuracy on a group level. In such studies, CSF is tested in all (MCI) patients and provides no information on the usefulness in specific patients. Moreover, the appropriate use criteria fairly state that a comprehensive clinical evaluation should precede the use of CSF biomarkers [4]. Clinicians should then determine, based on the available information, in which patients’ CSF biomarkers contribute to the diagnosis and clinical decision making. Such statements in the appropriate use criteria, however, are hard to operationalize for clinicians, especially in pre-dementia stages.

The current study provides clinicians with an easy-to-use algorithm that uses readily available information (i.e., age, sex, MMSE, and hippocampal volume if available) to identify MCI patients for CSF biomarker measurement. We took as a starting point progression probabilities based on basic clinical information only. By identifying the range of progression probabilities close to the progression prevalence in the population, where CSF is likely to add prognostic value, we allow the clinician to make an informed decision on performing biomarker testing. The clinician could also use this information to inform the patient before embarking on biomarker testing and manage expectations about potential outcomes. The communication of considerations to perform or not perform a diagnostic test was given high priority in a recent Delphi consensus study among clinicians, patients, and caregivers [35]. The BIOMAKAPD workgroup also acknowledges the importance of these considerations; they recommend that “in the case of positive biomarkers a personal follow-up plan should be offered and appropriate support should be initiated in the case of symptom progression”. And “in the case of negative AD biomarkers, an intensive follow-up plan may not be necessary”. Although this mentions implications for both possible outcomes, it is still in general terms and does not take available clinical information into account.

In the search for practical guidelines on which patient to test, several previous studies developed prediction models for amyloid positivity. Although these studies differ in their methodological details, they all focus on only one of the pathological hallmarks of Alzheimer’s disease as tauopathy and neurodegeneration are not considered [36, 37]. Moreover, most of these studies compare patients with AD dementia with controls and cannot be generalized to the MCI population. Finally, these algorithms identify individuals most likely to benefit from additional testing to identify amyloid positivity—most relevant in a trial setting, while in clinical settings, the clinical outcome, i.e., progression to (any type of) dementia is more relevant. One previous study used a computer algorithm to select patients in whom CSF testing was likely to contribute to a more accurate differential diagnosis for different types of dementia [38]. In this study, CSF testing was recommended in 26% of the cases. However, MCI patients were not included in this study. In the current study, we extended on the available literature with a keen eye for the needs in clinical practice by providing an algorithm to select MCI patients in whom CSF testing is most likely to contribute to a more accurate prognosis.

One of the strengths of the current study is that our algorithms make use of validated prognostic models to estimate the prognosis of each patient using available clinical information (patient characteristics and/or hippocampal atrophy). Moreover, we used measures that are easily available to the clinician, i.e., patient characteristics, the widely used MMSE score, and hippocampal volume. Although we described this stepwise approach for the decision to perform CSF testing in MCI patients, our approach has general applicability to investigate a stepwise approach from any two prognostic models. The novelty in our study is that we used a data-driven approach to define the proportion of patients that would benefit from additional biomarker testing, i.e., the performance of the stepwise approach should be significantly better than the clinical model and similar to the full (demographics, MRI and CSF) model. Similar approaches have been proposed for the classification of cancer samples by means of high-dimensional genomic markers [39] With our full model we have a measure for amyloid (A), tauopahty (T), and neurodegeneration (N) and thus align with the ATN criteria reported by the NIA-AA [1]. Lastly, we validated our stepwise approach in an independent cohort. The success of this validation may have resulted from the fact that BioFINDER patients had a similar risk profile compared to the MCI patients from Amsterdam, as similar diagnostic guidelines for MCI were used in both cohorts. The usefulness of this stepwise approach in a population with a different composition of risk profiles should be a topic for further research.

Limitations

Among the limitations is that we were unable to construct a stepwise approach from the demographic model to the MRI model, as the demographic and MRI model performed similarly in our sample (data not shown). Although the addition of MRI does not result in a more accurate prognosis in MCI patients, performing MRI or CT is still valuable to exclude other (reversible) causes for cognitive impairment. Other diagnostic tests, like amyloid-PET, were not part of the current study. In a future study, we aim to apply the same approach to amyloid-PET. Based on previous research on prognostic models in amyloid-PET, we expect similar results as reported here [40]. Another direction for further research is the definition of the prognostic accuracy measure. In this paper, we have chosen two well-established measures of model performance, i.e., Harrell’s C and the Brier score. Harrell’s C has the limitation that it is only a discriminative measure which may select models that are poorly calibrated to the actual data. The Brier score is also a measure of calibration, but the quadratic distance used for measuring accuracy may not be the most appropriate measure in clinical decision making.

Conclusion

In conclusion, we showed that by performing CSF testing in 50% of the MCI patients the same prognostic accuracy is reached compared to testing all patients. Our algorithm uses prognostic models without CSF data to identify those patients most likely to benefit from CSF testing. This has important implications with respect to cost-efficient use of CSF testing. Furthermore, this approach also aids clinicians to set appropriate expectations before diagnostic testing.