FormalPara Key Summary Points
Study Aim
What is the concordance between subjective and objective assessments of Alzheimer’s disease (AD) severity?
What was learned from the study?
Clinical notes with subjective (clinician’s judgement) and objective (cognitive test) assessments of AD severity were extracted from the Veteran’s Affairs Informatics and Computing Infrastructure database using text integration utilities (2008–2021). Among 7514 notes, concordance between subjective and objective assessments was 53%. In the subjectively assessed mild AD cohort, objective assessments were more severe in 40% of notes.
In real-world settings, clinicians may be considering extra-cognitive factors when determining AD severity; there is a critical need for improved understanding of clinical assessments/decision-making in AD.


Alzheimer’s disease (AD) is a progressive neurodegenerative disorder affecting an estimated 6.5 million Americans aged older than 65 years, and over 30 million people globally [1,2,3,4,5]. The experience of each individual patient with AD is a horrible tragedy for both the patient and their family. The cost of care for AD in the USA alone is estimated to be well over $300 billion US dollars (USD), with the global societal cost of dementia exceeding $1 trillion USD [4,5,6].

Anti-amyloid therapies have been explored for patients with mild cognitive impairment (MCI) or mild dementia stage AD [7]. However, uncertainty surrounding the accurate assessment of the early stage or “mild” AD may limit the selection of patients to appropriate care.

Assessment of AD disease severity in clinical practice is often based on clinician’s subjective judgement as well as objective instruments such as the Mini-Mental State Examination (MMSE) and the Montreal Cognitive Assessment (MoCA) [8, 9]. Physicians’ subjective assessments of severity may be influenced by many factors including objective tests, specific diagnoses, and types of behaviors exhibited by the patient. Treatment decisions for patients with AD may be made on the basis of either subjective and objective data, or a combination thereof [10]; however, whether there is concordance between subjective and objective clinical assessments of AD is not well established.

In order to advance the understanding of AD stage determination, we analyzed clinical assessments of patients with AD in the Veteran’s Affairs (VA) Healthcare System and evaluated concordance between subjective and objective assessments.


Data Source and Extraction

This retrospective analysis utilized the VA informatics and Computing Infrastructure (VINCI) database [11] Text Integration Utilities (TIU)-extracted clinical notes. Our study cohort was based on 2,586,768 veterans with 357,608,246 inpatient/outpatient visits from 2008 to 2021. An initial sample of clinical notes was extracted from April 1, 2008 through October 14, 2021 using the following targeted keyword search for Alzheimer’s and disease severity: (“AD” or “Alzheimer”) and (“mild” or “moderate” or “severe”) within one word distance of one another to identify notes with a subjective assessment of AD severity. Our proprietary Python algorithm was then applied to this initial sample of clinical notes to extract MMSE or MoCA test scores. Notes with only Saint Louis University Mental Status (SLUMS) examination scores were also documented, but not included in the analysis because this test lacks a severity classification rubric and is less commonly used in the VA healthcare system. Our final study sample of clinician’s notes contained both subjective (clinical judgement-based) AD staging and objective (MMSE- or MoCA-based) AD staging. Validation performed by manual chart review of 100 randomly selected notes for 92 unique patients found approximately 80% accuracy: among all notes, 79% and 81% of TIU-extracted MMSE and MoCA scores, respectively, were consistent with chart reviewed scores. This study was approved by the Bedford VA Healthcare System Institutional Review Board, and all data were fully de-identified before access. This study was performed in accordance with the Helsinki Declaration of 1964 and its later amendments.

Clinically Determined AD Severity Cohorts

Mild AD, moderate AD, and severe AD cohorts were identified on the basis of the initial targeted keyword search of clinician’s notes. If a note contained keywords corresponding to more than one AD severity stage, it was classified into the cohort of lower severity.

MMSE and MoCA-Based Staging of AD Severity

Objective AD severity staging in this analysis was based on published and publicly available standard ranges/cutoffs for MMSE and MoCA test scores [12,13,14,15]. Mild AD was defined as a score of 21–24 on MMSE and 18–25 on MoCA; moderate AD was defined as a score of 13–20 on MMSE and 11–17 on MoCA; severe AD was defined as a score of 12 or less on MMSE and 10 or less on MoCA.

Study Endpoints

Primary Endpoint: Concordance Between Subjective and Objective Assessments

Using the clinician-defined mild, moderate, and severe AD cohorts, we determined overall concordance and discordance between the clinician’s (subjective) assessment and the test-derived (objective) assessments of AD severity stage. The following designations were used to connote concordance: S = O, subjective and objective severity assessments agreed; S < O, subjective assessment (e.g., mild) was less severe than objective assessment (e.g., moderate or severe); S > O, subjective assessment (e.g., moderate or severe) was more severe than objective assessment (e.g., mild or moderate).

Secondary Endpoints

Variation in concordance between subjective and objective assessments was assessed over time as well as by selected symptoms and comorbidities identified by ICD9/10 codes (eTable 1 in the supplementary material), major clinician type, practice setting (dementia vs non-dementia), and Veterans Integrated Service Networks (VISNs).

Statistical Analysis

Descriptive statistics were used to present the concordance and directional concordance data. An additional analysis was conducted to determine concordance when MMSE and MoCA scores were examined separately rather than grouped together. Chi-square tests were used for between-group comparisons. An alpha of less than 0.05 was considered significant.


Study Sample

The initial sample consisted of 65,196 clinician notes with AD staging based on keywords search (mild AD, 44,504; moderate AD, 20,663; severe AD, 29,619). From this initial sample, we identified 6054 notes that included an MMSE test score and 2834 notes that included a MoCA test score. We excluded 899 notes that had a SLUMS rating but did not have either an MMSE or MoCA score (approximately 10% of notes sampled). A total of 8888 notes (corresponding to 5150 unique patients) contained documentation of both the clinician’s subjective judgement of AD severity and objective MMSE or MoCA test scores. After exclusion of notes with clinical diagnoses corresponding to classifications other than mild/moderate/severe AD (e.g., excluding MCI; 1373), our final analysis sample included 7514 notes (corresponding to 4469 patients).

Among patients for whom demographic information was available (eTable 2 in the supplementary material), the mean (SD) age was 78 (9) years; most patients were male (96.5%) and White (77.8%). Black/African American and Hispanic/Latino patients represented 11.2% and 5.8% of patients, respectively.

Concordance Between Subjective and Objective Assessments of AD Severity: Primary Endpoint

Approximately half (53%) of the notes in our analysis sample of 7514 notes were concordant (S = O) (Table 1). This concordance rate was consistent regardless of AD severity cohort: clinicians’ subjective assessments matched the objective test score assessments in 54%, 52%, and 53% of notes in the mild, moderate, and severe cohorts, respectively. Among discordant notes in the mild AD cohort, almost all subjective assessments of severity were lower than the test score assessment: S < O represented approximately 90% of discordant notes in this cohort. Among discordant assessments in the moderate AD cohort S < O was slightly more common, representing approximately 57% of discordant notes in this cohort. Any discordant notes in the severe AD cohort were S > O (this was expected since AD cohorts were defined on the basis of clinical assessment, and the objective assessment could not be greater than severe).

Table 1 Concordance between subjective and objective assessments of AD severity by AD cohort

Secondary Endpoints

Concordance by Year

Overall, an increase in the proportion of concordant assessments was observed over the 12-year study period (Fig. 1). From 2015 onward, among discordant assessments, clinician assessments tended to be less severe than the objective test-based assessments.

Fig. 1
figure 1

Concordance between subjective and objective assessments of AD severity by year. The figure depicts the proportion of clinical notes that were S = O, S < O, and S > O in each year over the 12-year study period (2008–2021). S = O, subjective and objective severity assessments agreed (red); S < O, subjective assessment was less severe than objective assessment (green); S > O, subjective assessment was more severe than objective assessment (blue)

Concordance by Instrument Used

We separated notes into MMSE and MoCA subgroups to examine concordance between the subjective assessments and each type of objective test’s assessment. Approximately half (53%) of the notes were concordant (S = O) regardless of whether the MMSE or MoCA score was being utilized (Table 2). Directional patterns of discordant assessments were comparable for the two tests, but a higher proportion of discordant notes for MoCA were S < O.

Table 2 Concordance between subjective and objective assessments of AD severity by individual objective tests

Concordance by Symptoms/Comorbidities

Among the select symptoms and comorbidities, the proportion of notes with concordant assessments (S = O) ranged widely from 21% to 73% (Table 3). Symptoms associated with the highest concordance rates were “wander” (73.3%) and “aberrant motor” (67.4%); symptoms associated with the lowest concordance rates were “delusion” (20.8%) and “agitation/aggression” (39.9%). Diseases/conditions associated with the highest concordance rates were type 1 diabetes (60.5%), Graves’ disease (58.8%), and post-traumatic stress disorder (PTSD) (56.0%); diseases associated with the lowest concordance rates were schizophrenia (42.0%), Crohn’s (34.4%), and celiac (33.3%). Among discordant notes associated with psychiatric comorbidities, subjective assessments were more commonly less severe than objective assessments: the proportion of discordant notes with S < O was 66.7% (30/45) for irritability, 76.7% (66/86) for agitation/aggression, 88.9% (88/99) for delusion, 60% (308/513) for PTSD, 67.2% (548/816) for anxiety, and 64% (375/586) for bipolar/mania.

Table 3 Concordance between subjective and objective assessments of AD severity by select symptoms and comorbidities

Concordance by Healthcare System-Based Factors

Across the major clinician types in our notes sample, the proportion of notes with concordant assessments (S = O) ranged from approximately 47% to 58% (Table 4). More than half the notes in our sample (3778/7514) were by psychiatrists and neurologists; 53% of notes by these clinicians were concordant. For all clinicians, when notes were discordant, the subjective assessment of AD severity stage was generally lower than the test score-based assessment (23% to 38% of all notes were S < O; 11% to 21% of all notes were S > O). Among psychiatrists and neurologists, 29% of all notes were S < O and 18% were S > O.

Table 4 Concordance between subjective and objective assessments of AD severity by clinician specialty/type and practice setting

Notes associated with dementia-clinic visits comprised 6% of our sample. A higher proportion of notes associated with dementia clinic visits were concordant (S = O) compared with non-dementia clinic notes: 61% vs 53% (Table 4). Discordant notes were more commonly S < O (57% of discordant dementia clinic notes; 55% of discordant non-dementia clinic notes). Similarly, across VISNs with more than 200 notes each, the proportion of concordant notes (S = O) ranged from 40% to 64% (eTable 3 in the supplementary material).


In this descriptive retrospective analysis of the VINCI database (2008–2021), we found that approximately half of the clinical notes of veterans with AD showed concordance between clinicians’ judgement and cognitive test-based assessments of AD severity. The concordance rate was consistent across the clinically defined mild, moderate, and severe AD cohorts (52–54%). A slight rise in the proportion of concordant notes was observed over the 12-year study period. Of note, the overall concordance rates were similar, regardless of whether subjective assessments were compared to objective ratings from either the MMSE or the MoCA; this is consistent with the reported high correlation between these two instruments [15].

From 2015 onward, the overall trend among discordant assessments was for clinicians’ judgements of severity to be less severe than the instrument-based ratings. In the mild AD cohort, subjective assessment underestimated severity relative to the objective test in approximately 41% of all notes (90% of discordant notes). These findings suggest that in real-world settings, clinician judgement may take into account extra-cognitive considerations such as psychiatric, neuropsychological, behavioral, or functional factors, and thus deviate from the defined MMSE and MoCA thresholds for severity that were used in our study. Furthermore, while these tests are used for dementia assessment in clinical practice, they both lack specificity for AD and low scores are not diagnostic for dementia or AD [16]. Nonetheless, these tests are widely utilized to assist clinical diagnosis of AD, in conjunction with other diagnostic methods such as brain imaging [2].

Certain symptoms and comorbidities were associated with markedly higher or lower concordance between subjective and objective assessments of AD severity. “Wander” was associated with the highest concordance (73%), possibly because this symptom is a hallmark of AD. In contrast, “delusion” was associated with the lowest concordance (21%), possibly because this symptom may have been attributed to other conditions—in 70% of notes with delusion, the clinician’s assessment of AD severity was less than that of the objective instrument. Among discordant notes associated with most psychiatric symptoms/diseases, our observation that subjective assessments tended to be less severe than objective assessment suggests that clinicians may be considering the independent impact of comorbidities that have overlapping symptoms with AD, and is worthy of further study. Interpretation of disease severity also depends on a clinician’s appreciation of agnosia, a common symptom of Alzheimer’s dementia and this may contribute to the variation and discordance found in this study. In many cases, the reason a particular symptom or comorbidity impacted concordance was not clear, and interpretation may be limited by small numbers of notes with these comorbid conditions.

There are several potential explanations for clinician bias toward lower subjective assessments. First, there is a perceived lack of effective treatments for AD. Second, patients may tend to minimize their symptoms either for social reasons or as a result of agnosia. Third, patients are often interviewed in the presence of family members, who may prefer to minimize symptoms out of concern for patient dignity. Patients and families often understand that a diagnosis of dementia may lead to loss of independence (i.e., driving) and increased demands on families to provide support for transportation, medication management, and financial supervision. Fourth, clinicians may have less time for subjective assessment, when an MMSE or MoCA test is performed. All of these factors may impact clinicians’ use of language when describing dementia in clinical documentation.

In the era of anti-amyloid therapy for AD, identification of patients in early stages of AD is essential to ensure that therapies with the potential to alter the progression of this debilitating disease are initiated in a timely manner in patients who are most likely to benefit. The decision to initiate certain therapies for chronic neurological conditions may be complex, and ideally depends on a precise assessment of disease severity [2]. Unfortunately, the clinical management of AD currently lacks clear, well-defined assessment and treatment guidelines. This is in stark contrast to the current practice paradigm for multiple sclerosis (MS) in which clinicians have evidence-based consensus guidelines [17, 18] and can rely on direct biomarkers (e.g., demyelinating plaque burden assessed quantitatively by magnetic resonance imaging [MRI]; presence of active demyelination assessed by gadolinium-enhanced MRI) [18, 19] to weigh the relative risks and benefits of the therapy.

Disease burden in AD may be estimated by a combination of clinical assessment and measurement of biomarkers of brain amyloid-beta (Aβ) protein deposition (i.e., low cerebrospinal fluid [CSF] Aβ42 and positive positron emission tomography [PET] amyloid imaging), as well as surrogate biomarkers of neuronal injury (e.g., increased CSF tau, decreased fluorodeoxyglucose (FDG) uptake on PET as a marker of metabolic activity, and atrophy measurement by MRI) [3, 20, 21]. Given that biomarkers for AD are indirect and that their measurement may either be prohibitively expensive for patients or dependent on invasive procedures, AD assessments may often be primarily based on clinical evaluation [22]; clinicians’ subjective assessments are often based on consideration of cognition, function, as well as behavioral symptoms. In practice, clinical evaluation typically entails subjective assessments made by clinicians using a combination of clinical history, patient interview with short cognitive screening instruments, and exclusionary data to rule out other causes of cognitive impairment [23]. Parallel assessments are made using more detailed and objective cognitive assessment tools, such as the MMSE and the MoCA that are solely based on cognition; however, these approaches do not have specificity and sensitivity for AD diagnosis [8, 9, 16].

We performed a post hoc sensitivity analysis to assess whether using ranges that may overlap with a classification of MCI would influence our overall findings; AD severity ranges were drawn from a study that reported the following MMSE cutoffs: 21–25 for mild, 11–20 for moderate, and 0–10 for severe AD [13]. We found that the proportion of assessments that were S > O in the mild AD cohort exhibited a minor increase from 4.7% to 6.7%; overall concordance and discordance findings remained at similar levels (eTable 4 in the supplementary material). To carry out more comprehensive evaluation in future studies comparing objective assessment of AD to clinician’s subjective assessment, the brief interview to detect dementia (e.g., AD8) [24] and Reisberg Functional Assessment Staging (FAST) [25] could be incorporated.

The current study provides insight into assessments of AD severity in one of the largest US managed care settings; however, several limitations must be considered. Since there are no ICD codes specific to AD severity stages, we utilized keywords to identify mild, moderate, and severe AD cohorts. This method allowed us to efficiently examine a large volume of clinical note data, but likely missed clinical context that may have clarified some of the discordant findings. For example, the temporal relationship between a clinical assessment and objective assessment that appeared within the same note was not evaluated. Such context would require a validating chart review. In addition, retrospective analyses are inherently more susceptible to confounding variables than prospective studies [26]. We examined factors that could have impacted concordance, such as concurrent symptoms and comorbidities. Symptoms and comorbidities were identified using diagnostic codes, which can be subject to inaccuracy and/or undercoding [27]. We could not explore possible root causes of discordance between subjective and objective assessments of patients with AD since it was beyond the scope of this descriptive study. Clinician’s subjective assessment and test-derived objective assessment are two reiterative processes that are difficult to separate. Furthermore, MoCA screening might only be appropriate for assessment of earlier stages of AD in the elderly, which may have contributed to some of the observed discordance [8]. Finally, these findings from a population of veterans in the VA healthcare system may not be generalizable to the overall US population and healthcare system. While our findings do not specifically address changes that may be needed in clinical practice, they serve to raise awareness regarding discrepancies between subjective and objective assessments of cognitive impairment—these discrepancies can impact clinical decisions.

The burden of AD among veterans is expected to increase, not only as a result of population aging, but also due to the prevalence of other potential risk factors for dementia such as traumatic brain injury and PTSD in this population [28,29,30]. Additional areas of investigation in the veteran’s population include evaluating the impact of AD diagnosis on healthcare utilization and an exploration of whether certain comorbidities may influence the duration of time from MCI diagnosis to onset of AD.


We found higher concordance between clinical assessment and objective instruments in dementia specialty clinics compared with non-dementia clinics. Remarkably, approximately 40% of notes in the mild AD cohort were assessed as more severe by the objective test assessments. Since early-stage AD is the preferred target of anti-amyloid therapies, these data indicate a critical need for improved understanding of AD clinical assessments and clinical decision-making.