FormalPara Key Summary Points

Why carry out this study?

Identification of mild cognitive impairment (MCI) and Alzheimer’s disease (AD) among Veterans in the United States (US) Veterans Affairs Healthcare System (VAHS) is a clinical and research imperative

We hypothesized that unstructured clinical notes in the VAHS electronic health record could serve as an important tool for identifying Veterans with recorded MCI or AD and could provide more contextual information for identification than diagnostic codes from structured administrative data

What was learned from the study?

By using iterative searches of clinical notes over the study period (Fiscal Years 2010–2019), we identified 339,007 Veterans recorded with MCI due to any cause and 572,063 with AD; clinical note-based identification captured more Veterans recorded with MCI and AD than diagnostic code-based identification

We found that, in the absence of an in-depth expert review/interpretation, notes did not offer substantial contextual information regarding clinician judgments; nonetheless, by using note- and code-based approaches, we were able to designate research cohorts for MCI and AD that will serve to provide comprehensive representation of the AD population, inclusive of early stage and at-risk individuals

Introduction

An estimated one in ten older Veterans in the United States Veterans Affairs Healthcare System (VAHS) have dementia [1]. Alzheimer’s disease (AD), the cause for 60–80% of dementia cases, has a devastating and progressive course [2]. Prior to the onset of dementia, the AD continuum includes preclinical AD (beginning of AD pathologic changes in an otherwise asymptomatic person) and mild cognitive impairment (MCI) due to AD [2]. Early and accurate identification of individuals with AD is a clinical and research imperative for clinical interventions, especially now that anti-amyloid treatments (e.g., aducanumab and lecanemab) indicated for MCI due to AD and mild AD are available. Treatments such as lecanemab have led to reduction in levels of brain amyloid as well as less decline on clinical measures of cognition/function [3, 4].

There are many challenges to identifying MCI and AD in clinical practice. Despite advancements in biomarkers and neuroimaging for AD (e.g., positron emission tomography, cerebrospinal fluid amyloid, and tau/phosphorylated tau quantification), there is no established “universal” diagnostic test, nor is there a definitive diagnostic guideline for validation of MCI due to AD or AD; these challenges contribute to uncertainty and variations in clinical practice [5, 6]. Stigma and lack of knowledge regarding MCI and AD may contribute to delays in individuals seeking evaluation [7]. For example, declines in cognitive function may be misattributed to “normal” aging [8]. Despite community-dwelling older adults having concerns about their memory, few discuss those concerns with their doctors [9]. In addition, clinician barriers can impact evaluation of AD due to intrinsic variations among clinicians’ subjective assessments and cognitive test score-based objective assessments [10, 11].

Diagnostic codes are the primary means for identifying individuals with MCI and AD in structured data. However, in the VAHS, low use of diagnostic codes related to cognitive impairment prior to dementia diagnosis and overuse of nonspecific dementia diagnostic codes have been reported [11, 12]. These findings highlight the limitations of using diagnostic codes for MCI and AD case identification in large populations. In addition, diagnostic coding is routinely used in clinical practice to indicate a rule-out evaluation for a specific, suspected health disorder. We hypothesized that clinical notes in the VAHS electronic health record (EHR) could serve as an important tool for capturing more Veterans with either MCI or AD than using diagnostic codes alone and that notes would provide more contextual information for identification since codes are primarily from structured administrative data.

Herein, we describe our use of clinical notes in the VAHS EHR (1) to identify Veterans diagnosed with or being evaluated for MCI due to any cause or AD and (2) to establish distinct cohorts for future research investigations.

Methods

Data Source and Extraction

A retrospective analysis was conducted using the Veterans Affairs Informatics and Computing Infrastructure (VINCI) database [13]. Text Integration Utilities (TIU) was used to query clinical notes. TIU is a set of basic natural language processing (NLP) software tools designed to handle clinical documents in a standardized manner and is available in the Microsoft Structured Query Language (SQL) Server relational database system. Clinical notes recording MCI or AD in Veterans were queried from the VAHS EHR using targeted keyword searches from fiscal year (FY) 2010 through FY 2019. In this analysis, FY was defined as the period from October 1 and through September 30 (designated by the calendar year in which it ended). This study was approved by the Bedford VAHS Institutional Review Board; informed consent was not required as all data were fully deidentified before access. This study was conducted in accordance with the Declaration of Helsinki 1964 and its later amendments.

Clinical Note-Based Identification of MCI and AD

Inclusion Criteria for MCI and AD Clinical Notes

All clinical notes in the VINCI database for Veterans above the age of 50 years from FY 2010–2019 were searched for relevant keywords (detailed in next section) related to MCI due to any cause or AD.

For MCI, “MCI” and “mild cognitive impairment” were selected as keywords, although additional exclusions were required for the “MCI” keyword to avoid selecting notes related to “millicurie” (more details in Supplemental Appendix). Our methodology did not distinguish whether MCI was due to AD or other causes. For AD, the “Alz*” keyword was selected since it includes possible common misspellings of Alzheimer (e.g., Alzeimer and Alzhemer) as well as variations in how this term is documented in the clinical notes (e.g., Alzheimers and Alzheimer’s). The “AD” keyword was initially considered, but review of results revealed wide usage of this acronym for unrelated conditions/designations (e.g., “advance directive,” “active duty,” “attention deficit,” “antidepressant,” “Addison’s disease,” etc.).

Notes with pharmacy-related titles were excluded as they were deemed unreliable for making MCI or AD designations (Supplemental Appendix). Notes were further limited to Veterans who had an inpatient and/or outpatient encounter in the FY with the note containing the MCI or AD keyword; any purely administrative notes (i.e., not associated with a clinical encounter/visit/admission) were excluded.

Iterative Inclusion Processes: MCI/AD Diagnostic Code and Problem List Mentions in Clinical Notes

In addition to the keyword searches, we also searched for diagnostic codes in the clinical notes that were within ± 8 words of the relevant search keywords: for MCI we searched for International Classification of Diseases (ICD)-9-Clinical Modification (CM) “331.83” and ICD-10-CM “G31.84;” for AD we searched for ICD-9-CM “331.0” and ICD-10 CM “G30*.” Additionally, the VAHS EHR clinical notes have a “problem list” feature that serves as another tool to capture diagnoses, where the content from the Computerized Patient Record System (CPRS) is directly transferred to the clinical notes; in this study, notes that listed MCI or AD in the EHR “problem list” were included. In addition to the more structured “problem list” section, we included notes that contained “problem” along with other common headings such as “impression” and “assessment.” These notes included MCI or AD as a problem along with other comorbidities.

Iterative Exclusion Processes for MCI/AD Notes

Iterative exclusion processes were used to establish a sample of notes with a positive predictive value (PPV) of at least 80%. Once the initial notes samples were identified by keyword searches, a randomly selected subset of at least 100 notes was reviewed manually to determine the PPV. If the PPV did not achieve a threshold of ≥ 80%, findings from the review were used to develop exclusions that were then applied to the entire sample. Once exclusions were applied, another randomly selected subset of notes was reviewed manually. This process was repeated until the PPV threshold of ≥ 80% was achieved. Details of the iterative exclusion process are included in the Supplement Appendix and Table(s) S1 and S2. Figure 1 summarizes the workflow for the MCI and AD clinical notes.

Fig. 1
figure 1

Workflow for notes containing MCI and AD keywords. AD Alzheimer’s disease, FY fiscal year, ICD International Classification of Diseases, MCI mild cognitive impairment

Creation of Distinct MCI and AD Cohorts

The MCI and AD cohorts were created based on Veterans who were identified via clinical notes that met the iterative search criteria (i.e., qualifying notes) from the unstructured data, along with Veterans who were identified via diagnostic codes for MCI or AD from the structured data. These cohorts included Veterans qualified based on clinical notes alone (“Notes only”), diagnostic codes alone (“Codes only”), or a combination of note(s) and diagnostic code(s) (“Notes + code”). Subjects with dementia not otherwise specified were not included if they did not meet the above criteria.

Validation of MCI and AD Cohorts

The distribution of clinical notes that fulfilled the iterative criteria for MCI or AD was examined by FY over the entire study period. Demographic characteristics in FY2010 were summarized for all Veterans with clinical notes with MCI or AD meeting the iterative criteria.

Among Veterans in MCI and AD cohorts who were identified based on “Notes + code,” the distribution over time of the first (index) qualified MCI or AD designation from the unstructured data (clinical notes) and the first diagnostic code were examined over the entire study period.

Statistical Methods

Descriptive statistics were used to summarize our findings, using counts and percentages for categorical variables (Veteran’s sex, race/ethnicity) and mean, standard deviation (SD), minimum, and maximum for continuous variables (Veteran’s age, number of notes).

The PPVs for keyword-based identification of Veterans with MCI and AD were calculated by dividing the number of true positive notes by the number of total keyword-identified notes randomly selected for manual review.

Results

MCI and AD Identified by Clinical Notes in the VAHS

A total of 2,134,661 notes from 339,007 Veterans were qualified based on clinical notes for MCI over the entire study period (FY 2010–2019). A total of 4,231,933 notes from 572,063 Veterans were qualified based on clinical notes for AD over the entire study period (Supplement, Table S3C). The “mild cognitive impairment” iterative search yielded more notes (1,756,362 notes from 272,754 Veterans; Supplement, Table S3A) than the “MCI” iterative search (469,686 notes from 140,640 Veterans; Supplement, Table S3B). The PPVs for the “mild cognitive impairment,” “MCI,” and “Alz*” iterative searches were 97%, 86%, and 83%, respectively (Supplement, Table S1).

At the start of FY 2010, the mean (SD) ages of the Veterans whose notes fulfilled the iterative criteria for MCI and AD were 68 (11.2) and 70 (11.5) years, respectively (Table 1). In both groups, 96% of Veterans were male, and approximately 70% were Non-Hispanic Whites.

Table 1 Demographics of Veterans in clinical note-identified sample (FY2010)

Over the 10-year study period, the average number of notes recording MCI per Veteran was 6 (minimum 1; maximum 725), and the average number of notes recording AD per Veteran was 7 (minimum 1; maximum 2105). The number of clinical notes and Veterans meeting the AD criteria was generally stable from one FY to the next, whereas the numbers for MCI more than doubled over the 10-year time span (Fig. 2).

Fig. 2
figure 2

Distribution by FY of A Veterans and B clinical notes that fulfilled the iterative criteria for MCI or AD. AD Alzheimer’s disease, FY fiscal year, MCI mild cognitive impairment

Designation of VAHS MCI and AD Research Cohorts

A visualization of the MCI and AD cohorts encompassing the clinical note-identified sample (with or without diagnostic codes from the structured data) is shown in Fig. 3. Among 339,007 Veterans with a qualifying MCI note, approximately 55% (n = 187,593) were identified based on having “Notes Only” and 45% (n = 151,414) were identified based on having “Notes + code.” Among 249,693 Veterans with one or more diagnostic codes for MCI in their structured data, approximately 39% (n = 98,279) were identified based on having a “Code only” (i.e., had no qualifying note recording MCI). Among 572,063 Veterans with a qualifying AD note, approximately 75% (n = 427,623) were identified based on having “Notes Only” and 25% (n = 144,440) based on having “Notes + code.” Among 155,622 Veterans with one or more diagnostic codes for AD in their structured data, approximately 7% (n = 11,182) were identified based on having a “Code only” (i.e., had no qualifying note recording AD).

Fig. 3
figure 3

Distribution of Veterans with a qualifying clinical note and/or at least one diagnostic code for A MCIa and B ADb. AD, Alzheimer’s disease; MCI, mild cognitive impairment. aForty-five percent of Veterans with a qualifying MCI note also had at least one MCI diagnostic code; 61% of Veterans with at least one MCI diagnostic code also had a qualifying MCI note. bTwenty-five percent of Veterans with a qualifying AD note also had at least one AD diagnostic code; 93% of Veterans with at least one AD diagnostic code also had a qualifying AD note. Notes Only identified by notes only, Notes + code identified by notes plus at least one code in the structured data, Code Only identified by at least one code in the structured data, but no note meeting the iterative search criteria for MCI or AD

Veterans Identified Via “Notes + code”

Among the Veterans who had both a clinical note for MCI and a structured diagnostic code for MCI (i.e., “Notes + code”, n = 151,414), more Veterans received a clinical note for MCI before receiving a diagnostic code for MCI from FY 2010 to FY 2015. However, from FY 2016 to FY 2019, the diagnostic coding for MCI was more than double that in preceding years, and the numbers of Veterans with their first clinical recording of MCI via note were equally frequent to recording via diagnostic code (Fig. 4A). Among the Veterans who had both a clinical note and a structured diagnostic code for AD (i.e., “Notes + code,” n = 144,440), over twice as many Veterans received a clinical note for AD before receiving a diagnostic code for AD in each of the FYs assessed (FY 2010 through FY 2019) (Fig. 4B).

Fig. 4
figure 4

Distribution by FY of the first note and first diagnostic code for A MCI and B AD. AD Alzheimer’s disease, FY fiscal year, MCI mild cognitive impairment. The number of “first notes” observed in FY 2010 among Veterans identified using “notes + code” is likely artificially elevated relative to other years, since FY 2010 was the first year of data extraction (i.e., some of the Veterans counted as having their first note in FY 2010 may have had their actual first notes in the FYs preceding the data extraction period)

Approximately two-thirds of the Veterans with MCI and AD who were identified by “Notes + code” received their first clinical note and first diagnostic code in the same FY. Among Veterans with MCI, the first clinical note for MCI preceded the first diagnostic code for MCI in approximately 22% of Veterans, while the first clinical note followed the first diagnostic code in approximately 13%. Among Veterans with AD, the first clinical note for AD preceded the first diagnostic code for AD in approximately 30%, while the first clinical note followed the first diagnostic code in approximately 3%.

Discussion

In this study, we used keyword searches of electronic clinical notes in the VAHS to identify Veterans diagnosed with or being evaluated for MCI or AD from FY 2010 through FY 2019. Through an iterative process we were able to achieve the PPV thresholds of ≥ 80% and identify over 2 million MCI clinical notes from 339,007 Veterans and over 4 million AD clinical notes from 572,063 Veterans. Using these clinical note-based samples as well as samples of Veterans with diagnostic codes for MCI or AD, we established cohorts for future research. Depending on whether “notes only,” “codes only,” or “notes + code” approaches were used, approximately 100,000 to 190,000 Veterans were identified for the MCI cohort and 11,000 to over 400,000 Veterans were identified for the AD cohort. These results emphasize that there is no single simple approach to identifying individuals with MCI and/or AD, and reliance on either structured or unstructured data alone is not likely to be sufficient.

While the number of Veterans identified by notes recording AD per FY was stable over our 10-year study period, the number of Veterans identified by notes recording MCI per FY rose markedly. This rise in MCI is consistent with findings from a previous diagnostic code-based analysis in the VAHS [14]. Dinesh et al. reported that while AD prevalence and incidence had a modest decline from 2004 to 2019 (with relative stability from 2010 to 2019), MCI prevalence and incidence rose sharply over the same period (particularly between 2010 and 2019) [14]. Of note, the prior study sought to identify individuals with an MCI or AD diagnosis using the criteria of at least two diagnostic codes from the structured administrative data, whereas our current study aimed to capture individuals with MCI or AD recorded in the clinical notes and/or with at least one diagnostic code for MCI or AD.

In the current analysis, we found that clinical notes identified a substantially higher proportion of Veterans with or being evaluated for MCI or AD than diagnostic codes from the structured database. For MCI, clinical notes identified > 35% more Veterans than diagnostic codes; for AD, clinical notes identified > 267% more Veterans than diagnostic codes. The larger gap in the number of Veterans identified via clinical notes recording AD vs. diagnostic codes for AD likely reflects the underuse of AD-specific diagnostic codes [12]. Individuals may be coded for unspecified/general dementia and, due to a combination of patient-, provider-, and system-related factors, they may not seek intervention until their disease is more advanced/severe, at which time an AD diagnosis may be given. Of note, in the VAHS, diagnostic codes are not used for reimbursement purposes but rather for administrative and clinical reporting. Therefore, the methods and routines used for diagnostic coding in the VAHS likely represent clinical practice patterns that differ from those reflected in traditional claims databases.

We chose to employ clinical note-based cohort identification because of known limitations of diagnostic code-based identification (i.e., rule-out coding, coding representing clinical work-up instead of clinical judgment, coding for reimbursement). However, there are limitations in our study. We were confronted with the reality that clinical note-based identification did not offer substantial contextual information regarding clinician judgments, in the absence of an in-depth expert review/reading of the notes for interpretation. An immense investment of human power and time would be required to annotate clinical notes text as inputs for machine learning (ML) that can potentially discern the clinical meaning. Use of artificial intelligence and ML modalities in AD research is complicated and still preliminary [15, 16]. Whether emerging technologies such as generative pre-trained transformer can be adapted for clinical note-based applications is not yet established. Our simplified rule-based approach was not intended to understand or reveal the process by which clinicians assign MCI or AD diagnoses but rather to identify individuals who had or were being evaluated for MCI or AD. Our iterative process involved manual review of thousands of notes to develop exclusion criteria to achieve greater precision and accuracy; thus, we were able to identify a robust sample from complete records of clinical notes from which we could identify individuals for research of MCI and AD. While clinical notes offered a rich source of information, the quantity and quality of notes varied between Veterans as well as by provider types. While we recognized that evaluation of AD may be impacted by clinician type [10], we intentionally did not limit our clinical notes searches to specialists, since this would have reduced our sample and likely identified individuals with later stages of disease. Additionally, we did not wish to limit our research cohorts to only Veterans with confirmed diagnoses, but rather aimed to also capture Veterans who were being evaluated for MCI and AD followed by iterative exclusion of non-MCI/AD subjects (see Methods/Supplement). Despite the reliable PPVs for our cohorts, our approaches do not distinguish between Veterans with confirmed diagnosis of MCI or AD and those who are being evaluated for MCI or AD. Of note, MCI in this study could be due to any cause and remain as stable MCI, since there is not yet a diagnostic code for “MCI due to AD” and our iterative searches of notes captured all-cause MCI.

We believe our note-based approach is an important complement to diagnostic code-based approaches in AD research, especially given that disease-specific diagnostic codes are being underutilized. Indeed, we did identify many additional Veterans for inclusion in the MCI and AD cohorts by using clinical notes. We are developing an algorithm for MCI or AD prevalence estimation in a separate epidemiologic study. While our purpose for this article was to explore the feasibility of using clinical notes in the VAHS as a novel approach for MCI and AD case ascertainment from the electronic health record, we believe that further refinement of our approaches is needed prior to providing dependable epidemiologic estimates of incidence and/or prevalence of all-cause MCI and AD in the VAHS. Additional future research may include longitudinal evaluation of disease modifying therapies adoption to better understand patient/provider/system characteristics that may impact adoption. However, we do not currently plan to recruit patients based on these de-identified cohorts or use it for referral to clinical trials and disease modifying therapy treatments. Future research may also focus on notes with titles specific to MCI/AD (e.g., memory, behavior), compare notes from primary care clinicians versus specialists (e.g., neurologists, psychiatrists), and examine the overlap between cohorts (i.e., Veterans with notes/codes for both MCI and AD). In addition, clinical note-based cohorts offer a source of information regarding AD severity/stage that can be used to study transition between disease stages to better understand AD progression; this is especially important since staging information for AD is not yet available in diagnostic codes/structured data.

Conclusions

We found that our clinical note-based identification method captured more Veterans with or being evaluated for MCI or AD than diagnostic code-based identification. Using various combinations of clinical notes and/or diagnostic codes from the unstructured and structured portions of the EHR, respectively, we were able to designate research cohorts for MCI and AD that will serve to provide comprehensive representation of the AD population, inclusive of early stage and at-risk individuals.