Background

Medical achievements have produced a population whose lifespan has increased by 30 years since the beginning of the twentieth century [1]. In 2012, there were 40.7 million people aged 65 and over in the United States (13.2% of the total population), with 38.7% reported to have one or more disabilities [2]. The aging population has also led to an increase in persons living with CI, more than 17 million people in the United States [3], causing patients, their families, and society an annual estimate of $18 billion in lost income and direct cost of care [4]. Herein, we are defining CI as either mild cognitive impairment (MCI) or dementia. In 2015 Alzheimer’s Disease International estimated that dementias affected 46.8 million individuals worldwide. They projected the number to nearly triple by 2050 reaching 131.5 million people worldwide [5]. Regarding this, the subject of MCI is paramount as it is a transitional zone between normal life in older ages and dementia. One study indicated that clinicians were not aware of CI in more than 40% of their patients [6]. The failure to diagnose cognitive complaints will delay appropriate care plans of underlying diseases and comorbid conditions, and may cause safety issues for patients and others [7, 8]. In many cases, the CI problem will worsen over time [8,9,10]. Thus, early diagnosis of CI can be of utmost importance and may reduce the large burden later on the medical and social care.

The impact of CI on ADL has been used as a criterion to differentiate MCI and dementia [11]. ADL is often divided into basic ADL (b-ADL), which includes activities such as personal hygiene, clothing, feeding and toileting [12] and instrumental ADL (i-ADL), which is commonly referred to as independent living abilities such as household activities, handling money, shopping, and transportation [13,14,15]. The i-ADL has a higher demand for cognitive function than the b-ADL and is important for living an independent life in society [16]. ADL is highly dependent on cognitive function and behavior [17]. Therefore, there should be assessments that are capable of detecting changes in ADL as soon as changes in cognition and behavior are detected [17].

In this study, we first examined basic statistics of EHR corpus relevant to CI diagnosis. Temporal trends of ADL in elderly patients (age 65 or up) mined from EHRs before they develop CI were compared between CI and CU patients. We used both structured (current visit information provided by patients) and unstructured data (clinical notes). Furthermore, we applied machine learning techniques (i.e., three topic modeling methods) on clinical notes to extract meaningful semantics (i.e., topics and terms) residing in clinical free text to examine their potential association with future CI development.

Different studies have used machine learning algorithms to differentiate between cognitively normal and MCI individuals [18, 19], to predict conversion from MCI to Alzheimer’s disease (AD) [20], and to predict the time to this conversion [19]. Researchers [21] developed two layers model in which the first layer is for a screening test to categorize a normal or abnormal group. The second layer is a close examination to classify MCI or dementia. They compared result with various machine learning approaches. Support vector machines, multi-layer perceptron and logistic regression showed high performance. Conversion from MCI to AD has also been studied using a deep learning model with MRI, neuropsychological and demographics data [22].

In another study [23], they tried to predict MCI from spontaneous spoken utterances. Classifying cognitive profiles using machine learning with fMRI data as an addition to cognitive data were explored [24]. In their work fMRI data are only used to train the classifier and classification of new data is solely based on cognitive data. Another research [25] focuses on the early diagnosis of AD with deep learning, utilizing sparse auto-encoder. They used neuroimages obtained from neuroimaging initiative database for identifying the region of brain images that are sensitive to AD progression. These previous studies tried to leverage their results by incorporating fMRI data into their models. Although it may have a positive impact on the result, not all the patients have the fMRI data so it may not be broadly applicable, compared to the application using routine EHR data in the health care population. Also, they did not try to identify new risk factors associated with the CI patients in EHRs but only rely on existing known medical conditions and fMRI data to predict the CI.

There are studies focused on predicting progression from MCI to dementia using neuropsychological data. Researches [26] considered the neuropsychological test results to examine their applicability for predicting dementia using a machine learning algorithm. They used a feature selection ensemble approach to choose the features available in the neuropsychological test as a predictor of developing AD dementia. The neuropsychological test to predict the time conversion from MCI was also investigated in [27]. In this study, MCI patients were grouped with regards to who developed to dementia (converter MCI) or remained MCI (stable MCI) during a specified time window. Then a prognostic model was developed to predict the conversion time as early as 5 years before developing to dementia.

Unlike the previous studies, we applied a machine learning approach (i.e., topic modeling) to examine topics and terms in EHR free text that can be potentially used for early detection of CI. A few studies have focused on the early diagnosis of CI [28]; however, these studies have followed the conventional approaches of assessing patients by i-ADL and b-ADL rather than utilizing machine learning algorithm and EHR free text.

Methods

The basic EHR corpus statistics (i.e., distributions of event types and practice settings of the first CI diagnosis, numbers of clinical notes between CI and CU patients) were examined. Temporal trends of patient ADL were compared and topics in the clinical free text were analyzed over time using three machine learning models between physician-diagnosed CI and CU patient groups.

Data

The study cohort was selected from patients 65 years of age or older at the time of enrollment in the Mayo Clinic Biobank (n = 22,772), where we identified physician-diagnosed CI patients (n = 1,435; male 55%) and CU patients (n = 1,435) matched by age (+/− 1 year) and sex. The physician-diagnosed CI patients were determined based on diagnosis (i.e., dementia, cognitive impairment, cognitive deficit, cognitive decline, mild cognitive impairment) under the diagnosis section in clinical notes [29].

Corpus analysis

The basic EHR corpus statistics relevant to CI diagnosis (i.e., the distributions of event types and practice settings of the first CI diagnosis) were examined and also the number of clinical notes over time between CI and CU patients was compared.

Analysis of activity of daily living

The ADL was collected from two sources: 1) the current visit information, which is provided and updated by the patients every 6 months when they visit the Mayo Clinic, 2) certain sections in clinical notes (i.e., instructions for continuing care, ongoing care orders, system review). The current visit information includes questionnaires to assess the ability of patients to accomplish ADL (binary assessment assessing the difficulty of ADL: yes or no) in a structured format. The clinical notes were processed by the MedTaggerIE module in MedTagger [30, 31], which is the open-source pipeline developed by Mayo Clinic for pattern-based information extraction with a capability of assertion detection (i.e., negated, possible, hypothetical, associated with a patient) and normalization, to extract ADL related concepts. These concepts were automatically mapped to the corresponding predefined ADL categories through the MedTaggerIE implementation (i.e., rule-based normalization process). We only included non-negated ADL related concepts.

Once we obtained ADL concepts, they were mapped to items in Katz’s index (b-ADL) [12] and Lawton scale (i-ADL) [13,14,15], which are the most commonly used tools for assessing ADL. The items of ADL used in this study for each ADL category are—1) b-ADL: bathing, dressing, transferring, toileting, and feeding; 2) i-ADL: using transportation, shopping, preparing food, housekeeping, responsibility for own medications, and handling financing. These items can be mapped to the International Classification of Functioning, Disability, and Health (ICF) [32], allowing for broad information exchange. The temporal trends of b-ADL and i-ADL between CI and CU patients were compared in every 6 months for 5 years before the first physician-diagnosed CI and the latest visit for CI and CU patients, respectively.

Analysis of topics in clinical notes

The topics in clinical notes were investigated: 1) how topic terms evolve in CI patients each year for the past 5 years (experiment 1), and 2) how topic terms are different between CI and CU patients over the 5-year period before the development of CI (experiment 2). This step-wise time frame allows us to observe how the topics change over time, motived by the expert recommendation that people older than 65 years old should visit doctors every 6 months to determine if symptoms are staying the same, improving or growing worse [17]. We examined the topics in 1) entire clinical notes, 2) individual sections (i.e., history of present illness, diagnosis, current medication) independently, and 3) the set of sections that most likely include medical concepts of interest (i.e., chief complaint, history of present illness, system review, past medical history, physical examination, impression/report/plan, and diagnosis).

For preprocessing the topic models, we keep the most frequent 2,000 words as the vocabulary after removing stop words and stemming. We applied three different machine learning models; two conventional topic modeling methods (LDA and TKM) and one deep learning approach (KATE) as follows. The number of topics was determined based on the self-regulatory capability embedded in a TKM model.

Latent Dirichlet allocation (LDA)

It is a generative probabilistic model in which the document will be viewed as a mixture of various topics and each topic as a distribution of the words [33]. We set the number of topics to 20 and 10 words distribution in each topic. Other hyper parameters were set as the code implemented in [34].

Topic keyword model (TKM)

This method addresses the shortcoming of LDA approach (i.e., ignoring the order of words). In TKM, each word in each topic aims to show how common the word is within the topic and how common it is between other topics [35]. The other advantage of this method is that redundant topics will be removed automatically. We used the hyper parameters as explained in the paper in [35].

K competitive autoencoder (KATE)

An autoencoder is a neural network which can automatically learn data representations though constructing its input at the output level. Many variants of autoencoders have been proposed mainly for image data. However, KATE has been designed to overcome the weakness of traditional autoencoder which is not suitable for textual data [34]. The number of the topics in this experiment was set to 20 and 10 words distribution for each topic. Other deep learning parameters were set as discussed in the original paper [34].

Results

We first examined basic EHR corpus statistics of the cohort. Then, we analyzed patterns of temporal trends of 1) b-ADL and i-ADL and 2) individual ADL between CI and CU patients before patients develop CI. The outcomes of three topic modeling methods (i.e., terms and topics mined from clinical notes) were analyzed and compared between the two patient groups over time, both qualitatively and quantitatively, in order to better understand patient medical conditions that may contribute more to CI development.

Corpus statistics

Figure 1 shows major event types (i.e., note types) and practice settings along with their occurrences in which a physician first diagnosed CI. The consultation was the most dominant event to diagnose CI (28%), followed by subsequent visit (19%), limited exam (18%), multi-system evaluation (11%), and supervisory (6%), which cover more than 80% of total events of CI diagnosis. For practice setting, neurology (31%) was the most dominant, followed by primary care (26%), general internal medicine (12%), family medicine (6%), and brain (3%).

Fig. 1
figure 1

Distribution of the first CI diagnosis (CON: consult, SV: subsequent visit, LE: limited exam, ME: multi-system evaluation, SUP: supervisory, SE: specialty evaluation, ADM: admission; GIM: general internal medicine)

Table 1 contains the statistics of clinical notes for the past 5 years of CI and CU patients before they develop CI and the latest visit date, respectively. As can be seen, CI patients consistently showed higher reading of clinical notes than CU patients and the difference was most significant in the first year before CI diagnosis.

Table 1 Average number of clinical notes for CI and CU patients (SD in parenthesis)

ADL distribution

Figure 2 shows temporal distributions of the deteriorated b-ADL and i-ADL of CI and CU patients in three age groups (65–74, 75–84, and 85 & up). Overall, CI patients had worse b-ADL and i-ADL (i.e., a higher ratio of deteriorated ADL) than CU patients in all age groups and this trend is more significant when it is close to physician-diagnosed CI for CI patients. The deteriorated b-ADL and i-ADL between the age groups of 65–74 and 75–84 are not much different for both CI and CU patients. Interestingly, the overall CU patients’ b-ADL were worse than i-ADL, but it is opposite for CI patients—i.e., CI patients’ i-ADL became worse than b-ADL over time, mainly when it was close to 1.5 to 1 year(s) before the physician-diagnosed CI.

Fig. 2
figure 2

Distribution of b-ADL and i-ADL for CI and CU patient groups (x-axis is year(s) before the 1st physicain-diagnosed CI for CI patients and the latest visit for CU patients; y-axis is a ratio of patients who have a deteriorated ADL)

We have also examined individual ADL trajectories for the entire patient cohort between CI and CU patients. Overall, CI patients had more deteriorated ADL than CU patients over time for all ADL categories. The most deteriorated ADL in 6 months prior was transferring (17% for CI and 14% for CU patients) in b-ADL and housekeeping (14% for CI and 10% for CU patients) in i-ADL. The difference between the two groups is relatively small for housekeeping and transferring, but large for bathing and responsibility for own medication (Fig. 3).

Fig. 3
figure 3

ADL distributions for CU and CI patient groups (x-axis is year(s) before the 1st physicain-diagnosed CI for CI patients and the latest clinical visit for CU patients; y-axis is a ratio of patients who have a deteriorated ADL)

Topic modeling

Qualitative analysis

We examined the topic terms extracted by three different models (i.e., LDA, TKM, and KATE) from clinical notes to compare hidden topics in CI patients before they develop CI. This approach may reveal potential patient medical conditions that lead to CI. Tables 2, 3 and 4 include topic terms generated by different topic models in different portions of clinical notes for 6 months before physician-diagnosed CI. The bold font in the topic denotes the correlated words in a given topic relevant to CI.

Table 2 Topic words by TKM (6 months before CI diagnosis)
Table 3 Topic words by KATE (6 months before CI diagnosis)
Table 4 Topic words by LDA (6 months before CI diagnosis)

The words in the tables are stemmed. We included one representative cluster of the topics for each section. As can be seen in the tables, the topics are distinguishable of each other, capturing a meaningful representation of the text data. For example, Table 2, all sections show some symptoms related to “fatigue,” which may be the potential risk of dysfunction [36]; the topic in set of sections is relevant to “sleep issue” that could be observed in the individuals suffering from cognitive disorder [36, 37]. The topic words in the history of present illness section, we can observe glucose, diabetes, insulin, and hydrochlorothiazide, which are related to diabetes disease considered as a potential risk factor of cognitive decline [38]. For the topic in the medication section, we observed medications to control high blood sugar [38]. The topic in the diagnosis section includes the terms related to cancer [39,40,41,42].

Table 3, set of section and history of present illness include hyperlipidemia that can be considered as a risk factor of CI [43], coronary artery disease and hypertension, which are relevant to cognitive decline [44, 45]. In Table 4, LDA result in similar outcomes as TKM and KATE is shown. Words like edema, distress, memory, hypertension, coronary, urinary and hyperlipidemia as the potential risk factor of cognitive dysfunction was discussed [44,45,46,47]. Carcinoma, melanoma, cancer, and squamous in the last row are the terms related to cancer [39,40,41,42].

Quantitative analysis

We quantified how the topic terms learned by the topic models are: 1) changed in CI patients when they approach physician-diagnosed CI, comparing year by year for the past 5 years (experiment 1), and 2) distinct between CI and CU patients for the entire past 5 years (experiment 2). We utilized aggregated term frequency in the topic terms over time.

For the first approach (experiment 1), the differences of topic term frequencies between two consecutive years prior to CI diagnosis were computed (starting from 1 year prior to the CI diagnosis), repeated for each year, for the whole 5 year period. We used 400 topic terms for each year. This may allow us to identify potential topic terms associated with CI development because we may observe more frequent topic terms that are relevant to CI when it approaches the CI diagnosis date. For the second approach (experiment 2), we also used the same approach of aggregated term-frequency differences but for the entire 5-year period. In this way, the common topic terms between CI and CU patients might be sorted out and the remaining terms are likely the ones associated with CI. The reason we used the entire 5 years was that we have not observed any significance comparing year by year.

Figure 4 shows the high-level concept of our approach using aggregated term differences. The result of these approaches is visualized in Figs. 5, 6, 7, 8, 9 and 10. The larger words denote that they appear more frequently in the result of topic modeling on clinical notes compared to the previous year (experiment 1), or in the whole 5 years (experiment 2) (the corresponding individual raw data in Figs. 5, 6, 7, 8, 9 and 10 are located in Tables in Appendix). The results were compared with the recent publication to verify whether this approach generates meaningful outcomes relevant to CI.

Fig. 4
figure 4

Aggregated term frequencies. The first table shows the frequency one year before CI development, middle table is the frequency two year before CI development. Last table is the result which terms repeated most

Fig. 5
figure 5

Topic terms for CI patients - TKM (Experiment 1)

Fig. 6
figure 6

Topic terms for CI patients - KATE (Experiment 1)

Fig. 7
figure 7

Topic terms for CI patients - LDA (Experiment 1)

Fig. 8
figure 8

Topic terms in the TKM model (Experiment 2)

Fig. 9
figure 9

Topic terms in the KATE model (Experiment 2)

Fig. 10
figure 10

Topic terms in the LDA model (Experiment 2)

A disease, “lymphoma” was seen in multiple results (Figs. 5a, 6b, c, 7a, b, c, 8b, 9b, c and 10a, c), which appeared in Hodgkin lymphoma patients complaining about cognitive deterioration and fatigue [48]. A researcher found that cognitive decline was more severe and frequent in Hodgkin lymphoma patients compared to the healthy population [48]. Based on recent study patients with “nocturnal hypoxia” had poor memory retention compared with healthy individuals [49]. Indeed, “oximetry” (Fig. 5a) is a device able to measure the oxygen saturated in the blood in hypoximia patients.

In another study [50], a researcher demonstrated that “global cerebral edema” is a vital risk factor for cognitive dysfunction which we see more frequently in Figs. 5a, 6a, and 10b,c. Researchers studied the association between cancer and cognitive decline in older ages [39,40,41,42]. They concluded that cancer therapy could negatively impact cognition in some patients. Regarding to this, the word “metastasi,” “squamous,” “chemotherapy,” “oxaliplatin,” and “carcinoma” can be seen in Figs. 5c, 7b, c, 8a, b, 9b, and 10a, c. It has been explored that “tinnitus patients” are more at risk of the cognitive deficit as shown in Fig. 5b, c [51]. The word “bevacizumab” in Fig. 8a is a cancer medicine that interferes with the growth of a cancer cell in the body. Indeed, it is used to treat certain types of brain cancer or kidney cancer. The relation between urinary disease and CI has been investigated in several studies (Figs. 6c and 8b) [46, 47]. The words like “depression,” “confusion,” “memory,” and “pressure,” which has been already known as the sign of CI can be seen in the Figs. 6b, 7a, b, c, 9b, c, and 10b, c.

A couple of the studies explored the relationship between CI in late life and hyperlipidemia, hypertension, and coronary (Figs. 6a, 9a, 8c, and 10c). Heavy snoring and sleep apnea in Figs. 6a, b, c and 8a have been investigated largely by researchers which shows a strong link to earlier cognitive decline [37]. An apnea/hypopnea index is an index, which is usually used to indicate the severity of sleep apnea in patients, is another extracted topic repeated 8 times more in the CI population compared with CU. CPAP is used to treat sleep-related breathing disorders including sleep apnea (Fig. 8c).

Diabetes diseases have been identified as a potential risk of cognitive dysfunction [38] and regarding that topic diabetes, glucose, and sugar [44, 45, 52] can be seen at Figs. 6c, 9c, and 10a. In [53], researchers showed that memory impairment has a particular association with the presence of left ventricular hypertrophy (Figs. 9b and 10b). Atrial Fibrillation has been studied at [54] as a risk factor of cognitive decline (Figs. 6c, 8c, and 9c). We can find the relation between “osteomyel” patients and CI at [55] as illustrated in Fig. 8b.

In [56] researcher explored that after ischemia cognitive function is disrupted (Fig. 8c). Figures 8b and 10b, c indicate the word “lung.” Some studies including researchers at [57] discussed lung diseases as a determinant of cognitive decline.

Apart from the topics and words discussed here, there are some words whose frequency was high in the years close to CI diagnosis, so they are bold and large. Some of them, for example, caregiver, care, exercise, and neuropathy may be indirectly relevant to CI. However, there are common words like boilerplate such as problem, pain, sudden, disease, status, which can appear in all diseases and need to be filtered out.

Discussion

It is important to identify early signs of CI and thus clinicians plan accordingly and perform appropriate actions, relieving potential cost and burden. In this study, we examined basic EHR corpus statistics relevant to CI patients, and analyzed temporal trends of patient ADL over time and topics in clinical notes between CI and CU patient groups in order to characterize and better understand elderly patient’s medical conditions before they develop CI.

The consultation was the most significant event type, and the neurology was the most dominant practice setting first to diagnose CI by physicians. The consistently higher number of clinical notes for CI patients than CU patients presumably concludes that CI patients likely visit hospitals or clinics more than CU patients. Temporal trends of individual ADL and the groups of ADL (i.e., b-ADL and i-ADL) have been examined over time back in 5 years before the first physician-diagnosed CI and the latest visit for CU patients, respectively. It was observed that the trajectories of ADL deterioration became steeper in CI patients than CU patients approximately 1 to 1.5 year(s) before the actual physician diagnosis of CI. More notably, the deterioration of i-ADL was worse than that of b-ADL in CI patients during this period, which was not in the case in CU patients. Considering a significant delay in CI diagnosis and a missing opportunity for appropriate plans in the current practice [4, 5], this observation may be beneficial to promote early detection of CI. The trajectories of bathing (b-ADL) and responsibility for own medication (i-ADL) deteriorated much more rapidly in CI patients than CU patients over time. These measures might also be a potential surrogate symptom to facilitate early CI diagnosis.

The result of this study suggests that using topic modeling can benefit to discover meaningful and hidden topics and terms of the clinical notes. The result was promising as we discussed in the qualitative and quantitative analysis. We observed that the words in the topic were mostly correlated and captured the underlying semantics. The model was able to extract the words relevant to CI; the words like hypertension, depress, and memory which are a potential indication associated with CI. We were also able to come up with other potential factors that may be relevant to CI according to the recent publications.

Overall, the recent models TKM and KATE were better at capturing the semantically meaningful representation of the data compared to LDA. Further, KATE model generated more words related to CI which falls in memory, depression, hypertension, dizziness, and confusion category than TKM model. We validated the results of the topic modeling based on aggregated term frequencies. The results were visualized to show the hidden potential topics that may contribute to developing CI. These results were validated by recent publications and showed promising outcomes. However, some common topic words, not relevant to CI but may appear in any diseases, were also captured. A further post-process would be required to filter out them.

Generally, CI is diagnosed by health professionals through asking questions to patients to assess memory, concentration, and understanding. However, it is not routinely performed in many healthcare institutions, causing a delay in timely CI diagnosis. Considering this fact, our study of the use of EHR free text to analyze early signals of CI would be a potential alternative to automate or support CI assessment and thus to facilitate a routine practice to detect CI in advance.

The limitations of this study include the use of physician-diagnosed CI, which does not differentiate the severity of CI, instead of full assessment or test due to its unavailability. However, our study is still useful since the focus of this study is to explore the use of EHR documentation to promote early detection of CI, considering the significant delay in CI diagnosis by clinicians in the current health care practice. Another limitation would be a potential imbalanced distribution of clinical notes for certain illnesses (e.g., cancer patients are seen more than others and have more clinical notes). This may affect the result of topic modeling; however, we examined a broad range of topics and demonstrated good potential applicability.

Conclusion

There exist notable differences in temporal trends of b-ADL and i-ADL between CI and CU patients, approximately 1 to 1.5 year(s) earlier than actual physician-diagnosis CI—i.e., the steeper slope of overall ADL deterioration and worse i-ADL than b-ADL in CI patients during this period. The trajectories of certain individual ADL (bathing and responsibility of own medication) were closely associated with the CI development. The topics and terms over time obtained by topic modeling methods from clinical free text have the potential to show how CI patient’s conditions evolve and reveal overlooked conditions when they close to CI diagnosis. These observations may promote early detection of CI and thus expedite appropriate care of underlying diseases and comorbid conditions. In the future, we plan to use neuroimaging and assessment data to identify the more granular classification of cognitive function and develop a prediction model leveraging our observations to detect patients in high risk of different stages of CI and identify associated longitudinal risk factors.