Abstract
Dementia and mild cognitive impairment can be underrecognized in primary care practice and research. Free-text fields in electronic medical records (EMRs) are a rich source of information which might support increased detection and enable a better understanding of populations at risk of dementia. We used natural language processing (NLP) to identify dementia-related features in EMRs and compared the performance of supervised machine learning models to classify patients with dementia. We assembled a cohort of primary care patients aged 66 + years in Ontario, Canada, from EMR notes collected until December 2016: 526 with dementia and 44,148 without dementia. We identified dementia-related features by applying published lists, clinician input, and NLP with word embeddings to free-text progress and consult notes and organized features into thematic groups. Using machine learning models, we compared the performance of features to detect dementia, overall and during time periods relative to dementia case ascertainment in health administrative databases. Over 900 dementia-related features were identified and grouped into eight themes (including symptoms, social, function, cognition). Using notes from all time periods, LASSO had the best performance (F1 score: 77.2%, sensitivity: 71.5%, specificity: 99.8%). Model performance was poor when notes written before case ascertainment were included (F1 score: 14.4%, sensitivity: 8.3%, specificity 99.9%) but improved as later notes were added. While similar models may eventually improve recognition of cognitive issues and dementia in primary care EMRs, our findings suggest that further research is needed to identify which additional EMR components might be useful to promote early detection of dementia.
This is a preview of subscription content, access via your institution.

Data Availability
The dataset from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at www.ices.on.ca/DAS (email: das@ices.on.ca).
Code Availability
The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.
References
(2020) 2020 Alzheimer's disease facts and figures. Alzheimers Dement. https://doi.org/10.1002/alz.12068
Nichols E, Szoeke CEI, Vollset SE, Abbasi N, Abd-Allah F, Abdela J, . . . Murray CJL (2019) Global, regional, and national burden of Alzheimer's disease and other dementias, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol;18(1):88–106. https://doi.org/10.1016/S1474-4422(18)30403-4
Prince M BR, Ferri C. World Alzheimer Report 2011: the benefits of early diagnosis and intervention. London: Alzheimer’s Disease International 2011. https://www.alzint.org/u/WorldAlzheimerReport2011.pdf. Accessed February 8, 2021.
Black CM, Fillit H, Xie L, Hu X, Kariburyo MF, Ambegaonkar BM, . . . Khandker RK (2018) Economic burden, mortality, and institutionalization in patients newly diagnosed with Alzheimer's disease. J Alzheimers Dis;61(1):185–93. https://doi.org/10.3233/JAD-170518
Rasmussen J, Langerman H (2019) Alzheimer’s disease - why we need early diagnosis. Degener Neurol Neuromuscul Dis 9:123–130. https://doi.org/10.2147/DNND.S228939
Holzer S, Warner JP, Iliffe S (2013) Diagnosis and management of the patient with suspected dementia in primary care. Drugs Aging 30(9):667–676. https://doi.org/10.1007/s40266-013-0098-4
Fox C, Maidment I, Moniz-Cook E, White J, Thyrian JR, Young J, . . . Chew-Graham CA (2013) Optimising primary care for people with dementia. Ment Health Fam Med;10(3):143–51.
Valcour VG, Masaki KH, Curb JD, Blanchette PL (2000) The detection of dementia in the primary care setting. Arch Intern Med 160(19):2964–2968. https://doi.org/10.1001/archinte.160.19.2964
Mitchell AJ, Meader N, Pentzek M (2011) Clinical recognition of dementia and cognitive impairment in primary care: a meta-analysis of physician accuracy. Acta Psychiatr Scand 124(3):165–183. https://doi.org/10.1111/j.1600-0447.2011.01730.x
Boustani M, Callahan CM, Unverzagt FW, Austrom MG, Perkins AJ, Fultz BA, . . . Hendrie HC (2005) Implementing a screening and diagnosis program for dementia in primary care. J Gen Intern Med;20(7):572–7. https://doi.org/10.1111/j.1525-1497.2005.0126.x
Connolly A, Gaehl E, Martin H, Morris J, Purandare N (2011) Underdiagnosis of dementia in primary care: variations in the observed prevalence and comparisons to the expected prevalence. Aging Ment Health 15(8):978–984. https://doi.org/10.1080/13607863.2011.596805
Bradford A, Kunik ME, Schulz P, Williams SP, Singh H (2009) Missed and delayed diagnosis of dementia in primary care: prevalence and contributing factors. Alzheimer Dis Assoc Disord 23(4):306–314. https://doi.org/10.1097/WAD.0b013e3181a6bebc
Parmar J, Dobbs B, McKay R, Kirwan C, Cooper T, Marin A, Gupta N (2014) Diagnosis and management of dementia in primary care: exploratory study. Can Fam Physician 60(5):457–465
Goerdten J, Cukic I, Danso SO, Carriere I, Muniz-Terrera G (2019) Statistical methods for dementia risk prediction and recommendations for future work: a systematic review. Alzheimers Dement (N Y) 5:563–569. https://doi.org/10.1016/j.trci.2019.08.001
Tang EY, Harrison SL, Errington L, Gordon MF, Visser PJ, Novak G, . . . Stephan BC (2015) Current developments in dementia risk prediction modelling: an updated systematic review. PLoS One;10(9):e0136181. https://doi.org/10.1371/journal.pone.0136181
Pellegrini E, Ballerini L, Hernandez M, Chappell FM, Gonzalez-Castro V, Anblagan D, . . . Wardlaw JM (2018) Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: a systematic review. Alzheimers Dement (Amst);10:519–35. https://doi.org/10.1016/j.dadm.2018.07.004
Stephan BC, Kurth T, Matthews FE, Brayne C, Dufouil C (2010) Dementia risk prediction in the population: are screening models accurate? Nat Rev Neurol 6(6):318–326. https://doi.org/10.1038/nrneurol.2010.54
Walters K, Hardoon S, Petersen I, Iliffe S, Omar RZ, Nazareth I, Rait G (2016) Predicting dementia risk in primary care: development and validation of the Dementia Risk Score using routinely collected data. BMC Med 14:6. https://doi.org/10.1186/s12916-016-0549-y
Bullard J, Alm CO, Liu X, Yu Q, Proano RA. Towards early dementia detection: fusing linguistic and non-linguistic clinical data. Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology 2016. https://aclanthology.org/W16-0302
Chen T, Dredze M, Weiner JP, Hernandez L, Kimura J, Kharrazi H (2019) Extraction of geriatric syndromes from electronic health record clinical notes: assessment of statistical natural language processing methods. JMIR Med Inform 7(1):e13039. https://doi.org/10.2196/13039
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA (2016) Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 23(5):1007–1015. https://doi.org/10.1093/jamia/ocv180
Anzaldi LJ, Davison A, Boyd CM, Leff B, Kharrazi H (2017) Comparing clinician descriptions of frailty and geriatric syndromes using electronic health records: a retrospective cohort study. BMC Geriatr 17(1):248. https://doi.org/10.1186/s12877-017-0645-7
Aponte-Hao S, Wong ST, Thandi M, Ronksley P, McBrien K, Lee J, . . . Williamson T (2021) Machine learning for identification of frailty in Canadian primary care practices. Int J Pop D Sci;6(1).
Chase HS, Mitrani LR, Lu GG, Fulgieri DJ (2017) Early recognition of multiple sclerosis using natural language processing of the electronic health record. BMC Med Inform Decis Mak 17(1):24. https://doi.org/10.1186/s12911-017-0418-4
Jackson RG, Patel R, Jayatilleke N, Kolliakou A, Ball M, Gorrell G, . . . Stewart R (2017) Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project. BMJ Open;7(1):e012012. https://doi.org/10.1136/bmjopen-2016-012012
Topaz M, Adams V, Wilson P, Woo K, Ryvicker M (2020) Free-text documentation of dementia symptoms in home healthcare: a natural language processing study. Gerontol Geriatr Med 6:2333721420959861. https://doi.org/10.1177/2333721420959861
Hane CA, Nori VS, Crown WH, Sanghavi DM, Bleicher P (2020) Predicting onset of dementia using clinical notes and machine learning: case-control study. JMIR Med Inform 8(6):e17819. https://doi.org/10.2196/17819
McCoy TH Jr, Han L, Pellegrini AM, Tanzi RE, Berretta S, Perlis RH (2020) Stratifying risk for dementia onset using large-scale electronic health record data: a retrospective cohort study. Alzheimers Dement 16(3):531–540. https://doi.org/10.1016/j.jalz.2019.09.084
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V (2019) Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med Inform 7(2):e12239. https://doi.org/10.2196/12239
Tu K, Mitiku TF, Ivers NM, Guo H, Lu H, Jaakkimainen L, . . . Tu JV (2014) Evaluation of electronic medical record administrative data linked database (EMRALD). Am J Manag Care;20(1):e15–21.
Tu K, Widdifield J, Young J, Oud W, Ivers NM, Butt DA, . . . Jaakkimainen L (2015) Are family physicians comprehensively using electronic medical records such that the data can be used for secondary purposes? A Canadian perspective. BMC Med Inform Decis Mak;15:67. https://doi.org/10.1186/s12911-015-0195-x
Tu K, Wang M, Young J, Green D, Ivers NM, Butt D, . . . Kapral MK (2013) Validity of administrative data for identifying patients who have had a stroke or transient ischemic attack using EMRALD as a reference standard. Can J Cardiol;29(11):1388–94. https://doi.org/10.1016/j.cjca.2013.07.676
Tu K, Mitiku T, Lee DS, Guo H, Tu JV (2010) Validation of physician billing and hospitalization data to identify patients with ischemic heart disease using data from the Electronic Medical Record Administrative data Linked Database (EMRALD). Can J Cardiol 26(7):e225–e228. https://doi.org/10.1016/s0828-282x(10)70412-8
Jaakkimainen RL, Bronskill SE, Tierney MC, Herrmann N, Green D, Young J, . . . Tu K (2016) Identification of physician-diagnosed Alzheimer's disease and related dementias in population-based administrative data: a validation study using family physicians' electronic medical records. J Alzheimers Dis;54(1):337–49. https://doi.org/10.3233/JAD-160105
Statistics Canada. Postal CodeOM Conversion File Plus (PCCF+) Version 6C, Reference Guide: Ottawa, Minister of Industry, 2016. https://www150.statcan.gc.ca/n1/en/catalogue/82F0086X.
Mondor L, Maxwell CJ, Hogan DB, Bronskill SE, Gruneir A, Lane NE, Wodchis WP (2017) Multimorbidity and healthcare utilization among home care clients with dementia in Ontario, Canada: a retrospective analysis of a population-based cohort. PLoS Med 14(3):e1002249. https://doi.org/10.1371/journal.pmed.1002249
Mondor L, Maxwell CJ, Bronskill SE, Gruneir A, Wodchis WP (2016) The relative impact of chronic conditions and multimorbidity on health-related quality of life in Ontario long-stay home care clients. Qual Life Res 25(10):2619–2632. https://doi.org/10.1007/s11136-016-1281-y
Halpern R, Seare J, Tong J, Hartry A, Olaoye A, Aigbogun MS (2019) Using electronic health records to estimate the prevalence of agitation in Alzheimer disease/dementia. Int J Geriatr Psychiatry 34(3):420–431. https://doi.org/10.1002/gps.5030
Wang L, Lakin J, Riley C, Korach Z, Frain LN, Zhou L (2018) Disease trajectories and end-of-life care for dementias: latent topic modeling and trend analysis using clinical notes. AMIA Annu Symp Proc 2018:1056–1065
Gilmore-Bykovskyi AL, Block LM, Walljasper L, Hill N, Gleason C, Shah MN (2018) Unstructured clinical documentation reflecting cognitive and behavioral dysfunction: toward an EHR-based phenotype for cognitive impairment. J Am Med Inform Assoc 25(9):1206–1212. https://doi.org/10.1093/jamia/ocy070
Wang B, Wang A, Chen F, Wang Y, Kuo C-CJ (2019) Evaluating word embedding models: methods and experimental results. APSIPA Transactions on Signal and Information Processing;8(E19). https://doi.org/10.1017/ATSIP.2019.12
Khattak FK, Jeblee S, Pou-Prom C, Abdalla M, Meaney C, Rudzicz F (2019) A survey of word embeddings for clinical text. J Biomed Inform 4:100057. https://doi.org/10.1016/j.yjbinx.2019.100057
Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L, Shen F, . . . Liu H (2018) A comparison of word embeddings for the biomedical natural language processing. Journal of biomedical informatics;87:12–20.
Austin PC, Steyerberg EW (2019) The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models. Stat Med 38(21):4051–4065
Tonelli M, Wiebe N, Fortin M, Guthrie B, Hemmelgarn BR, James MT, . . . For the Alberta Kidney Disease N (2015) Methods for identifying 30 chronic conditions: application to administrative data. BMC Medical Informatics and Decision Making;15(1):31. https://doi.org/10.1186/s12911-015-0155-5
Shao Y, Zeng QT, Chen KK, Shutes-David A, Thielke SM, Tsuang DW (2019) Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records. BMC Med Inform Decis Mak 19(1):1–11
Acknowledgements
This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long-Term Care (MLTC). This document used data adapted from the Statistics Canada Postal CodeOM Conversion File, which is based on data licensed from Canada Post Corporation, and/or data adapted from the Ontario Ministy of Health Postal Code Conversion File, which contains data copied under license from ©Canada Post Corporation and Statistics Canada. Parts of this material are basesd on data and information compiled and provided by CIHI and the Ontario Ministry of Health. We thank IQVIA Solutions Canada Inc. for the use of their Drug Information File.
Funding
This study was supported by the Ontario Neurodegenerative Disease Research Initiative (ONDRI) through the Ontario Brain Institute, an independent non-profit corporation, funded partially by the Ontario government. MA is funded by the Canadian Institutes of Health Research Vanier Scholarship Program. DAH is funded by an Alzheimer Society of Canada Research Program Doctoral Award.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Disclaimer
The analyses, conclusions, opinions and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Maclagan, L.C., Abdalla, M., Harris, D.A. et al. Can Patients with Dementia Be Identified in Primary Care Electronic Medical Records Using Natural Language Processing?. J Healthc Inform Res 7, 42–58 (2023). https://doi.org/10.1007/s41666-023-00125-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41666-023-00125-6
Keywords
- Electronic health records
- Dementia
- Primary health care
- Artificial intelligence
- Natural language processing