Abstract
We developed algorithms to identify pregnant women with suicidal behavior using information extracted from clinical notes by natural language processing (NLP) in electronic medical records. Using both codified data and NLP applied to unstructured clinical notes, we first screened pregnant women in Partners HealthCare for suicidal behavior. Psychiatrists manually reviewed clinical charts to identify relevant features for suicidal behavior and to obtain gold-standard labels. Using the adaptive elastic net, we developed algorithms to classify suicidal behavior. We then validated algorithms in an independent validation dataset. From 275,843 women with codes related to pregnancy or delivery, 9331 women screened positive for suicidal behavior by either codified data (N = 196) or NLP (N = 9,145). Using expert-curated features, our algorithm achieved an area under the curve of 0.83. By setting a positive predictive value comparable to that of diagnostic codes related to suicidal behavior (0.71), we obtained a sensitivity of 0.34, specificity of 0.96, and negative predictive value of 0.83. The algorithm identified 1423 pregnant women with suicidal behavior among 9331 women screened positive. Mining unstructured clinical notes using NLP resulted in a 11-fold increase in the number of pregnant women identified with suicidal behavior, as compared to solely reliance on diagnostic codes.
Similar content being viewed by others
References
Oates M. Suicide: the leading cause of maternal death. Br J Psychiatry. 2003;183:279–81.
Oates M. Perinatal psychiatric disorders: a leading cause of maternal morbidity and mortality. Br Med Bull. 2003;67:219–29.
Lindahl V, Pearson JL, Colpe L. Prevalence of suicidality during pregnancy and the postpartum. Arch Womens Ment Health. 2005;8:77–87.
Zhong Q-Y, Gelaye B, Miller M, Fricchione GL, Cai T, Johnson PA, et al. Suicidal behavior-related hospitalizations among pregnant women in the USA, 2006–2012. Arch Womens Ment Health. 2016;19:463–72.
Thomas KH, Davies N, Metcalfe C, Windmeijer F, Martin RM, Gunnell D. Validation of suicide and self-harm records in the clinical practice research datalink. Br J Clin Pharmacol. 2013;76:145–57.
Lu CY, Stewart C, Ahmed AT, Ahmedani BK, Coleman K, Copeland LA, et al. How complete are E-codes in commercial plan claims databases? Pharmacoepidemiol Drug Saf. 2014;23:218–20.
Anderson HD, Pace WD, Brandt E, Nielsen RD, Allen RR, Libby AM, et al. Monitoring suicidal patients in primary care using electronic health records. J Am Board Fam Med. 2015;28:65–71.
Rhodes AE, Links PS, Streiner DL, Dawe I, Cass D, Janes S. Do hospital E-codes consistently capture suicidal behaviour? Chronic Dis Can. 2002;23:139–45.
Walkup JT, Townsend L, Crystal S, Olfson M. A systematic review of validated methods for identifying suicide or suicidal ideation using administrative or claims data. Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):174–82.
Zhong Q-Y, Karlson EW, Gelaye B, Finan S, Avillach P, Smoller JW, et al. Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing. BMC Med Inform Decis Mak. 2018;18:30.
Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306:848–55.
Haerian K, Salmasian H, Friedman C. Methods for identifying suicide or suicidal ideation in EHRs. In: AMIA Annual Symposium Proceeding 2012, pp. 1244–53 (2012).
Zhong Q-Y, Gelaye B, Smoller JW, Avillach P, Cai T, Williams MA. Adverse obstetric outcomes during delivery hospitalizations complicated by suicidal behavior among US pregnant women. PLoS ONE. 2018;13:e0192943.
Wang SV, Rogers JR, Jin Y, Bates DW, Fischer MA. Use of electronic healthcare records to identify complex patients with atrial fibrillation for targeted intervention. J Am Med Inform Assoc. 2017;24:339–44.
Barak-Corren Y, Castro VM, Javitt S, Hoffnagle AG, Dai Y, Perlis RH, et al. Predicting Suicidal Behavior From Longitudinal Electronic Health Records. Am J Psychiatry. 2017;174:154–62.
World Health Organization. International statistical classification of diseases and related health problems. Geneva: World Health Organization; 2004.
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–13.
Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267–70.
Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.
McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22:276–82.
Posner K, Oquendo MA, Gould M, Stanley B, Davies M. Columbia classification algorithm of suicide assessment (C-CASA): classification of suicidal events in the FDA’s pediatric suicidal risk analysis of antidepressants. Am J Psychiatry. 2007;164:1035–43.
Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, Raychaudhuri S, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 2010;62:1120–7.
Yu S, Chakrabortty A, Liao KP, Cai T, Ananthakrishnan AN, Gainer VS, et al. Surrogate-assisted feature extraction for high-throughput phenotyping. J Am Med Inform Assoc. 2017;24:e143–9.
Ananthakrishnan AN, Cai T, Savova G, Cheng S-C, Chen P, Perez RG, et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013;19:1411–20.
Xia Z, Secor E, Chibnik LB, Bove RM, Cheng S, Chitnis T, et al. Modeling disease severity in multiple sclerosis using electronic health records. PLoS ONE. 2013;8:e78927.
Castro V, Shen Y, Yu S, Finan S, Pau CT, Gainer V, et al. Identification of subjects with polycystic ovary syndrome using electronic health records. Reprod Biol Endocrinol. 2015;13:116.
Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Ann Stat. 2009;37:1733–51.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Berlin: Springer; 2013.
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing (2014).
Cook BL, Progovac AM, Chen P, Mullin B, Hou S, Baca-Garcia E. Novel use of natural language processing (NLP) to predict suicidal ideation and psychiatric symptoms in a text-based mental health intervention in madrid. Comput Math Methods Med. 2016;2016:8708434.
Perlis RH, Iosifescu DV, Castro VM, Murphy SN, Gainer VS, Minnier J, et al. Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model. Psychol Med. 2012;42:41–50.
Castro VM, Dligach D, Finan S, Yu S, Can A, Abd-El-Barr M, et al. Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology. 2017;88:164–8.
Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350:h1885.
Liao KP, Ananthakrishnan AN, Kumar V, Xia Z, Cagan A, Gainer VS, et al. Methods to develop an electronic medical record phenotype algorithm to compare the risk of coronary artery disease across 3 chronic disease cohorts. PLoS ONE. 2015;10:e0136651.
O’Connor RC, Nock MK. The psychology of suicidal behaviour. Lancet Psychiatry. 2014;1:73–85.
Christensen H, Cuijpers P, Reynolds CF 3rd. Changing the direction of suicide prevention research: a necessity for true population impact. JAMA Psychiatry. 2016;73:435–6.
McCoy TH Jr, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry. 2016;73:1064–71.
Gandhi SG, Gilbert WM, McElvy SS, El Kady D, Danielson B, Xing G, et al. Maternal and neonatal outcomes after attempted suicide. Obstet Gynecol. 2006;107:984–90.
Andover MS, Morris BW, Wren A, Bruzzese ME. The co-occurrence of non-suicidal self-injury and attempted suicide among adolescents: distinguishing risk factors and psychosocial correlates. Child Adolesc Psychiatry Ment Health. 2012;6:11.
Nock MK, Joiner TE Jr, Gordon KH, Lloyd-Richardson E, Prinstein MJ. Non-suicidal self-injury among adolescents: diagnostic correlates and relation to suicide attempts. Psychiatry Res. 2006;144:65–72.
Turecki G, Brent DA. Suicide and suicidal behaviour. Lancet. 2016;387:1227–39.
Ribeiro JD, Franklin JC, Fox KR, Bentley KH, Kleiman EM, Chang BP, et al. Letter to the editor: suicide as a complex classification problem: machine learning and related techniques can advance suicide prediction: a reply to Roaldset (2016). Psychol Med. 2016;46:2009–10.
Ressom HW, Varghese RS, Zhang Z, Xuan J, Clarke R. Classification algorithms for phenotype prediction in genomics and proteomics. Front Biosci. 2008;13:691–708.
Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull. 2017;143:187–232.
Nock MK. Suicide: global perspectives from the WHO World Mental Health Surveys. Cambridge: Cambridge University Press; 2012.
Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci. 2017;5:457–69.
Kemball RS, Gasgarth R, Johnson B, Patil M, Houry D. Unrecognized suicidal ideation in ED patients: are we missing an opportunity? Am J Emerg Med. 2008;26:701–5.
Committee on Obstetric Practice. The American College of Obstetricians and Gynecologists Committee Opinion no. 630. Screening for perinatal depression. Obstet Gynecol. 2015;125:1268–71.
Stewart C, Crawford PM, Simon GE. Changes in coding of suicide attempts or self-harm with transition From ICD-9 to ICD-10. Psychiatr Serv. 2017;68:215.
Oquendo MA, Baca-Garcia E. Suicidal behavior disorder as a diagnostic entity in the DSM-5 classification system: advantages outweigh limitations. World Psychiatry. 2014;13:128–30.
Silverman MM. The language of suicidology. Suicide Life Threat Behav. 2006;36:519–32.
Acknowledgements
This research was supported by awards from the National Institutes of Health (the National Institute on Minority Health and Health Disparities: T37-MD001449; and the National Center for Research Resources (NCRR), the National Center for Advancing Translational Sciences (NCATS): 8UL1TR 000170-09). The NIH had no further role in study design; in the collection, analysis and interpretation of data; in the writing of the manuscript; and in the decision to submit the paper for publication. The authors thank the Enterprise Research Infrastructure & Services at Partners HealthCare for the provision of computing resources. The authors also thank Laurie Bogosian and Stacey Duey of the Research Patient Data Repository at Partners HealthCare for the in-depth support. This research was done as partial fulfillment of the requirements of a Doctor of Science degree by one of the authors (QYZ) in the Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. The authors thank Dr. Michael G. Napolitano for valuable discussions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhong, QY., Mittal, L.P., Nathan, M.D. et al. Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem. Eur J Epidemiol 34, 153–162 (2019). https://doi.org/10.1007/s10654-018-0470-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-018-0470-0