Skip to main content

Advertisement

Log in

Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem

  • PSYCHIATRIC EPIDEMIOLOGY
  • Published:
European Journal of Epidemiology Aims and scope Submit manuscript

Abstract

We developed algorithms to identify pregnant women with suicidal behavior using information extracted from clinical notes by natural language processing (NLP) in electronic medical records. Using both codified data and NLP applied to unstructured clinical notes, we first screened pregnant women in Partners HealthCare for suicidal behavior. Psychiatrists manually reviewed clinical charts to identify relevant features for suicidal behavior and to obtain gold-standard labels. Using the adaptive elastic net, we developed algorithms to classify suicidal behavior. We then validated algorithms in an independent validation dataset. From 275,843 women with codes related to pregnancy or delivery, 9331 women screened positive for suicidal behavior by either codified data (N = 196) or NLP (N = 9,145). Using expert-curated features, our algorithm achieved an area under the curve of 0.83. By setting a positive predictive value comparable to that of diagnostic codes related to suicidal behavior (0.71), we obtained a sensitivity of 0.34, specificity of 0.96, and negative predictive value of 0.83. The algorithm identified 1423 pregnant women with suicidal behavior among 9331 women screened positive. Mining unstructured clinical notes using NLP resulted in a 11-fold increase in the number of pregnant women identified with suicidal behavior, as compared to solely reliance on diagnostic codes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Oates M. Suicide: the leading cause of maternal death. Br J Psychiatry. 2003;183:279–81.

    Article  PubMed  Google Scholar 

  2. Oates M. Perinatal psychiatric disorders: a leading cause of maternal morbidity and mortality. Br Med Bull. 2003;67:219–29.

    Article  PubMed  Google Scholar 

  3. Lindahl V, Pearson JL, Colpe L. Prevalence of suicidality during pregnancy and the postpartum. Arch Womens Ment Health. 2005;8:77–87.

    Article  CAS  PubMed  Google Scholar 

  4. Zhong Q-Y, Gelaye B, Miller M, Fricchione GL, Cai T, Johnson PA, et al. Suicidal behavior-related hospitalizations among pregnant women in the USA, 2006–2012. Arch Womens Ment Health. 2016;19:463–72.

    Article  PubMed  Google Scholar 

  5. Thomas KH, Davies N, Metcalfe C, Windmeijer F, Martin RM, Gunnell D. Validation of suicide and self-harm records in the clinical practice research datalink. Br J Clin Pharmacol. 2013;76:145–57.

    Article  PubMed  Google Scholar 

  6. Lu CY, Stewart C, Ahmed AT, Ahmedani BK, Coleman K, Copeland LA, et al. How complete are E-codes in commercial plan claims databases? Pharmacoepidemiol Drug Saf. 2014;23:218–20.

    Article  CAS  PubMed  Google Scholar 

  7. Anderson HD, Pace WD, Brandt E, Nielsen RD, Allen RR, Libby AM, et al. Monitoring suicidal patients in primary care using electronic health records. J Am Board Fam Med. 2015;28:65–71.

    Article  PubMed  Google Scholar 

  8. Rhodes AE, Links PS, Streiner DL, Dawe I, Cass D, Janes S. Do hospital E-codes consistently capture suicidal behaviour? Chronic Dis Can. 2002;23:139–45.

    PubMed  Google Scholar 

  9. Walkup JT, Townsend L, Crystal S, Olfson M. A systematic review of validated methods for identifying suicide or suicidal ideation using administrative or claims data. Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):174–82.

    Article  PubMed  Google Scholar 

  10. Zhong Q-Y, Karlson EW, Gelaye B, Finan S, Avillach P, Smoller JW, et al. Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing. BMC Med Inform Decis Mak. 2018;18:30.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306:848–55.

    CAS  PubMed  Google Scholar 

  12. Haerian K, Salmasian H, Friedman C. Methods for identifying suicide or suicidal ideation in EHRs. In: AMIA Annual Symposium Proceeding 2012, pp. 1244–53 (2012).

  13. Zhong Q-Y, Gelaye B, Smoller JW, Avillach P, Cai T, Williams MA. Adverse obstetric outcomes during delivery hospitalizations complicated by suicidal behavior among US pregnant women. PLoS ONE. 2018;13:e0192943.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Wang SV, Rogers JR, Jin Y, Bates DW, Fischer MA. Use of electronic healthcare records to identify complex patients with atrial fibrillation for targeted intervention. J Am Med Inform Assoc. 2017;24:339–44.

    PubMed  Google Scholar 

  15. Barak-Corren Y, Castro VM, Javitt S, Hoffnagle AG, Dai Y, Perlis RH, et al. Predicting Suicidal Behavior From Longitudinal Electronic Health Records. Am J Psychiatry. 2017;174:154–62.

    Article  PubMed  Google Scholar 

  16. World Health Organization. International statistical classification of diseases and related health problems. Geneva: World Health Organization; 2004.

    Google Scholar 

  17. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–13.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.

    Article  Google Scholar 

  20. McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22:276–82.

    Article  Google Scholar 

  21. Posner K, Oquendo MA, Gould M, Stanley B, Davies M. Columbia classification algorithm of suicide assessment (C-CASA): classification of suicidal events in the FDA’s pediatric suicidal risk analysis of antidepressants. Am J Psychiatry. 2007;164:1035–43.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, Raychaudhuri S, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 2010;62:1120–7.

    Article  Google Scholar 

  23. Yu S, Chakrabortty A, Liao KP, Cai T, Ananthakrishnan AN, Gainer VS, et al. Surrogate-assisted feature extraction for high-throughput phenotyping. J Am Med Inform Assoc. 2017;24:e143–9.

    PubMed  Google Scholar 

  24. Ananthakrishnan AN, Cai T, Savova G, Cheng S-C, Chen P, Perez RG, et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013;19:1411–20.

    Article  PubMed  Google Scholar 

  25. Xia Z, Secor E, Chibnik LB, Bove RM, Cheng S, Chitnis T, et al. Modeling disease severity in multiple sclerosis using electronic health records. PLoS ONE. 2013;8:e78927.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Castro V, Shen Y, Yu S, Finan S, Pau CT, Gainer V, et al. Identification of subjects with polycystic ovary syndrome using electronic health records. Reprod Biol Endocrinol. 2015;13:116.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Ann Stat. 2009;37:1733–51.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Berlin: Springer; 2013.

    Google Scholar 

  29. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing (2014).

  30. Cook BL, Progovac AM, Chen P, Mullin B, Hou S, Baca-Garcia E. Novel use of natural language processing (NLP) to predict suicidal ideation and psychiatric symptoms in a text-based mental health intervention in madrid. Comput Math Methods Med. 2016;2016:8708434.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Perlis RH, Iosifescu DV, Castro VM, Murphy SN, Gainer VS, Minnier J, et al. Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model. Psychol Med. 2012;42:41–50.

    Article  CAS  PubMed  Google Scholar 

  32. Castro VM, Dligach D, Finan S, Yu S, Can A, Abd-El-Barr M, et al. Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology. 2017;88:164–8.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350:h1885.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Liao KP, Ananthakrishnan AN, Kumar V, Xia Z, Cagan A, Gainer VS, et al. Methods to develop an electronic medical record phenotype algorithm to compare the risk of coronary artery disease across 3 chronic disease cohorts. PLoS ONE. 2015;10:e0136651.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. O’Connor RC, Nock MK. The psychology of suicidal behaviour. Lancet Psychiatry. 2014;1:73–85.

    Article  PubMed  Google Scholar 

  36. Christensen H, Cuijpers P, Reynolds CF 3rd. Changing the direction of suicide prevention research: a necessity for true population impact. JAMA Psychiatry. 2016;73:435–6.

    Article  PubMed  Google Scholar 

  37. McCoy TH Jr, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry. 2016;73:1064–71.

    Article  PubMed  Google Scholar 

  38. Gandhi SG, Gilbert WM, McElvy SS, El Kady D, Danielson B, Xing G, et al. Maternal and neonatal outcomes after attempted suicide. Obstet Gynecol. 2006;107:984–90.

    Article  PubMed  Google Scholar 

  39. Andover MS, Morris BW, Wren A, Bruzzese ME. The co-occurrence of non-suicidal self-injury and attempted suicide among adolescents: distinguishing risk factors and psychosocial correlates. Child Adolesc Psychiatry Ment Health. 2012;6:11.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Nock MK, Joiner TE Jr, Gordon KH, Lloyd-Richardson E, Prinstein MJ. Non-suicidal self-injury among adolescents: diagnostic correlates and relation to suicide attempts. Psychiatry Res. 2006;144:65–72.

    Article  PubMed  Google Scholar 

  41. Turecki G, Brent DA. Suicide and suicidal behaviour. Lancet. 2016;387:1227–39.

    Article  PubMed  Google Scholar 

  42. Ribeiro JD, Franklin JC, Fox KR, Bentley KH, Kleiman EM, Chang BP, et al. Letter to the editor: suicide as a complex classification problem: machine learning and related techniques can advance suicide prediction: a reply to Roaldset (2016). Psychol Med. 2016;46:2009–10.

    Article  CAS  PubMed  Google Scholar 

  43. Ressom HW, Varghese RS, Zhang Z, Xuan J, Clarke R. Classification algorithms for phenotype prediction in genomics and proteomics. Front Biosci. 2008;13:691–708.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull. 2017;143:187–232.

    Article  PubMed  Google Scholar 

  45. Nock MK. Suicide: global perspectives from the WHO World Mental Health Surveys. Cambridge: Cambridge University Press; 2012.

    Google Scholar 

  46. Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci. 2017;5:457–69.

    Article  Google Scholar 

  47. Kemball RS, Gasgarth R, Johnson B, Patil M, Houry D. Unrecognized suicidal ideation in ED patients: are we missing an opportunity? Am J Emerg Med. 2008;26:701–5.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Committee on Obstetric Practice. The American College of Obstetricians and Gynecologists Committee Opinion no. 630. Screening for perinatal depression. Obstet Gynecol. 2015;125:1268–71.

    Article  Google Scholar 

  49. Stewart C, Crawford PM, Simon GE. Changes in coding of suicide attempts or self-harm with transition From ICD-9 to ICD-10. Psychiatr Serv. 2017;68:215.

    Article  PubMed  Google Scholar 

  50. Oquendo MA, Baca-Garcia E. Suicidal behavior disorder as a diagnostic entity in the DSM-5 classification system: advantages outweigh limitations. World Psychiatry. 2014;13:128–30.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Silverman MM. The language of suicidology. Suicide Life Threat Behav. 2006;36:519–32.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This research was supported by awards from the National Institutes of Health (the National Institute on Minority Health and Health Disparities: T37-MD001449; and the National Center for Research Resources (NCRR), the National Center for Advancing Translational Sciences (NCATS): 8UL1TR 000170-09). The NIH had no further role in study design; in the collection, analysis and interpretation of data; in the writing of the manuscript; and in the decision to submit the paper for publication. The authors thank the Enterprise Research Infrastructure & Services at Partners HealthCare for the provision of computing resources. The authors also thank Laurie Bogosian and Stacey Duey of the Research Patient Data Repository at Partners HealthCare for the in-depth support. This research was done as partial fulfillment of the requirements of a Doctor of Science degree by one of the authors (QYZ) in the Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. The authors thank Dr. Michael G. Napolitano for valuable discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiu-Yue Zhong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 29 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, QY., Mittal, L.P., Nathan, M.D. et al. Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem. Eur J Epidemiol 34, 153–162 (2019). https://doi.org/10.1007/s10654-018-0470-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10654-018-0470-0

Keywords

Navigation