Generating Positive Psychosis Symptom Keywords from Electronic Health Records

  • Natalia VianiEmail author
  • Rashmi Patel
  • Robert Stewart
  • Sumithra Velupillai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11526)


The development of Natural Language Processing (NLP) solutions for information extraction from electronic health records (EHRs) has grown in recent years, as most clinically relevant information in EHRs is documented only in free text. One of the core tasks for any NLP system is to extract clinically relevant concepts such as symptoms. This information can then be used for more complex problems such as determining symptom onset, which requires temporal information. In the mental health domain, comprehensive vocabularies for specific disorders are scarce, and rarely contain keywords that reflect real-world terminology use. We explore the use of embedding techniques to automatically generate lexical variants of psychosis symptoms into vocabularies, that can be used in complex downstream NLP tasks. We study the impact of the underlying text material on generating useful lexical entries, experimenting with different corpora and with unigram/bigram models. We also propose a method to automatically compute thresholds for choosing the most relevant terms. Our main contribution is a systematic study of unsupervised vocabulary generation using different corpora for an understudied clinical use-case. Resulting lexicons are publicly available.


Natural language processing Electronic health records Embedding models Schizophrenia 


  1. 1.
    Wang, Y., Wang, L., Rastegar-Mojarad, M., et al.: Clinical information extraction applications: a literature review. J. Biomed. Inf. 77, 34–49 (2018)CrossRefGoogle Scholar
  2. 2.
    Kisely, S., Scott, A., Denney, J., Simon, G.: Duration of untreated symptoms in common mental disorders: association with outcomes. Br. J. Psychiatry 189(1), 79–80 (2006)CrossRefGoogle Scholar
  3. 3.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  4. 4.
    Ye, C., Fabbri, D.: Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews. J. Biomed. Inf. 83, 63–72 (2018)CrossRefGoogle Scholar
  5. 5.
    Velupillai, S., Mowery, D.L., Conway, M., et al.: Vocabulary development to support information extraction of substance abuse from psychiatry notes. In: Proceedings of BioNLP 2016, pp. 92–101 (2016)Google Scholar
  6. 6.
    Jackson, R., Patel, R., Velupillai, S., et al.: Knowledge discovery for deep phenotyping serious mental illness from electronic mental health records. F1000Res. 7 (2018).
  7. 7.
    Perera, G., Broadbent, M., Callard, F., et al.: Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open 6(3) (2016).
  8. 8.
    Saeed, M., Villarroel, M., Reisner, A.T., et al.: Multiparameter intelligent monitoring in intensive care II (MIMIC-II): a public-access intensive care unit database. Crit. Care Med. 39(5), 952–960 (2011)CrossRefGoogle Scholar
  9. 9.
    McDonald, R., Brokos, G.I., Androutsopoulos, I.: Deep relevance ranking using enhanced document-query interactions. In: Proceedings EMNLP 2018 (2018)Google Scholar
  10. 10.
    Chiu, B., Crichton, G., Korhonen, A., Pyysalo, S.: How to train good word embeddings for biomedical NLP. In: Proceedings BioNLP 2016, pp. 166–174 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.IoPPNKing’s College LondonLondonUK
  2. 2.South London and Maudsley NHS Foundation TrustLondonUK

Personalised recommendations