Advertisement

Soft Computing

, Volume 23, Issue 21, pp 10969–10975 | Cite as

Word embeddings for negation detection in health records written in Spanish

  • Sara SantisoEmail author
  • Arantza Casillas
  • Alicia Pérez
  • Maite Oronoz
Methodologies and Application
  • 196 Downloads

Abstract

This work focuses on the creation of a system to detect negated medical entities in electronic health records (EHRs) written in Spanish. The importance of this task rests on the influence that the negation can have in the automatic understanding of information given that it inverts the truth value of a clause. We explore a novel continuous characterization as an alternative to previous negation extraction approaches based on discrete characterizations. The aim is to increase the ability of the characterization to generalize over discrete features. We also included other features that could be useful for the negation detection task. In addition, the negation detection is approached as a named entity recognition task where we want to find only the negated entities. EHRs are represented by the corresponding embeddings. In addition, this approach is compared with a traditional discrete characterization based on words. These representations are employed by a supervised classifier such as conditional random fields to infer the predictive model. The approach is assessed on health records from different hospitals, namely IxaMed-GS and IULA. The best performance is achieved by virtue of the embedding-based characterization, leading to an f-measure of 75.3 and 81.6 for the IxaMed-GS and IULA corpus, respectively. With this work, we prove that the use of embedding-based representations can also be useful for the detection of negated medical entities.

Keywords

Negation detection Word embeddings Machine learning Text mining 

Notes

Acknowledgements

This work was partially funded by the Spanish Ministry of Science and Innovation (PROSAMED: TIN2016-77820-C3-1-R and TADEEP: TIN2015-70214-P) and the Basque Government (DETEAMI: Ministry of Health 2014111003, Predoctoral Grant: PRE 2016 1 0128).

Compliance with ethical standard

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Agarwal S, Yu H (2010) Biomedical negation scope detection with conditional random fields. J Am Med Inform Assoc 17(6):696–701CrossRefGoogle Scholar
  2. Artetxe M, Labaka G, Lopez-Gazpio I, Agirre E (2018) Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation. In: Proceedings of the 22nd conference on computational natural language learning, pp 282–291Google Scholar
  3. Asker L, Boström H, Papapetrou P, Persson H (2016) Identifying factors for the effectiveness of treatment of heart failure: a registry study. In: 2016 IEEE 29th international symposium on computer-based medical systems, pp 205–206Google Scholar
  4. Brown PF, deDouza PV, Mercer RL, Della Pietra VJ, Lai JC (1992) Class-based n-gram models of natural language. Computat Linguist 18(4):467–479Google Scholar
  5. Cardellino C (2016) Spanish billion words corpus and embeddings. http://crscardellino.me/SBWCE/
  6. Casillas A, Pérez A, Oronoz M, Gojenola K, Santiso S (2016) Learning to extract adverse drug reaction events from electronic health records in Spanish. Expert Syst Appl 61:235–245CrossRefGoogle Scholar
  7. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310CrossRefGoogle Scholar
  8. Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37(1):7–18CrossRefGoogle Scholar
  9. Copara J, Ochoa J, Thorne C, Glavaš G (2016) Spanish NER with word representations and conditional random fields. In: Proceedings of the sixth named entity workshop, pp 34–40Google Scholar
  10. Costumero R, López F, Gonzalo-Martín C, Millan M, Menasalvas E (2014) An approach to detect negation on medical documents in Spanish. Int Conf Brain Inform Health 8609:366–375CrossRefGoogle Scholar
  11. Cotik V, Roller R, Xu F, Uszkoreit H, Budde K, Schmidt D (2016) Negation detection in clinical reports written in German. In: Proceedings of the fifth workshop on building and evaluating resources for biomedical text mining, pp 115–124Google Scholar
  12. Cruz NP, Maña MJ, Mata J (2010) Aprendizaje automático versus expresiones regulares en la detección de la negación y la especulación en biomedicina. Proces Leng Nat 45:77–85Google Scholar
  13. Cruz NP, Maña MJ, Mata J (2012) A machine-learning approach to negation and speculation detection in clinical texts. J Assoc Inf Sci Technol 63(7):1398–1410CrossRefGoogle Scholar
  14. Cruz NP, Morante R, Maña MJ, Mata J, Parra CL (2017) Annotating negation in Spanish clinical texts. In: Proceedings of the workshop computational semantics beyond events and roles, pp 53–58Google Scholar
  15. Dalianis H, Névéol A, Savova G, Zweigenbaum P (2014) Didactic panel: clinical natural language processing in languages other than English. AMIA Annu Symp 2014:1–12Google Scholar
  16. Deléger L, Grouin C (2012) Detecting negation of medical problems in French clinical notes. In: Proceedings of the 2nd ACM sighit international health informatics symposium, pp 697–702Google Scholar
  17. Goldberg Y, Hirst G (2017) Neural network methods in natural language processing. Morgan & Claypool Publishers, San RafaelCrossRefGoogle Scholar
  18. Goldin IM, Chapman WW (2003) Learning to detect negation with ‘not’ in medical texts. In: SIGIR 2003 workshop on text analysis and search for bioinformatics, pp 1–7Google Scholar
  19. Gormley MR, Yu M, Dredze M (2015) Improved relation extraction with feature-rich compositional embedding models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1774–1784Google Scholar
  20. Guo J, Che W, Wang H, Liu T (2014) Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 110–120Google Scholar
  21. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  22. He H, Lin J (2016) Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of human language technologies: the annual conference of the North American chapter of the association for computational linguistics, pp 937–948Google Scholar
  23. Henriksson A, Zhao J, Boström H, Dalianis H (2015) Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection. In: 2015 IEEE international conference on bioinformatics and biomedicine, pp 343–350Google Scholar
  24. Jacobson O, Dalianis H (2016) Applying deep learning on electronic health records in Swedish to predict healthcare-associated infections. In: Proceedings of the 15th workshop on biomedical natural language processing, pp 191–195Google Scholar
  25. Kang T, Zhang S, Xu N, Wen D, Zhang X, Lei J (2017) Detecting negation and scope in Chinese clinical notes using character and word embedding. Comput Methods Programs Biomed 140:53–59CrossRefGoogle Scholar
  26. Kou G, Lu Y, Peng Y, Shi Y (2012) Evaluation of classification algorithms using mcdm and rank correlation. Int J Inf Technol Decis Mak 11(01):197–225CrossRefGoogle Scholar
  27. Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using mcdm methods. Inf Sci 275:1–12CrossRefGoogle Scholar
  28. Kudo T (2005) CRF++: Yet another CRF toolkit. https://sourceforge.net/projects/crfpp/
  29. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, vol 1, pp 282–289Google Scholar
  30. Ling W, Dyer C, Black AW, Trancoso I (2015) Two/too simple adaptations of word2vec for syntax problems. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1299–1304Google Scholar
  31. Marimon M, Vivaldi J, Bel N (2017) Annotation of negation in the IULA Spanish clinical record corpus. In: Proceedings of the workshop computational semantics beyond events and roles, pp 43–52Google Scholar
  32. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of workshop at international conference on learning representations, pp 1–12Google Scholar
  33. Morante R, Daelemans W (2009) A metalearning approach to processing the scope of negation. In: Proceedings of the thirteenth conference on computational natural language learning, pp 21–29Google Scholar
  34. Nakov P, Zesch T (eds) (2014) Proceedings of the 8th international workshop on semantic evaluationGoogle Scholar
  35. Nguyen TH, Grishman R (2016) Combining neural networks and log-linear models to improve relation extraction. In: Proceedings of IJCAI workshop on deep learning for artificial intelligence, pp 1–7Google Scholar
  36. Oronoz M, Casillas A, Gojenola K, Pérez A (2013) Automatic annotation of medical records in Spanish with disease, drug and substance names. In: Progress in pattern recognition, image analysis, computer vision, and applications—18th Iberoamerican congress, vol 8259, pp 536–543Google Scholar
  37. Oronoz M, Gojenola K, Pérez A, Díaz de Ilarraza A, Casillas A (2015) On the creation of a clinical gold standard corpus in Spanish: mining adverse drug reactions. J Biomed Inform 56:318–332CrossRefGoogle Scholar
  38. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543Google Scholar
  39. Pérez A, Gojenola K, Casillas A, Oronoz M, Díaz de Ilarraza A (2015) Computer aided classification of diagnostic terms in Spanish. Expert Syst Appl 42(6):2949–2958CrossRefGoogle Scholar
  40. Segura-Bedmar I, Suárez-Paniagua V, Martínez P (2015) Exploring word embedding for drug name recognition. In: Proceedings of the sixth international workshop on health text mining and information analysis, pp 64–72Google Scholar
  41. Skeppstedt M (2011) Negation detection in Swedish clinical text an adaption of NegEx to Swedish. J Biomed Semant 2(3):1–12Google Scholar
  42. Vincze V, Szarvas G, Farkas R, Móra G, Csirik J (2008) The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinform 9(11):1–9Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IXA GroupUniversity of the Basque Country (UPV-EHU)Donostia/San SebastiánSpain

Personalised recommendations