Abstract
This work focuses on the creation of a system to detect negated medical entities in electronic health records (EHRs) written in Spanish. The importance of this task rests on the influence that the negation can have in the automatic understanding of information given that it inverts the truth value of a clause. We explore a novel continuous characterization as an alternative to previous negation extraction approaches based on discrete characterizations. The aim is to increase the ability of the characterization to generalize over discrete features. We also included other features that could be useful for the negation detection task. In addition, the negation detection is approached as a named entity recognition task where we want to find only the negated entities. EHRs are represented by the corresponding embeddings. In addition, this approach is compared with a traditional discrete characterization based on words. These representations are employed by a supervised classifier such as conditional random fields to infer the predictive model. The approach is assessed on health records from different hospitals, namely IxaMed-GS and IULA. The best performance is achieved by virtue of the embedding-based characterization, leading to an f-measure of 75.3 and 81.6 for the IxaMed-GS and IULA corpus, respectively. With this work, we prove that the use of embedding-based representations can also be useful for the detection of negated medical entities.
References
Agarwal S, Yu H (2010) Biomedical negation scope detection with conditional random fields. J Am Med Inform Assoc 17(6):696–701
Artetxe M, Labaka G, Lopez-Gazpio I, Agirre E (2018) Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation. In: Proceedings of the 22nd conference on computational natural language learning, pp 282–291
Asker L, Boström H, Papapetrou P, Persson H (2016) Identifying factors for the effectiveness of treatment of heart failure: a registry study. In: 2016 IEEE 29th international symposium on computer-based medical systems, pp 205–206
Brown PF, deDouza PV, Mercer RL, Della Pietra VJ, Lai JC (1992) Class-based n-gram models of natural language. Computat Linguist 18(4):467–479
Cardellino C (2016) Spanish billion words corpus and embeddings. http://crscardellino.me/SBWCE/
Casillas A, Pérez A, Oronoz M, Gojenola K, Santiso S (2016) Learning to extract adverse drug reaction events from electronic health records in Spanish. Expert Syst Appl 61:235–245
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310
Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37(1):7–18
Copara J, Ochoa J, Thorne C, Glavaš G (2016) Spanish NER with word representations and conditional random fields. In: Proceedings of the sixth named entity workshop, pp 34–40
Costumero R, López F, Gonzalo-Martín C, Millan M, Menasalvas E (2014) An approach to detect negation on medical documents in Spanish. Int Conf Brain Inform Health 8609:366–375
Cotik V, Roller R, Xu F, Uszkoreit H, Budde K, Schmidt D (2016) Negation detection in clinical reports written in German. In: Proceedings of the fifth workshop on building and evaluating resources for biomedical text mining, pp 115–124
Cruz NP, Maña MJ, Mata J (2010) Aprendizaje automático versus expresiones regulares en la detección de la negación y la especulación en biomedicina. Proces Leng Nat 45:77–85
Cruz NP, Maña MJ, Mata J (2012) A machine-learning approach to negation and speculation detection in clinical texts. J Assoc Inf Sci Technol 63(7):1398–1410
Cruz NP, Morante R, Maña MJ, Mata J, Parra CL (2017) Annotating negation in Spanish clinical texts. In: Proceedings of the workshop computational semantics beyond events and roles, pp 53–58
Dalianis H, Névéol A, Savova G, Zweigenbaum P (2014) Didactic panel: clinical natural language processing in languages other than English. AMIA Annu Symp 2014:1–12
Deléger L, Grouin C (2012) Detecting negation of medical problems in French clinical notes. In: Proceedings of the 2nd ACM sighit international health informatics symposium, pp 697–702
Goldberg Y, Hirst G (2017) Neural network methods in natural language processing. Morgan & Claypool Publishers, San Rafael
Goldin IM, Chapman WW (2003) Learning to detect negation with ‘not’ in medical texts. In: SIGIR 2003 workshop on text analysis and search for bioinformatics, pp 1–7
Gormley MR, Yu M, Dredze M (2015) Improved relation extraction with feature-rich compositional embedding models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1774–1784
Guo J, Che W, Wang H, Liu T (2014) Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 110–120
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
He H, Lin J (2016) Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of human language technologies: the annual conference of the North American chapter of the association for computational linguistics, pp 937–948
Henriksson A, Zhao J, Boström H, Dalianis H (2015) Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection. In: 2015 IEEE international conference on bioinformatics and biomedicine, pp 343–350
Jacobson O, Dalianis H (2016) Applying deep learning on electronic health records in Swedish to predict healthcare-associated infections. In: Proceedings of the 15th workshop on biomedical natural language processing, pp 191–195
Kang T, Zhang S, Xu N, Wen D, Zhang X, Lei J (2017) Detecting negation and scope in Chinese clinical notes using character and word embedding. Comput Methods Programs Biomed 140:53–59
Kou G, Lu Y, Peng Y, Shi Y (2012) Evaluation of classification algorithms using mcdm and rank correlation. Int J Inf Technol Decis Mak 11(01):197–225
Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using mcdm methods. Inf Sci 275:1–12
Kudo T (2005) CRF++: Yet another CRF toolkit. https://sourceforge.net/projects/crfpp/
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, vol 1, pp 282–289
Ling W, Dyer C, Black AW, Trancoso I (2015) Two/too simple adaptations of word2vec for syntax problems. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1299–1304
Marimon M, Vivaldi J, Bel N (2017) Annotation of negation in the IULA Spanish clinical record corpus. In: Proceedings of the workshop computational semantics beyond events and roles, pp 43–52
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of workshop at international conference on learning representations, pp 1–12
Morante R, Daelemans W (2009) A metalearning approach to processing the scope of negation. In: Proceedings of the thirteenth conference on computational natural language learning, pp 21–29
Nakov P, Zesch T (eds) (2014) Proceedings of the 8th international workshop on semantic evaluation
Nguyen TH, Grishman R (2016) Combining neural networks and log-linear models to improve relation extraction. In: Proceedings of IJCAI workshop on deep learning for artificial intelligence, pp 1–7
Oronoz M, Casillas A, Gojenola K, Pérez A (2013) Automatic annotation of medical records in Spanish with disease, drug and substance names. In: Progress in pattern recognition, image analysis, computer vision, and applications—18th Iberoamerican congress, vol 8259, pp 536–543
Oronoz M, Gojenola K, Pérez A, Díaz de Ilarraza A, Casillas A (2015) On the creation of a clinical gold standard corpus in Spanish: mining adverse drug reactions. J Biomed Inform 56:318–332
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543
Pérez A, Gojenola K, Casillas A, Oronoz M, Díaz de Ilarraza A (2015) Computer aided classification of diagnostic terms in Spanish. Expert Syst Appl 42(6):2949–2958
Segura-Bedmar I, Suárez-Paniagua V, Martínez P (2015) Exploring word embedding for drug name recognition. In: Proceedings of the sixth international workshop on health text mining and information analysis, pp 64–72
Skeppstedt M (2011) Negation detection in Swedish clinical text an adaption of NegEx to Swedish. J Biomed Semant 2(3):1–12
Vincze V, Szarvas G, Farkas R, Móra G, Csirik J (2008) The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinform 9(11):1–9
Acknowledgements
This work was partially funded by the Spanish Ministry of Science and Innovation (PROSAMED: TIN2016-77820-C3-1-R and TADEEP: TIN2015-70214-P) and the Basque Government (DETEAMI: Ministry of Health 2014111003, Predoctoral Grant: PRE 2016 1 0128).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Santiso, S., Casillas, A., Pérez, A. et al. Word embeddings for negation detection in health records written in Spanish. Soft Comput 23, 10969–10975 (2019). https://doi.org/10.1007/s00500-018-3650-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3650-7