Towards Knowledge Extraction from Electronic Health Records - Automatic Negation Identification
Along with the adoption of the Electronic Health Records (EHRs) a step forward had been taken in the health care industry as it becomes easier and less time consuming to store and retrieve information. The EHRs are free text format documents, where although a certain structure may exist, the information is usually spread across the entire document, with no standard specific structure. Our current work represents a step in providing an accessible format to EHRs. While the ultimate goal for structuring EHRs is knowledge extraction for assisting the medical diagnosis, the correct identification of terms in documents is essential. In order to do this, the paper deals with negation identification. We consider negation representing disjoint sets of concepts as it can be expressed with explicit terms and prefixed terms. Most of the medical negation terms expressed using prefixes are not usually found in common dictionaries that could help in identifying their truth values. We compare our approach with the standard NegEx on a dataset of 2132 sentences, and report better performance (with 27% improvement of recall, with the cost of less than 3% degradation of precision) in negation identification. We implemented a rule-based approach for identifying explicit negation and a vocabulary-based approach for the case when negation is expressed with prefixes. To the best of our knowledge, other works dealing with negation do not include an analysis negation expressed with prefixes, although in medical documents it represents an important trait in communicating information. In contrast to the baseline algorithm, NegEx, and other similar systems, our approach automatically identifies negated concepts and includes identification of negation expressed using prefixes.
KeywordsText Mining information extraction prefix negation Electronic Health Records vocabulary
Unable to display preview. Download preview PDF.