Information Retrieval

, Volume 11, Issue 6, pp 499–538 | Cite as

Negation recognition in medical narrative reports

  • Lior RokachEmail author
  • Roni Romano
  • Oded Maimon


Substantial medical data, such as discharge summaries and operative reports are stored in electronic textual form. Databases containing free-text clinical narratives reports often need to be retrieved to find relevant information for clinical and research purposes. The context of negation, a negative finding, is of special importance, since many of the most frequently described findings are such. When searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the documents retrieved will be irrelevant. Hence, negation is a major source of poor precision in medical information retrieval systems. Previous research has shown that negated findings may be difficult to identify if the words implying negations (negation signals) are more than a few words away from them. We present a new pattern learning method for automatic identification of negative context in clinical narratives reports. We compare the new algorithm to previous methods proposed for the same task, and show its advantages: accuracy improvement compared to other machine learning methods, and much faster than manual knowledge engineering techniques with matching accuracy. The new algorithm can be applied also to further context identification and information extraction tasks.


Text classification Part-of-speech tagging Negation Narrative medical reports Artificial intelligence 



The authors would like to thank J. Kannry, M.D. (Center for Medical Informatics, Department of Medicine, Mount Sinai—NYU, New York, USA), T. Karson, M.D. (Departments of Clinical Informatics and Cardiology, Mount Sinai School of Medicine, New York, USA) and M. Averbuch, M.D. (Tel-Aviv Sourasky Medical Center, Israel) for providing the data that have been used in the experimental study and for helping doing the initial prior studies which lead eventually to this study.


  1. Aronow, D., Feng, F., & Croft, W. B. (1999). Ad hoc classification of radiology reports. Journal of the American Medical Informatics Association, 6(5), 393–411.Google Scholar
  2. Averbuch, M., Karson, T., Ben-Ami, B., Maimon, O., & Rokach, L. (2004). Context-sensitive medical information retrieval, MEDINFO-2004, San Francisco, CA, September. IOS Press, pp. 282–262.Google Scholar
  3. Bekkerman, R., & Allen, J. (2003). Using bigrams in text categorization. Department of Computer Science, University of Massachusetts, Amherst, CIIR Technical Report IR-408.Google Scholar
  4. Califf, M. E., & Mooney, R. J. (1999). Relational learning of pattern-match rules for information extraction. In Proceedings of the Sixteenth National Conf. on Artificial Intelligence, pp. 328–334.Google Scholar
  5. Caropreso, M., Matwin, S., & Sebastiani, F. (2001). A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In Text databases and document management: Theory and practice (pp. 78–102). Idea Group Publishing.Google Scholar
  6. Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanann, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34, 301–310. doi: 10.1006/jbin.2001.1029.CrossRefGoogle Scholar
  7. Ciravegna, F. (2001). Adaptive information extraction from text by rule induction and generalization, In Proceedings of the 17th International Joint Conference on Artificial Intelligence.Google Scholar
  8. Cohn, T. A. (2007). Scaling conditional random fields for natural language processing. PhD dissertation, University of Melbourne.Google Scholar
  9. Damashek, M. (1995). Gauging similarity with N-grams: Language-independent categorization of text. Science, 267, 843–848.Google Scholar
  10. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7(1), 1–30.Google Scholar
  11. Dietterich, G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10, 1895–1924. doi: 10.1162/089976698300017197.CrossRefGoogle Scholar
  12. Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155.Google Scholar
  13. Esuli, A., & Sebastiani, F. (2005). Determining the semantic orientation of terms through gloss analysis. In Proceedings of CIKM-05, the ACM SIGIR Conference on Information and Knowledge Management, Bremen, DE.Google Scholar
  14. Fiszman, M., & Haug, P. J. (2000). Using medical language processing to support real-time evaluation of pneumonia guidelines. In Proceedings of AMIA Symposium, pp. 235–239.Google Scholar
  15. Fiszman, M., Chapman, W. W., Aronsky, D., Evans, R. S., & Haug, P. J. (2000). Automatic detection of acute bacterial pneumonia from chest X-ray reports. Journal of the American Medical Informatics Association, 7, 593–604.Google Scholar
  16. Freitag, D., & Kushmerick, N. (2000). Boosted wrapper induction. In Proceedings of the Seventh National Conference on Artificial, Austin, Texas, pp. 577–583.Google Scholar
  17. Freitag, D. (1998). Toward general-purpose learning for information extraction. In Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pp. 404–408.Google Scholar
  18. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning, Proceedings of the Thirteenth International Conference, pp. 325–332.Google Scholar
  19. Friedman, C., & Hripcsak, G. (1998). Evaluating natural language processors in the clinical domain. Methods of Information in Medicine, 37, 334–344.Google Scholar
  20. Friedman, C., Alderson, P., Austin, J., Cimino, J., & Johnson, S. (1994). A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association, 1(2), 161–174.Google Scholar
  21. Goldin, I., & Chapman, W. W. (2003). Learning to detect negation with ‘not’ in medical texts. In: E. Brown, W. Hersh, & A. Valencia (Eds.), Proceedings of the Workshop on Text Analysis and Search for Bioinformatics at the 26th Annual International ACM SIGIR Conference (SIGIR-2003).Google Scholar
  22. Hall, M. (1999). Correlation-based feature selection for machine learning. PhD thesis, University of Waikato.Google Scholar
  23. Halteren, H., Zavrel, J., & Daelemans, W. (2001). Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics, 27(2), 199–229. doi: 10.1162/089120101750300508.CrossRefGoogle Scholar
  24. Hersh, W. R., & Hickam, D. H. (1995). Information retrieval in medicine: the SAPHIRE experience. Journal of the American Society for Information Science American Society for Information Science, 46, 743–747. doi:10.1002/(SICI)1097-4571(199512)46:10<743::AID-ASI5>3.0.CO;2-C.CrossRefGoogle Scholar
  25. Horn, L. R. (2001). A natural history of negation. Stanford, CA: Center for the Study of Language and Information. ISBN: 1575863367 Google Scholar
  26. Hripcsak, G., Knirsch, C. A., Jain, N. L., Stazesky, R. C., Pablos-mendez, A., & Fulmer, T. (1999). A health information network for managing innercity tuberculosis: Bridging clinical care, public health, and home care. Computers and Biomedical Research, an International Journal, 32(1), 67–76. doi: 10.1006/cbmr.1998.1496.CrossRefGoogle Scholar
  27. Java, A. (2007). A framework for modeling influence, opinions and structure in social media. In Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada, pp. 1933-1934. Google Scholar
  28. Kim, S., & Hovy, E. (2004). Determining the sentiment of opinions. In Proceedings of the 20th international conference on computational linguistics, August 23–27 Geneva, Switzerland, International Conference On Computational Linguistics.Google Scholar
  29. Kupiec, J. M. (1992). Robust part-of-speech tagging using a hidden Markov model. Computer Speech & Language, 6, 225–242. doi: 10.1016/0885-2308(92)90019-Z.CrossRefGoogle Scholar
  30. Kushmerick, N., Weld, D. S., & Doorenbos, R. B. (1997). Wrapper induction for information extraction. In International Joint Conference on Artificial Intelligence (IJCAI), pp. 729–737.Google Scholar
  31. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001), pp. 282–289.Google Scholar
  32. Leroy, G., Chen, H., & Martinez, J. D. (2003). A shallow parser based on closed-class words to capture relations in biomedical text. Journal of Biomedical Informatics, 36, 145–158. doi: 10.1016/S1532-0464(03)00039-X.CrossRefGoogle Scholar
  33. Lindbergh, D. A. B., & Humphreys, B. L. (1993). The unified medical language system. In van Bemmel, J. H. & McCray, A. T. (Eds.), 1993 yearbook of medical informatics (pp. 41–51). The Netherlands: IMIA.Google Scholar
  34. Lingpipe. Home page, <>, Accessed 12 March 2007.
  35. McCallum, A. K. MALLET: A machine learning for language toolkit. Home page, Accessed 12 March 2007.
  36. Muslea, I., Minton, S., & Knoblock, C. (2001). Hierarchical wrapper induction for semistructured information sources. Journal of Autonomous Agents and Multi-Agent Systems, 4, 93–114.Google Scholar
  37. Mutalik, P. G., Deshpande, A., & Nadkarni, P. M. (2001). Use of general-purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. Journal of the American Medical Informatics Association, 8(6), 598–609.Google Scholar
  38. Myers, E. W., & An, O. (1986). (ND) Difference algorithm and its variations. Algorithmica, 1(1), 251–266. doi: 10.1007/BF01840446.zbMATHCrossRefMathSciNetGoogle Scholar
  39. Nadkarni, P. (2000). Information retrieval in medicine: Overview and applications. Journal of Postgraduate Medicine, 46(2), 116–122.Google Scholar
  40. Perner, P. (2001). Improving the accuracy of decision tree induction by feature pre-selection. Applied Artificial Intelligence, 15(8), 747–760. doi: 10.1080/088395101317018582.CrossRefGoogle Scholar
  41. Quinlan, J. (1993). C4.5: Programs for machine learning. Morgan Kaufmann, Los Altos, CA.Google Scholar
  42. Rigoutsos, I., & Floratos, A. (1998). Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics (Oxford, England), 14(2), 229.Google Scholar
  43. Rokach, L., Averbuch, M., & Maimon, O. (2004). Information retrieval system for medical narrative reports (pp. 217–228). Lecture notes in artificial intelligence, 3055. Springer-Verlag.Google Scholar
  44. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47. doi: 10.1145/505282.505283.CrossRefGoogle Scholar
  45. Seymore, K., McCallum, A., & Rosenfeld, R. (1999) Learning hidden markov model structure for information extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence: Workshop on Machine Learning for Information Extraction. Orlando, FL, pp. 37–42.Google Scholar
  46. Smith, L., Rindflesch, T., & Wilbur, W. J. (2004). MedPost: A part-of-speech tagger for biomedical text. Bioinformatics (Oxford, England), 20(14), 2320–2321. doi: 10.1093/bioinformatics/bth227.CrossRefGoogle Scholar
  47. Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine Learning, 34, 233–272. doi: 10.1023/A:1007562322031.zbMATHCrossRefGoogle Scholar
  48. Tottie, G. (1991). Negation in English speech and writing: a study in variation. Academic Press: New York.Google Scholar
  49. Turney, P. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL'02), Philadelphia, PA, pp. 417–424. Google Scholar
  50. Van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). London: Butterworths.Google Scholar
  51. Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, BC, Canada.Google Scholar
  52. Witten, I. H., & Eibe, F. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Information Systems EngineeringBen Gurion UniversityBeer ShevaIsrael
  2. 2.Department of Industrial EngineeringTel-Aviv UniversityTel-AvivIsrael

Personalised recommendations