Natural Language Processing of Medical Reports



A significant amount of information regarding the observations, assessments, and recommendations related to a patient's case is documented within free-text medical reports. The ability to structure and standardize clinical patient data has been a grand goal of medical informatics since the inception of the field - especially if this structuring can be (automatically) achieved at the patient bedside and within the modus operandi of current medical practice. A computational infrastructure that transforms the process of clinical data collection from an uncontrolled to highly controlled operation (i.e., precise, completely specified, standard representation) can facilitate medical knowledge acquisition and its application to improve healthcare. Medical natural language processing (NLP) systems attempt to interpret free-text to facilitate a clinical, research, or teaching task. An NLP system performs translates a source language (e.g., free-text) to a target surrogate, computer-understandable representation (e.g., first-order logic), which in turn can support the operations of a driving application. NLP is really then a transformation from a representational form that is not very useful from the perspective of a computer (a sequence of characters) to a form that is useful (a logic-based representation of the text meaning). In general, the accuracy and speed of translation is heavily dependent on the end application. This chapter presents work related to natural language processing of clinical reports, covering issues related to representation, computation, and evaluation. We first summarize a number of typical clinical applications. We then present a high-level formalization of the medical NLP problem in order to provide structure as to how various aspects of NLP fit and complement one another. Examples of approaches that target various forms of representations and degrees of potential accuracy are discussed. Individual NLP subtasks are subsequently discussed. We conclude this chapter with evaluation methods and a discussion of the directions expected in the processing of clinical medical reports. Throughout, we describe applications illustrating the many open issues revolving around medical natural language processing.


Noun Phrase Natural Language Processing Word Sense Disambiguation Word Feature Clinical Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Abney S (2002) Bootstrapping. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 360-367.Google Scholar
  2. 2.
    Aho AV, Corasick MJ (1975) Efficient string matching: Aid to bibliographic search. Comm ACM, 18(6):333-340.MATHMathSciNetCrossRefGoogle Scholar
  3. 3.
    Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. Proc AMIA Symp, pp 17-21.Google Scholar
  4. 4.
    Baneyx A, Charlet J, Jaulent MC (2006) Methodology to build medical ontology from textual resources. Proc AMIA Symp, pp 21-25.Google Scholar
  5. 5.
    Bashyam V (2008) Towards a canonical representation for machine understanding of natural language in radiology reports. Department of Information Studies, PhD dissertation. University of California Los Angeles.Google Scholar
  6. 6.
    Bashyam V, Taira RK (2005) Indexing anatomical phrases in neuro-radiology reports to the UMLS 2005AA. Proc AMIA Symp pp 26-30.Google Scholar
  7. 7.
    Bashyam V, Taira RK (2005) A study of lexical behaviour of sentences in chest radiology reports. Proc AMIA Symp p 891.Google Scholar
  8. 8.
    Bates DW, Evans RS, Murff H, Stetson PD, Pizziferri L, Hripcsak G (2003) Detecting adverse events using information technology. J Am Med Inform Assoc, 10(2):115-128.CrossRefGoogle Scholar
  9. 9.
    Baud R (2004) A natural language based search engine for ICD10 diagnosis encoding. Med Arh, 58(1 Suppl 2):79-80.Google Scholar
  10. 10.
    Becker GJ (2005) Restructuring cancer clinical trials. J Am Coll Radiol, 2(10):816-817.CrossRefGoogle Scholar
  11. 11.
    Bell GB, Sethi A (2001) Matching records in a national medical patient index. Communications of the ACM, 44(9):83-88.CrossRefGoogle Scholar
  12. 12.
    Berger AL, DellaPietra SA, DellaPietra VJ (1996) A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39-71.Google Scholar
  13. 13.
    Berman JJ (2004) Pathology abbreviated: A long review of short terms. Arch Pathol Lab Med, 128(3):347-352.Google Scholar
  14. 14.
    Berrios DC (2000) Automated indexing for full text information retrieval. Proc AMIA Symp, pp 71-75.Google Scholar
  15. 15.
    Black A, van de Plassche J, Williams B (1991) Analysis of unknown words through morphological decomposition. Proc 5th Conf European Chapter of the Association of Computational Linguistics, pp 101-106.Google Scholar
  16. 16.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. Proc 11th Annual Conf Computational Learning Theory, pp 92-100.Google Scholar
  17. 17.
    Bodenreider O, McCray AT (2003) Exploring semantic groups through visual approaches. J Biomed Inform, 36(6):414-432.CrossRefGoogle Scholar
  18. 18.
    Booker DL, Berman JJ (2004) Dangerous abbreviations. Hum Pathol, 35(5):529-531.CrossRefGoogle Scholar
  19. 19.
    Bouillon P, Rayner M, Chatzichrisafis N, Hockey BA, Santaholma M, Starlander M, Nakao Y, Kanzaki K, Isahara H (2005) A generic multi-lingual open source platform for limited-domain medical speech translation. Proc 10th Annual Conf European Association for Machine Translation, pp 50-58.Google Scholar
  20. 20.
    Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543-565.Google Scholar
  21. 21.
    Budiu R, Anderson JR (2004) Interpretation-based processing: A unified theory of semantic sentence comprehension. Cognitive Science, 28(1):1-44.CrossRefGoogle Scholar
  22. 22.
    Campbell DA, Johnson SB (2001) Comparing syntactic complexity in medical and non-medical corpora. Proc AMIA Symp, pp 90-94.Google Scholar
  23. 23.
    Campbell DA, Johnson SB (2002) A transformational-based learner for dependency grammars in discharge summaries. Proc ACL-02 Workshop on Natural language Processing in the Biomedical Domain, vol 3, pp 37-44.Google Scholar
  24. 24.
    Cao H, Markatou M, Melton GB, Chiang MF, Hripcsak G (2005) Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics. Proc AMIA Symp, pp 106-110.Google Scholar
  25. 25.
    Cardie C (1994) Domain-specific Knowledge Acquisition for Conceptual Sentence Analysis. Department of Computer Science PhD dissertation. University of Massachusetts, Amherst.Google Scholar
  26. 26.
    Carroll J, Minnen G, Pearce D, Canning Y, Devlin S, Tait J (1999) Simplifying text for language-impaired readers. Proc 9th Conf European Chapter of the Association of Computational Linguistics, pp 269-270.Google Scholar
  27. 27.
    Carter PI (2004) HIPAA Compliance Handbook 2004. Aspen Publishing, Gaithersburg, MD.Google Scholar
  28. 28.
    Chao G (2002) Recurrent probabilistic modeling and its application to part-of-speech tagging. Proc 40th Annual Meeting Assoc Computational Linguistics: Student Research Workshop, pp 6-11.Google Scholar
  29. 29.
    Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform, 34(5):301-310.CrossRefGoogle Scholar
  30. 30.
    Chapman WW, Chu D, Dowling JN (2007) ConText: An algorithm for identifying contextual features from clinical text. BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp 81-88.Google Scholar
  31. 31.
    Chapman WW, Fiszman M, Dowling JN, Chapman BE, Rindflesch TC (2004) Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. Stud Health Technol Inform, 107(Pt 1):487-491.Google Scholar
  32. 32.
    Charniak E (2001) Unsupervised learning of name structure from coreference data. Proc North American Chapter Assoc Computational Linguistics, pp 48-54.Google Scholar
  33. 33.
    Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4):359-394.CrossRefGoogle Scholar
  34. 34.
    Chinchor N, Marsh E (1998) MUC-7 named entity task definition. Proc 7th Message Understanding Conference (MUC-7).Google Scholar
  35. 35.
    Cho PS, Taira RK, Kangarloo H (2002) Text boundary detection of medical reports. Proc AMIA Symp, pp 155-159.Google Scholar
  36. 36.
    Cho PS, Taira RK, Kangarloo H (2003) Automatic section segmentation of medical reports. Proc AMIA Symp, pp 155-159.Google Scholar
  37. 37.
    Christensen LM, Haug PJ, Fiszman M (2002) MPLUS: A probabilistic medical language understanding system. Proc ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, vol 3, pp 29-36.Google Scholar
  38. 38.
    Ciaramita M, Johnson M (2000) Explaining away ambiguity: Learning verb selectional preference with Bayesian networks. Proc 18th Conf Computational Linguistics, vol 1, pp 187-193.Google Scholar
  39. 39.
    Clegg AB, Shepherd AJ (2007) Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinformatics, 8:24-40.CrossRefGoogle Scholar
  40. 40.
    Coates-Stephens S (1992) The analysis and acquisition of proper names for the understanding of free text. Computers and the Humanities, 26(5):441-456.CrossRefGoogle Scholar
  41. 41.
    Coden AR, Pakhomov SV, Ando RK, Duffy PH, Chute CG (2005) Domain-specific language models and lexicons for tagging. J Biomed Inform, 38(6):422-430.CrossRefGoogle Scholar
  42. 42.
    Cohen KB, Hunter L (2006) A critical review of PASBio's argument structures for biomedical verbs. BMC Bioinformatics, 7 Suppl 3:S5.CrossRefGoogle Scholar
  43. 43.
    Cohn A (1996) Calculi for qualitative spatial reasoning. Artificial Intelligence and Symbolic Mathematical Computation, pp 124-143.Google Scholar
  44. 44.
    Collins M (2002) Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 489-496.Google Scholar
  45. 45.
    Computational Mdeicine Center (2009) International Challenge: Classifying Clinical Free Text Using Natural Language Processing. . Accessed April 14, 2009.
  46. 46.
    D'Avolio LW, Litwin MS, Rogers SO, Jr., Bui AA (2008) Facilitating clinical outcomes assessment through the automated identification of quality measures for prostate cancer surgery. J Am Med Inform Assoc, 15(3):341-348.CrossRefGoogle Scholar
  47. 47.
    Dejean H (2000) ALLiS: A symbolic learning system for natural language learning. Proc CoNLL-2000 and LLL-2000, pp 95-98.Google Scholar
  48. 48.
    DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1):31-39.Google Scholar
  49. 49.
    Divita G, Browne AC, Rindflesch TC (1998) Evaluating lexical variant generation to improve information retrieval. Proc AMIA Symp, pp 775-779.Google Scholar
  50. 50.
    Dolin RH, Alschuler L, Boyer S, Beebe C, Behlen FM, Biron PV, Shabo Shvo A (2006) HL7 Clinical Document Architecture, Release 2. J Am Med Inform Assoc, 13(1):30-39.CrossRefGoogle Scholar
  51. 51.
    Duda RO, Hart PE, Stork DG (2001) Pattern Classification. 2nd edition. Wiley, New York, NY.MATHGoogle Scholar
  52. 52.
    Eck M, Vogel S, Waibel A (2004) Improving statistical machine translation in the medical domain using the unified medical language system. Proc 20th Intl Conf Computational Linguistics.Google Scholar
  53. 53.
    Eddy SR (2004) What is a hidden Markov model? Nat Biotechnol, 22(10):1315-1316.CrossRefGoogle Scholar
  54. 54.
    Eng J, Eisner JM (2004) Radiology report entry with automatic phrase completion driven by language modeling. RadioGraphics, 24(5):1493-1501.CrossRefGoogle Scholar
  55. 55.
    Fellbaum C (1998) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google Scholar
  56. 56.
    Feng D, Burns G, Zhu J, Hovy EH (2008) Towards automated semantic analysis on biomedical research articles. Proc 3rd Intl Joint Conf Natural Language Processing.Google Scholar
  57. 57.
    Firth JR (1957) Modes of meaning. In: Firth JR (ed) Papers in Linguistics 1934-1951. Oxford University Press, London.Google Scholar
  58. 58.
    Fisk JM, Mutalik P, Levin FW, Erdos J, Taylor C, Nadkarni P (2003) Integrating query of relational and textual data in clinical databases: A case study. J Am Med Inform Assoc, 10(1):21-38.CrossRefGoogle Scholar
  59. 59.
    Forney Jr GD (1973) The Viterbi algorithm. Proceedings of the IEEE, 61(3):268-278.MathSciNetCrossRefGoogle Scholar
  60. 60.
    Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc, 1(2):161-174.Google Scholar
  61. 61.
    Friedman C, Hripcsak G, Shablinsky I (1998) An evaluation of natural language processing methodologies. Proc AMIA Symp:855-859.Google Scholar
  62. 62.
    Friedman C, Huff SM, Hersh WR, Pattisongordon E, Cimino JJ (1995) The Canon Group's effort: Working toward a merged model. J Am Med Inform Assoc, 2(1):4-18.Google Scholar
  63. 63.
    Friedman C, Kra P, Rzhetsky A (2002) Two biomedical sublanguages: A description based on the theories of Zellig Harris. J Biomed Inform, 35(4):222-235.CrossRefGoogle Scholar
  64. 64.
    Friedman C, Shagina L, Lussier Y, Hripcsak G (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc, 11(5):392-402.CrossRefGoogle Scholar
  65. 65.
    Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. Proc 17th Intl Conf Machine Learning (ICML-2000), pp 327-334.Google Scholar
  66. 66.
    Guihenneuc-Jouyaux C, Richardson S, Longini IM, Jr. (2000) Modeling markers of disease progression by a hidden Markov process: Application to characterizing CD4 cell decline. Biometrics, 56(3):733-741.MATHCrossRefGoogle Scholar
  67. 67.
    Gundlapalli AV, South BR, Phansalkar S, Kinney AY, Shen S (2008) Application of natural language processing to VA electronic health records to identify phenotypic characteristics for clinical and research purposes. Proc 2008 AMIA Summit on Translational Bioinformatics, pp 36-40.Google Scholar
  68. 68.
    Gupta A, Ludascher B, Grethe JS, Martone ME (2003) Towards a formalization of disease-specific ontologies for neuroinformatics. Neural Networks 16:1277-1292.CrossRefGoogle Scholar
  69. 69.
    Gupta D, Saul M, Gilbertson J (2004) Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol, 121(2):176-186.CrossRefGoogle Scholar
  70. 70.
    Hachey B, Alex B, Mecker M (2005) Investigating the effects of selective sampling on the annotation task. Proc 9th Conf Computational Natural Language Processing, pp 144-151.Google Scholar
  71. 71.
    Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K (1997) A natural language parsing system for encoding admitting diagnoses. Proc AMIA Symp, pp 814-818.Google Scholar
  72. 72.
    Heinze DT, Morsch ML, Sheffer RE, Jimmink MA, Jennings MA, Morris WC, Morch AEW (2001) LifeCode: A deployed application for automated medical coding. AI Magazine, 22(2):76-88.Google Scholar
  73. 73.
    Hersh WR, Campbell EM, Malveau SE (1997) Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: A lexical analysis. Proc AMIA Fall Symp, pp 580-584.Google Scholar
  74. 74.
    Herzig TW, Johns M (1997) Extraction of medical information from textual sources: A statistical variant of the boundary-word method. J Am Med Inform Assoc:859-859.Google Scholar
  75. 75.
    Hripcsak G, Austin JH, Alderson PO, Friedman C (2002) Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology, 224(1):157-163.CrossRefGoogle Scholar
  76. 76.
    Huang Y, Lowe HJ (2007) A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc, 14(3):304-311.CrossRefGoogle Scholar
  77. 77.
    Huang Y, Lowe HJ, Klein D, Cucina RJ (2005) Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon. J Am Med Inform Assoc, 12(3):275-285.CrossRefGoogle Scholar
  78. 78.
    Huddleston R (1984) Introduction to the Grammar of English. Cambridge University Press, Cambridge, MA.Google Scholar
  79. 79.
    Humphrey SM, Rogers WJ, Kilicoglu H, Demner-Fushman D, Rindflesch TC (2006) Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: Preliminary experiment. J Am American Society for Information Science and Technology, 57(1):96-113.CrossRefGoogle Scholar
  80. 80.
    Iwanska LM, Shapiro SC (2000) Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language. AAAI Press, Menlo Park, CA.MATHGoogle Scholar
  81. 81.
    Jain AK, Duin RPW, Mao JC (2000) Statistical pattern recognition: A review. IEEE Trans Pattern Analysis and Machine Intelligence, 22(1):4-37.CrossRefGoogle Scholar
  82. 82.
    Jelinek F (1999) Statistical Methods for Speech Recognition. 2nd edition. MIT press, Cambridge, MA.Google Scholar
  83. 83.
    Johansson C (2000) A context sensitive maximum likelihood approach to chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 136-138.Google Scholar
  84. 84.
    Johnson DB, Chu WW, Dionisio JD, Taira RK, Kangarloo H (1999) Creating and indexing teaching files from free-text patient reports. Proc AMIA Symp, pp 814-818.Google Scholar
  85. 85.
    Johnson SB (1998) Conceptual graph grammar: A simple formalism for sublanguage. Methods Inf Med, 37(4-5):345-352.Google Scholar
  86. 86.
    Johnson SB (1999) A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6(3):205-218.Google Scholar
  87. 87.
    Joshi M, Pedersen MJT, Maclin R, Pakhomov S (2006) Kernel methods for word sense disambiguation and acronym expansion. Proc 21st National Conf Artificial Intelligence.Google Scholar
  88. 88.
    Jurafsky D, Martin JH (2000) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, NJ.Google Scholar
  89. 89.
    Karlsson F (1990) Constraint grammar as a framework for parsing running text. Proc 13th Annual Conf Computational Linguistics, pp 168-173.Google Scholar
  90. 90.
    Kudo T, Matsumoto Y (2001) Chunking with support vector machines. Proc 2nd Meeting North American Chapter Assoc Computational Linguistics on Language Technologies, pp 192-199.Google Scholar
  91. 91.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc 18th Intl Conf Machine Learning, pp 282-289.Google Scholar
  92. 92.
    Le Moigno S, Charlet J, Bourigault D, Degoulet P, Jaulent MC (2002) Terminology extraction from text to build an ontology in surgical intensive care. Proc AMIA Symp, pp 430-434.Google Scholar
  93. 93.
    Lee DL, Chuang H, Seamons K (1997) Document ranking and the vector-space model. IEEE Software, 14(2):67-75.CrossRefGoogle Scholar
  94. 94.
    Li L, Chase HS, Patel CO, Friedman C, Weng C (2008) Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: A case study. Proc AMIA Symp, pp 404-408.Google Scholar
  95. 95.
    Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods Inf Med, 32(4):281-291.Google Scholar
  96. 96.
    Liu K, Chapman W, Hwa R, Crowley RS (2007) Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger. J Am Med Inform Assoc, 14(5):641-650.CrossRefGoogle Scholar
  97. 97.
    Lovis C, Michel PA, Baud R, Scherrer JR (1995) Word segmentation processing: A way to exponentially extend medical dictionaries. Proc MedInfo, vol 8 Pt 1, pp 28-32.Google Scholar
  98. 98.
    Lyman M, Sager N, Tick L, Nhan N, Borst F, Scherrer JR (1991) The application of natural-language processing to healthcare quality assessment. Med Decis Making, 11(4 Suppl):S65-68.Google Scholar
  99. 99.
    Manning CD, Schütze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.MATHGoogle Scholar
  100. 100.
    Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.Google Scholar
  101. 101.
    McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. Proc 7th Intl Conf Machine Learning, pp 591-598.Google Scholar
  102. 102.
    McCray AT, Bodenreider O, Malley JD, Browne AC (2001) Evaluating UMLS strings for natural language processing. Proc AMIA Symp, pp 448-452.Google Scholar
  103. 103.
    McDonald DD (1993) Internal and external evidence in the identification and semantic categorization of proper names. Acquisition of Lexical Knowledge from Text: Proc Workshop Sponsored by the Special Interest Group on the Lexicon of the ACL, pp 32-43.Google Scholar
  104. 104.
    McDonald DD (1996) Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev B, Pustejovsky J (eds) Corpus Processing for Lexical Acquisition. MIT Press, Cambridge, MA, pp 21-39.Google Scholar
  105. 105.
    McRoy SW, Ali SS, Haller SM (1997) Uniform knowledge representation for language processing in the B2 system. Natural Language Engineering, 3(2):123-145.CrossRefGoogle Scholar
  106. 106.
    Melton GB, Hripcsak G (2005) Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc, 12(4):448-457.CrossRefGoogle Scholar
  107. 107.
    Meng H, Lam W, Low KF (1999) Learning belief networks for language understanding. Proc ASRU.Google Scholar
  108. 108.
    Meystre S, Haug PJ (2005) Automation of a problem list using natural language processing. BMC Med Inform Decis Mak, 5:30.CrossRefGoogle Scholar
  109. 109.
    Meystre S, Haug PJ (2006) Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. J Biomed Inform, 39(6):589-599.CrossRefGoogle Scholar
  110. 110.
    Mikheev A (2000) Tagging sentence boundaries. Proc 1st North American Chapter Assoc Computational Linguistics Conf, pp 264-271.Google Scholar
  111. 111.
    Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: An on-line lexical database. Intl J Lexicography, 3(4):235-244.CrossRefGoogle Scholar
  112. 112.
    Miller JE, Torii M, Vijay-Shanker K (2007) Adaptation of POS tagging for multiple biomedical domains. BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp 179-180.CrossRefGoogle Scholar
  113. 113.
    Minsky ML, Papert S (1988) Perceptrons: An Introduction to Computational Geometry. Expanded edition. MIT Press, Cambridge, MA.Google Scholar
  114. 114.
    Molina A, Pla F (2002) Shallow parsing using specialized HMMs. J Machine Learning Research, 2(4):595-613.MATHCrossRefGoogle Scholar
  115. 115.
    Nadkarni P, Chen R, Brandt C (2001) UMLS concept indexing for production databases: A feasibility study. J Am Med Inform Assoc, 8(1):80-91.Google Scholar
  116. 116.
    Navigli R (2009) Word sense disambiguation: A survey. ACM Computing Surveys, 41(2):1-69.CrossRefGoogle Scholar
  117. 117.
    Neamatullah I, Douglass MM, Lehman LWH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD (2008) Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making, 8(32):1-17.Google Scholar
  118. 118.
    Nelson SJ, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD (1995) Identifying concepts in medical knowledge. Proc MedInfo, vol 8, pp 33-36.Google Scholar
  119. 119.
    Nguyen N, Guo Y (2007) Comparisons of sequence labeling algorithms and extensions. Proc 24th Intl Conf Machine Learning, pp 681-688.Google Scholar
  120. 120.
    Pakhomov S, Pedersen T, Chute CG (2005) Abbreviation and acronym disambiguation in clinical discourse. Proc AMIA Symp, pp 589-593.Google Scholar
  121. 121.
    Pedersen MJT, Banerjee S, Patwardhan S (2005) Maximizing semantic relatedness to perform word sense disambiguation (Technical Report). University of Minnesota Supercomputing Institute.Google Scholar
  122. 122.
    Penz JF, Wilcox AB, Hurdle JF (2007) Automated identification of adverse events related to central venous catheters. J Biomed Inform, 40(2):174-182.CrossRefGoogle Scholar
  123. 123.
    Pestian JP, Itert L, Duch W (2004) Development of a pediatric text-corpus for part-of-speech tagging. In: Wierzchon ST, Trojanowski K (eds) Intelligent Information Processing and the Web. Springer, pp 219-226.Google Scholar
  124. 124.
    Pierce D, Cardie C (2001) Limitations of co-training for natural language learning from large datasets. Proc 2001 Conf Empirical Methods in Natural Language Processing, pp 1–9.Google Scholar
  125. 125.
    Polackova G (2008) Understanding and use of phrasal verbs and idioms in medical/nursing texts. Bratisl Lek Listy, 109(11):531-532.Google Scholar
  126. 126.
    Pyper C, Amery J, Watson M, Crook C (2004) Patients' experiences when accessing their on-line electronic patient records in primary care. British Journal of General Practice, 54(498):38-43.Google Scholar
  127. 127.
    Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.Google Scholar
  128. 128.
    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257-286.CrossRefGoogle Scholar
  129. 129.
    Radiological Society of North America (2009) RadLex: A Lexicon for Uniform Indexing and Retrieval of Radiology Information Resources. . Accessed April 14, 2009.
  130. 130.
    Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. Proc Conf Empirical Methods in Natural Language Processing, pp 133-142.Google Scholar
  131. 131.
    Ratnaparkhi A (1998) Maximum Entropy Models for Natural Language Ambiguity Resolution. Department of Computer and Information Science PhD dissertation. University of Pennsylvania.Google Scholar
  132. 132.
    Rind DM, Kohane IS, Szolovits P, Safran C, Chueh HC, Barnett GO (1997) Maintaining the confidentiality of medical records shared over the Internet and the World Wide Web. Ann Intern Med, 127(2):138-141.Google Scholar
  133. 133.
    Roth D (1999) Memory based learning (Technical Report UIUCDCS-R-99-2125). Department of Computer Science, University of Illinois at Urbana-Champaign.Google Scholar
  134. 134.
    Ruch P, Baud R, Geissbuhler A (2003) Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1-2):169-184.CrossRefGoogle Scholar
  135. 135.
    Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G (2000) Medical document anonymization with a semantic lexicon. Proc AMIA Symp, pp 729-733.Google Scholar
  136. 136.
    Ruppenhofer J, Ellsworth M, Petruck M, Johnson C (2005) FrameNet II: Extended Theory and Practice (Technical Report). ICSI, Berkeley, CA.Google Scholar
  137. 137.
    Sager N, Lyman M, Nhan NT, Tick LJ (1995) Medical language processing: Applications to patient data representation and automatic encoding. Methods Inf Med, 34(1-2):140-146.Google Scholar
  138. 138.
    Salton G (1988) Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.Google Scholar
  139. 139.
    Sang EFTK, Buchholz S (2000) Introduction to the CoNLL-2000 shared task: Chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 127-132.Google Scholar
  140. 140.
    Savova GK, Coden AR, Sominsky IL, Johnson R, Ogren PV, de Groen PC, Chute CG (2008) Word sense disambiguation across two domains: Biomedical literature and clinical notes. J Biomed Inform, 41(6):1088-1100.CrossRefGoogle Scholar
  141. 141.
    Schulz S, Hahn U (2000) Morpheme-based, cross-lingual indexing for medical document retrieval. Int J Med Inform, 58-59:87-99.CrossRefGoogle Scholar
  142. 142.
    Schulz S, Honeck M, Hahn U (2002) Biomedical text retrieval in languages with a complex morphology. Proc Workshop on NLP in the Biomedical Domain, pp 61-68.Google Scholar
  143. 143.
    Skut W, Brants T (1998) Chunk tagger: Statistical recognition of noun phrases. Proc ESSLLI-1998 Workshop on Automated Acquisition of Syntax and Parsing.Google Scholar
  144. 144.
    Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C (2005) Relations in biomedical ontologies. Genome Biol, 6(5):R46.CrossRefGoogle Scholar
  145. 145.
    Smith L, Rindflesch T, Wilbur WJ (2004) MedPost: A part-of-speech tagger for biomedical text. Bioinformatics, 20(14):2320-2321.CrossRefGoogle Scholar
  146. 146.
    Strzalkowski T (1999) Natural Language Information Retrieval. Kluwer Academic, Boston, MA.Google Scholar
  147. 147.
    Sweeney L (1996) Replacing personally-identifying information in medical records: The Scrub system. Proc AMIA Symp, pp 333-337.Google Scholar
  148. 148.
    Taira R, Bui AA, Hsu W, Bashyam V, Dube S, Watt E, Andrada L, El-Saden S, Cloughesy T, Kangarloo H (2008) A tool for improving the longitudinal imaging characterization for neuro-oncology cases. Proc AMIA Symp, pp 712-716.Google Scholar
  149. 149.
    Taira RK, Bui AA, Kangarloo H (2002) Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp, pp 757-761.Google Scholar
  150. 150.
    Tang M, Luo X, Roukos S (2002) Active learning for statistical natural language parsing. Proc 40th Ann Meeting Assoc Computational Linguistics, Philadelphia, PA, pp 120-127.Google Scholar
  151. 151.
    Taskar B, Klein D, Collins M, Koller D, Manning C (2004) Max-margin parsing. Proc Empirical Methods in Natural Language Processing.Google Scholar
  152. 152.
    Tersmette S, Moore M (1988) Boundary word techniques for isolating multiword terminologies. Proc Ann Symp Computer Applications in Medical Care, pp 207-211.Google Scholar
  153. 153.
    Thede SM, Harper MP (1999) A second-order hidden Markov model for part-of-speech tagging. Proc 37th Annual Meeting ACL on Computational Linguistics, pp 175-182.Google Scholar
  154. 154.
    Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. Proc 16th Intl Machine Learning Conf, Bled, Slovenia, pp 406-414.Google Scholar
  155. 155.
    Tjong EF, Sang K (2000) Noun phrase recognition by system combination. Proc 1st Meeting of the North American Chapter for the Association for Computational Linguistics, pp 50-55.Google Scholar
  156. 156.
    Tolentino HD, Matters MD, Walop W, Law B, Tong W, Liu F, Fontelo P, Kohl K, Payne DC (2007) A UMLS-based spell checker for natural language processing in vaccine safety. BMC Med Inform Decis Mak, 7:3.CrossRefGoogle Scholar
  157. 157.
    Tomanek K, Wermter J, Hahn U (2007) A reappraisal of sentence and token splitting for life sciences documents. Stud Health Technol Inform, 129(Pt 1):524-528.Google Scholar
  158. 158.
    Trieschnigg D, Kraaij W, de Jong F (2007) The influence of basic tokenization on biomedical document retrieval. Proc 30th Ann Intl ACM SIGIR Conf Research and Development in Information Retrieval, pp 803-804.Google Scholar
  159. 159.
    Uzuner O, Luo Y, Szolovits P (2007) Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc, 14(5):550-563.CrossRefGoogle Scholar
  160. 160.
    van den Bosch A, Buchholz S (2001) Shallow parsing on the basis of words only: A case study. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 433-440.Google Scholar
  161. 161.
    Veenstra J, Van den Bosch A (2000) Single-classifier memory-based phrase chunking. Proc CoNLL, Lisbon, Portugal, pp 157-159.Google Scholar
  162. 162.
    Vilain M, Day D (2000) Phrase parsing with rule sequence processors: An application to the shared CoNLL task. Proc CoNLL-2000 and LLL-2000, pp 160-162.Google Scholar
  163. 163.
    Weeber M, Mork JG, Aronson AR (2001) Developing a test collection for biomedical word sense disambiguation. Proc AMIA Symp, pp 746-750.Google Scholar
  164. 164.
    Xiao J, Wang X, Liu B (2007) The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task. ACM Trans Asian Language Inforamtion Processing, 6(2):1-29.Google Scholar
  165. 165.
    Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. Proc 33rd Annual Meeting Assoc Computational Linguistics, pp 189-196.Google Scholar
  166. 166.
    Yu H, Hripcsak G, Friedman C (2002) Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc, 9(3):262-272.CrossRefGoogle Scholar
  167. 167.
    Zeng QT, Tse T (2006) Exploring and developing consumer health vocabularies. J Am Med Inform Assoc, 13(1):24-29.CrossRefGoogle Scholar
  168. 168.
    Zhou GD, Su J, Tey TG (2000) Hybrid text chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 163-165.Google Scholar
  169. 169.
    Zitouni I (2007) Backoff hierarchical class n-gram language models: Effectiveness to model unseen events in speech recognition. Computer Speech and Language, 21(1):88-104.CrossRefGoogle Scholar
  170. 170.
    Zou Q, Chu WW, Morioka C, Leazer GH, Kangarloo H (2003) IndexFinder: A method of extracting key concepts from clinical texts for indexing. Proc AMIA Symp, pp 763-767.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Medical Imaging Informatics Group Department of Radiological SciencesDavid Geffen School of Medicine University of California, Los AngelesLos AngelesUSA

Personalised recommendations