Advertisement

Two Applications of Statistical Modelling to Natural Language Processing

  • William DuMouchel
  • Carol Friedman
  • George Hripcsak
  • Stephen B. Johnson
  • Paul D. Clayton
Part of the Lecture Notes in Statistics book series (LNS, volume 112)

Abstract

Each week the Columbia-Presbyterian Medical Center collects several megabytes of English text transcribed from radiologists’ dictation and notes of their interpretations of medical diagnostic x-rays. It is desired to automate the extraction of diagnoses from these natural language reports. This paper reports on two aspects of this project requiring advanced statistical methods. First, the identification of pairs of words and phrases that tend to appear together (collocate) uses a hierarchical Bayesian model that adjusts to different word and word pair distributions in different bodies of text. Second, we present an analysis of data from experiments to compare the performance of the computer diagnostic program to that of a panel of physician and lay readers of randomly sampled texts. A measure of inter-subject distance with respect to the diagnoses is defined for which estimated variances and covariances are easily computed. This allows statistical conclusions about the similarities and dissimilarities among diagnoses by the various programs and experts.

Keywords

Chronic Obstructive Pulmonary Disease Natural Language Processing Word Pair Clinical Information System Hierarchical Bayesian Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Dillon84]
    Dillon W, Goldstein M (1984) Multivariate Analysis, New York: Wiley, 587pp.zbMATHGoogle Scholar
  2. [Dunning93]
    Dunning, Ted (1993) Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19: 61–74.Google Scholar
  3. [Friedman95]
    Friedman C, Hripcsak G, DuMouchel W, Johnson S, Clayton P (1995) Natural language processing in an operational clinical information system, Natural Language Engineering 1 (1): 1–28.CrossRefGoogle Scholar
  4. [Hripcsak95]
    Hripcsak G, Friedman C, Alderson P, DuMouchel W, Johnson S, Clayton P (1995) Unlocking clinical data from narrative reports: a study of natural language processing. Annals of Internal Medicine, 122: 681–688.Google Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • William DuMouchel
    • 1
  • Carol Friedman
    • 2
  • George Hripcsak
    • 1
  • Stephen B. Johnson
    • 1
  • Paul D. Clayton
    • 1
  1. 1.Department of Medical InformaticsColumbia UniversityNew YorkUSA
  2. 2.Department of Computer ScienceQueens College, CUNYFlushingUSA

Personalised recommendations