Text Mining for Medical Documents Using a Hidden Markov Model

  • Hyeju Jang
  • Sa Kwang Song
  • Sung Hyon Myaeng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4182)


We propose a semantic tagger that provides high level concept information for phrases in clinical documents. It delineates such information from the statements written by doctors in patient records. The tagging, based on Hidden Markov Model (HMM), is performed on the documents that have been tagged with Unified Medical Language System (UMLS), Part-of-Speech (POS), and abbreviation tags. The result can be used to extract clinical knowledge that can support decision making or quality assurance of medical treatment.


Unify Medical Language System Training Corpus High Level Concept Clinical Document Medical Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rabiner, L.R., et al.: An Introduction to Hidden Markov Models. IEEE ASSP Magazine (1986)Google Scholar
  2. 2.
    van Guilder, L.: Automated Part of Speech Tagging:A Brief Overview. In: Handout for LING, vol. 361 (1995)Google Scholar
  3. 3.
    Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. In: Computer Speech and Language, pp. 225–242 (1992)Google Scholar
  4. 4.
    Cutting, D., et al.: A Practical Part-of-Speech Tagger. In: Proceedings of the 3rd ACL, pp. 133–140 (1992)Google Scholar
  5. 5.
    Ruch, P.: MEDTAG: Tag-like Semantics for Medical Document Indexing. In: Proceedings of AMIA 1999, pp. 35–42 (1999)Google Scholar
  6. 6.
    Johnson, S.B.: A Semantic Lexicon for Medical Language Processing. JAMIA 6(3), 205–218 (1999)Google Scholar
  7. 7.
    Hahn, U.: Tagging Medical Documents with High Accuracy. In: Pacific Rim International Conference on Artificial Intelligence Auckland, Newzealand, pp. 852–861 (2004)Google Scholar
  8. 8.
    Paulussen, H.: DILEMMA-2: A Lemmatizer-Tagger for Medical Abstracts. In: Proceeings of ANLP, pp. 141–146 (1992)Google Scholar
  9. 9.
    Friedman, C.: Automatic Structuring of Sublanguage Information, pp. 85–102. IEA, London (1986)Google Scholar
  10. 10.
    Chi, E.C., et al.: Processing Free-text Input to Obtain a Database of Medical Information. In: Proceedings of the 8th Annual ACM-SIGIR Conference (1985)Google Scholar
  11. 11.
    Hahn, U.: Automatic Knowledge Acquisition from Medical Texts. In: Proceedings of the 1996 AMIA Annual Fall Symposium, pp. 383–387 (1996)Google Scholar
  12. 12.
  13. 13.
    Elworthy, D.: Does Baum-Welch Re-estimation Help Taggers? In: Proceedings of the 27th ACL (1989)Google Scholar
  14. 14.
    Merialdo, B.: Tagging English Text with a Probabilistic Model. Computational Linguistics 20(2), 155–172 (1994)Google Scholar
  15. 15.
    Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions of Information Theory 13, 260–269 (1967)MATHCrossRefGoogle Scholar
  16. 16.
    Baum, L.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities 3, 1–8 (1972)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hyeju Jang
    • 1
  • Sa Kwang Song
    • 2
  • Sung Hyon Myaeng
    • 1
  1. 1.Department of Computer ScienceInformation and Communications UniversityDaejeonKorea
  2. 2.Electronics and Telecommunications Research InstituteDaejeonKorea

Personalised recommendations