A Lightweight Approach for Extracting Disease-Symptom Relation with MetaMap toward Automated Generation of Disease Knowledge Base

  • Takashi Okumura
  • Yuka Tateisi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7231)


Diagnostic decision support systems necessitate disease knowledge base, and this part may occupy dominant portion in the total development cost of such systems. Accordingly, toward automated generation of disease knowledge base, we conducted a preliminary study for efficient extraction of symptomatic expressions, utilizing MetaMap, a tool for assigning UMLS (Unified Medical Language System) semantic tags onto phrases in a given medical literature text.

We first utilized several tags in the MetaMap output, related to symptoms and findings, for extraction of symptomatic terms. This straightforward approach resulted in Recall 82% and Precision 64%. Then, we applied a heuristics that exploits certain patterns of tag sequences that frequently appear in typical symptomatic expressions. This simple approach achieved 7% recall gain, without sacrificing precision.

Although the extracted information requires manual inspection, the study suggested that the simple approach can extract symptomatic expressions, at very low cost. Failure analysis of the output was also performed to further improve the performance.


Symptomatic Expression Free Text Format Lightweight Approach Clinical Synopsis Diagnostic Decision Support System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: AMIA Annual Symposium, pp. 17–21 (2001)Google Scholar
  2. 2.
    Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17(3), 229–236 (2010)Google Scholar
  3. 3.
    Bashyam, V., Divita, G., Bennett, D.B., Browne, A.C., Taira, R.K.: A normalized lexical lookup approach to identifying UMLS concepts in free text. Studies in Health Technology and Informatics 129(Pt 1), 545–549 (2007)Google Scholar
  4. 4.
    Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(Database issue), D267–D270 (2004)Google Scholar
  5. 5.
    Cantor, M.N., Lussier, Y.A.: Mining OMIM for insight into complex diseases. Studies in Health Technology and Informatics 107(Pt 2), 753–757 (2004)Google Scholar
  6. 6.
    Chapman, W.W., Fiszman, M., Dowling, J.N., Chapman, B.E., Rindflesch, T.C.: Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. Studies in Health Technology and Informatics 107(Pt 1), 487–491 (2004)Google Scholar
  7. 7.
    Cohen, R., Gefen, A., Elhadad, M., Birk, O.S.: CSI-OMIM–Clinical Synopsis Search in OMIM. BMC Bioinformatics 12, 65 (2011)CrossRefGoogle Scholar
  8. 8.
    Divita, G., Tse, T., Roth, L.: Failure analysis of MetaMap Transfer (MMTx). Studies in Health Technology and Informatics 107(Pt 2), 763–767 (2004)Google Scholar
  9. 9.
    Gschwandtner, T., Kaiser, K., Martini, P., Miksch, S.: Easing semantically enriched information retrieval-an interactive semi-automatic annotation system for medical documents. International Journal of Human-Computer Studies 68(6), 370–385 (2010)CrossRefGoogle Scholar
  10. 10.
    INSERM SC11: Orphanet,
  11. 11.
    Jimeno, A., Jimenez-Ruiz, E., Lee, V., Gaudan, S., Berlanga, R., Rebholz-Schuhmann, D.: Assessment of disease named entity recognition on a corpus of annotated sentences. BMC Bioinformatics 9(suppl. 3), S3 (2008)Google Scholar
  12. 12.
    John Hopkins University: OMIM: Online Mendelian Inheritance in Man,
  13. 13.
    Meystre, S., Haug, P.J.: Evaluation of medical problem extraction from electronic clinical documents using MetaMap Transfer (MMTx). Studies in Health Technology and Informatics 116, 823–828 (2005)Google Scholar
  14. 14.
    Miller, R.A.: Computer-assisted diagnostic decision support: history, challenges, and possible paths forward. Adv. in Health Sci. Educ. 14, 89–106 (2009)CrossRefGoogle Scholar
  15. 15.
    Osborne, J.D., Lin, S., Zhu, L., Kibbe, W.A.: Mining biomedical data using MetaMap Transfer (MMTx) and the Unified Medical Language System (UMLS). Methods in Molecular Biology 408, 153–169 (2007)CrossRefGoogle Scholar
  16. 16.
    Pratt, W., Yetisgen-Yildiz, M.: A study of biomedical concept identification: MetaMap vs. people. In: AMIA Annual Symposium, pp. 529–533 (2003)Google Scholar
  17. 17.
    Segura-Bedmar, I., Martinez, P., Segura-Bedmar, M.: Drug name recognition and classification in biomedical texts. a case study outlining approaches underpinning automated systems. Drug Discovery Today 13(17-18), 816–823 (2008)CrossRefGoogle Scholar
  18. 18.
    Sneiderman, C.A., Rindflesch, T.C., Aronson, A.R.: Finding the findings: identification of findings in medical literature using restricted natural language processing. In: AMIA Annual Fall Symposium, pp. 239–243 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Takashi Okumura
    • 1
  • Yuka Tateisi
    • 1
  1. 1.National Institute of Public HealthSaitamaJapan

Personalised recommendations