Automatic Extraction for Creating a Lexical Repository of Abbreviations in the Biomedical Literature

  • Min Song
  • Il-Yeol Song
  • Ki Jung Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4081)


The sheer volume of biomedical text is growing at an exponential rate. This growth creates challenges for both human readers and automatic text processing algorithms. One such challenge arises from common and uncontrolled usages of abbreviations in the biomedical literature. This, in turn, requires that biomedical lexical ontologies be continuously updated. In this paper, we propose a hybrid approach combining lexical analysis techniques and the Support Vector Machine (SVM) to create an automatically generated and maintained lexicon of abbreviations. The proposed technique is differentiated from others in the following aspects: 1) It incorporates lexical analysis techniques to supervised learning for extracting abbreviations. 2) It makes use of text chunking techniques to identify long forms of abbreviations. 3) It significantly improves Recall compared to other techniques. The experimental results show that our approach outperforms the leading abbreviation algorithms, ExtractAbbrev and ALICE, at least by 6% and 13.9%, respectively, in both Precision and Recall on the Gold Standard Development corpus.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ao, H., Takagi, T.: ALICE: An algorithm to extract abbreviations from MEDLINE. Journal of the American Medical Informatics Association 12, 576–586 (2005)CrossRefGoogle Scholar
  2. 2.
    Aronson, A.R.: Effective Mapping of Biomedical Text to the UMLS Metathesaurus: the MetaMap Program. In: Proceedings of the AMIA Symposium, pp. 17–21 (2001)Google Scholar
  3. 3.
    Chang, J.T., Schütze, H., Altman, R.B.: Creating an Online Dictionary of Abbreviations from MEDLINE. The Journal of the American Medical Informatics Association 9, 612–620 (2002)CrossRefGoogle Scholar
  4. 4.
    Cohen, A., Hersh, W.: A Survey of Current Work in Biomedical Text Mining. Briefing in Bioinformatics 6, 57–71 (2005)CrossRefGoogle Scholar
  5. 5.
    Cortes, C., Vapnik, V.: Support-vector Networks. Machine Learning 20, 273–297 (1995)zbMATHGoogle Scholar
  6. 6.
    Kudo, T., Matsumoto, Y.: Use of Support Vector Learning for Chunk Identification. In: Proceedings of the CoNLL-2000 and LLL-2000, pp. 142–144 (2000)Google Scholar
  7. 7.
    Liu, H., Aronson, A.R., Friedman, C.: A Study of Abbreviations in MEDLINE Abstracts. In: Proceedings of the AMIA Annual Fall Symposium, pp. 64–69 (2002)Google Scholar
  8. 8.
    Liu, H., Friedman, C.: Mining Terminological Knowledge in Large Biomedical Corpora. Proceedings of the Pacific Symposium on Biocomputing 8, 415–426 (2003)Google Scholar
  9. 9.
    Schwartz, A.S., Hearst, M.A.: A simple algorithm for identifying abbreviation definitions in biomedical text. Proceedings of the Pacific Symposium on Biocomputing 8, 451–462 (2003)Google Scholar
  10. 10.
    Yu, H., Hripcsak, G., Friedman, C.: Mapping abbreviations to full forms in biomedical articles. Journal of the American Medical Informatics Association 9, 162–172 (2002)CrossRefGoogle Scholar
  11. 11.
    Yu, Z., Tsuruoka, Y., Tsujii, J.: Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using Support Vector Machines and One Sense Per Discourse Hypothesis. In: Proceedings of the SIGIR (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Min Song
    • 1
  • Il-Yeol Song
    • 2
  • Ki Jung Lee
    • 2
  1. 1.Information Systems DepartmentNew Jersey Institute of Technology, University HeightsNewark
  2. 2.College of Information Science & TechnologyDrexel UniversityPhiladelphia

Personalised recommendations