Skip to main content

Massive Biomedical Term Discovery

  • Conference paper
Discovery Science (DS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3735))

Included in the following conference series:

Abstract

Most technical and scientific terms are comprised of complex, multi-word noun phrases but certainly not all noun phrases are technical or scientific terms. The distinction of specific terminology from common non-specific noun phrases can be based on the observation that terms reveal a much lesser degree of distributional variation than non-specific noun phrases. We formalize the limited paradigmatic modifiability of terms and, subsequently, test the corresponding algorithm on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using an already existing and community-wide curated biomedical terminology as an evaluation gold standard, we show that our algorithm significantly outperforms standard term identification measures and, therefore, qualifies as a high-performant building block for any terminology identification system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nakagawa, H., Mori, T.: Nested collocation and compound noun for term recognition. In: COMPUTERM 1998 – Proceedings of the First Workshop on Comutational Terminology, pp. 64–70 (1998)

    Google Scholar 

  2. Hersh, W.R., Campbell, E., Evans, D., Brownlow, N.: Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools. In: Cimino, J.J. (ed.) AMIA 1996 – Proceedings of the 1996 AMIA Annual Fall Symposium (formerly SCAMC). Beyond the Superhighway: Exploiting the Internet with Medical Informatics, Washington, D.C., October 26-30, pp. 159–163. Hanley & Belfus, Philadelphia (1996)

    Google Scholar 

  3. Rindflesch, T.C., Hunter, L., Aronson, A.R.: Mining molecular binding terminology from biomedical text. In: Lorenzi, N.M. (ed.) AMIA 1999 – Proceedings of the 1999 Annual Symposium of the American Medical Informatics Association. Transforming Health Care through Informatics: Cornerstones for a New Information Management Paradigm, Washington, D.C., November 6-10, pp. 127–131. Hanley & Belfus, Philadelphia (1999)

    Google Scholar 

  4. Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7, 239–257 (2002)

    Article  Google Scholar 

  5. Bodenreider, O., Rindflesch, T.C., Burgun, A.: Unsupervised, corpus-based method for extending a biomedical terminology. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, pp. 53–60, Pittsburgh, Association for Computational Linguistics (2002)

    Google Scholar 

  6. Nenadić, G., Spasic, I., Ananiadou, S.: Terminology-driven mining of biomedical literature. Journal of Biomedical Informatics 33, 1–6 (2003)

    Google Scholar 

  7. Krauthammer, M., Nenadić, G.: Term identification in the biomedical literature. Journal of Biomedical Informatics 37, 512–526 (2004)

    Article  Google Scholar 

  8. Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word-terms: the C/NC value method. International Journal of Digital Libraries 3, 115–130 (2000)

    Article  Google Scholar 

  9. Nenadić, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: COLING 2004 – Proceedings of the 20th International Conference on Computational Linguistics, pp. 604–610. Association for Computational Linguistics (2004)

    Google Scholar 

  10. Damerau, F.J.: Generating and evaluating domain-oriented multi-word terms from text. Information Processing & Management 29, 433–447 (1993)

    Article  Google Scholar 

  11. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. Bradford Book & MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  12. Evert, S., Krenn, B.: Methods for the qualitative evaluation of lexical association measures. In: ACL 2001 – Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, pp. 188–195 (2001)

    Google Scholar 

  13. Wermter, J., Hahn, U.: Collocation extraction based on modifiability statistics. In: COLING Geneva 2004 – Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, August 23-27, 2004, vol. 2, pp. 980–986. Association for Computational Linguistics (2004)

    Google Scholar 

  14. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: NAACL 2001, Language Technologies 2001 – Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA, USA, June 2-7, pp. 192–199 (2001)

    Google Scholar 

  15. Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1, 9–27 (1995)

    Article  Google Scholar 

  16. Browne, A.C., Divita, G., Nguyen, V., Cheng, V.C.: Modular text processing system based on the Specialist lexicon and lexical tools. In: Chute, C.G. (ed.) AMIA 1998 – Proceedings of the 1998 AMIA Annual Fall Symposium. A Paradigm Shift in Health Care Information Systems: Clinical Infrastructures for the 21st Century, Orlando, FL, November 7-11, p. 982. Hanley & Belfus, Philadelphia (1998)

    Google Scholar 

  17. UMLS: Unified Medical Language System. National Library of Medicine, Bethesda (2004)

    Google Scholar 

  18. MESH: Medical Subject Headings. National Library of Medicine, Bethesda (2004)

    Google Scholar 

  19. Sachs, L.: Applied Statistics: A Handbook of Techniques, 2nd edn. Springer, New York (1984)

    MATH  Google Scholar 

  20. Mima, H., Ananiadou, S., Nenadić, G.: The ATRACT workbench: Automatic term recognition and clustering for terms. In: Matusek, V. (ed.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 126–133. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  21. Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette, F.B., Rapp, B.A., Wheeler, D.L.: GENBANK. Nucleic Acids Research 27, 12–17 (1999)

    Article  Google Scholar 

  22. Gene Ontology Consortium: Creating the Gene Ontology resource: Design and implementation. Genome Research 11, 1425–1433 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wermter, J., Hahn, U. (2005). Massive Biomedical Term Discovery. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_24

Download citation

  • DOI: https://doi.org/10.1007/11563983_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29230-2

  • Online ISBN: 978-3-540-31698-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics