Massive Biomedical Term Discovery

Wermter, Joachim; Hahn, Udo

doi:10.1007/11563983_24

Joachim Wermter²¹ &
Udo Hahn²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3735))

Included in the following conference series:

International Conference on Discovery Science

719 Accesses
1 Citations

Abstract

Most technical and scientific terms are comprised of complex, multi-word noun phrases but certainly not all noun phrases are technical or scientific terms. The distinction of specific terminology from common non-specific noun phrases can be based on the observation that terms reveal a much lesser degree of distributional variation than non-specific noun phrases. We formalize the limited paradigmatic modifiability of terms and, subsequently, test the corresponding algorithm on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using an already existing and community-wide curated biomedical terminology as an evaluation gold standard, we show that our algorithm significantly outperforms standard term identification measures and, therefore, qualifies as a high-performant building block for any terminology identification system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nakagawa, H., Mori, T.: Nested collocation and compound noun for term recognition. In: COMPUTERM 1998 – Proceedings of the First Workshop on Comutational Terminology, pp. 64–70 (1998)
Google Scholar
Hersh, W.R., Campbell, E., Evans, D., Brownlow, N.: Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools. In: Cimino, J.J. (ed.) AMIA 1996 – Proceedings of the 1996 AMIA Annual Fall Symposium (formerly SCAMC). Beyond the Superhighway: Exploiting the Internet with Medical Informatics, Washington, D.C., October 26-30, pp. 159–163. Hanley & Belfus, Philadelphia (1996)
Google Scholar
Rindflesch, T.C., Hunter, L., Aronson, A.R.: Mining molecular binding terminology from biomedical text. In: Lorenzi, N.M. (ed.) AMIA 1999 – Proceedings of the 1999 Annual Symposium of the American Medical Informatics Association. Transforming Health Care through Informatics: Cornerstones for a New Information Management Paradigm, Washington, D.C., November 6-10, pp. 127–131. Hanley & Belfus, Philadelphia (1999)
Google Scholar
Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7, 239–257 (2002)
Article Google Scholar
Bodenreider, O., Rindflesch, T.C., Burgun, A.: Unsupervised, corpus-based method for extending a biomedical terminology. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, pp. 53–60, Pittsburgh, Association for Computational Linguistics (2002)
Google Scholar
Nenadić, G., Spasic, I., Ananiadou, S.: Terminology-driven mining of biomedical literature. Journal of Biomedical Informatics 33, 1–6 (2003)
Google Scholar
Krauthammer, M., Nenadić, G.: Term identification in the biomedical literature. Journal of Biomedical Informatics 37, 512–526 (2004)
Article Google Scholar
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word-terms: the C/NC value method. International Journal of Digital Libraries 3, 115–130 (2000)
Article Google Scholar
Nenadić, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: COLING 2004 – Proceedings of the 20th International Conference on Computational Linguistics, pp. 604–610. Association for Computational Linguistics (2004)
Google Scholar
Damerau, F.J.: Generating and evaluating domain-oriented multi-word terms from text. Information Processing & Management 29, 433–447 (1993)
Article Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. Bradford Book & MIT Press, Cambridge (1999)
MATH Google Scholar
Evert, S., Krenn, B.: Methods for the qualitative evaluation of lexical association measures. In: ACL 2001 – Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, pp. 188–195 (2001)
Google Scholar
Wermter, J., Hahn, U.: Collocation extraction based on modifiability statistics. In: COLING Geneva 2004 – Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, August 23-27, 2004, vol. 2, pp. 980–986. Association for Computational Linguistics (2004)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: NAACL 2001, Language Technologies 2001 – Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA, USA, June 2-7, pp. 192–199 (2001)
Google Scholar
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1, 9–27 (1995)
Article Google Scholar
Browne, A.C., Divita, G., Nguyen, V., Cheng, V.C.: Modular text processing system based on the Specialist lexicon and lexical tools. In: Chute, C.G. (ed.) AMIA 1998 – Proceedings of the 1998 AMIA Annual Fall Symposium. A Paradigm Shift in Health Care Information Systems: Clinical Infrastructures for the 21st Century, Orlando, FL, November 7-11, p. 982. Hanley & Belfus, Philadelphia (1998)
Google Scholar
UMLS: Unified Medical Language System. National Library of Medicine, Bethesda (2004)
Google Scholar
MESH: Medical Subject Headings. National Library of Medicine, Bethesda (2004)
Google Scholar
Sachs, L.: Applied Statistics: A Handbook of Techniques, 2nd edn. Springer, New York (1984)
MATH Google Scholar
Mima, H., Ananiadou, S., Nenadić, G.: The ATRACT workbench: Automatic term recognition and clustering for terms. In: Matusek, V. (ed.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 126–133. Springer, Heidelberg (2001)
Chapter Google Scholar
Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette, F.B., Rapp, B.A., Wheeler, D.L.: GENBANK. Nucleic Acids Research 27, 12–17 (1999)
Article Google Scholar
Gene Ontology Consortium: Creating the Gene Ontology resource: Design and implementation. Genome Research 11, 1425–1433 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Language and Information Engineering (Julie) Lab, Jena University, Fürstengraben 30, Jena, 07743, Germany
Joachim Wermter & Udo Hahn

Authors

Joachim Wermter
View author publications
You can also search for this author in PubMed Google Scholar
Udo Hahn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science & Engineering, The University of New South Wales, Sydney, Australia
Achim Hoffmann
Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, 567-0047, Ibaraki, Osaka, Japan
Hiroshi Motoda
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wermter, J., Hahn, U. (2005). Massive Biomedical Term Discovery. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_24

Download citation

DOI: https://doi.org/10.1007/11563983_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics