Acquisition of Elementary Synonym Relations from Biological Structured Terminology

  • Thierry Hamon
  • Natalia Grabar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4919)

Abstract

Acquisition and enrichment of lexical resources have long been acknowledged as an important research in the area of computational linguistics. Nevertheless, we notice that such resources, particularly in specialised domains, are missing. However, specialised domains, i.e. biomedicine, propose several structured terminologies. In this paper, we propose a high-quality method for exploiting a structured terminology and inferring a specialised elementary synonym lexicon. The method is based on the analysis of syntactic structure of complex terms. We evaluate the approach on the biomedical domain by using the terminological resource Gene Ontology. It provides results with over 93% precision. Comparison with an existing synonym resource (the general-language resource WordNet) shows that there is a very small overlap between the induced lexicon of synonyms and the WordNet synsets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brill, E.: A Corpus-Based Approach to Language Learning. PhD thesis, University of Pennsylvania, Philadelphia (1993)Google Scholar
  2. 2.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)Google Scholar
  3. 3.
    Namer, F.: FLEMM: un analyseur flexionnel du français á base de règles. Traitement Automatique des Langues (TAL) 41(2), 523–547 (2000)Google Scholar
  4. 4.
    Burnage, G.: CELEX - A Guide for Users. Centre for Lexical Information, University of Nijmegen (1990)Google Scholar
  5. 5.
    Hathout, N., Namer, F., Dal, G.: An experimental constructional database: the MorTAL project. In: Boucher, P. (ed.) Morphology book, Cascadilla Press, Cambridge (2001)Google Scholar
  6. 6.
    NLM: UMLS Knowledge Sources Manual. National Library of Medicine, Bethesda, Maryland (2007), www.nlm.nih.gov/research/umls/
  7. 7.
    Schulz, S., et al.: Towards a multilingual morpheme thesaurus for medical free-text retrieval. In: Medical Informatics in Europe (MIE) (1999)Google Scholar
  8. 8.
    Zweigenbaum, P., et al.: Towards a Unified Medical Lexicon for French. In: Medical Informatics in Europe (MIE) (2003)Google Scholar
  9. 9.
    Fellbaum, C.: A semantic network of english: the mother of all WordNets. Computers and Humanities. EuroWordNet: a multilingual database with lexical semantic network 32(2–3), 209–220 (1998)Google Scholar
  10. 10.
    Smith, B., Fellbaum, C.: Medical wordnet: a new methodology for the construction and validation of information. In: Proc. of 20th CoLing, Geneva, Switzerland, pp. 371–382 (2004)Google Scholar
  11. 11.
    Hamon, T., Nazarenko, A.: Detection of synonymy links between terms: experiment and results. In: Recent Advances in Computational Terminology, pp. 185–208. John Benjamins (2001)Google Scholar
  12. 12.
    Gene Ontology Consortium: Creating the Gene Ontology resource: design and implementation. Genome Research 11, 1425–1433 (2001)Google Scholar
  13. 13.
    Partee, B.H.: In: Compositionality. F. Landman and F. Veltman (1984)Google Scholar
  14. 14.
    Ogren, P., et al.: The compositional structure of Gene Ontology terms. In: Pacific Symposium of Biocomputing, pp. 214–225 (2004)Google Scholar
  15. 15.
    Hamon, T., et al.: A robust linguistic platform for efficient and domain specific web content analysis. In: RIAO 2007, Pittsburgh, USA (2007)Google Scholar
  16. 16.
    Berroyer, J.F.: Tagen, un analyseur d”entits nommes: conception, développement et valuation. Mémoire de D.E.A. d’intelligence artificielle, Universit Paris-Nord (2004)Google Scholar
  17. 17.
    Tsuruoka, Y., et al.: Developing a robust part-of-speech tagger for biomedical text. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 382–392. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Aubin, S., Hamon, T.: Improving term extraction with terminological resources. In: Salakoski, T., et al. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 380–387. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Verspoor, C.M., Joslyn, C., Papcun, G.J.: The gene ontology as a source of lexical semantic knowledge for a biological natural language processing application. In: SIGIR workshop on Text Analysis and Search for Bioinformatics, pp. 51–56 (2003)Google Scholar
  20. 20.
    Ogren, P., Cohen, K., Hunter, L.: Implications of compositionality in the Gene Ontology for its curation and usage. In: Pacific Symposium of Biocomputing, pp. 174–185 (2005)Google Scholar
  21. 21.
    Cruse, D.A.: Lexical Semantics. Cambridge University Press, Cambridge (1986)Google Scholar
  22. 22.
    Grabar, N., Zweigenbaum, P.: Utilisation de corpus de spécialité pour le filtrage de synonymes de la langue générale. In: Traitement Automatique de Langues Naturelles (TALN) (2005)Google Scholar
  23. 23.
    Bodenreider, O., Burgun, A.: Characterizing the definitions of anatomical concepts in WordNet and specialized sources. In: Proceedings of the First Global WordNet Conference, pp. 223–230 (2002)Google Scholar
  24. 24.
    Bodenreider, O., Burgun, A., Mitchell, J.A.: Evaluation of WordNet as a source of lay knowledge for molecular biology and genetic diseases: a feasibility study. In: Medical Informatics in Europe (MIE), pp. 379–384 (2003)Google Scholar
  25. 25.
    National Library of Medicine Bethesda, Maryland: Medical Subject Headings (2001), http://www.nlm.nih.gov/mesh/meshhome.html
  26. 26.
    Côté, R.A.: Répertoire d’anatomopathologie de la SNOMED internationale, v3.4. Université de Sherbrooke, Sherbrooke, Québec (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Thierry Hamon
    • 1
  • Natalia Grabar
    • 2
  1. 1.LIPN – UMR 7030Université Paris 13 – CNRSVilletaneuseFrance
  2. 2.Université Paris Descartes, UMR_S 872, Paris, F-75006 France, INSERM, U872, Paris, F-75006France

Personalised recommendations