Advertisement

Improving Term Extraction with Terminological Resources

  • Sophie Aubin
  • Thierry Hamon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)

Abstract

Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. Facing the difficulty or impossibility to customize existing tools, we developed a tunable term extractor. It exploits linguistic-based rules in combination with the reuse of existing terminologies, i.e. exogenous disambiguation. Experiments reported here show that the combination of the two strategies allows the extraction of a greater number of term candidates with a higher level of reliability. We further describe the extraction process involving both endogenous and exogenous disambiguation implemented in the term extractor \(\rm Y\kern-.36em \lower.7ex\hbox{A}\kern-.25em T\kern-.1667em\lower.7ex\hbox{E}\kern-.08emA\).

Keywords

Noun Phrase Term Candidate Term Extractor Content Coverage Nominal Phrase 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Daille, B.: Conceptual structuring through term variations. In: Bond, F., Kohonen, A., Carthy, D.M., Villaciencio, A. (eds.) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition, and Treatment, pp. 9–16 (2003)Google Scholar
  2. 2.
    Bourigault, D.: An endogeneous corpus-based method for structural noun phrase disambiguation. In: Proceedings of the EACL 1993, Utrecht, The Netherlands, pp. 81–86 (1993)Google Scholar
  3. 3.
    Bourigault, D., Fabre, C.: Approche linguistique pour l’analyse syntaxique de corpus. Cahiers de Grammaire (25), 131–151 (2000)Google Scholar
  4. 4.
    Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic term detection: a review of current systems. In: Recent Advances in Computational Terminology, John Benjamins, Amsterdam, Philadelphia (2001)Google Scholar
  5. 5.
    Aubin, S.: Recommandations sur l’utilisation des outils terminologiques. Technical report, Projet ExtraPloDocs (2003), http://www-lipn.univ-paris13.fr/~poibeau/Extra/D31b.pdf
  6. 6.
    Chute, C.G., Cohn, S.P., Campbell, K.E., Olivier, D.E., Campbell, J.R.: The content coverage of clinical classifications. Journal of American Medical Informatics Association 3, 224–233 (1996)Google Scholar
  7. 7.
    McCray, A.T., Browne, A.C., Bodenreider, O.: The lexical properties of the gene ontology (GO). In: Proceedings of the AMIA 2002 Annual Symposium, pp. 504–508 (2002)Google Scholar
  8. 8.
    Bodenreider, O., Rindflesch, T.C., Burgun, A.: Unsupervised, corpus-based method for extending a biomedical terminology. In: Workshop on Natural Language Processing in the Biomedical Domain (ACL 2002), pp. 53–60 (2002)Google Scholar
  9. 9.
    Hamon, T.: Indexer les documents spécialisés : les ressources terminologiques contrôlées sont-elles suffisantes? In: 6eme rencontres Terminologie et Intelligence Artificielle, Rouen, France, pp. 71–82 (2005)Google Scholar
  10. 10.
    Enguehard, C., Malvache, P., Trigano, P.: Indexation de textes: l’apprentissage des concepts. In: Proceedings of COLING 1992, Nantes, France, pp. 1197–1202 (1992)Google Scholar
  11. 11.
    Jacquemin, C., Klavans, J.L., Tzoukermann, E.: Expansion of multi-word terms for indexing and retrieval using morphology and syntax. In: Proceedings of the ACL 1997/EACL 1997, Barcelona, Spain, pp. 24–31 (1997)Google Scholar
  12. 12.
    Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.: Developing a robust part-of-speech tagger for biomedical text. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 382–392. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Consortium, T.G.O.: Gene ontology: tool for the unification of biology. Nature genetics 25, 25–29 (2000)CrossRefGoogle Scholar
  14. 14.
    MeSH: Medical subject headings. Library of Medicine, Bethesda, Maryland (1998), WWW page: http://www.nlm.nih.gov/mesh/meshhome.html
  15. 15.
    National Library of Medicine (ed.): UMLS Knowledge Source, 13th edn. (2003)Google Scholar
  16. 16.
    Consortium, T.G.O.: Creating the Gene Ontology Resource: Design and Implementation. Genome Res. 11(8), 1425–1433 (2001)CrossRefGoogle Scholar
  17. 17.
    Côté, R.A.: Répertoire d’anatomopathologie de la SNOMED internationale, v3.4. Université de Sherbrooke, Sherbrooke, Québec (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sophie Aubin
    • 1
  • Thierry Hamon
    • 1
  1. 1.UMR CNRS 7030LIPNVilletaneuse

Personalised recommendations