Skip to main content

Automatic Extraction of Medical Term Variants from Multilingual Parallel Translations

  • Chapter
  • First Online:
Interactive Multi-modal Question-Answering

Abstract

The extraction of terms and their variants is an important issue in various applications of natural language processing (NLP), such as question answering and information retrieval. This chapter discusses a method to automatically extract medical terms and their variants from a multilingual corpus of parallel translations. As a first step terms are extracted using a pattern-based approach. In order to determine what terms are variants of each other the distributional method used calculates semantic similarity between terms on the basis of translations of these terms in multiple languages. Word alignment techniques were used in combination with phrase extraction techniques from phrase-based machine translation to extract phrases and their translations from a medical parallel corpus. The approach provides a promising strategy for the extraction of term variants using straightforward and fully-automatic techniques. Moreover, the approach is independent of domain and language and can thus be applied to various domains and various languages for which parallel multilingual corpora exist.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bannard C, Callison-Burch C (2005) Paraphrasing with bilingual parallel corpora. In: Proceedings of the annual Meeting of the Association for Computational Linguistics (ACL)

    Google Scholar 

  • Barzilay R, McKeown K (2001) Extracting paraphrases from a parallel corpus. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp 50–57, URL citeseer.ist.psu.edu/ barzilay01extracting.html

    Google Scholar 

  • Bouma G, Fahmi I, Mur J, van Noord G, van der Plas L, Tiedemann J (2007) Linguistic knowledge and question answering. Traitement Automatique des Langues (TAL) 2005(03)

    Google Scholar 

  • Brin S (99) Extracting patterns and relations from theWorldWideWeb. In:WebDB ‘98: Selected papers from the International Workshop on The World Wide Web and Databases

    Google Scholar 

  • Brown P, Della Pietra S, Della Pietra V, Mercer R (1993) The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2):263–296

    Google Scholar 

  • Callison-Burch C (2008) Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of EMNLP

    Google Scholar 

  • Cormen T, Leierson C, Rivest R, Stein C (2001) Introduction to algorithms. MIT Press

    Google Scholar 

  • Curran J (2003) From distributional to semantic similarity. PhD thesis, University of Edinburgh

    Google Scholar 

  • Dagan I, Itai A, Schwall U (1991) Two languages are more informative than one. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)

    Google Scholar 

  • Dyvik H (1998) Translations as semantic mirrors. In: Proceedings of Workshop Multilinguality in the Lexicon II (ECAI)

    Google Scholar 

  • Dyvik H (2002) Translations as semantic mirrors: from parallel corpus to wordnet. Language and Computers, Advances in Corpus Linguistics Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) 16:311–326

    Google Scholar 

  • Fahmi I (2009) Automatic term and relation extraction for medical question answering system. PhD thesis, University of Groningen

    Google Scholar 

  • Fahmi I, Bouma G, van der Plas L (2007) Using multilingual terms for biomedical term extraction. In: Proceedings of the RANLP Workshop on Acquisition and Management of Multilingual Lexicons, Borovetz, Bulgaria

    Google Scholar 

  • Fellbaum C (1998) WordNet, an electronic lexical database. MIT Press

    Google Scholar 

  • Firth J (1957) A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis (special volume of the Philological Society) pp 1–32

    Google Scholar 

  • Furnas G, Landauer T, Gomez L, Dumais S (1987) The vocabulary problem in human-system communication. In: Communications of the ACM, pp 964–971

    Google Scholar 

  • Harris Z (1968) Mathematical structures of language. Wiley

    Google Scholar 

  • Ibrahim A, Katz B, Lin J (2003) Extracting structural paraphrases from aligned monolingual corpora. In: Proceedings of the second international workshop on Paraphrasing (IWP), pp 57–64

    Google Scholar 

  • Ide N, Erjavec T, Tufis D (2002) Sense discrimination with parallel corpora. In: Proceedings of the ACL Workshop on Sense Disambiguation: Recent Successes and Future Directions

    Google Scholar 

  • Justeson J, Katz S (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1:9–27

    Article  Google Scholar 

  • Kilgarriff A, Yallop C (2000) What’s in a thesaurus? In: Proceedings of the Second Conference on Language Resource an Evaluation (LREC)

    Google Scholar 

  • Koehn P, Hoang H, Birch A, Callison-Burch C, MFederico, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, AConstantin, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational

    Google Scholar 

  • Linguistics Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL

    Google Scholar 

  • Lin D, Pantel P (2001) Discovery of inference rules for question answering. Natural Language Engineering 7(4):343-360 7(4):343–360

    Google Scholar 

  • Lin D, Zhao S, Qin L, Zhou M (2003) Identifying synonyms among distributionally similar words. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

    Google Scholar 

  • McCray A, Hole W (1990) The scope and structure of the first version of the umls semantic network. In: Symposium on Computer Applications in Primary Care (SCAMC-90),, Washington DC, IEEE Computer Society. 126-130., IEEE Computer Society, pp 126–130

    Google Scholar 

  • van Noord G (2006) At last parsing is now operational. In: Actes de la 13eme Conference sur le Traitement Automatique des Langues Naturelles Och F (2003) GIZA++: Training of statistical translation models. Available from http://www.isi.edu/˜och/GIZA++.html

  • van der Plas L (2008a) Automatic lexico-semantic acquisition for question answering. Groningen dissertations in linguistics

    Google Scholar 

  • van der Plas L (2008b) Automatic lexico-semantic acquisition for question answering. PhD thesis, University of Groningen

    Google Scholar 

  • van der Plas L, Tiedemann J (2006) Finding synonyms using automatic word alignment and measures of distributional similarity. In: Proceedings of COLING/ACL

    Google Scholar 

  • van der Plas L, Tiedemann J (2010) Finding medical term variations using parallel corpora and distributional similarity. In: Proceedings of the Coling workshop on ontologies and lexical resources

    Google Scholar 

  • Resnik P (1993) Selection and information, unpublished doctoral thesis, University of Pennsylvania

    Google Scholar 

  • Resnik P, Yarowsky D (1997) A perspective on word sense disambiguation methods and their evaluation. In: Proceedings of ACL SIGLEXWorkshop on Tagging Text with Lexical Semantics: Why, what, and how?

    Google Scholar 

  • Roget P (1911) Thesaurus of English words and phrases

    Google Scholar 

  • Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, pp 44–49, http://www.ims.uni-stuttgart.de/˜schmid/

  • Sch¨utze H (1992) Dimensions of meaning. In: Proceedings of the ACM/IEEE conference on Supercomputing

    Google Scholar 

  • Shimota M, Sumita E (2002) Automatic paraphrasing based on parallel corpus for normalization. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC)

    Google Scholar 

  • Tiedemann J (2009) News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In: Nicolov N, Bontcheva K, Angelova G, Mitkov R (eds) Recent Advances in Natural Language Processing, John Benjamins, Amsterdam/Philadelphia, Borovets, Bulgaria, vol V, pp 237–248

    Google Scholar 

  • Varga D, N´emeth L, Hal´acsy P, Kornai A, Tr´on V, Nagy V (2005) Parallel corpora for medium density languages. In: Proceedings of RANLP 2005, pp 590–596

    Google Scholar 

  • Ville-Ometz F, Royaut´e J, Zasadzinski A (2008) Enhancing in automatic recognition and extraction of term variants with linguistic features. Terminology, International Journal of Theoretical and Applied Issues in Specialized Communication 13:1:35–59

    Google Scholar 

  • Vivaldi J, Rodr´ıguez H (2007) Evaluation of terms and term extraction systems: A practical approach. Terminology, International Journal of Theoretical and Applied Issues in Specialized Communication 13:2:225–248

    Google Scholar 

  • Vossen P (1998) EuroWordNet a multilingual database with lexical semantic networks

    Google Scholar 

  • Wilks Y, Fass D, Guo CM, McDonald JE, T Plate BMS (1993) Providing machine tractable dictionary tools. Machine Translation 5(2):99–154

    Article  Google Scholar 

  • Wu H, Zhou M (2003) Optimizing synonym extraction using monolingual and bilingual resources. In: Proceedings of the International Workshop on Paraphrasing: Paraphrase Acquisition and Applications (IWP)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lonneke van der Plas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

van der Plas, L., Tiedemann, J., Fahmi, I. (2011). Automatic Extraction of Medical Term Variants from Multilingual Parallel Translations. In: van den Bosch, A., Bouma, G. (eds) Interactive Multi-modal Question-Answering. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17525-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17525-1_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17524-4

  • Online ISBN: 978-3-642-17525-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics