Automatic Extraction of Medical Term Variants from Multilingual Parallel Translations

van der Plas, Lonneke; Tiedemann, Jörg; Fahmi, Ismail

doi:10.1007/978-3-642-17525-1_7

Lonneke van der Plas³,
Jörg Tiedemann⁴ &
Ismail Fahmi⁵

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

635 Accesses

Abstract

The extraction of terms and their variants is an important issue in various applications of natural language processing (NLP), such as question answering and information retrieval. This chapter discusses a method to automatically extract medical terms and their variants from a multilingual corpus of parallel translations. As a first step terms are extracted using a pattern-based approach. In order to determine what terms are variants of each other the distributional method used calculates semantic similarity between terms on the basis of translations of these terms in multiple languages. Word alignment techniques were used in combination with phrase extraction techniques from phrase-based machine translation to extract phrases and their translations from a medical parallel corpus. The approach provides a promising strategy for the extraction of term variants using straightforward and fully-automatic techniques. Moreover, the approach is independent of domain and language and can thus be applied to various domains and various languages for which parallel multilingual corpora exist.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bannard C, Callison-Burch C (2005) Paraphrasing with bilingual parallel corpora. In: Proceedings of the annual Meeting of the Association for Computational Linguistics (ACL)
Google Scholar
Barzilay R, McKeown K (2001) Extracting paraphrases from a parallel corpus. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp 50–57, URL citeseer.ist.psu.edu/ barzilay01extracting.html
Google Scholar
Bouma G, Fahmi I, Mur J, van Noord G, van der Plas L, Tiedemann J (2007) Linguistic knowledge and question answering. Traitement Automatique des Langues (TAL) 2005(03)
Google Scholar
Brin S (99) Extracting patterns and relations from theWorldWideWeb. In:WebDB ‘98: Selected papers from the International Workshop on The World Wide Web and Databases
Google Scholar
Brown P, Della Pietra S, Della Pietra V, Mercer R (1993) The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2):263–296
Google Scholar
Callison-Burch C (2008) Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of EMNLP
Google Scholar
Cormen T, Leierson C, Rivest R, Stein C (2001) Introduction to algorithms. MIT Press
Google Scholar
Curran J (2003) From distributional to semantic similarity. PhD thesis, University of Edinburgh
Google Scholar
Dagan I, Itai A, Schwall U (1991) Two languages are more informative than one. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
Google Scholar
Dyvik H (1998) Translations as semantic mirrors. In: Proceedings of Workshop Multilinguality in the Lexicon II (ECAI)
Google Scholar
Dyvik H (2002) Translations as semantic mirrors: from parallel corpus to wordnet. Language and Computers, Advances in Corpus Linguistics Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) 16:311–326
Google Scholar
Fahmi I (2009) Automatic term and relation extraction for medical question answering system. PhD thesis, University of Groningen
Google Scholar
Fahmi I, Bouma G, van der Plas L (2007) Using multilingual terms for biomedical term extraction. In: Proceedings of the RANLP Workshop on Acquisition and Management of Multilingual Lexicons, Borovetz, Bulgaria
Google Scholar
Fellbaum C (1998) WordNet, an electronic lexical database. MIT Press
Google Scholar
Firth J (1957) A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis (special volume of the Philological Society) pp 1–32
Google Scholar
Furnas G, Landauer T, Gomez L, Dumais S (1987) The vocabulary problem in human-system communication. In: Communications of the ACM, pp 964–971
Google Scholar
Harris Z (1968) Mathematical structures of language. Wiley
Google Scholar
Ibrahim A, Katz B, Lin J (2003) Extracting structural paraphrases from aligned monolingual corpora. In: Proceedings of the second international workshop on Paraphrasing (IWP), pp 57–64
Google Scholar
Ide N, Erjavec T, Tufis D (2002) Sense discrimination with parallel corpora. In: Proceedings of the ACL Workshop on Sense Disambiguation: Recent Successes and Future Directions
Google Scholar
Justeson J, Katz S (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1:9–27
Article Google Scholar
Kilgarriff A, Yallop C (2000) What’s in a thesaurus? In: Proceedings of the Second Conference on Language Resource an Evaluation (LREC)
Google Scholar
Koehn P, Hoang H, Birch A, Callison-Burch C, MFederico, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, AConstantin, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational
Google Scholar
Linguistics Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL
Google Scholar
Lin D, Pantel P (2001) Discovery of inference rules for question answering. Natural Language Engineering 7(4):343-360 7(4):343–360
Google Scholar
Lin D, Zhao S, Qin L, Zhou M (2003) Identifying synonyms among distributionally similar words. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
Google Scholar
McCray A, Hole W (1990) The scope and structure of the first version of the umls semantic network. In: Symposium on Computer Applications in Primary Care (SCAMC-90),, Washington DC, IEEE Computer Society. 126-130., IEEE Computer Society, pp 126–130
Google Scholar
van Noord G (2006) At last parsing is now operational. In: Actes de la 13eme Conference sur le Traitement Automatique des Langues Naturelles Och F (2003) GIZA++: Training of statistical translation models. Available from http://www.isi.edu/˜och/GIZA++.html
van der Plas L (2008a) Automatic lexico-semantic acquisition for question answering. Groningen dissertations in linguistics
Google Scholar
van der Plas L (2008b) Automatic lexico-semantic acquisition for question answering. PhD thesis, University of Groningen
Google Scholar
van der Plas L, Tiedemann J (2006) Finding synonyms using automatic word alignment and measures of distributional similarity. In: Proceedings of COLING/ACL
Google Scholar
van der Plas L, Tiedemann J (2010) Finding medical term variations using parallel corpora and distributional similarity. In: Proceedings of the Coling workshop on ontologies and lexical resources
Google Scholar
Resnik P (1993) Selection and information, unpublished doctoral thesis, University of Pennsylvania
Google Scholar
Resnik P, Yarowsky D (1997) A perspective on word sense disambiguation methods and their evaluation. In: Proceedings of ACL SIGLEXWorkshop on Tagging Text with Lexical Semantics: Why, what, and how?
Google Scholar
Roget P (1911) Thesaurus of English words and phrases
Google Scholar
Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, pp 44–49, http://www.ims.uni-stuttgart.de/˜schmid/
Sch¨utze H (1992) Dimensions of meaning. In: Proceedings of the ACM/IEEE conference on Supercomputing
Google Scholar
Shimota M, Sumita E (2002) Automatic paraphrasing based on parallel corpus for normalization. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC)
Google Scholar
Tiedemann J (2009) News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In: Nicolov N, Bontcheva K, Angelova G, Mitkov R (eds) Recent Advances in Natural Language Processing, John Benjamins, Amsterdam/Philadelphia, Borovets, Bulgaria, vol V, pp 237–248
Google Scholar
Varga D, N´emeth L, Hal´acsy P, Kornai A, Tr´on V, Nagy V (2005) Parallel corpora for medium density languages. In: Proceedings of RANLP 2005, pp 590–596
Google Scholar
Ville-Ometz F, Royaut´e J, Zasadzinski A (2008) Enhancing in automatic recognition and extraction of term variants with linguistic features. Terminology, International Journal of Theoretical and Applied Issues in Specialized Communication 13:1:35–59
Google Scholar
Vivaldi J, Rodr´ıguez H (2007) Evaluation of terms and term extraction systems: A practical approach. Terminology, International Journal of Theoretical and Applied Issues in Specialized Communication 13:2:225–248
Google Scholar
Vossen P (1998) EuroWordNet a multilingual database with lexical semantic networks
Google Scholar
Wilks Y, Fass D, Guo CM, McDonald JE, T Plate BMS (1993) Providing machine tractable dictionary tools. Machine Translation 5(2):99–154
Article Google Scholar
Wu H, Zhou M (2003) Optimizing synonym extraction using monolingual and bilingual resources. In: Proceedings of the International Workshop on Paraphrasing: Paraphrase Acquisition and Applications (IWP)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, University of Geneva, Geneva, Switzerland
Lonneke van der Plas
Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden
Jörg Tiedemann
Gresnews Media, Amsterdam, The Netherlands
Ismail Fahmi

Authors

Lonneke van der Plas
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Tiedemann
View author publications
You can also search for this author in PubMed Google Scholar
Ismail Fahmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lonneke van der Plas .

Editor information

Editors and Affiliations

Fac. Humanities, Tilburg University, Tilburg, Netherlands
Antal van den Bosch
, Information Science, University of Groningen, NL-9700 AS Groningen, Netherlands
Gosse Bouma

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

van der Plas, L., Tiedemann, J., Fahmi, I. (2011). Automatic Extraction of Medical Term Variants from Multilingual Parallel Translations. In: van den Bosch, A., Bouma, G. (eds) Interactive Multi-modal Question-Answering. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17525-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-17525-1_7
Published: 08 April 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17524-4
Online ISBN: 978-3-642-17525-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics