Abstract
The extraction of terms and their variants is an important issue in various applications of natural language processing (NLP), such as question answering and information retrieval. This chapter discusses a method to automatically extract medical terms and their variants from a multilingual corpus of parallel translations. As a first step terms are extracted using a pattern-based approach. In order to determine what terms are variants of each other the distributional method used calculates semantic similarity between terms on the basis of translations of these terms in multiple languages. Word alignment techniques were used in combination with phrase extraction techniques from phrase-based machine translation to extract phrases and their translations from a medical parallel corpus. The approach provides a promising strategy for the extraction of term variants using straightforward and fully-automatic techniques. Moreover, the approach is independent of domain and language and can thus be applied to various domains and various languages for which parallel multilingual corpora exist.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bannard C, Callison-Burch C (2005) Paraphrasing with bilingual parallel corpora. In: Proceedings of the annual Meeting of the Association for Computational Linguistics (ACL)
Barzilay R, McKeown K (2001) Extracting paraphrases from a parallel corpus. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp 50–57, URL citeseer.ist.psu.edu/ barzilay01extracting.html
Bouma G, Fahmi I, Mur J, van Noord G, van der Plas L, Tiedemann J (2007) Linguistic knowledge and question answering. Traitement Automatique des Langues (TAL) 2005(03)
Brin S (99) Extracting patterns and relations from theWorldWideWeb. In:WebDB ‘98: Selected papers from the International Workshop on The World Wide Web and Databases
Brown P, Della Pietra S, Della Pietra V, Mercer R (1993) The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2):263–296
Callison-Burch C (2008) Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of EMNLP
Cormen T, Leierson C, Rivest R, Stein C (2001) Introduction to algorithms. MIT Press
Curran J (2003) From distributional to semantic similarity. PhD thesis, University of Edinburgh
Dagan I, Itai A, Schwall U (1991) Two languages are more informative than one. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
Dyvik H (1998) Translations as semantic mirrors. In: Proceedings of Workshop Multilinguality in the Lexicon II (ECAI)
Dyvik H (2002) Translations as semantic mirrors: from parallel corpus to wordnet. Language and Computers, Advances in Corpus Linguistics Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) 16:311–326
Fahmi I (2009) Automatic term and relation extraction for medical question answering system. PhD thesis, University of Groningen
Fahmi I, Bouma G, van der Plas L (2007) Using multilingual terms for biomedical term extraction. In: Proceedings of the RANLP Workshop on Acquisition and Management of Multilingual Lexicons, Borovetz, Bulgaria
Fellbaum C (1998) WordNet, an electronic lexical database. MIT Press
Firth J (1957) A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis (special volume of the Philological Society) pp 1–32
Furnas G, Landauer T, Gomez L, Dumais S (1987) The vocabulary problem in human-system communication. In: Communications of the ACM, pp 964–971
Harris Z (1968) Mathematical structures of language. Wiley
Ibrahim A, Katz B, Lin J (2003) Extracting structural paraphrases from aligned monolingual corpora. In: Proceedings of the second international workshop on Paraphrasing (IWP), pp 57–64
Ide N, Erjavec T, Tufis D (2002) Sense discrimination with parallel corpora. In: Proceedings of the ACL Workshop on Sense Disambiguation: Recent Successes and Future Directions
Justeson J, Katz S (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1:9–27
Kilgarriff A, Yallop C (2000) What’s in a thesaurus? In: Proceedings of the Second Conference on Language Resource an Evaluation (LREC)
Koehn P, Hoang H, Birch A, Callison-Burch C, MFederico, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, AConstantin, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational
Linguistics Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL
Lin D, Pantel P (2001) Discovery of inference rules for question answering. Natural Language Engineering 7(4):343-360 7(4):343–360
Lin D, Zhao S, Qin L, Zhou M (2003) Identifying synonyms among distributionally similar words. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
McCray A, Hole W (1990) The scope and structure of the first version of the umls semantic network. In: Symposium on Computer Applications in Primary Care (SCAMC-90),, Washington DC, IEEE Computer Society. 126-130., IEEE Computer Society, pp 126–130
van Noord G (2006) At last parsing is now operational. In: Actes de la 13eme Conference sur le Traitement Automatique des Langues Naturelles Och F (2003) GIZA++: Training of statistical translation models. Available from http://www.isi.edu/˜och/GIZA++.html
van der Plas L (2008a) Automatic lexico-semantic acquisition for question answering. Groningen dissertations in linguistics
van der Plas L (2008b) Automatic lexico-semantic acquisition for question answering. PhD thesis, University of Groningen
van der Plas L, Tiedemann J (2006) Finding synonyms using automatic word alignment and measures of distributional similarity. In: Proceedings of COLING/ACL
van der Plas L, Tiedemann J (2010) Finding medical term variations using parallel corpora and distributional similarity. In: Proceedings of the Coling workshop on ontologies and lexical resources
Resnik P (1993) Selection and information, unpublished doctoral thesis, University of Pennsylvania
Resnik P, Yarowsky D (1997) A perspective on word sense disambiguation methods and their evaluation. In: Proceedings of ACL SIGLEXWorkshop on Tagging Text with Lexical Semantics: Why, what, and how?
Roget P (1911) Thesaurus of English words and phrases
Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, pp 44–49, http://www.ims.uni-stuttgart.de/˜schmid/
Sch¨utze H (1992) Dimensions of meaning. In: Proceedings of the ACM/IEEE conference on Supercomputing
Shimota M, Sumita E (2002) Automatic paraphrasing based on parallel corpus for normalization. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC)
Tiedemann J (2009) News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In: Nicolov N, Bontcheva K, Angelova G, Mitkov R (eds) Recent Advances in Natural Language Processing, John Benjamins, Amsterdam/Philadelphia, Borovets, Bulgaria, vol V, pp 237–248
Varga D, N´emeth L, Hal´acsy P, Kornai A, Tr´on V, Nagy V (2005) Parallel corpora for medium density languages. In: Proceedings of RANLP 2005, pp 590–596
Ville-Ometz F, Royaut´e J, Zasadzinski A (2008) Enhancing in automatic recognition and extraction of term variants with linguistic features. Terminology, International Journal of Theoretical and Applied Issues in Specialized Communication 13:1:35–59
Vivaldi J, Rodr´ıguez H (2007) Evaluation of terms and term extraction systems: A practical approach. Terminology, International Journal of Theoretical and Applied Issues in Specialized Communication 13:2:225–248
Vossen P (1998) EuroWordNet a multilingual database with lexical semantic networks
Wilks Y, Fass D, Guo CM, McDonald JE, T Plate BMS (1993) Providing machine tractable dictionary tools. Machine Translation 5(2):99–154
Wu H, Zhou M (2003) Optimizing synonym extraction using monolingual and bilingual resources. In: Proceedings of the International Workshop on Paraphrasing: Paraphrase Acquisition and Applications (IWP)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
van der Plas, L., Tiedemann, J., Fahmi, I. (2011). Automatic Extraction of Medical Term Variants from Multilingual Parallel Translations. In: van den Bosch, A., Bouma, G. (eds) Interactive Multi-modal Question-Answering. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17525-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-17525-1_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17524-4
Online ISBN: 978-3-642-17525-1
eBook Packages: Computer ScienceComputer Science (R0)