Synonym Acquisition across Domains and Languages

van der Plas, Lonneke; Tiedemann, Jörg; Manguin, Jean-Luc

doi:10.1007/978-3-642-21384-7_4

Lonneke van der Plas⁵,
Jörg Tiedemann⁶ &
Jean-Luc Manguin⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 361))

496 Accesses

Abstract

We describe an approach to the automatic extraction of synonyms that is easy to port across domains and across languages. The approach relies on automatic word alignments in parallel texts and uses distributional methods to compute the semantic similarity of words based on these word alignments. As a result the system outputs ranked lists of candidate synonyms for a given word. We apply the method to French, a language for which an extensive electronic synonym dictionary is available, that serves to evaluate the method. We compare the performance with a system that uses syntactic contexts to acquire synonyms automatically. We show that the alignment-based method outperforms the syntactic method by a large margin. In addition, we show that we can adapt to the domain of colloquial language use by replacing the parallel corpus with one that contains a lot of conversational speech: a corpus of movie subtitles. Furthermore, we apply the method to another language, Dutch, with similar performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Acronyms: identification, expansion and disambiguation

Article 06 December 2018

Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus

Synonym Discovery from Large Corpus

References

Baayen, R., Piepenbrock, R., van Rijn, H.: The CELEX lexical database (CD-ROM). Linguistic Data Consortium, University of Pennsylvania, Philadelphia (1993)
Google Scholar
Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: Proceedings of the annual Meeting of the Association for Computational Linguistics, ACL (2005)
Google Scholar
Bourigault, D., Galy, E.: Analyse distributionnelle de corpus de langue générale et synonymie. In: Lorient, Actes des Journées de la Linguistique de Corpus, JLC (2005)
Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–296 (1993)
Google Scholar
Callison-Burch, C.: Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of EMNLP (2008)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information and lexicography. In: Proceedings of the Annual Conference of the Association of Computational Linguistics, ACL (1989)
Google Scholar
Curran, J.: From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh (2003)
Google Scholar
Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 222–229 (2002)
Google Scholar
Dagan, I., Itai, A., Schwall, U.: Two languages are more informative than one. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, ACL (1991)
Google Scholar
Dyvik, H.: Translations as semantic mirrors. In: Proceedings of Workshop Multilinguality in the Lexicon II (ECAI) (1998)
Google Scholar
Harris, Z.S.: Mathematical structures of language. Wiley, Chichester (1968)
MATH Google Scholar
Hindle, D.: Noun classification from predicate-argument structures. In: Proceedings of the Annual Meeting of the Association of Computational Linguistics, ACL (1990)
Google Scholar
Ide, N., Erjavec, T., Tufis, D.: Sense discrimination with parallel corpora. In: Proceedings of the ACL Workshop on Sense Disambiguation: Recent Successes and Future Directions (2002)
Google Scholar
Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation (2003)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL (1998)
Google Scholar
Lin, D., Zhao, S., Qin, L., Zhou, M.: Identifying synonyms among distributionally similar words. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2003)
Google Scholar
Moortgat, M., Schuurman, I., van der Wouden, T.: CGN syntactische annotatie, Internal Project Report Corpus Gesproken Nederlands (2000), http://lands.let.kun.nl/cgn
van Noord, G.: At last parsing is now operational. In: Actes de la 13eme Conference sur le Traitement Automatique des Langues Naturelles (2006)
Google Scholar
Och, F.: GIZA++: Training of statistical translation models (2003), http://www.isi.edu/~och/GIZA++.html
Ordelman, R.: Twente nieuws corpus (TwNC). Parlevink Language Techonology Group. University of Twente (2002)
Google Scholar
van der Plas, L.: Automatic lexico-semantic acquisition for question answering. Groningen dissertations in linguistics (2008)
Google Scholar
van der Plas, L.: Automatic lexico-semantic acquisition for question answering. Ph.D. thesis, University of Groningen (2008)
Google Scholar
van der Plas, L., Tiedemann, J.: Finding synonyms using automatic word alignment and measures of distributional similarity. In: Proceedings of COLING/ACL (2006)
Google Scholar
van der PLas, L., Tiedemann, J.: Finding medical term variations using parallel corpora and distributional similarity. In: Proceedings of the Coling Workshop on Ontologies and Lexical Resources (2010)
Google Scholar
van der Plas, L., Tiedemann, J., Manguin, J.L.: Extraction de synonymes à partir d’un corpus multilingue aligné. Actes des 5èmes Journées de Linguistique de Corpus à Lorient (2008)
Google Scholar
Ploux, S., Manguin, J.: Dictionnaire électronique des synonymes français (1998, released 2007)
Google Scholar
Resnik, P.: Selection and information, Unpublished doctoral thesis, University of Pennsylvania (1993)
Google Scholar
Resnik, P., Yarowsky, D.: A perspective on word sense disambiguation methods and their evaluation. In: Proceedings of ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How? (1997)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994), http://www.ims.uni-stuttgart.de/~schmid/
Shimota, M., Sumita, E.: Automatic paraphrasing based on parallel corpus for normalization. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC (2002)
Google Scholar
Tiedemann, J.: News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In: Nicolov, N., Bontcheva, K., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing, vol. V, pp. 237–248. John Benjamins, Amsterdam (2009)
Google Scholar
Tiedemann, J., Nygaard, L.: The OPUS corpus - parallel & free. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC (2004)
Google Scholar
Vossen, P.: EuroWordNet A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998)
MATH Google Scholar
Wu, H., Zhou, M.: Optimizing synonym extraction using monolingual and bilingual resources. In: Proceedings of the International Workshop on Paraphrasing: Paraphrase Acquisition and Applications, IWP (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Geneva, Switzerland
Lonneke van der Plas
Uppsala University, Sweden
Jörg Tiedemann
CNRS/University of Caen, France
Jean-Luc Manguin

Authors

Lonneke van der Plas
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Tiedemann
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Manguin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

InterAnalytics, Rue des Savoises, 19, 1205, Geneva, Switzerland
Vincenzo Pallotta
CRS4, Center of Advanced Studies Research and Development in Sardinia, Parco Scientifico della Sardegna, Ed. 1, 09010, Loc. Piscinamanna Pula, CA, Italy
Alessandro Soro
Department of Electrical and Electronic Engineering, University of Cagliari, 09123, Piazza d’Armi, Cagliari, Italy
Eloisa Vargiu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

van der Plas, L., Tiedemann, J., Manguin, JL. (2011). Synonym Acquisition across Domains and Languages. In: Pallotta, V., Soro, A., Vargiu, E. (eds) Advances in Distributed Agent-Based Retrieval Tools. Studies in Computational Intelligence, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21384-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-21384-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21383-0
Online ISBN: 978-3-642-21384-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Synonym Acquisition across Domains and Languages

Abstract

Access this chapter

Preview

Similar content being viewed by others

Acronyms: identification, expansion and disambiguation

Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus

Synonym Discovery from Large Corpus

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Synonym Acquisition across Domains and Languages

Abstract

Access this chapter

Preview

Similar content being viewed by others

Acronyms: identification, expansion and disambiguation

Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus

Synonym Discovery from Large Corpus

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation