Bilingually Learning Word Senses for Translation

Casteleiro, João; Lopes, Gabriel Pereira; Silva, Joaquim

doi:10.1007/978-3-642-54903-8_24

João Casteleiro¹⁷,
Gabriel Pereira Lopes¹⁷ &
Joaquim Silva¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1667 Accesses
1 Citations

Abstract

All words in every natural language are ambiguous, specially when translation is at stake. In translation tasks, there is the need for finding out adequate translations for such words in the contexts where they occur. In this article, a bilingual strategy to cluster words according to their meanings is described. A publicly available parallel corpora sentence aligned is used. Word senses are discriminated by their translations and by the words occurring in a window, both in the source and target language parallel sentences. This strategy is language independent and uses a correlation algorithm for filtering out irrelevant features. Clusters obtained were evaluated in terms of F-measure (getting an average rating of 94%) and their homogeneity and completeness was determined using V-Measure (getting an average rating of 83%). Learned clusters are then used to train a support vector machine to tag ambiguous words with their translations in the contexts where they occur. This task was also evaluated in terms of F-measure and confronted with a baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS (LNAI), vol. 5816, pp. 587–597. Springer, Heidelberg (2009)
Chapter Google Scholar
Gale, W.A., Church, K.W., Yarowsky, D.: A method for disambiguating word senses in a large corpus. Computers and the Humanities 26, 415–439 (1992)
Article Google Scholar
Bar-Hillel, Y.: The present status of automatic translation of languages. Advances in Computers 1, 91–163 (1960)
Article MathSciNet Google Scholar
Li, P., Church, K.W.: A sketch algorithm for estimating two-way and multi-way associations. Computational Linguistics 33, 305–354 (2007)
Article MATH Google Scholar
Mihalcea, R., Moldovan, D.I.: A method for word sense disambiguation of unrestricted text. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 152–158. Association for Computational Linguistics (1999)
Google Scholar
Gamallo, P., Agustini, A., Lopes, G.P.: Clustering syntactic positions with similar semantic requirements. Computational Linguistics 31, 107–146 (2005)
Article MATH Google Scholar
Ponzetto, S.P., Navigli, R.: Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1522–1531. Association for Computational Linguistics (2010)
Google Scholar
Zhong, Z., Ng, H.T.: Word sense disambiguation improves information retrieval. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Long Papers, vol. 1, pp. 273–282. Association for Computational Linguistics (2012)
Google Scholar
Moraliyski, R., Dias, G.: One sense per discourse for synonymy extraction. In: International Conference on Recent Advances in Natural Language Processing, RANLP 2007, vol. 2, pp. 383–387 (2008)
Google Scholar
Bansal, M., DeNero, J., Lin, D.: Unsupervised translation sense clustering. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 773–782. Association for Computational Linguistics (2012)
Google Scholar
Apidianaki, M., He, Y., et al.: An algorithm for cross-lingual sense-clustering tested in a mt evaluation setting. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 219–226 (2010)
Google Scholar
Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: Word-sense disambiguation using statistical methods. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics, pp. 264–270. Association for Computational Linguistics (1991)
Google Scholar
Diab, M.T., et al.: Word sense disambiguation within a multilingual framework (2003)
Google Scholar
TufiŞ, D., Ion, R., Ide, N.: Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 1312. Association for Computational Linguistics (2004)
Google Scholar
Lefever, E., Hoste, V., De Cock, M.: Five languages are better than one: an attempt to bypass the data acquisition bottleneck for wsd. In: Gelbukh, A. (ed.) CICLing 2013, Part I. LNCS, vol. 7816, pp. 343–354. Springer, Heidelberg (2013)
Chapter Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11, 10–18 (2009)
Article Google Scholar
Pelleg, D., Moore, A.W., et al.: X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)
Google Scholar
Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420 (2007)
Google Scholar
Rijsbergen, V. (ed.): Information Retrieval, 2nd edn. Information Retrieval Group. University of Glasgow (1979)
Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 27 (2011)
Google Scholar
Lefever, E., Hoste, V.: Semeval-2010 task 3: Cross-lingual word sense disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 15–20. Association for Computational Linguistics(2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculdade de Ciências e Tecnologia, Departamento de Informática, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
João Casteleiro, Gabriel Pereira Lopes & Joaquim Silva

Authors

João Casteleiro
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Pereira Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Joaquim Silva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Casteleiro, J., Lopes, G.P., Silva, J. (2014). Bilingually Learning Word Senses for Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-54903-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics