Advertisement

Language Resources and Evaluation

, Volume 47, Issue 3, pp 607–638 | Cite as

The cross-lingual lexical substitution task

  • Diana McCarthyEmail author
  • Ravi Sinha
  • Rada Mihalcea
Original Paper
  • 248 Downloads

Abstract

In this paper we provide an account of the cross-lingual lexical substitution task run as part of SemEval-2010. In this task both annotators (native Spanish speakers, proficient in English) and participating systems had to find Spanish translations for target words in the context of an English sentence. Because only translations of a single lexical unit were required, this task does not necessitate a full blown translation system. This we hope encouraged those working specifically on lexical semantics to participate without a requirement for them to use machine translation software, though they were free to use whatever resources they chose. In this paper we pay particular attention to the resources used by the various participating systems and present analyses to demonstrate the relative strengths of the systems as well as the requirements they have in terms of resources. In addition to the analyses of individual systems we also present the results of a combined system based on voting from the individual systems. We demonstrate that the system produces better results at finding the most frequent translation from the annotators compared to the highest ranked translation provided by individual systems. This supports our other analyses that the systems are heterogeneous, with different strengths and weaknesses.

Keywords

SemEval 2010 Cross lingual Lexical substitution 

Notes

Acknowledgments

This material is based in part upon work supported by the National Science Foundation CAREER award #0747340. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. We thank the anonymous reviewers for their helpful feedback.

References

  1. Apidianaki, M. (2009). Data-driven semantic analysis for multilingual WSD and lexical selection in translation. In Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009) (pp. 77–85). Athens, Greece: Association for Computational Linguistics. http://www.aclweb.org/anthology/E09-1010.
  2. Apidianaki, M. (2011). Unsupervised cross-lingual lexical substitution. In Proceedings of the first workshop on unsupervised learning in NLP (pp. 13–23). Edinburgh, Scotland: Association for Computational Linguistics. http://www.aclweb.org/anthology/W11-2203.
  3. Aziz, W., & Specia, L. (2010). Uspwlv and wlvusp: Combining dictionaries and contextual information for cross-lingual lexical substitution. In Proceedings of the 5th international workshop on semantic evaluation, SemEval ’10 (pp. 117–122). Morristown, NJ, USA: Association for Computational Linguistics. http://portal.acm.org/citation.cfm?id=1859664.1859688.
  4. Basile, P., & Semeraro, G. (2010). Uba: Using automatic translation and wikipedia for cross-lingual lexical substitution. In Proceedings of the 5th international workshop on semantic evaluation (pp. 242–247). Uppsala, Sweden: Association for Computational Linguistics. http://www.aclweb.org/anthology/S10-1054.
  5. Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. In Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL 2007) (pp. 61–72). Prague, Czech Republic: Association for Computational Linguistics.Google Scholar
  6. Chan, Y. S., Ng, H. T., & Chiang, D. (2007). Word sense disambiguation improves statistical machine translation. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 33–40). Prague, Czech Republic: Association for Computational Linguistics. http://www.aclweb.org/anthology/P07-1005.
  7. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machanical Learning Research, 7, 551–585. http://portal.acm.org/citation.cfm?id=1248547.1248566.Google Scholar
  8. Davidov, D., & Rappoport, A. (2009). Enhancement of lexical concepts using cross-lingual web mining. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 852–861). Singapore: Association for Computational Linguistics. http://www.aclweb.org/anthology/D/D09/D09-1089.
  9. Erk, K., McCarthy, D., & Gaylord, N. (2009). Investigations on word senses and word usages. In Proceedings of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing of the Asian Federation of Natural Language Processing. Suntec, Singapore: Association for Computational Linguistics.Google Scholar
  10. Guo, W., & Diab, M. (2010). Coleur and colslm: A wsd approach to multilingual lexical substitution, tasks 2 and 3 semeval 2010. In Proceedings of the 5th international workshop on semantic evaluation, SemEval ’10 (pp. 129–133). Morristown, NJ, USA: Association for Computational Linguistics. http://portal.acm.org/citation.cfm?id=1859664.1859690.
  11. Kurohashi, S. (2001). senseval-2 japanese translation task. In Proceedings of the senseval -2 workshop (pp. 37–44).Google Scholar
  12. Lefever, E., & Hoste, V. (2010) SemEval-2007 task 3: Cross-lingual word sense disambiguation. In Proceedings of the 5th international workshop on semantic evaluations (SemEval-2010). Uppsala, Sweden.Google Scholar
  13. McCarthy, D. (2002). Lexical substitution as a task for wsd evaluation. In Proceedings of the ACL workshop on word sense disambiguation: Recent successes and future directions (pp. 109–115). Philadelphia, USA.Google Scholar
  14. McCarthy, D. (2011). Measuring similarity of word meaning in context with lexical substitutes and translations. In Gelbukh, A. (Ed.), Computational linguistics and intelligent text processing, CICLing 2011, Pt. I (Lecture Notes in Computer Science, LNTCS 6608). Springer.Google Scholar
  15. McCarthy, D., Keller, B., & Navigli, R. (2010). Getting synonym candidates from raw data in the english lexical substitution task. In Proceedings of the 14th euralex international congress. The Netherlands: Leeuwarden.Google Scholar
  16. McCarthy, D., & Navigli, R. (2007). SemEval-2007 task 10: English lexical substitution task. In Proceedings of the 4th international workshop on semantic evaluations (SemEval-2007) (pp. 48–53). Prague, Czech Republic.Google Scholar
  17. McCarthy, D., & Navigli, R. (2009). The English lexical substitution task. Language Resources and Evaluation Special Issue on Computational Semantic Analysis of Language: SemEval-2007 and Beyond, 43(2), 139–159.CrossRefGoogle Scholar
  18. Mihalcea, R., Sinha, R., & McCarthy, D. (2010). Semeval-2010 task 2: Cross-lingual lexical substitution. In Proceedings of the 5th international workshop on semantic evaluation (pp. 9–14). Uppsala, Sweden: Association for Computational Linguistics. http://www.aclweb.org/anthology/S10-1002.
  19. Navigli, R., & Ponzetto, S. (2012). Joining forces pays off: Multilingual joint word sense disambiguation. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1399–1410). Korea: Jeju.Google Scholar
  20. Ng, H. T., & Chan, Y. S. (2007). SemEval-2007 task 11: English lexical sample task via English-Chinese parallel text. In Proceedings of the 4th international workshop on semantic evaluations (SemEval-2007) (pp. 54–58). Czech Republic: Prague.Google Scholar
  21. Pantel, P., & Lin, D. (2002). Discovering word senses from text. In Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (pp. 613–619). Canada: Edmonton.Google Scholar
  22. Raybaud, S., Lavecchia, C., Langlois, D., & Smaïli, K. (2009). Word- and sentence-level confidence measures for machine translation. In 13th Annual meeting of the European association for machine translation—EAMT 09 proceedings of the 13th annual meeting of the European association for machine translation—EAMT 09. Spain: Barcelona. http://hal.inria.fr/inria-00417541/en/.
  23. Resnik P., & Yarowsky D. (2000). Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation. Natural Language Engineering 5(3), 113–133.Google Scholar
  24. Schütze H. (1998). Automatic word sense discrimination. Computational Linguistics 24(1), 97–123.Google Scholar
  25. Sharoff S. (2006). Open-source corpora: Using the net to fish for linguistic data. International Journal of Corpus Linguistics 11(4), 435–462.CrossRefGoogle Scholar
  26. Sinha, R., McCarthy, D., & Mihalcea, R. (2009). Semeval-2010 task 2: Cross-lingual lexical substitution. In Proceedings of the NAACL-HLT workshop SEW-2009—semantic evaluations: Recent achievements and future directions. Colorado, USA: Boulder.Google Scholar
  27. Su, F., & Markert, K. (2010). Word sense subjectivity for cross-lingual lexical substitution. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics (pp. 357–360). Los Angeles, California: Association for Computational Linguistics. http://www.aclweb.org/anthology/N10-1054.
  28. van Gompel, M. (2010). Uvt-wsd1: A cross-lingual word sense disambiguation system. In Proceedings of the 5th international workshop on semantic evaluation (pp. 238–241). Uppsala, Sweden: Association for Computational Linguistics. http://www.aclweb.org/anthology/S10-1053.
  29. Vickrey, D., Biewald, L., Teyssier, M., & Koller, D. (2005). Word-sense disambiguation for machine translation. In Proceedings of human language technology conference and conference on empirical methods in natural language processing (pp. 771–778). Vancouver, British Columbia, Canada: Association for Computational Linguistics. http://www.aclweb.org/anthology/H/H05/H05-1097.
  30. Vilariño Ayala, D., Balderas Posada, C., Pinto Avendaño, D. E., Rodríguez Hernández, M., & León Silverio, S. (2010). Fcc: Modeling probabilities with giza++ for task 2 and 3 of semeval-2. In Proceedings of the 5th international workshop on semantic evaluation (pp. 112–116). Uppsala, Sweden: Association for Computational Linguistics. http://www.aclweb.org/anthology/S10-1023.
  31. Wicentowski, R., Kelly, M., & Lee, R. (2010). Swat: Cross-lingual lexical substitution using local context matching, bilingual dictionaries and machine translation. In Proceedings of the 5th international workshop on semantic evaluation, SemEval ’10 (pp. 123–128). Morristown, NJ, USA: Association for Computational Linguistics. http://portal.acm.org/citation.cfm?id=1859664.1859689.
  32. Zaragoza, H., Cambazoglu, B. B., & Baeza-Yates, R. (2010). Web search solved? All result rankings the same? In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10 (pp. 529–538). New York, NY, USA: ACM http://doi.acm.org/10.1145/1871437.1871507. URL http://doi.acm.org/10.1145/1871437.1871507.

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  1. 1.DTALUniversity of CambridgeCambridgeUK
  2. 2.University of North TexasDentonUSA

Personalised recommendations