Skip to main content
Log in

The English lexical substitution task

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Since the inception of the Senseval series there has been a great deal of debate in the word sense disambiguation (WSD) community on what the right sense distinctions are for evaluation, with the consensus of opinion being that the distinctions should be relevant to the intended application. A solution to the above issue is lexical substitution, i.e. the replacement of a target word in context with a suitable alternative substitute. In this paper, we describe the English lexical substitution task and report an exhaustive evaluation of the systems participating in the task organized at SemEval-2007. The aim of this task is to provide an evaluation where the sense inventory is not predefined and where performance on the task would bode well for applications. The task not only reflects WSD capabilities, but also can be used to compare lexical resources, whether man-made or automatically created, and has the potential to benefit several natural-language applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Available from http://www.informatics.sussex.ac.uk/research/groups/nlp/mccarthy/task10index.html.

  2. There were only 19 verbs due to an error in automatic selection of one of the verbs picked for manual selection of sentences.

  3. Full instructions given to the annotators are posted at http://www.informatics.susx.ac.uk/research/nlp/mccarthy/files/instructions.pdf.

  4. In the SemEval-2007 task, there was also a third subtask on multiwords. Only one system participated in the multiword subtask, so we do not to describe it here. The scoring measures for all three subtasks are as described in the document at http://nlp.cs.swarthmore.edu/semeval/tasks/task10/task10documentation.pdf released with our trial data.

  5. We also calculated precision over the items attempted by a system which can be contrasted with recall which includes all items. Since systems typically left out only a few items we do not report the precision figures here due to lack of space as the results are similar.

  6. We only used single words as substitutes for the baseline as we did not have frequency data for multiwords.

  7. For WordNet oot we found up to ten synonyms using the same criteria in order until ten were found. We do not report the oot baselines here due to lack of space and because we observed a similar pattern to the best baseline.

  8. We used 0.99 as the parameter for α for this measure.

  9. usyd and hit used version 2.1, the others based on WordNet all used 2.0.

  10. In the case of more than one substitute ranked highest by frequency, the recall score is limited by the frequency of any substitute that shares the highest rank.

  11. Recall that these are only calculated on items where there is a mode.

  12. To highlight the problem of duplicates we have added a warning in the release version of the scorer which indicates where a duplicate is found and states that systems that include duplicates should NOT be compared with those that do not on oot.

  13. We have not tried to calculate human agreement on the oot task because the gold-standard is the same as best and it is not realistic for humans to come up with ten substitutes for a given item. The oot task was envisaged as a way of compensating for the fact that we only have five annotators and there could be more substitutes than they can think of so it allows systems a better chance of finding the substitutes provided by the five annotators.

  14. We do not further complicate this analysis by considering the frequency of these responses.

  15. Please note that whilst the post hoc evaluation looked at 100 sentences, there were 1342 substitutes in total for these 100 sentences examined by post hoc annotators.

  16. This could be rectified to some extent by recruiting more annotators, possibly using volunteers in a web-based game (Mihalcea and Chklovski 2003).

References

  • Barnard, J. (Ed.) (1986). Macquarie Thesaurus. Sydney: Macquarie Library.

    Google Scholar 

  • Brants, T., & Franz, A. (2006). Web 1T 5-gram corpus version 1.1. Technical Report.

  • Briscoe, E., & Carroll, J. (2002). Robust accurate statistical annotation of general text. In Proceedings of the third international conference on Language Resources and Evaluation (LREC) (pp. 1499–1504). Las Palmas, Canary Islands, Spain.

  • Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. In Proceedings of the joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007) (pp. 61–72). Prague, Czech Republic.

  • Chan, Y. S., Ng, H. T., & Chiang, D. (2007). Word sense disambiguation improves statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics (pp. 33–40). Prague, Czech Republic.

  • Dagan, I., Glickman, O., & Magnini, B. (2005). The PASCAL recognising textual entailment challenge. In Proceedings of the PASCAL first challenge workshop (pp. 1–8). Southampton, UK.

  • Dahl, G., Frassica, A.-M., & Wicentowski, R. (2007). SW-AG: Local context matching for English lexical substitution. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 304–307). Prague, Czech Republic.

  • Fellbaum, C. (Ed.) (1998). WordNet, an electronic lexical database. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Giuliano, C., Gliozzo, A., & Strapparava, C. (2007). FBK-irst: Lexical substitution task exploiting domain and syntagmatic coherence. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 145–148). Prague, Czech Republic.

  • Graff, D. (2003). English Gigaword. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  • Hanks, P. (2000). Do word meanings exist?. Computers and the Humanities. Senseval Special Issue, 34(1–2), 205–215.

    Google Scholar 

  • Hassan, S., Csomai, A., Banea, C., Sinha, R., & Mihalcea, R. (2007). UNT: SubFinder: Combining knowledge sources for automatic lexical substitution. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 410–413). Prague, Czech Republic.

  • Hawker, T. (2007). USYD: WSD and lexical substitution using the Web1T corpus. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 446–453). Prague, Czech Republic.

  • Ide, N., & Véronis, J. (1998). Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 1–40.

    Google Scholar 

  • Ide, N., & Wilks, Y. (2006). Making sense about sense. In E. Agirre & P. Edmonds (Eds.), Word sense disambiguation, algorithms and applications (pp. 47–73). Springer.

  • Kilgarriff, A. (2004). How dominant is the commonest sense of a word? In Proceedings of text, speech, dialogue. Brno, Czech Republic.

  • Kilgarriff, A. (2006). Word senses. In E. Agirre & P. Edmonds (Eds.), Word sense disambiguation, algorithms and applications (pp. 29–46). Springer.

  • Lee, L. (1999). Measures of distributional similarity. In Proceedings of the 37th annual meeting of the association for computational linguistics (pp. 25–32).

  • Leech, G. (1992). 100 million words of English: The British National Corpus. Language Research, 28(1), 1–13.

    Google Scholar 

  • Lin, D. (1998). An information-theoretic definition of similarity. In Proceedings of the 15th international conference on machine learning. Madison, WI.

  • Lindberg, C. (Ed.) (2004). The Oxford American Writer’s Thesaurus. Oxford: Oxford University Press.

    Google Scholar 

  • Martinez, D., Kim, S. N., & Baldwin, T. (2007). MELB-MKB: Lexical substitution system based on relatives in context. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 237–240). Prague, Czech Republic.

  • McCarthy, D. (2002). Lexical substitution as a task for WSD evaluation. In Proceedings of the ACL workshop on word sense disambiguation: Recent successes and future directions (pp. 109–115). Philadelphia, USA.

  • McCarthy, D. (2008). Lexical substitution as a framework for multiword evaluation. In Proceedings of the sixth international conference on Language Resources and Evaluation (LREC 2008). Marrakech, Morocco.

  • McCarthy, D., & Navigli, R. (2007). SemEval-2007 Task 10: English lexical substitution task. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 48–53). Prague, Czech Republic.

  • Mihalcea, R., & Chklovski, T. (2003). Open mind word expert: Creating large annotated data collections with Web Users’ help. In Proceedings of the EACL 2003 workshop on linguistically annotated corpora (pp. 53–60). Budapest.

  • Mihalcea, R., & Csomai, A. (2005). SenseLearner: Word sense disambiguation for all words in unrestricted text. In Proceedings of the 43rd annual meeting of the association for computational linguistics. University of Michigan, USA.

  • Miller, G. A., Leacock, C., Tengi, R., & Bunker, R. T. (1993). A semantic concordance. In Proceedings of the ARPA workshop on human language technology (pp. 303–308).

  • Mohammad, S., Hirst, G., & Resnik, P. (2007). Tor, TorMd: Distributional profiles of concepts for unsupervised word sense disambiguation. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 326–333). Prague, Czech Republic.

  • Navigli, R. (2006). Meaningful clustering of senses helps boost word sense disambiguation performance. In Proceedings of the 44th annual meeting of the association for Computational Linguistics joint with the 21st International Conference on Computational Linguistics (COLING-ACL 2006) (pp. 105–112). Sydney, Australia.

  • Navigli, R., Litkowski, K. C., & Hargraves, O. (2007). SemEval-2007 Task 7: Coarse-grained English all-words task. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 30–35). Prague, Czech Republic.

  • Palmer, M. (2000). Consistent criteria for sense distinctions. Computers and the Humanities. Senseval Special Issue, 34(1–2), 217–222.

    Google Scholar 

  • Palmer M., Dang, H. T., & Fellbaum, C. (2007). Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Natural Language Engineering, 13(02), 137–163.

    Google Scholar 

  • Pantel, P., & Lin, D. (2002). Discovering word senses from text. In Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (pp. 613–619). Edmonton, Canada.

  • Resnik, P., & Yarowsky, D. (2000). Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation. Natural Language Engineering, 5(3), 113–133.

    Google Scholar 

  • Roget, P. M. (1911). Roget’s International Thesaurus (1st ed.). New York, USA: Cromwell.

    Google Scholar 

  • Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–123.

    Google Scholar 

  • Schütze, H., & Pederson, J. O. (1995). Information retrieval based on word senses. In Proceedings of the fourth annual symposium on document analysis and information retrieval (pp. 161–175). Las Vegas, NV.

  • Sharoff, S. (2006). Open-source corpora: Using the net to fish for linguistic data. International Journal of Corpus Linguistics, 11(4), 435–462.

    Article  Google Scholar 

  • Stokoe, C. (2005). Differentiating homonymy and polysemy in information retrieval. In Proceedings of the joint conference on human language technology and empirical methods in natural language processing (pp. 403–410). Vancouver, BC, Canada.

  • Stokoe, C., Oakes, M. P., & Tait, J. (2003). Word sense disambiguation in information retrieval revisited. In Proceedings of SIGIR (pp. 159–166).

  • Thesaurus.com. (2007). Roget’s New Millennium TM Thesaurus(ist ed., v 1.3.1). Lexico Publishing Group, LLC. http://thesaurus.reference.co.

  • Yuret, D. (2007). KU: Word sense disambiguation by substitution. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 207–214). Prague, Czech Republic.

  • Zhao, S., Zhao, L., Zhang, Y., Liu, T., & Li, S. (2007). HIT: Web based scoring method for English lexical substitution. In Proceedings of the 4th workshop on Semantic Evaluations (SemEval-2007) (pp. 173–176). Prague, Czech Republic.

Download references

Acknowledgements

We acknowledge support from the Royal Society UK for funding the annotation for the project, and for a Dorothy Hodgkin Fellowship to the first author. We also acknowledge support to the first author from the UK EPSRC project EP/C537262/1 “Ranking Word Senses for Disambiguation: Models and Applications” and to the second author from INTEROP NoE (508011, 6th EU FP). We thank the annotators for their hard work, the anonymous reviewers for their useful feedback, Serge Sharoff for the use of his Internet corpus, Julie Weeds for the distributional similarity software and Suzanne Stevenson for suggesting the oot task.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diana McCarthy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McCarthy, D., Navigli, R. The English lexical substitution task. Lang Resources & Evaluation 43, 139–159 (2009). https://doi.org/10.1007/s10579-009-9084-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-009-9084-1

Keywords

Navigation