Computers and the Humanities

, Volume 34, Issue 1–2, pp 15–48 | Cite as

Framework and Results for English SENSEVAL

  • A. Kilgarriff
  • J. Rosenzweig


Senseval was the first open, community-based evaluation exercisefor Word Sense Disambiguation programs. It adopted the quantitativeapproach to evaluation developed in MUC and other ARPA evaluationexercises. It took place in 1998. In this paper we describe thestructure, organisation and results of the SENSEVAL exercise forEnglish. We present and defend various design choices for theexercise, describe the data and gold-standard preparation, considerissues of scoring strategies and baselines, and present the resultsfor the 18 participating systems. The exercise identifies thestate-of-the-art for fine-grained word sense disambiguation, wheretraining data is available, as 74–78% correct, with a number ofalgorithms approaching this level of performance. For systems thatdid not assume the availability of training data, performance wasmarkedly lower and also more variable. Human inter-tagger agreementwas high, with the gold standard taggings being around 95%replicable.

evaluation SENSEVAL word sense disambiguation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Harley, A. and D. Glennon. “Combining Different Tests with Additive Weighting and Their Evaluation”. In Tagging Text with Lexical Semantics: Why, What and How? Ed. M. Light, Washington, 1997, pp. 74–78.Google Scholar


  1. Li, X., S. Szpakowicz and S. Matwin. “A WordNet-based Algorithm for Word Sense Disambiguation”. In Proceedings, IJCAI '95. Montreal, 1995, pp. 1368–1374.Google Scholar
  2. Szpakowicz, S., S. Matwin and K. Barker. “WordNet-based Word Sense Disambiguation that Works for Small Texts”. Technical Report Computer Science TR–96–03, School of Information Technology and Engineering, University of Ottawa, 1996.Google Scholar


  1. Guo, C.-M. Constructing a MTD from LDOCE, Chapt. Part 2. Norwood, New Jersey: Ablex, 1995, pp. 145–234.Google Scholar
  2. Wilks, Y., D. Fass, C.-M. Guo, J. McDonald, T. Plate and B. Slator: 1989, 'A Tractable Machine Dictionary as a Resource for Computational Semantics”. In Computational Lexicography for Natural Language Processing. Eds. B. K. Boguraev and E. J. Briscoe, Harlow: Longman, pp. 193–238.Google Scholar


  1. Atkins, S. “Tools for Computer-Aided Corpus Lexicography: The Hector Project”. Acta Linguistica Hungarica, 41 (1993), 5–72.Google Scholar
  2. Byrd, R. J., N. Calzolari, M. S. Chodorow, J. L. Klavans, M. S. Neff and O. A. Rizk. “Tools and Methods for Computational Lexicology”. Computational Linguistics, 13 (1987), 219–240.Google Scholar
  3. CIDE. “Cambridge International Dictionary of English”. Cambridge, England: CUP, 1995.Google Scholar
  4. Fellbaum, C. (ed.). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.Google Scholar
  5. Gale, W., K. Church and D. Yarowsky. “Estimating Upper and Lower Bounds on the Performance of Word-sense Disambiguation Programs”. In Proceedings, 30th ACL, 1992, pp. 249–156.Google Scholar
  6. Harley, A. and D. Glennon. “Combining Different Tests with Additive Weighting and Their Evaluation”. In Tagging Text with Lexical Semantics: Why, What and How? Ed. M. Light, Washington, 1997, pp. 74–78.Google Scholar
  7. Hirschman, L. “The Evolution of Evaluation: Lessons from the Message Understanding Conferences”. Computer Speech and Language, 12(4) (1998), 281–307.Google Scholar
  8. Jorgensen, J. C. “The Psychological Reality of Word Senses”. Journal of Psycholinguistic Research, 19(3) (1990), 167–190.Google Scholar
  9. Kilgarriff, A.: 1992, 'Polysemy'. Ph.D. thesis, University of Sussex, CSRP 261, School of Cognitive and Computing Sciences.Google Scholar
  10. Kilgarriff, A.: 1997, 'Evaluating Word Sense Disambiguation Programs: Progress Report'. In Proc. SALT Workshop on Evaluation in Speech and Language Technology. Ed. R. Gaizauskas, Sheffield, pp. 114–120.Google Scholar
  11. Kilgarriff, A. “Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs”. Computer Speech and Language, 12(4) (1998), 453–472. Special Issue on Evaluation of Speech and Language Technology, edited by R. Gaizauskas.Google Scholar
  12. Lesk, M. E. “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone”. In Proc. 1986 SIGDOC Conference. Toronto, Canada, 1986.Google Scholar
  13. Ng, H. T. and H. B. Lee. “Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach”. In ACL Proceedings. Santa Cruz, California, 1996, pp. 40–47.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • A. Kilgarriff
    • 1
  • J. Rosenzweig
    • 2
  1. 1.ITRIUniversity of BrightonBrightonUK
  2. 2.University of PennsylvaniaUSA

Personalised recommendations