Skip to main content
Log in

Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Definite clauses are first-order clauses containing one positive literal.

  2. Horn clauses are first-order clauses that can contain at most one positive literal. A Horn clause with exactly one positive literal is a definite clause.

  3. Where \(\wedge\) represents logical and, ⊧ logically proves and □ falsity.

  4. A clause is satisfiable if there exists at least one model for it, i.e., there exists one interpretation (a set of ground facts) that assigns a true value for such clause.

  5. See Sect. 4.1 for a discussion of OntoNote’s treatment of phrasal verbs such as “come back”.

References

  • Agirre, E., & Martínez, D. (2001). Knowledge sources for word sense disambiguation. In Proceedings of the 4th international conference on text speech and dialogue (TSD), Plzen (pp. 1–10).

  • Agirre, E., Marquez, L., & Wicentowski, R. (2007). In 4th International workshop on semantic evaluations (SemEval-07), Prague (pp. 48–53).

  • Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In Proceedings of the 15th conference on computational linguistics (COLING-96), Copenhagen (pp. 16–22).

  • Agirre, E., & Stevenson, M. (2006). Knowledge sources for word sense disambiguation. In E. Agirre & P. Edmonds (Eds.), Word sense disambiguation: Algorithms, applications and trends. Dordrecht: Springer.

  • Bruce, R., & Guthrie, L. (1992). Genus disambiguation: A study in weighted performance. In 14th Conference on computational linguistics (COLING-92), Nantes (pp. 1187–1191).

  • Daelemans, W., Hoste, V., Meulder, F., & Naudts, B. (2003). Combined optimization of feature selection and algorithm parameter interaction in machine learning of language. In Proceedings of the 14th European conference on machine learning (ECML-03), Croatia (pp. 84–95).

  • Decadt, B., Hoste, V., Daelemans, W., & van den Bosch, A. (2004). GAMBL, genetic algorithm optimization of memory-based WSD. In Senseval-3: 3rd international workshop on the evaluation of systems for the semantic analysis of text, Barcelona (pp. 108–112).

  • Edmonds, P., Mihalcea, R., & Saint-Dizier, P. (2002). Proceedings of the workshop word sense disambiguation: Recent successes and future directions, Philadelphia.

  • Fellbaum, C. (1998). WordNet: An electronic lexical Database. Massachusetts: MIT Press.

    Google Scholar 

  • Hovy, E. H., Marcus, M., Palmer, M., Pradhan, S., Ramshaw, L., & Weischedel, R. (2006). OntoNotes: The 90% solution. In Human language technology/North American association of computational linguistics conference (HLT-NAACL 06), New York (pp. 57–60).

  • Lee, Y. K., & Ng, H. T. (2002). An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), Philadelphia (pp. 41–48).

  • Lin, D. (1993). Principle based parsing without overgeneration. In Proceedings of the 31st meeting of the association for computational linguistics (ACL-93), Columbus (pp. 112–120).

  • Mihalcea, R. F. (2002). Word sense disambiguation with pattern learning and automatic feature selection. Natural Language Engineering, 8(4), 343–358. (Cambridge University Press).

    Google Scholar 

  • Mihalcea, R. F., Chklovski, T., & Kilgariff, A. (2004). The SENSEVAL-3 english lexical sample task. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for semantic analysis of text (pp. 25–28).

  • Miller, G. A., Chorodow, M., Landes, S., Leacock, C., & Thomas, R. G. (1994). Using a semantic concordancer for sense identification. In ARPA human language technology workshop, Washington (pp. 240–243).

  • Muggleton, S. (1991). Inductive logic programming. New Generation Computing, 8(4), 295–318.

    Article  Google Scholar 

  • Muggleton, S. (1994). Inductive logic programming: Derivations, successes and shortcomings. SIGART Bulletin, 5(1), 5–11.

    Article  Google Scholar 

  • Muggleton, S. (1995). Inverse entailment and progol. New Generation Computing, 13, 245–286.

    Article  Google Scholar 

  • Muggleton, S., & Raedt, L. D. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19(20), 629–679.

    Article  Google Scholar 

  • Ng, H. T., & Lee, H. B. (1996). Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In Proceedings of the 34th meeting of the association for computational linguistics (ACL-96), Santa Cruz (pp. 40–47).

  • Pradhan, S., Loper, E., Dligach, D., & Palmer, M. (2007). SemEval-2007 Task-17: English lexical sample, SRL and all words. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-07), Prague (pp. 87–92).

  • Procter, P. (Ed.) (1978). Longman dictionary of contemporary English. Essex: Longman Group.

    Google Scholar 

  • Ratnaparkhi, A. (1996). A maximum entropy part-of-speech Tagger. In Proceedings of the conference on empirical methods in natural language processing, New Jersey (pp. 133–142).

  • Small, S., & Rieger, C. (1982). Parsing and comprehending with word experts (a theory and its realisation). In W. Lehnert, & M. Ringle (Eds.), Strategies for natural language processing. Hillsdate: Lawrence Erlbaum Associates.

  • Specia, L., Nunes, M. G. V., & Stevenson, M. (2007a). Learning expressive models for word sense disambiguation. In 45th Annual meeting of the association for computational linguistics (ACL-07), Prague (pp. 41–148).

  • Specia, L., Nunes, M. G. V., Srinivasan, A., & Ramakrishnan, G. (2007b). USP-IBM-1 and USP-IBM-2: The ILP-based systems for lexical sample WSD in SemEval-2007. In Proceedings of the 4th international workshop on semantic evaluations (SemEval-07), Prague (pp. 442–445).

  • Srinivasan, A. (1999). The aleph manual. Available at http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/, 1999.

  • Stevenson, M., & Wilks, Y. (2001). The interaction of knowledge sources in word sense disambiguation. Computational Linguistics, 27(3), 321–349.

    Article  Google Scholar 

  • Wilks, Y. (1978). Making preferences more active. Artificial Intelligence, 11(3), 197–223

    Article  Google Scholar 

  • Yarowsky, D. (1995). Unsupervised word-sense disambiguation rivaling supervised methods. In Proceedings of the 33rd meeting of the association for computational linguistics (ACL-95), Cambridge (pp. 189–196).

  • Yarowsky, D., & Florian, R. (2002). Evaluating sense disambiguation across diverse parameter spaces. Natural Language Engineering, 8(2), 293–310.

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful for the feedback provided by the anonymous reviewers of this paper. Mark Stevenson was supported by the UK Engineering and Physical Sciences Research Council (grants EP/E004350/1 and EP/D069548/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Stevenson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Specia, L., Stevenson, M. & das Graças Volpe Nunes, M. Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation. Lang Resources & Evaluation 44, 295–313 (2010). https://doi.org/10.1007/s10579-009-9107-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-009-9107-y

Keywords

Navigation