Advertisement

Machine Learning

, Volume 76, Issue 1, pp 109–136 | Cite as

An investigation into feature construction to assist word sense disambiguation

  • Lucia Specia
  • Ashwin Srinivasan
  • Sachindra JoshiEmail author
  • Ganesh Ramakrishnan
  • Maria das Graças Volpe Nunes
Article

Abstract

Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.

Keywords

ILP Word sense disambiguation Feature construction Randomised search 

References

  1. Agirre, E., & Lopez de Lacalle, O. (2007). UBC-ALM: combining k-NN with SVD for WSD. In Proceedings of the fourth international workshop on semantic evaluations (pp. 342–345). Google Scholar
  2. Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In 16th international conference on computational linguistics (pp. 16–22), Copenhagen. Google Scholar
  3. Bar-Hillel, Y. (1960). Automatic translation of languages. In F. Alt, D. Booth, & R. E. Meagher (Eds.), Advances in computers. New York: Academic Press. Google Scholar
  4. Cai, J. F., Lee, W. S., & Teh, Y. W. (2007). NUS-ML: improving word sense disambiguation using topic features. In Proceedings of the fourth international workshop on semantic evaluations (pp. 249–252). Google Scholar
  5. Ciaramita, M., & Johnson, M. (2004). Multi-component word sense disambiguation. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for the semantic analysis of text (pp. 97–100), Barcelona. Google Scholar
  6. Cottrell, G. W. (1989). A connectionist approach to word sense disambiguation. Research notes in artificial intelligence. San Mateo: Morgan Kaufmann. Google Scholar
  7. Davis, J., Ong, I., Struyf, J., Burnside, E., Page, D., & Costa, V. S. (2007). Change of representation for statistical relational learning. In International joint conferences on artificial intelligence. Google Scholar
  8. Hand, D. J. (1997). Construction and assessment of classification rules. Chichester: Wiley. zbMATHGoogle Scholar
  9. Hirst, G. (1987). Semantic interpretation and the resolution of ambiguity. Studies in natural language processing. Cambridge: Cambridge University Press. Google Scholar
  10. John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the eleventh international conference on machine learning (pp. 121–129). San Mateo: Morgan Kaufmann. Google Scholar
  11. Kohavi, R., & John, G. H. (1995). Automatic parameter selection by minimizing estimated error. In 12th international conference on machine learning. San Francisco: Morgan Kaufmann. Google Scholar
  12. Kramer, S., Lavrac, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In S. Dzeroski & N. Lavrac (Eds.), Relational data mining (pp. 262–291). Berlin: Springer. Google Scholar
  13. Lamjiri, A., Demerdash, O., & Kosseim, F. (2004). Simple features for statistical word sense disambiguation. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for the semantic analysis of text (pp. 133–136), Barcelona. Google Scholar
  14. Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kFOIL: learning simple relational kernels. In Y. Gil & R. Mooney (Eds.), Proceedings of the twenty-first national conference on artificial intelligence. Google Scholar
  15. Lavrac, N., Dzeroski, S., & Grobelnik, M. (1990). Learning nonrecursive definitions of relations with LINUS (Technical report). Jozef Stefan Institute. Google Scholar
  16. Lesk, M. (1986). Automated sense disambiguation using machine-readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC conference (pp. 24–26), Toronto. Google Scholar
  17. Lin, D. (1993). Principle based parsing without overgeneration. In 31st annual meeting of the association for computational linguistics (pp. 112–120), Columbus. Google Scholar
  18. McRoy, S. (1992). Using multiple knowledge sources for word sense discrimination. Computational Linguistics, 18(1), 1–30. Google Scholar
  19. Mihalcea, R., Chklovski, T., & Kilgariff, A. (2004). The SENSEVAL-3 English lexical sample task. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for semantic analysis of text (pp. 25–28), Barcelona. Google Scholar
  20. Miller, G. A., Beckwith, R. T., Fellbaum, C. D., Gross, D., & Miller, K. (1990). Wordnet: an on-line lexical database. International Journal of Lexicography, 3(4), 235–244. CrossRefGoogle Scholar
  21. Mohammad, S., & Pedersen, T. (2004). Complementarity of lexical and simple syntactic features: the syntalex approach to SENSEVAL-3. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for the semantic analysis of text (pp. 159–162), Barcelona. Google Scholar
  22. Muggleton, S. (1994). Inductive logic programming: derivations, successes and shortcomings. SIGART Bulletin, 5(1), 5–11. CrossRefGoogle Scholar
  23. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: theory and methods. Journal of Logic Programming, 19(20), 629–679. CrossRefMathSciNetGoogle Scholar
  24. Muggleton, S., Lodhi, H., Amini, A., & Sternberg, M. J. E. (2005). Support vector inductive logic programming. In 8th international conference on discovery science (pp. 163–175). Berlin: Springer. Google Scholar
  25. Niu, Z. Y., Ji, D. H., & Tan, C. L. (2007). I2R: three systems for word sense discrimination, Chinese word sense disambiguation, and English word sense disambiguation. In Proceedings of the fourth international workshop on semantic evaluations (pp. 177–182). Google Scholar
  26. Nienhuys-Cheng, S., & de Wolf, R. (1997). Foundations of inductive logic programming. Berlin: Springer. Google Scholar
  27. Paes, A., Zaverucha, G., Page, C. D. Jr., & and Srinivasan, A. (2007). LNCS: Vol. 4455 ILP through propositionalization and stochastic k-term DNF learning. Sense disambiguation using inductive logic programming. Selected papers from the 16th international conference on inductive logic programming. Berlin: Springer, (pp. 379–393). Google Scholar
  28. Parker, J., & Stahel, M. (1998). Password: English dictionary for speakers of Portuguese. São Paulo: Martins Fontes. Google Scholar
  29. Pedersen, T. (2002). A baseline methodology for word sense disambiguation. In 3rd international conference on intelligent text processing and computational linguistics, Mexico City. Google Scholar
  30. Pradhan, S., Loper, E., Dligach, D., & Palmer, M. (2007). SemEval-2007 Task-17: English lexical sample, SRL and all words. In Fourth international workshop on semantic evaluations (pp. 87–92), Prague. Google Scholar
  31. Procter, P. (Ed.). (1978). Longman dictionary of contemporary English. Essex: Longman Group. Google Scholar
  32. Quillian, M. R. (1961). A design for an understanding machine. Colloquium of semantic problems in natural language. Cambridge: Cambridge University Press. Google Scholar
  33. Ratnaparkhi, A. (1996). A maximum entropy part-of-speech tagger. Empirical methods in NLP conference. Philadelphia: University of Pennsylvania Press. Google Scholar
  34. Schutze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–124. MathSciNetGoogle Scholar
  35. Siegel, S. (1956). Nonparametric statistics for the behavioural sciences. New York: McGraw-Hill. Google Scholar
  36. Specia, L. (2006a). A hybrid relational approach for WSD—first results. In Student research workshop at Coling-ACL (pp. 55–60), Sydney. Google Scholar
  37. Specia, L. (2006b). A hybrid relational approach for WSD—first results. In Proceedings of the COLING/ACL 2006 student research workshop (pp. 55–60). Google Scholar
  38. Specia, L., Nunes, M. G. V., & Stevenson, M. (2005). Exploiting parallel texts to produce a multilingual sense-tagged corpus for word sense disambiguation. In RANLP-05, Borovets (pp. 525–531). Google Scholar
  39. Specia, L., Nunes, M. G. V., & Stevenson, M. (2007a). Learning expressive models for word sense disambiguation. In 45th annual meeting of the association for computational linguistics (pp. 41–48), Prague. Google Scholar
  40. Specia, L., Nunes, M. G. V., Srinivasan, A., & Ramakrishnan, G. (2007b). Word sense disambiguation using inductive logic programming. In LNCS: Vol. 4455 Selected papers from the 16th international conference on inductive logic programming (pp. 409–423). Berlin: Springer. Google Scholar
  41. Specia, L., Nunes, M. G. V., Srinivasan, A., & Ramakrishnan, G. (2007c). USP-IBM-1 and USP-IBM-2: the ILP-based systems for lexical sample WSD in SemEval-2007. In 4th international workshop on semantic evaluations (pp. 442–445), Prague. Google Scholar
  42. Specia, L., Das, G. M., Nunes, M. G. V., Srinivasan, A., & Ramakrishnan, G. (2007d). USP-IBM-1 and USP-IBM-2: the ILP-based systems for lexical sample WSD in SemEval-2007. In Proceedings of the fourth international workshop on semantic evaluations (pp. 442–445). Google Scholar
  43. Srinivasan, A. (1999). The aleph manual. Available at http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/.
  44. Stevenson, M., & Wilks, Y. (2001). The interaction of knowledge sources for word sense disambiguation. Computational Linguistics, 27(3), 321–349. CrossRefGoogle Scholar
  45. Wilks, Y., & Stevenson, M. (1997). Combining independent knowledge sources for word sense disambiguation. In 3rd conference on recent advances in natural language processing (pp. 1–7), Tzigov Chark. Google Scholar
  46. Wilks, Y., & Stevenson, M. (1998). The grammar of sense: using part-of-speech tags as a first step in semantic disambiguation. Natural Language Engineering, 4(1), 1–9. CrossRefGoogle Scholar
  47. Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics (189–196), Cambridge. Google Scholar
  48. Zelezny, F., Srinivasan, A., & Page, C. D. Jr. (2006). Randomised restarted search in ILP. Machine Learning, 64(1–3), 183–208. zbMATHCrossRefGoogle Scholar
  49. Železný, F. & Lavrač, N. (2006). Propositionalization-based relational subgroup discovery with RSD. Machine Learning, 62(1–2), 33–63. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Lucia Specia
    • 1
  • Ashwin Srinivasan
    • 2
  • Sachindra Joshi
    • 2
    Email author
  • Ganesh Ramakrishnan
    • 3
  • Maria das Graças Volpe Nunes
    • 4
  1. 1.Xerox Research Centre EuropeMeylanFrance
  2. 2.IBM India Research LaboratoryNew DelhiIndia
  3. 3.Department of Computer Science & EngineeringIIT BambayMumbaiIndia
  4. 4.ICMC—Universidade de São PauloSão CarlosBrazil

Personalised recommendations