Advertisement

SBFC: An Efficient Feature Frequency-Based Approach to Tackle Cross-Lingual Word Sense Disambiguation

  • Dieter Mourisse
  • Els Lefever
  • Nele Verbiest
  • Yvan Saeys
  • Martine De Cock
  • Chris Cornelis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7499)

Abstract

The Cross-Lingual Word Sense Disambiguation (CLWSD) problem is a challenging Natural Language Processing (NLP) task that consists of selecting the correct translation of an ambiguous word in a given context. Different approaches have been proposed to tackle this problem, but they are often complex and need tuning and parameter optimization.

In this paper, we propose a new classifier, Selected Binary Feature Combination (SBFC), for the CLWSD problem. The underlying hypothesis of SBFC is that a translation is a good classification label for new instances if the features that occur frequently in the new instance also occur frequently in the training feature vectors associated with the same translation label.

The advantage of SBFC over existing approaches is that it is intuitive and therefore easy to implement. The algorithm is fast, which allows processing of large text mining data sets. Moreover, no tuning is needed and experimental results show that SBFC outperforms state-of-the-art models for the CLWSD problem w.r.t. accuracy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)Google Scholar
  2. 2.
    Agirre, E., Edmonds, P.: Word Sense Disambiguation. In: Algorithms and Applications. Text, Speech and Language Technology series. Springer (2006)Google Scholar
  3. 3.
    Lefever, E., Hoste, V.: SemEval-2010 Task 3: Cross-Lingual Word Sense Disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, Uppsala, Sweden, pp. 15–20 (2010)Google Scholar
  4. 4.
    Och, F., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)zbMATHCrossRefGoogle Scholar
  5. 5.
    Daelemans, W., van den Bosch, A.: Memory-based Language Processing. Cambridge University Press (2005)Google Scholar
  6. 6.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK (1994)Google Scholar
  7. 7.
    Lefever, E., Hoste, V.: Construction of a Benchmark Data Set for Cross-Lingual Word Sense Disambiguation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta, Malta (2010)Google Scholar
  8. 8.
    Lefever, E., Hoste, V., De Cock, M.: ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 317–322. Association for Computational Linguistics, Portland (2011)Google Scholar
  9. 9.
    van Gompel, M.: UvT-WSD1: A Cross-Lingual Word Sense Disambiguation System. In: Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, pp. 238–224. Association for Computational Linguistics, Uppsala (2010)Google Scholar
  10. 10.
    Guo, W., Diab, M.: COLEPL and COLSLM: An Unsupervised WSD Approach to Multilingual Lexical Substitution, Tasks 2 and 3 SemEval 2010. In: Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, pp. 129–133. Association for Computational Linguistics, Uppsala (2010)Google Scholar
  11. 11.
    Maron, M.E.: Automatic Indexing: An Experimental Inquiry. Journal of the ACM (JACM) 8(3), 404–417 (1961)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Dieter Mourisse
    • 1
  • Els Lefever
    • 1
    • 2
  • Nele Verbiest
    • 1
  • Yvan Saeys
    • 3
    • 4
  • Martine De Cock
    • 1
  • Chris Cornelis
    • 1
    • 5
  1. 1.Department of Applied Mathematics and Computer ScienceGhent UniversityGentBelgium
  2. 2.LT3, University College GhentGentBelgium
  3. 3.Department of Plant Systems BiologyVIBGentBelgium
  4. 4.Department of Molecular GeneticsGhent UniversityGentBelgium
  5. 5.Granada UniversityGranadaSpain

Personalised recommendations