Advertisement

Spanish All-Words Semantic Class Disambiguation Using Cast3LB Corpus

  • Rubén Izquierdo-Beviá
  • Lorenza Moreno-Monteagudo
  • Borja Navarro
  • Armando Suárez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)

Abstract

In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bentivogli, L., Pianta, E.: Exploiting Parallel Texts in the Creation of Multilingual Semantically Annotated Resources: The MultiSemCor Corpus. Natural Language Engineering. Special Issue on Parallel Text 11(3), 247–261 (2005)Google Scholar
  2. 2.
    Civit, M., Martí, M.A., Navarro, B., Bufí, N., Fernández, B., Marcos, R.: Issues in the Syntactic Annotation of Cast3LB. In: 4th International on Workshop on Linguistically Interpreted Corpora (LINC 2003), EACL 2003 workshop, Budapest, Hungary (2003)Google Scholar
  3. 3.
    Kohomban, U.S., Lee, W.S.: Learning Semantic Classes for Word Sense Disambiguation. In: Proceeding of the 43th Annual Meeting of the Association for Computational Linguistics, Michigan, USA (2005)Google Scholar
  4. 4.
    Magnini, B., Cavaglia, G.: Integrating Subject Field Codes into WordNet. In: Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation, Athens, Greece (2000)Google Scholar
  5. 5.
    Miller, G.A., Leacock, C., Randee, T., Bunker, R.: A Semantic Concordance. In: Proceedings of the 3rd ARPA Workshop on Human Language Technology, San Francisco (1993)Google Scholar
  6. 6.
    Navarro, B., Civit, M., Martí, M.A., Marcos, R., Fernández, B.: Syntactic, Semantic and Pragmatic Annotation in Cast3LB. In: Corpus Linguistics 2003 Workshop on Shallow Procesing of Large Corpora, Lancaster, UK (2003)Google Scholar
  7. 7.
    Navarro, B., Marcos, R., Abad, P.: Semantic Annotation and Inter-Annotators Agreement in Cast3LB Corpus. In: Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), Barcelona, Spain (2005)Google Scholar
  8. 8.
    Niles, I., Pease, A.: Towards a Standard Upper Ontology. In: Proceedings of 2nd International Conference on Formal Ontology in Information Systems (FOIS 2001), Ogunquit, USA (2001)Google Scholar
  9. 9.
    Resnik, P.: Selectional preference and sense disambiguation. In: ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington (1997)Google Scholar
  10. 10.
    Sebastián, N., Martí, M.A., Carreiras, M.F., Cuetos, F.: LEXESP: Léxico Informatizado del Español, Edicions de la Universitat de Barcelona (2000)Google Scholar
  11. 11.
    Segond, F., Schiller, A., Grefenstette, G., Chanod, J.-P.: An Experiment in Semantic Tagging using Hidden Markov Model Tagging. In: Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications. Proceedings of ACL 1997, Madrid, Spain, pp. 78–81 (1997)Google Scholar
  12. 12.
    Snyder, B., Palmer, M.: The English All-Word Task. In: Porceedings of SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)Google Scholar
  13. 13.
    Uliveri, M., Guazzini, E., Bertagna, F., Calzolari, N.: Senseval-3: The Italian All-words Task. In: Proceeding of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Anlysis of Texts, Barcelona, Spain (2004)Google Scholar
  14. 14.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)zbMATHGoogle Scholar
  15. 15.
    Villarejo, L., Márquez, L., Rigau, G.: Exploring the construction of semantic class classifiers for WSD. Revista de Procesamiento del Lenguaje Natural 35, 195–202 (2005)Google Scholar
  16. 16.
    Yarowsky, D.: Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora. In: Proceedings, COLING 1992, Nantes, France, pp. 454–460 (1992)Google Scholar
  17. 17.
    Vossen, P.: EuroWordNet: a multilingual database with lexical semantic networks for European Languages (1998)Google Scholar
  18. 18.
    Gale, W., Church, K., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, pp. 233–237 (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Rubén Izquierdo-Beviá
    • 1
  • Lorenza Moreno-Monteagudo
    • 1
  • Borja Navarro
    • 1
  • Armando Suárez
    • 1
  1. 1.Departamento de Lenguajes y Sistemas InformáticosUniversidad de AlicanteSpain

Personalised recommendations