Skip to main content

Spanish All-Words Semantic Class Disambiguation Using Cast3LB Corpus

  • Conference paper
MICAI 2006: Advances in Artificial Intelligence (MICAI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4293))

Included in the following conference series:

Abstract

In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology.

This paper has been supported by the Spanish Government under projects CESS-ECE (HUM2004-21127-E) and R2D2 (TIC2003-07158-C04-01).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 239.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bentivogli, L., Pianta, E.: Exploiting Parallel Texts in the Creation of Multilingual Semantically Annotated Resources: The MultiSemCor Corpus. Natural Language Engineering. Special Issue on Parallel Text 11(3), 247–261 (2005)

    Google Scholar 

  2. Civit, M., Martí, M.A., Navarro, B., Bufí, N., Fernández, B., Marcos, R.: Issues in the Syntactic Annotation of Cast3LB. In: 4th International on Workshop on Linguistically Interpreted Corpora (LINC 2003), EACL 2003 workshop, Budapest, Hungary (2003)

    Google Scholar 

  3. Kohomban, U.S., Lee, W.S.: Learning Semantic Classes for Word Sense Disambiguation. In: Proceeding of the 43th Annual Meeting of the Association for Computational Linguistics, Michigan, USA (2005)

    Google Scholar 

  4. Magnini, B., Cavaglia, G.: Integrating Subject Field Codes into WordNet. In: Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation, Athens, Greece (2000)

    Google Scholar 

  5. Miller, G.A., Leacock, C., Randee, T., Bunker, R.: A Semantic Concordance. In: Proceedings of the 3rd ARPA Workshop on Human Language Technology, San Francisco (1993)

    Google Scholar 

  6. Navarro, B., Civit, M., Martí, M.A., Marcos, R., Fernández, B.: Syntactic, Semantic and Pragmatic Annotation in Cast3LB. In: Corpus Linguistics 2003 Workshop on Shallow Procesing of Large Corpora, Lancaster, UK (2003)

    Google Scholar 

  7. Navarro, B., Marcos, R., Abad, P.: Semantic Annotation and Inter-Annotators Agreement in Cast3LB Corpus. In: Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), Barcelona, Spain (2005)

    Google Scholar 

  8. Niles, I., Pease, A.: Towards a Standard Upper Ontology. In: Proceedings of 2nd International Conference on Formal Ontology in Information Systems (FOIS 2001), Ogunquit, USA (2001)

    Google Scholar 

  9. Resnik, P.: Selectional preference and sense disambiguation. In: ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington (1997)

    Google Scholar 

  10. Sebastián, N., Martí, M.A., Carreiras, M.F., Cuetos, F.: LEXESP: Léxico Informatizado del Español, Edicions de la Universitat de Barcelona (2000)

    Google Scholar 

  11. Segond, F., Schiller, A., Grefenstette, G., Chanod, J.-P.: An Experiment in Semantic Tagging using Hidden Markov Model Tagging. In: Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications. Proceedings of ACL 1997, Madrid, Spain, pp. 78–81 (1997)

    Google Scholar 

  12. Snyder, B., Palmer, M.: The English All-Word Task. In: Porceedings of SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)

    Google Scholar 

  13. Uliveri, M., Guazzini, E., Bertagna, F., Calzolari, N.: Senseval-3: The Italian All-words Task. In: Proceeding of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Anlysis of Texts, Barcelona, Spain (2004)

    Google Scholar 

  14. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  15. Villarejo, L., Márquez, L., Rigau, G.: Exploring the construction of semantic class classifiers for WSD. Revista de Procesamiento del Lenguaje Natural 35, 195–202 (2005)

    Google Scholar 

  16. Yarowsky, D.: Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora. In: Proceedings, COLING 1992, Nantes, France, pp. 454–460 (1992)

    Google Scholar 

  17. Vossen, P.: EuroWordNet: a multilingual database with lexical semantic networks for European Languages (1998)

    Google Scholar 

  18. Gale, W., Church, K., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, pp. 233–237 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Izquierdo-Beviá, R., Moreno-Monteagudo, L., Navarro, B., Suárez, A. (2006). Spanish All-Words Semantic Class Disambiguation Using Cast3LB Corpus. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_84

Download citation

  • DOI: https://doi.org/10.1007/11925231_84

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49026-5

  • Online ISBN: 978-3-540-49058-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics