Abstract
In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology.
This paper has been supported by the Spanish Government under projects CESS-ECE (HUM2004-21127-E) and R2D2 (TIC2003-07158-C04-01).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bentivogli, L., Pianta, E.: Exploiting Parallel Texts in the Creation of Multilingual Semantically Annotated Resources: The MultiSemCor Corpus. Natural Language Engineering. Special Issue on Parallel Text 11(3), 247–261 (2005)
Civit, M., Martí, M.A., Navarro, B., Bufí, N., Fernández, B., Marcos, R.: Issues in the Syntactic Annotation of Cast3LB. In: 4th International on Workshop on Linguistically Interpreted Corpora (LINC 2003), EACL 2003 workshop, Budapest, Hungary (2003)
Kohomban, U.S., Lee, W.S.: Learning Semantic Classes for Word Sense Disambiguation. In: Proceeding of the 43th Annual Meeting of the Association for Computational Linguistics, Michigan, USA (2005)
Magnini, B., Cavaglia, G.: Integrating Subject Field Codes into WordNet. In: Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation, Athens, Greece (2000)
Miller, G.A., Leacock, C., Randee, T., Bunker, R.: A Semantic Concordance. In: Proceedings of the 3rd ARPA Workshop on Human Language Technology, San Francisco (1993)
Navarro, B., Civit, M., Martí, M.A., Marcos, R., Fernández, B.: Syntactic, Semantic and Pragmatic Annotation in Cast3LB. In: Corpus Linguistics 2003 Workshop on Shallow Procesing of Large Corpora, Lancaster, UK (2003)
Navarro, B., Marcos, R., Abad, P.: Semantic Annotation and Inter-Annotators Agreement in Cast3LB Corpus. In: Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), Barcelona, Spain (2005)
Niles, I., Pease, A.: Towards a Standard Upper Ontology. In: Proceedings of 2nd International Conference on Formal Ontology in Information Systems (FOIS 2001), Ogunquit, USA (2001)
Resnik, P.: Selectional preference and sense disambiguation. In: ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington (1997)
Sebastián, N., Martí, M.A., Carreiras, M.F., Cuetos, F.: LEXESP: Léxico Informatizado del Español, Edicions de la Universitat de Barcelona (2000)
Segond, F., Schiller, A., Grefenstette, G., Chanod, J.-P.: An Experiment in Semantic Tagging using Hidden Markov Model Tagging. In: Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications. Proceedings of ACL 1997, Madrid, Spain, pp. 78–81 (1997)
Snyder, B., Palmer, M.: The English All-Word Task. In: Porceedings of SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)
Uliveri, M., Guazzini, E., Bertagna, F., Calzolari, N.: Senseval-3: The Italian All-words Task. In: Proceeding of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Anlysis of Texts, Barcelona, Spain (2004)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Villarejo, L., Márquez, L., Rigau, G.: Exploring the construction of semantic class classifiers for WSD. Revista de Procesamiento del Lenguaje Natural 35, 195–202 (2005)
Yarowsky, D.: Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora. In: Proceedings, COLING 1992, Nantes, France, pp. 454–460 (1992)
Vossen, P.: EuroWordNet: a multilingual database with lexical semantic networks for European Languages (1998)
Gale, W., Church, K., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, pp. 233–237 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Izquierdo-Beviá, R., Moreno-Monteagudo, L., Navarro, B., Suárez, A. (2006). Spanish All-Words Semantic Class Disambiguation Using Cast3LB Corpus. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_84
Download citation
DOI: https://doi.org/10.1007/11925231_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49026-5
Online ISBN: 978-3-540-49058-6
eBook Packages: Computer ScienceComputer Science (R0)