Abstract
At present time, Information plays a relevant role in current societies. In this context, Internet is one of the most extended mechanisms to communicate and distribute information around the word. Today, due to the extremely large number of information sources, automatic mechanisms are needed to filter the information that could be useful for each user. However, one of the problems that the usual techniques of automatic text categorization have not been able to handle is polysemy (words with two o more senses). In this paper, we have faced this problem by proposing a semantic analyzer for the automatic categorization of texts in Spanish. Context exploration techniques were used as a key mechanism for guiding the disambiguation process. A specific lexical database and its existing semantic relations fulfilled the objective of appropriately categorizing the analyzed text. To validate this analyzer, a tool was developed that classifies web pages by semantic sense. We present performance results for this classifier. Finally, a comparison with four other classification tools is reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Desclés, J.P., Cartier, E., Jackiewicz, A., Minel, J.L.: Textual Processing and Contextual Exploration Method, Context 97, Rio de Janeiro (February 1997)
Couto, J.: Los sistemas de exploración contextual de cara al usuario, Tesis de Maestría, InCo, Facultad de Ingeniería, Universidad de la República, Uruguay (2002)
Amsterdam University: EuroWordNet: Building a multilingual database with wordnets for several European languages (March 2006), http://www.illc.uva.nl/EuroWordNet/
Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet Domains Hierarchy Semantics, Coverage, and Balancing. In: COLING 2004 Workshop on Multilingual Linguistic Resources, Geneva, Switzerland, August 28, 2004, pp. 101–108 (2004)
Gómez, J., de Buenaga, M., Cortizo, J.: The role of word sense disambiguation in automated text categorization. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 298–309. Springer, Heidelberg (2005)
Prada, J.J., Moncecchi, G.: Reconocimiento eficiente de marcadores del discurso en español. In: VIII Simposio Internacional de Comunicación Social, Santiago de Cuba, Cuba (January 2003)
Prada, J.J.: Marcadores del discurso en español: Análisis y Representación, Tesis de Maestría, InCo, Facultad de Ingeniería, Universidad de la República, Uruguay (2001)
Gulli, A., Ferragina, P.: The anatomy of a hierarchical clustering engine for web-page, news and book snippets. In: Fourth IEEE International Conference on Data Mining, ICDM 2004, Brighton, UK (2004)
Gulli, A.: SnakeT. Pisa University, Italy, http://www.snaket.com
Vivísimo, Inc.: How the Vivísimo clustering engine works (2003), http://www.vivisimo.com
Vivísimo, Inc.: Clusty (August 2006), http://www.clusty.com
http://www.iboogie.com (November 2006)
Puertas, E., Gómez, J., Carrero, F., Buenaza, M.: Filtrado de contenidos Web en español dentro del proyecto POESIA. In: Proceedings of Ibero-America Conference www/Internet (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Montes Rendon, A., Vargas A., R., Estrada Esquivel, H., Gonzalez Serna, J.G., Ruiz Ascencio, J. (2008). A Method for Automatic Text Categorization Using Word Sense Disambiguation. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69848-7_92
Download citation
DOI: https://doi.org/10.1007/978-3-540-69848-7_92
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69840-1
Online ISBN: 978-3-540-69848-7
eBook Packages: Computer ScienceComputer Science (R0)