A Method for Automatic Text Categorization Using Word Sense Disambiguation

Montes Rendon, Azucena; Vargas A., Rocio; Estrada Esquivel, Hugo; Gonzalez Serna, Juan G.; Ruiz Ascencio, Jose

doi:10.1007/978-3-540-69848-7_92

Azucena Montes Rendon¹,
Rocio Vargas A.¹,
Hugo Estrada Esquivel¹,
Juan G. Gonzalez Serna¹ &
…
Jose Ruiz Ascencio¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5073))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1572 Accesses

Abstract

At present time, Information plays a relevant role in current societies. In this context, Internet is one of the most extended mechanisms to communicate and distribute information around the word. Today, due to the extremely large number of information sources, automatic mechanisms are needed to filter the information that could be useful for each user. However, one of the problems that the usual techniques of automatic text categorization have not been able to handle is polysemy (words with two o more senses). In this paper, we have faced this problem by proposing a semantic analyzer for the automatic categorization of texts in Spanish. Context exploration techniques were used as a key mechanism for guiding the disambiguation process. A specific lexical database and its existing semantic relations fulfilled the objective of appropriately categorizing the analyzed text. To validate this analyzer, a tool was developed that classifies web pages by semantic sense. We present performance results for this classifier. Finally, a comparison with four other classification tools is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Desclés, J.P., Cartier, E., Jackiewicz, A., Minel, J.L.: Textual Processing and Contextual Exploration Method, Context 97, Rio de Janeiro (February 1997)
Google Scholar
Couto, J.: Los sistemas de exploración contextual de cara al usuario, Tesis de Maestría, InCo, Facultad de Ingeniería, Universidad de la República, Uruguay (2002)
Google Scholar
Amsterdam University: EuroWordNet: Building a multilingual database with wordnets for several European languages (March 2006), http://www.illc.uva.nl/EuroWordNet/
Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet Domains Hierarchy Semantics, Coverage, and Balancing. In: COLING 2004 Workshop on Multilingual Linguistic Resources, Geneva, Switzerland, August 28, 2004, pp. 101–108 (2004)
Google Scholar
Gómez, J., de Buenaga, M., Cortizo, J.: The role of word sense disambiguation in automated text categorization. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 298–309. Springer, Heidelberg (2005)
Google Scholar
Prada, J.J., Moncecchi, G.: Reconocimiento eficiente de marcadores del discurso en español. In: VIII Simposio Internacional de Comunicación Social, Santiago de Cuba, Cuba (January 2003)
Google Scholar
Prada, J.J.: Marcadores del discurso en español: Análisis y Representación, Tesis de Maestría, InCo, Facultad de Ingeniería, Universidad de la República, Uruguay (2001)
Google Scholar
Gulli, A., Ferragina, P.: The anatomy of a hierarchical clustering engine for web-page, news and book snippets. In: Fourth IEEE International Conference on Data Mining, ICDM 2004, Brighton, UK (2004)
Google Scholar
Gulli, A.: SnakeT. Pisa University, Italy, http://www.snaket.com
Vivísimo, Inc.: How the Vivísimo clustering engine works (2003), http://www.vivisimo.com
Vivísimo, Inc.: Clusty (August 2006), http://www.clusty.com
http://www.iboogie.com (November 2006)
Puertas, E., Gómez, J., Carrero, F., Buenaza, M.: Filtrado de contenidos Web en español dentro del proyecto POESIA. In: Proceedings of Ibero-America Conference www/Internet (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Centro Nacional de Investigación y Desarrollo Tecnológico, Interior Internado Palmira s/n, col. Palmira, Cuernavaca, Morelos, México, C.P. 62490
Azucena Montes Rendon, Rocio Vargas A., Hugo Estrada Esquivel, Juan G. Gonzalez Serna & Jose Ruiz Ascencio

Authors

Azucena Montes Rendon
View author publications
You can also search for this author in PubMed Google Scholar
Rocio Vargas A.
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Estrada Esquivel
View author publications
You can also search for this author in PubMed Google Scholar
Juan G. Gonzalez Serna
View author publications
You can also search for this author in PubMed Google Scholar
Jose Ruiz Ascencio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Osvaldo Gervasi Beniamino Murgante Antonio Laganà David Taniar Youngsong Mun Marina L. Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Montes Rendon, A., Vargas A., R., Estrada Esquivel, H., Gonzalez Serna, J.G., Ruiz Ascencio, J. (2008). A Method for Automatic Text Categorization Using Word Sense Disambiguation. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69848-7_92

Download citation

DOI: https://doi.org/10.1007/978-3-540-69848-7_92
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69840-1
Online ISBN: 978-3-540-69848-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics