Skip to main content

A Method for Automatic Text Categorization Using Word Sense Disambiguation

  • Conference paper
Computational Science and Its Applications – ICCSA 2008 (ICCSA 2008)

Abstract

At present time, Information plays a relevant role in current societies. In this context, Internet is one of the most extended mechanisms to communicate and distribute information around the word. Today, due to the extremely large number of information sources, automatic mechanisms are needed to filter the information that could be useful for each user. However, one of the problems that the usual techniques of automatic text categorization have not been able to handle is polysemy (words with two o more senses). In this paper, we have faced this problem by proposing a semantic analyzer for the automatic categorization of texts in Spanish. Context exploration techniques were used as a key mechanism for guiding the disambiguation process. A specific lexical database and its existing semantic relations fulfilled the objective of appropriately categorizing the analyzed text. To validate this analyzer, a tool was developed that classifies web pages by semantic sense. We present performance results for this classifier. Finally, a comparison with four other classification tools is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Desclés, J.P., Cartier, E., Jackiewicz, A., Minel, J.L.: Textual Processing and Contextual Exploration Method, Context 97, Rio de Janeiro (February 1997)

    Google Scholar 

  2. Couto, J.: Los sistemas de exploración contextual de cara al usuario, Tesis de Maestría, InCo, Facultad de Ingeniería, Universidad de la República, Uruguay (2002)

    Google Scholar 

  3. Amsterdam University: EuroWordNet: Building a multilingual database with wordnets for several European languages (March 2006), http://www.illc.uva.nl/EuroWordNet/

  4. Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet Domains Hierarchy Semantics, Coverage, and Balancing. In: COLING 2004 Workshop on Multilingual Linguistic Resources, Geneva, Switzerland, August 28, 2004, pp. 101–108 (2004)

    Google Scholar 

  5. Gómez, J., de Buenaga, M., Cortizo, J.: The role of word sense disambiguation in automated text categorization. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 298–309. Springer, Heidelberg (2005)

    Google Scholar 

  6. Prada, J.J., Moncecchi, G.: Reconocimiento eficiente de marcadores del discurso en español. In: VIII Simposio Internacional de Comunicación Social, Santiago de Cuba, Cuba (January 2003)

    Google Scholar 

  7. Prada, J.J.: Marcadores del discurso en español: Análisis y Representación, Tesis de Maestría, InCo, Facultad de Ingeniería, Universidad de la República, Uruguay (2001)

    Google Scholar 

  8. Gulli, A., Ferragina, P.: The anatomy of a hierarchical clustering engine for web-page, news and book snippets. In: Fourth IEEE International Conference on Data Mining, ICDM 2004, Brighton, UK (2004)

    Google Scholar 

  9. Gulli, A.: SnakeT. Pisa University, Italy, http://www.snaket.com

  10. Vivísimo, Inc.: How the Vivísimo clustering engine works (2003), http://www.vivisimo.com

  11. Vivísimo, Inc.: Clusty (August 2006), http://www.clusty.com

  12. http://www.iboogie.com (November 2006)

  13. Puertas, E., Gómez, J., Carrero, F., Buenaza, M.: Filtrado de contenidos Web en español dentro del proyecto POESIA. In: Proceedings of Ibero-America Conference www/Internet (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Osvaldo Gervasi Beniamino Murgante Antonio Laganà David Taniar Youngsong Mun Marina L. Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Montes Rendon, A., Vargas A., R., Estrada Esquivel, H., Gonzalez Serna, J.G., Ruiz Ascencio, J. (2008). A Method for Automatic Text Categorization Using Word Sense Disambiguation. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69848-7_92

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69848-7_92

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69840-1

  • Online ISBN: 978-3-540-69848-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics