Skip to main content

A Method for Automatic Text Categorization Using Word Sense Disambiguation

  • Conference paper
  • 1468 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 5073)

Abstract

At present time, Information plays a relevant role in current societies. In this context, Internet is one of the most extended mechanisms to communicate and distribute information around the word. Today, due to the extremely large number of information sources, automatic mechanisms are needed to filter the information that could be useful for each user. However, one of the problems that the usual techniques of automatic text categorization have not been able to handle is polysemy (words with two o more senses). In this paper, we have faced this problem by proposing a semantic analyzer for the automatic categorization of texts in Spanish. Context exploration techniques were used as a key mechanism for guiding the disambiguation process. A specific lexical database and its existing semantic relations fulfilled the objective of appropriately categorizing the analyzed text. To validate this analyzer, a tool was developed that classifies web pages by semantic sense. We present performance results for this classifier. Finally, a comparison with four other classification tools is reported.

Keywords

  • Semantic Analyzer
  • Word Sense Disambiguation
  • Lexical Database
  • Automatic Text
  • Domain Index

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-540-69848-7_92
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-69848-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   219.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Desclés, J.P., Cartier, E., Jackiewicz, A., Minel, J.L.: Textual Processing and Contextual Exploration Method, Context 97, Rio de Janeiro (February 1997)

    Google Scholar 

  2. Couto, J.: Los sistemas de exploración contextual de cara al usuario, Tesis de Maestría, InCo, Facultad de Ingeniería, Universidad de la República, Uruguay (2002)

    Google Scholar 

  3. Amsterdam University: EuroWordNet: Building a multilingual database with wordnets for several European languages (March 2006), http://www.illc.uva.nl/EuroWordNet/

  4. Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet Domains Hierarchy Semantics, Coverage, and Balancing. In: COLING 2004 Workshop on Multilingual Linguistic Resources, Geneva, Switzerland, August 28, 2004, pp. 101–108 (2004)

    Google Scholar 

  5. Gómez, J., de Buenaga, M., Cortizo, J.: The role of word sense disambiguation in automated text categorization. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 298–309. Springer, Heidelberg (2005)

    Google Scholar 

  6. Prada, J.J., Moncecchi, G.: Reconocimiento eficiente de marcadores del discurso en español. In: VIII Simposio Internacional de Comunicación Social, Santiago de Cuba, Cuba (January 2003)

    Google Scholar 

  7. Prada, J.J.: Marcadores del discurso en español: Análisis y Representación, Tesis de Maestría, InCo, Facultad de Ingeniería, Universidad de la República, Uruguay (2001)

    Google Scholar 

  8. Gulli, A., Ferragina, P.: The anatomy of a hierarchical clustering engine for web-page, news and book snippets. In: Fourth IEEE International Conference on Data Mining, ICDM 2004, Brighton, UK (2004)

    Google Scholar 

  9. Gulli, A.: SnakeT. Pisa University, Italy, http://www.snaket.com

  10. Vivísimo, Inc.: How the Vivísimo clustering engine works (2003), http://www.vivisimo.com

  11. Vivísimo, Inc.: Clusty (August 2006), http://www.clusty.com

  12. http://www.iboogie.com (November 2006)

  13. Puertas, E., Gómez, J., Carrero, F., Buenaza, M.: Filtrado de contenidos Web en español dentro del proyecto POESIA. In: Proceedings of Ibero-America Conference www/Internet (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Montes Rendon, A., Vargas A., R., Estrada Esquivel, H., Gonzalez Serna, J.G., Ruiz Ascencio, J. (2008). A Method for Automatic Text Categorization Using Word Sense Disambiguation. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69848-7_92

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69848-7_92

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69840-1

  • Online ISBN: 978-3-540-69848-7

  • eBook Packages: Computer ScienceComputer Science (R0)