Advertisement

Formal Distance vs. Association Strength in Text Processing

  • José E. Medina Pagola
  • Ansel Y. Rodríguez González
  • Abdel Hechavarría Díaz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4756)

Abstract

Text information processing depends critically on the proper document representation. Traditional models, like vector space model, have significant limitations because they do not consider semantic relations amongst terms. In this paper we analyze a document representation that uses an association graph scheme model called Global Association Distance Model or GADM, the significance of the formal distance for the association strength, and the application of several distance-strength functions in this model. We evaluate this significance for topic classification tasks.

Keywords

Document modelling Document processing Document re-presentation 

References

  1. 1.
    Salton, G.: The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, New Jersey (1971)Google Scholar
  2. 2.
    Berry, M.: Survey of Text Mining, Clustering, Classification and Retrieval. Springer, Heidelberg (2004)zbMATHGoogle Scholar
  3. 3.
    Feldman, R., Dagan, I.: Knowledge Discovery in Textual Databases (KDT). In: KDD 1995, Montreal, pp. 112–117 (1995)Google Scholar
  4. 4.
    Kou, H., Gardarin, G.: Similarity Model and Term Association for Document Categorization. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 223–229. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Becker, J., Kuropka, D.: Topic-based Vector Space Model. In: BIS 2003 (2003)Google Scholar
  6. 6.
    Wong, S.K.M, Ziarko, W., Wong, P.C.N.: Generalized Vector Space Model in Information Retrieval. In: Proc. of the 8th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, p. 11. ACM, New York (1985)Google Scholar
  7. 7.
    Medina-Pagola, J.E., Guevara-Martinez, E., Hernández-Palancar, J., Hechavarría-Díaz, A., Hernández-León, R.: Similarity Measures in Documents using Association Graphs. In: Sanfeliu, A., Cortés, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 741–751. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Medina-Pagola, J.E., Rodríguez, A.Y., Hechavarría, A., Hernández-Palancar, J.: Document Representation using Global Association Distance Model. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 565–572. Springer, Heidelberg (2007)Google Scholar
  9. 9.
    Schmid, H.: Probabilistic Part-Of-Speech Tagging Using Decision Tree. In: Proc. of International Conference on New Methods in Language Processing, Manchester, UK (1994)Google Scholar
  10. 10.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • José E. Medina Pagola
    • 1
  • Ansel Y. Rodríguez González
    • 1
  • Abdel Hechavarría Díaz
    • 1
  1. 1.Advanced Technologies Application Center (CENATAV), 7th Avenue # 21812, % 218 and 222, Siboney, Playa, Havana CityCuba

Personalised recommendations