Similarity Measures in Documents Using Association Graphs

  • José E. Medina Pagola
  • Ernesto Guevara Martínez
  • José Hernández Palancar
  • Abdel Hechavarría Díaz
  • Raudel Hernández León
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3773)

Abstract

In this paper we present a new model, designated as Association Graph, to improve document representation, facilitating the ontological dimension. We explain how to generate and use this kind of graph. Also, we analyze different document similarity measures based on this representation. A classical vector space model was used to evaluate this model and measures, investigating their strengths and weaknesses. The proposed model was found to give promising results.

Keywords

Information Retrieval Vector Model Collaborative Filter Vector Space Model Cosine Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Yao, J.T., Yao, Y.Y.: Web-based Information Retrieval Support Systems: building research tools for scientists in the new information age. In: Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence (WI 2003), Halifax, Canada (2003)Google Scholar
  2. 2.
    Xu, J., Huang, Y., Madey, G.: A Research Support System Framework for Web Data Mining. In: Proceedings ofWI/IAT 2003 Workshop on Applications, Products and Services of Web-based Support Systems, WSS 2003, Halifax, Canada (2003)Google Scholar
  3. 3.
    Rojo, A.: RA, un agente recomendador de recursos digitales de la Web. Master thesis, Universidad de las Américas, Puebla, México, (2002), http://www.pue.udlap.mx/~tesis/msp/rojo_g_a/
  4. 4.
    Berry, M.: Survey of Text Mining, Clustering, Classification and Retrieval. Springer, Heidelberg (2004)MATHGoogle Scholar
  5. 5.
    Raghavan, V., Wong, S.: A critical analysis of Vector Space Model for Information Retrieval. Journal of the American Society on Information Science 37(5), 279–287 (1986)Google Scholar
  6. 6.
    Pons, A.: Desarrollo de algoritmos para la estructuración dinámica de información y su aplicación a la detección de sucesos. Doctoral thesis, University Jaume I, Spain (2004)Google Scholar
  7. 7.
    Salton, G.: The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  8. 8.
    Ziqiang, W., Boqin, F.: Collaborative Filtering Algorithm Based on Mutual Information. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 405–415. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Simón, A., Rosete, A., Panucia, K., Ortiz, A.: Aproximación a un método para la representación en Mapas Conceptuales del conocimiento almacenado en textos, con beneficios para la Minería de Texto. I Simposio Cubano de Inteligencia Artificial, Convención Informática 2004, Cuba (2004)Google Scholar
  10. 10.
    Budanitsky, A., Hirst, G.: Semantic distance inWordNet: An experimental, application-oriented evaluation of five measures. Workshop on WordNet and Other Lexical Resources, in the North American Chapter of the Association for Computational Linguistics, NAACL 2000 (2001)Google Scholar
  11. 11.
    Feldman, R., Dagan, I.: Knowledge Discovery in Textual Databases (KDT). In: Proceedings of the first International Conference on Data Mining and Knowledge Discovery, KDD 1995, Montreal, pp. 112–117 (1995)Google Scholar
  12. 12.
    Kou, H., Gardarin, G.: Similarity Model and Term Association for Document Categorization. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 223–229. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized Vector Space Model in Information Retrieval. In: Proc. of the 8th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, vol. 11. ACM, New York (1985)Google Scholar
  14. 14.
    Ahonen, H., Heikkinen, B., Heinonen, O., Klemettinen, M.: Discovery of Reasonably sized Fragments Using Inter-paragraph Similarities. Technical Report C-1997-67, University of Helsinki, Department of Computer Science (1997)Google Scholar
  15. 15.
    van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)Google Scholar
  16. 16.
    Pazienza, M.T., Vindigni, M.: Agents Based Ontological Mediation in IE Systems. In: Pazienza, M.T. (ed.) SCIE 2003. LNCS (LNAI), vol. 2700, pp. 92–128. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  17. 17.
    Carmona, J., et al.: An Environment for Morphosyntactic Processing of Unrestricted Spanish Text. In: Proceedings of the First International Conference on Language Resources and Evaluation, LREC 1998 (1998)Google Scholar
  18. 18.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • José E. Medina Pagola
    • 1
  • Ernesto Guevara Martínez
    • 2
  • José Hernández Palancar
    • 1
  • Abdel Hechavarría Díaz
    • 1
  • Raudel Hernández León
    • 1
  1. 1.Centro de Aplicaciones de Tecnologías de Avanzada (CENATAV)Playa, C. de la HabanaCuba
  2. 2.Instituto Superior Politécnico “José Antonio Echeverria” (ISPJAE)Marianao, C. de la HabanaCuba

Personalised recommendations