Abstract
In this paper we present a new model, designated as Association Graph, to improve document representation, facilitating the ontological dimension. We explain how to generate and use this kind of graph. Also, we analyze different document similarity measures based on this representation. A classical vector space model was used to evaluate this model and measures, investigating their strengths and weaknesses. The proposed model was found to give promising results.
Keywords
- Information Retrieval
- Vector Model
- Collaborative Filter
- Vector Space Model
- Cosine Measure
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Chapter PDF
References
Yao, J.T., Yao, Y.Y.: Web-based Information Retrieval Support Systems: building research tools for scientists in the new information age. In: Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence (WI 2003), Halifax, Canada (2003)
Xu, J., Huang, Y., Madey, G.: A Research Support System Framework for Web Data Mining. In: Proceedings ofWI/IAT 2003 Workshop on Applications, Products and Services of Web-based Support Systems, WSS 2003, Halifax, Canada (2003)
Rojo, A.: RA, un agente recomendador de recursos digitales de la Web. Master thesis, Universidad de las Américas, Puebla, México, (2002), http://www.pue.udlap.mx/~tesis/msp/rojo_g_a/
Berry, M.: Survey of Text Mining, Clustering, Classification and Retrieval. Springer, Heidelberg (2004)
Raghavan, V., Wong, S.: A critical analysis of Vector Space Model for Information Retrieval. Journal of the American Society on Information Science 37(5), 279–287 (1986)
Pons, A.: Desarrollo de algoritmos para la estructuración dinámica de información y su aplicación a la detección de sucesos. Doctoral thesis, University Jaume I, Spain (2004)
Salton, G.: The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)
Ziqiang, W., Boqin, F.: Collaborative Filtering Algorithm Based on Mutual Information. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 405–415. Springer, Heidelberg (2004)
Simón, A., Rosete, A., Panucia, K., Ortiz, A.: Aproximación a un método para la representación en Mapas Conceptuales del conocimiento almacenado en textos, con beneficios para la Minería de Texto. I Simposio Cubano de Inteligencia Artificial, Convención Informática 2004, Cuba (2004)
Budanitsky, A., Hirst, G.: Semantic distance inWordNet: An experimental, application-oriented evaluation of five measures. Workshop on WordNet and Other Lexical Resources, in the North American Chapter of the Association for Computational Linguistics, NAACL 2000 (2001)
Feldman, R., Dagan, I.: Knowledge Discovery in Textual Databases (KDT). In: Proceedings of the first International Conference on Data Mining and Knowledge Discovery, KDD 1995, Montreal, pp. 112–117 (1995)
Kou, H., Gardarin, G.: Similarity Model and Term Association for Document Categorization. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 223–229. Springer, Heidelberg (2002)
Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized Vector Space Model in Information Retrieval. In: Proc. of the 8th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, vol. 11. ACM, New York (1985)
Ahonen, H., Heikkinen, B., Heinonen, O., Klemettinen, M.: Discovery of Reasonably sized Fragments Using Inter-paragraph Similarities. Technical Report C-1997-67, University of Helsinki, Department of Computer Science (1997)
van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)
Pazienza, M.T., Vindigni, M.: Agents Based Ontological Mediation in IE Systems. In: Pazienza, M.T. (ed.) SCIE 2003. LNCS (LNAI), vol. 2700, pp. 92–128. Springer, Heidelberg (2003)
Carmona, J., et al.: An Environment for Morphosyntactic Processing of Unrestricted Spanish Text. In: Proceedings of the First International Conference on Language Resources and Evaluation, LREC 1998 (1998)
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pagola, J.E.M., Martínez, E.G., Palancar, J.H., Díaz, A.H., León, R.H. (2005). Similarity Measures in Documents Using Association Graphs. In: Sanfeliu, A., Cortés, M.L. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2005. Lecture Notes in Computer Science, vol 3773. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11578079_77
Download citation
DOI: https://doi.org/10.1007/11578079_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29850-2
Online ISBN: 978-3-540-32242-9
eBook Packages: Computer ScienceComputer Science (R0)
Publish with us
-
Published in cooperation with
http://www.iapr.org/
