Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
Book cover

Iberoamerican Congress on Pattern Recognition

CIARP 2005: Progress in Pattern Recognition, Image Analysis and Applications pp 741–751Cite as

  1. Home
  2. Progress in Pattern Recognition, Image Analysis and Applications
  3. Conference paper
Similarity Measures in Documents Using Association Graphs

Similarity Measures in Documents Using Association Graphs

  • José E. Medina Pagola18,
  • Ernesto Guevara Martínez19,
  • José Hernández Palancar18,
  • Abdel Hechavarría Díaz18 &
  • …
  • Raudel Hernández León18 
  • Conference paper
  • 1070 Accesses

  • 2 Citations

Part of the Lecture Notes in Computer Science book series (LNIP,volume 3773)

Abstract

In this paper we present a new model, designated as Association Graph, to improve document representation, facilitating the ontological dimension. We explain how to generate and use this kind of graph. Also, we analyze different document similarity measures based on this representation. A classical vector space model was used to evaluate this model and measures, investigating their strengths and weaknesses. The proposed model was found to give promising results.

Keywords

  • Information Retrieval
  • Vector Model
  • Collaborative Filter
  • Vector Space Model
  • Cosine Measure

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Chapter PDF

Download to read the full chapter text

References

  1. Yao, J.T., Yao, Y.Y.: Web-based Information Retrieval Support Systems: building research tools for scientists in the new information age. In: Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence (WI 2003), Halifax, Canada (2003)

    Google Scholar 

  2. Xu, J., Huang, Y., Madey, G.: A Research Support System Framework for Web Data Mining. In: Proceedings ofWI/IAT 2003 Workshop on Applications, Products and Services of Web-based Support Systems, WSS 2003, Halifax, Canada (2003)

    Google Scholar 

  3. Rojo, A.: RA, un agente recomendador de recursos digitales de la Web. Master thesis, Universidad de las Américas, Puebla, México, (2002), http://www.pue.udlap.mx/~tesis/msp/rojo_g_a/

  4. Berry, M.: Survey of Text Mining, Clustering, Classification and Retrieval. Springer, Heidelberg (2004)

    MATH  Google Scholar 

  5. Raghavan, V., Wong, S.: A critical analysis of Vector Space Model for Information Retrieval. Journal of the American Society on Information Science 37(5), 279–287 (1986)

    Google Scholar 

  6. Pons, A.: Desarrollo de algoritmos para la estructuración dinámica de información y su aplicación a la detección de sucesos. Doctoral thesis, University Jaume I, Spain (2004)

    Google Scholar 

  7. Salton, G.: The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  8. Ziqiang, W., Boqin, F.: Collaborative Filtering Algorithm Based on Mutual Information. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 405–415. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

  9. Simón, A., Rosete, A., Panucia, K., Ortiz, A.: Aproximación a un método para la representación en Mapas Conceptuales del conocimiento almacenado en textos, con beneficios para la Minería de Texto. I Simposio Cubano de Inteligencia Artificial, Convención Informática 2004, Cuba (2004)

    Google Scholar 

  10. Budanitsky, A., Hirst, G.: Semantic distance inWordNet: An experimental, application-oriented evaluation of five measures. Workshop on WordNet and Other Lexical Resources, in the North American Chapter of the Association for Computational Linguistics, NAACL 2000 (2001)

    Google Scholar 

  11. Feldman, R., Dagan, I.: Knowledge Discovery in Textual Databases (KDT). In: Proceedings of the first International Conference on Data Mining and Knowledge Discovery, KDD 1995, Montreal, pp. 112–117 (1995)

    Google Scholar 

  12. Kou, H., Gardarin, G.: Similarity Model and Term Association for Document Categorization. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 223–229. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  13. Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized Vector Space Model in Information Retrieval. In: Proc. of the 8th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, vol. 11. ACM, New York (1985)

    Google Scholar 

  14. Ahonen, H., Heikkinen, B., Heinonen, O., Klemettinen, M.: Discovery of Reasonably sized Fragments Using Inter-paragraph Similarities. Technical Report C-1997-67, University of Helsinki, Department of Computer Science (1997)

    Google Scholar 

  15. van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)

    Google Scholar 

  16. Pazienza, M.T., Vindigni, M.: Agents Based Ontological Mediation in IE Systems. In: Pazienza, M.T. (ed.) SCIE 2003. LNCS (LNAI), vol. 2700, pp. 92–128. Springer, Heidelberg (2003)

    CrossRef  Google Scholar 

  17. Carmona, J., et al.: An Environment for Morphosyntactic Processing of Unrestricted Spanish Text. In: Proceedings of the First International Conference on Language Resources and Evaluation, LREC 1998 (1998)

    Google Scholar 

  18. Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Centro de Aplicaciones de Tecnologías de Avanzada (CENATAV), 7a # 21812 e/ 218 y 228, Rpto. Siboney, CP. 12200, Playa, C. de la Habana, Cuba

    José E. Medina Pagola, José Hernández Palancar, Abdel Hechavarría Díaz & Raudel Hernández León

  2. Instituto Superior Politécnico “José Antonio Echeverria” (ISPJAE), Ave. 114 # 11901, CP. 10390, Marianao, C. de la Habana, Cuba

    Ernesto Guevara Martínez

Authors
  1. José E. Medina Pagola
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Ernesto Guevara Martínez
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. José Hernández Palancar
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Abdel Hechavarría Díaz
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Raudel Hernández León
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Dept. System Engineering and Automation, Universitat Politècnica de Catalunya (UPC) Barcelona, Spain

    Alberto Sanfeliu

  2. Pattern Recognition Group, ICIMAF, Havana, Cuba

    Manuel Lazo Cortés

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pagola, J.E.M., Martínez, E.G., Palancar, J.H., Díaz, A.H., León, R.H. (2005). Similarity Measures in Documents Using Association Graphs. In: Sanfeliu, A., Cortés, M.L. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2005. Lecture Notes in Computer Science, vol 3773. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11578079_77

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/11578079_77

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29850-2

  • Online ISBN: 978-3-540-32242-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

  • The International Association for Pattern Recognition

    Published in cooperation with

    http://www.iapr.org/

search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Cancel contracts here

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature