Clustering Abstracts Instead of Full Texts

  • Pavel Makagonov
  • Mikhail Alexandrov
  • Alexander Gelbukh
Conference paper

DOI: 10.1007/978-3-540-30120-2_17

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3206)
Cite this paper as:
Makagonov P., Alexandrov M., Gelbukh A. (2004) Clustering Abstracts Instead of Full Texts. In: Sojka P., Kopeček I., Pala K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science, vol 3206. Springer, Berlin, Heidelberg

Abstract

Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50–100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Pavel Makagonov
    • 1
  • Mikhail Alexandrov
    • 2
  • Alexander Gelbukh
    • 2
    • 3
  1. 1.Mixteca University of TechnologyMexico
  2. 2.National Polytechnic InstituteCenter for Computing ResearchMexico
  3. 3.Computer Science and Engineering DepartmentChung-Ang UniversitySeoulKorea

Personalised recommendations