Improving the Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm

Conference paper

DOI: 10.1007/978-81-322-0491-6_88

Volume 131 of the book series Advances in Intelligent and Soft Computing (AINSC)
Cite this paper as:
Hanumanthappa M., Prakash B.R., Mamatha M. (2012) Improving the Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm. In: Deep K., Nagar A., Pant M., Bansal J. (eds) Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20-22, 2011. Advances in Intelligent and Soft Computing, vol 131. Springer, India

Abstract

Document clustering is an effective tool to manage information overload. By grouping similar documents together, we enable a human observer to quickly browse large document collections, make it possible to easily grasp the distinct topics and subtopics. In this Paper we survey the most important problems and techniques related to text information retrieval: document pre-processing and filtering, word sense disambiguation, Further we present text clustering using Modified FPF algorithm and comparison of our clustering algorithms against FPF, which is the most used algorithm in the text clustering context. Further we introduce the problem of cluster labeling: Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure.

Keywords

Clustering document clustering Cluster Labeling Information retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer India Pvt. Ltd. 2012

Authors and Affiliations

  1. 1.Department of Computer Science & ApplicationsBangalore UniversityBangaloreIndia
  2. 2.Department of Computer ScienceSiddaganga College for WomenTumkurIndia