Chapter

Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20-22, 2011

Volume 131 of the series Advances in Intelligent and Soft Computing pp 957-966

Improving the Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm

  • M. HanumanthappaAffiliated withDepartment of Computer Science & Applications, Bangalore University Email author 
  • , B. R. PrakashAffiliated withDepartment of Computer Science & Applications, Bangalore University
  • , M. MamathaAffiliated withDepartment of Computer Science, Siddaganga College for Women

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Document clustering is an effective tool to manage information overload. By grouping similar documents together, we enable a human observer to quickly browse large document collections, make it possible to easily grasp the distinct topics and subtopics. In this Paper we survey the most important problems and techniques related to text information retrieval: document pre-processing and filtering, word sense disambiguation, Further we present text clustering using Modified FPF algorithm and comparison of our clustering algorithms against FPF, which is the most used algorithm in the text clustering context. Further we introduce the problem of cluster labeling: Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure.

Keywords

Clustering document clustering Cluster Labeling Information retrieval