Improving the Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm

Purchase on Springer.com

$29.95 / €24.95 / £19.95*

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Document clustering is an effective tool to manage information overload. By grouping similar documents together, we enable a human observer to quickly browse large document collections, make it possible to easily grasp the distinct topics and subtopics. In this Paper we survey the most important problems and techniques related to text information retrieval: document pre-processing and filtering, word sense disambiguation, Further we present text clustering using Modified FPF algorithm and comparison of our clustering algorithms against FPF, which is the most used algorithm in the text clustering context. Further we introduce the problem of cluster labeling: Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure.