Improving the Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm
- Cite this paper as:
- Hanumanthappa M., Prakash B.R., Mamatha M. (2012) Improving the Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm. In: Deep K., Nagar A., Pant M., Bansal J. (eds) Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20-22, 2011. Advances in Intelligent and Soft Computing, vol 131. Springer, India
Document clustering is an effective tool to manage information overload. By grouping similar documents together, we enable a human observer to quickly browse large document collections, make it possible to easily grasp the distinct topics and subtopics. In this Paper we survey the most important problems and techniques related to text information retrieval: document pre-processing and filtering, word sense disambiguation, Further we present text clustering using Modified FPF algorithm and comparison of our clustering algorithms against FPF, which is the most used algorithm in the text clustering context. Further we introduce the problem of cluster labeling: Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure.
KeywordsClustering document clustering Cluster Labeling Information retrieval
Unable to display preview. Download preview PDF.