Advertisement

Frequent Term-Based Text Clustering Using Hidden Support

  • Harsha PatilEmail author
  • Ramjeevan Singh Thakur
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 34)

Abstract

As the use of Internet increases in all directions, generation of digital documents grow vigorously. Handling of these digital documents challenges the technology to get appropriate response to any query. Many researchers took this challenge and depict their interest to mine these digitalized treasure of knowledge and find hidden information. High dimensionality of text document is always a big challenge for researchers. Handling of high-dimensional text documents for classifying them into clusters with accuracy is another stone of challenge. In this paper, we proposed a method hidden term-based document clustering (HTBDC) which utilized frequent itemset-based mining method. Here in our approach, we try to trade off between high dimensionality with high accuracy of clustering and we got good results. We evaluate our method on the bases of F-score on standard datasets, and the results show that our method performs comparatively better.

Keywords

Clustering Text mining F-score Itemsets Score function 

References

  1. 1.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD workshop on text mining’00 (2000)Google Scholar
  2. 2.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data, An Introduction to Cluster Analysis. Wiley, NJ (1990)Google Scholar
  3. 3.
    Fung, B., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: Proceedings of SIAM International Conference on Data Mining (2003)CrossRefGoogle Scholar
  4. 4.
    Agrawal, R., Srikant R.: Fast algorithms for mining association rules in large databases. In: Proceeding of VLDB Conference’94, Santiago de Chile, Chile, pp. 487–499 (1994)Google Scholar
  5. 5.
    Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of International Conference on Information and Knowledge Management (2002)Google Scholar
  6. 6.
    Kiran, G.V.R., Shankar, R., Pudi, V.: Frequent itemset based hierarchical document clustering using wikipedia as external knowledge. In: Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (2010)Google Scholar
  7. 7.
    Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (2002)Google Scholar
  8. 8.
    Yu, H., Searsmith, D., Li, X., Han, J.: Scalable construction of topic directory with nonparametric closed termset mining. In: Proceedings of Fourth IEEE International Conference on Data Mining (2004)Google Scholar
  9. 9.
    Malik, H.H., Kender, J.R.: High quality, efficient hierarchical document clustering using closed interesting itemsets. In: Proceedings of IEEE International Conference on Data Mining (2006)Google Scholar
  10. 10.
    Han, E.-H., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: WebAce: a web agent for document categorization and exploration. In: Proceedings of the 2nd International Conference on Autonomous Agents (Agents’98) (1998)Google Scholar
  11. 11.
  12. 12.
    Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: Proceedings of the Sixth National Conference on Digital Government Research. San Diego, CA, pp. 167–176 (2006)Google Scholar
  13. 13.
    Hotho, A., Staab, S.: Wordnet improves text document clustering. In: Proceedings of Semantic Web Workshop, the 26th Annual International ACM SIGIR Conference (2003)Google Scholar
  14. 14.
    Su, C., Chen, Q., Wang, X., Meng, X.: Text clustering approach based on maximal frequent term sets. In: Proceeding of 2003 IEEE International Conference on—Systems, Man and Cybernetics, Harbin Institute of Technology, Shenzhen, China, pp. 1551–1556 (2009)Google Scholar
  15. 15.
    Chen, C.L., Tseng, F.S.C., Liang, T.: Mining fuzzy frequent itemsets for hierarchical document clustering. Inf. Process. Manag. 46(2), 193–211 (2010)CrossRefGoogle Scholar
  16. 16.
    Negm, N., Elkafrawy, P., Amin, M., Salem, A.M.: Investigate the performance of document clustering approach based on association rules mining. Int. J. Adv. Comput. Sci. Appl. 4(8), 142–151 (2013)Google Scholar
  17. 17.
    Asmat, N., Rehman, S.U., Ashraf, J., Habib, A.: Maximal frequent itemsets based hierarchical strategy for document clustering. In: International Conference on Computer Science, Data Mining and Mechanical Engineering (ICCDMME’2015), Bangkok (Thailand), 20–21 Apr 2015Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Computer ApplicationsMaulana Azad National Institute of TechnologyBhopalIndia
  2. 2.Ashoka Center for Business and Computer StudiesNashikIndia

Personalised recommendations