Text Clustering Using Reference Centered Similarity Measure

  • Ch. S. Narayana
  • P. Ramesh Babu
  • M. Nagabushana Rao
  • Ch. Pramod Chaithanya
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 249)


The majority clustering skill must presume some cluster relationship relating to the data set. Similarity among the items is usually defined sometimes clearly or even absolutely. With this paper, we introduced some sort of novel numerous reference centered similarity measure and two related clustering approaches. The significant difference between a traditional dissimilarity/ similarity measure and our’s is to compared the performance of the former method using single viewpoint, which may be the source, the number of mention sources. Using several reference points, more useful assessment of similarity could possibly be achieved. Two qualification functions with regard to document clustering are proposed determined by this novel measure. We examine them with well-known clustering algorithm cosine similarity and exposed the development. Performance Analysis is conducted and compared.


Document Clustering Similarity Measure Cosine Similarity Multi View Point Similarity Measure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)CrossRefGoogle Scholar
  2. 2.
    Chim, H., Deng, X.: Efficient phrase-based document similarity for clustering. IEEE Trans. on Knowl. and Data Eng. 20(9), 1217–1229 (2008)CrossRefGoogle Scholar
  3. 3.
    Lee, D., Lee, J.: Dynamic dissimilarity measure for support based clustering. IEEE Trans. on Knowl. and Data Eng. 22(6), 900–905 (2010)CrossRefGoogle Scholar
  4. 4.
    Lakkaraju, P., Gauch, S., Speretta, M.: Document similarity based on concept tree distance. In: Proc. of the 19th ACM conf. on Hypertext and Hypermedia, pp. 127–132 (2008)Google Scholar
  5. 5.
    Ienco, D., Pensa, R.G., Meo, R.: Context-based distance learning for categorical data clustering. In: Proc. of the 8th Int. Symp. IDA, pp. 83–94 (2009)Google Scholar
  6. 6.
    Guyon, I., von Luxburg, U., Williamson, R.C.: Clustering: Science or Art? In: NIPS 2009 Workshop on Clustering Theory (2009)Google Scholar
  7. 7.
    Pękalska, E., Harol, A., Duin, R.P.W., Spillmann, B., Bunke, H.: Non-euclidean or non-metric measures can be informative. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR & SPR 2006. LNCS, vol. 4109, pp. 871–880. Springer, Heidelberg (2006)Google Scholar
  8. 8.
    Pelillo, M.: What is a cluster? Perspectives from game theory. In: Proc. of the NIPS Workshop on Clustering Theory (2009)Google Scholar
  9. 9.
    Dhillon, I., Modha, D.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42(1-2), 143–175 (2001)CrossRefMATHGoogle Scholar
  10. 10.
    Zhong, S.: Efficient online spherical K-means clustering. In: IEEE IJCNN, pp. 3180–3185 (2005)Google Scholar
  11. 11.
    Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)MathSciNetMATHGoogle Scholar
  12. 12.
    Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)MathSciNetMATHGoogle Scholar
  13. 13.
    Xu, W., Liu, X., Gong, Y.: Document clustering based on nonnegative matrix factorization. In: SIGIR, pp. 267–273 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ch. S. Narayana
    • 1
  • P. Ramesh Babu
    • 1
  • M. Nagabushana Rao
    • 2
  • Ch. Pramod Chaithanya
    • 1
  1. 1.CSE DepartmentMalla Reddy Engineering College (Autonomous)HyderabadIndia
  2. 2.CSE DepartmentSwarnandra Engineering CollegeNarsapurIndia

Personalised recommendations