Symbolic Representation of Text Documents Using Multiple Kernel FCM

  • B. S. Harish
  • M. B. Revanasiddappa
  • S. V. Aruna Kumar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9468)


In this paper, we proposed a novel method of representing text documents based on clustering of term frequency vector. In order to cluster the term frequency vectors, we make use of Multiple Kernel Fuzzy C-Means (MKFCM). After clustering, term frequency vector of each cluster are used to form a interval valued representation (symbolic representation) by the use of mean and standard deviation. Further, interval value features are stored in knowledge base as a representative of the cluster. To corroborate the efficacy of the proposed model, we conducted extensive experimentation on standard datset like Reuters-21578 and 20 Newsgroup. We have compared our classification accuracy achieved by the Symbolic classifier with the other existing Naive Bayes classifier, KNN classifier and SVM classifier. The experimental result reveals that the classification accuracy achieved by using symbolic classifier is better than other three classifiers.


Classification Text documents Representation Symbolic feature Multiple Kernel FCM 


  1. 1.
    Nedungadi, P., Harikumar, H., Ramesh, M.: A high performance hybrid algorithm for text classification. In: 2014 Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), pp. 118–123. IEEE (2014)Google Scholar
  2. 2.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)CrossRefGoogle Scholar
  3. 3.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)zbMATHCrossRefGoogle Scholar
  4. 4.
    Choudhary, B., Bhattacharyya, P.: Text clustering using universal networking language representation. In: The Proceedings of Eleventh International World Wide Web Conference, pp. 1–7 (2002)Google Scholar
  5. 5.
    Hotho, A., Maedche, A., Staab, S.: Ontology-based text document clustering 16, 48–54 (2002)Google Scholar
  6. 6.
    Cavnar, W.: Using an n-gram-based document representation with a vector processing retrieval model, pp. 269–269. NIST SPECIAL PUBLICATION SP (1995)Google Scholar
  7. 7.
    Milios, E., Zhang, Y., He, B., Dong, L.: Automatic term extraction and document similarity in special text corpora. In: Proceedings of the Sixth Conference of the Pacific Association for Computational Linguistics, pp. 275–284. Citeseer (2003)Google Scholar
  8. 8.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JAsIs 41(6), 391–407 (1990)CrossRefGoogle Scholar
  9. 9.
    He, X., Cai, D., Liu, H., Ma, W.Y.: Locality preserving indexing for document representation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (2004)Google Scholar
  10. 10.
    Cai, D., He, X., Zhang, W.V., Han, J.: Regularized locality preserving indexing via spectral regression. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 741–750. ACM (2007)Google Scholar
  11. 11.
    Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998)Google Scholar
  12. 12.
    Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3, 1183–1208 (2003)zbMATHGoogle Scholar
  13. 13.
    Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Zadeh, L.A.: Similarity relations and fuzzy orderings. Inf. Sci. 3(2), 177–200 (1971)zbMATHMathSciNetCrossRefGoogle Scholar
  15. 15.
    Anilkumarreddy, T., Madhukumar, B., Chandrakumar, K.: Classification of text using fuzzy based incremental feature clustering algorithm. Int. J. Adv. Res. Comput. Eng. Technol. 1(5), 313–318 (2012)Google Scholar
  16. 16.
    Jiang, J.Y., Liou, R.J., Lee, S.J.: A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans. Knowl. Data Eng. 23(3), 335–349 (2011)CrossRefGoogle Scholar
  17. 17.
    Puri, S.: A fuzzy similarity based concept mining model for text classification. Int. J. Adv. Comput. Sci. Appl. 2(11), 115–121 (2012)Google Scholar
  18. 18.
    Carvalho, F.D.A.: Fuzzy c-means clustering methods for symbolic interval data. Pattern Recogn. Lett. 28(4), 423–437 (2007)CrossRefGoogle Scholar
  19. 19.
    Guru, D.S., Harish, B.S., Manjunath, S.: Symbolic representation of text documents. In: Proceedings of the Third Annual ACM Bangalore Conference, pp. 1–8. ACM (2010)Google Scholar
  20. 20.
    Harish, B.S., Prasad, B., Udayasri, B.: Classification of text documents using adaptive fuzzy c-means clustering. In: Thampi, S.M., Abraham, A., Pal, S.K., Rodriguez, J.M.C. (eds.) Recent Advances in Intelligent Informatics. AISC, vol. 235, pp. 205–214. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  21. 21.
    Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)CrossRefGoogle Scholar
  22. 22.
    Huang, H.C., Chuang, Y.Y., Chen, C.S.: Multiple kernel fuzzy clustering. IEEE Trans. Fuzzy Syst. 20(1), 120–134 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • B. S. Harish
    • 1
  • M. B. Revanasiddappa
    • 1
  • S. V. Aruna Kumar
    • 1
  1. 1.Department of Information Science and EngineeringSri Jayachamarajendra College of EngineeringMysuruIndia

Personalised recommendations