Abstract
In this paper, we proposed a novel method of representing text documents based on clustering of term frequency vector. In order to cluster the term frequency vectors, we make use of Multiple Kernel Fuzzy C-Means (MKFCM). After clustering, term frequency vector of each cluster are used to form a interval valued representation (symbolic representation) by the use of mean and standard deviation. Further, interval value features are stored in knowledge base as a representative of the cluster. To corroborate the efficacy of the proposed model, we conducted extensive experimentation on standard datset like Reuters-21578 and 20 Newsgroup. We have compared our classification accuracy achieved by the Symbolic classifier with the other existing Naive Bayes classifier, KNN classifier and SVM classifier. The experimental result reveals that the classification accuracy achieved by using symbolic classifier is better than other three classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nedungadi, P., Harikumar, H., Ramesh, M.: A high performance hybrid algorithm for text classification. In: 2014 Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), pp. 118–123. IEEE (2014)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Choudhary, B., Bhattacharyya, P.: Text clustering using universal networking language representation. In: The Proceedings of Eleventh International World Wide Web Conference, pp. 1–7 (2002)
Hotho, A., Maedche, A., Staab, S.: Ontology-based text document clustering 16, 48–54 (2002)
Cavnar, W.: Using an n-gram-based document representation with a vector processing retrieval model, pp. 269–269. NIST SPECIAL PUBLICATION SP (1995)
Milios, E., Zhang, Y., He, B., Dong, L.: Automatic term extraction and document similarity in special text corpora. In: Proceedings of the Sixth Conference of the Pacific Association for Computational Linguistics, pp. 275–284. Citeseer (2003)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JAsIs 41(6), 391–407 (1990)
He, X., Cai, D., Liu, H., Ma, W.Y.: Locality preserving indexing for document representation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (2004)
Cai, D., He, X., Zhang, W.V., Han, J.: Regularized locality preserving indexing via spectral regression. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 741–750. ACM (2007)
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998)
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3, 1183–1208 (2003)
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)
Zadeh, L.A.: Similarity relations and fuzzy orderings. Inf. Sci. 3(2), 177–200 (1971)
Anilkumarreddy, T., Madhukumar, B., Chandrakumar, K.: Classification of text using fuzzy based incremental feature clustering algorithm. Int. J. Adv. Res. Comput. Eng. Technol. 1(5), 313–318 (2012)
Jiang, J.Y., Liou, R.J., Lee, S.J.: A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans. Knowl. Data Eng. 23(3), 335–349 (2011)
Puri, S.: A fuzzy similarity based concept mining model for text classification. Int. J. Adv. Comput. Sci. Appl. 2(11), 115–121 (2012)
Carvalho, F.D.A.: Fuzzy c-means clustering methods for symbolic interval data. Pattern Recogn. Lett. 28(4), 423–437 (2007)
Guru, D.S., Harish, B.S., Manjunath, S.: Symbolic representation of text documents. In: Proceedings of the Third Annual ACM Bangalore Conference, pp. 1–8. ACM (2010)
Harish, B.S., Prasad, B., Udayasri, B.: Classification of text documents using adaptive fuzzy c-means clustering. In: Thampi, S.M., Abraham, A., Pal, S.K., Rodriguez, J.M.C. (eds.) Recent Advances in Intelligent Informatics. AISC, vol. 235, pp. 205–214. Springer, Heidelberg (2014)
Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Huang, H.C., Chuang, Y.Y., Chen, C.S.: Multiple kernel fuzzy clustering. IEEE Trans. Fuzzy Syst. 20(1), 120–134 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Harish, B.S., Revanasiddappa, M.B., Aruna Kumar, S.V. (2015). Symbolic Representation of Text Documents Using Multiple Kernel FCM. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-26832-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)