Symbolic Representation of Text Documents Using Multiple Kernel FCM

Harish, B. S.; Revanasiddappa, M. B.; Aruna Kumar, S. V.

doi:10.1007/978-3-319-26832-3_10

B. S. Harish¹⁶,
M. B. Revanasiddappa¹⁶ &
S. V. Aruna Kumar¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9468))

Included in the following conference series:

International Conference on Mining Intelligence and Knowledge Exploration

1767 Accesses
2 Citations

Abstract

In this paper, we proposed a novel method of representing text documents based on clustering of term frequency vector. In order to cluster the term frequency vectors, we make use of Multiple Kernel Fuzzy C-Means (MKFCM). After clustering, term frequency vector of each cluster are used to form a interval valued representation (symbolic representation) by the use of mean and standard deviation. Further, interval value features are stored in knowledge base as a representative of the cluster. To corroborate the efficacy of the proposed model, we conducted extensive experimentation on standard datset like Reuters-21578 and 20 Newsgroup. We have compared our classification accuracy achieved by the Symbolic classifier with the other existing Naive Bayes classifier, KNN classifier and SVM classifier. The experimental result reveals that the classification accuracy achieved by using symbolic classifier is better than other three classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nedungadi, P., Harikumar, H., Ramesh, M.: A high performance hybrid algorithm for text classification. In: 2014 Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), pp. 118–123. IEEE (2014)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Choudhary, B., Bhattacharyya, P.: Text clustering using universal networking language representation. In: The Proceedings of Eleventh International World Wide Web Conference, pp. 1–7 (2002)
Google Scholar
Hotho, A., Maedche, A., Staab, S.: Ontology-based text document clustering 16, 48–54 (2002)
Google Scholar
Cavnar, W.: Using an n-gram-based document representation with a vector processing retrieval model, pp. 269–269. NIST SPECIAL PUBLICATION SP (1995)
Google Scholar
Milios, E., Zhang, Y., He, B., Dong, L.: Automatic term extraction and document similarity in special text corpora. In: Proceedings of the Sixth Conference of the Pacific Association for Computational Linguistics, pp. 275–284. Citeseer (2003)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JAsIs 41(6), 391–407 (1990)
Article Google Scholar
He, X., Cai, D., Liu, H., Ma, W.Y.: Locality preserving indexing for document representation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (2004)
Google Scholar
Cai, D., He, X., Zhang, W.V., Han, J.: Regularized locality preserving indexing via spectral regression. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 741–750. ACM (2007)
Google Scholar
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998)
Google Scholar
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3, 1183–1208 (2003)
MATH Google Scholar
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)
MATH MathSciNet Google Scholar
Zadeh, L.A.: Similarity relations and fuzzy orderings. Inf. Sci. 3(2), 177–200 (1971)
Article MATH MathSciNet Google Scholar
Anilkumarreddy, T., Madhukumar, B., Chandrakumar, K.: Classification of text using fuzzy based incremental feature clustering algorithm. Int. J. Adv. Res. Comput. Eng. Technol. 1(5), 313–318 (2012)
Google Scholar
Jiang, J.Y., Liou, R.J., Lee, S.J.: A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans. Knowl. Data Eng. 23(3), 335–349 (2011)
Article Google Scholar
Puri, S.: A fuzzy similarity based concept mining model for text classification. Int. J. Adv. Comput. Sci. Appl. 2(11), 115–121 (2012)
Google Scholar
Carvalho, F.D.A.: Fuzzy c-means clustering methods for symbolic interval data. Pattern Recogn. Lett. 28(4), 423–437 (2007)
Article Google Scholar
Guru, D.S., Harish, B.S., Manjunath, S.: Symbolic representation of text documents. In: Proceedings of the Third Annual ACM Bangalore Conference, pp. 1–8. ACM (2010)
Google Scholar
Harish, B.S., Prasad, B., Udayasri, B.: Classification of text documents using adaptive fuzzy c-means clustering. In: Thampi, S.M., Abraham, A., Pal, S.K., Rodriguez, J.M.C. (eds.) Recent Advances in Intelligent Informatics. AISC, vol. 235, pp. 205–214. Springer, Heidelberg (2014)
Chapter Google Scholar
Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Article Google Scholar
Huang, H.C., Chuang, Y.Y., Chen, C.S.: Multiple kernel fuzzy clustering. IEEE Trans. Fuzzy Syst. 20(1), 120–134 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysuru, India
B. S. Harish, M. B. Revanasiddappa & S. V. Aruna Kumar

Authors

B. S. Harish
View author publications
You can also search for this author in PubMed Google Scholar
M. B. Revanasiddappa
View author publications
You can also search for this author in PubMed Google Scholar
S. V. Aruna Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. S. Harish .

Editor information

Editors and Affiliations

Norwegian Univ. of Science & Technology, Trondheim, Norway
Rajendra Prasath
Intl Inst of Info Tech Hyderabad, Hyderabad, India
Anil Kumar Vuppala
V.H.N.S.N.College (Autonomous), Virudhunagar, Tamil Nadu, India
T. Kathirvalavakumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Harish, B.S., Revanasiddappa, M.B., Aruna Kumar, S.V. (2015). Symbolic Representation of Text Documents Using Multiple Kernel FCM. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-26832-3_10
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics