K-means and Wordnet Based Feature Selection Combined with Extreme Learning Machines for Text Classification

Roul, Rajendra Kumar; Sahay, Sanjay Kumar

doi:10.1007/978-3-319-28034-9_13

Rajendra Kumar Roul¹⁶ &
Sanjay Kumar Sahay¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9581))

Included in the following conference series:

International Conference on Distributed Computing and Internet Technology

854 Accesses
3 Citations

Abstract

The incredible increase of online documents in digital form on the Web, has renewed the interest in text classification. The aim of text classification is to classify text documents into a set of pre-defined categories. But the poor quality of features selection, extremely high dimensional feature space and complexity of natural languages become the roadblock for this classification process. To address these issues, here we propose a k-means clustering based feature selection for text classification. Bi-Normal Separation (BNS) combine with Wordnet and cosine-similarity helps to form a quality and reduce feature vector to train the Extreme Learning Machine (ELM) and Multi-layer Extreme Learning Machine (ML-ELM) classifiers. For experimental purpose, 20-Newsgroups and DMOZ datasets have been used. The empirical results on these two benchmark datasets demonstrate the applicability, efficiency and effectiveness of our approach using ELM and ML-ELM as the classifiers over state-of-the-art classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://ai.stanford.edu/~rion/parsing/minipar_viz.html.
2.
https://radimrehurek.com/gensim/tutorial.html.
3.
Iteratively running the script over a range of values of m and finally select that value of m for which the result is best.
4.
http://qwone.com/~jason/20Newsgroups/.
5.
http://www.dmoz.org.

References

Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)
Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. Mining Text Data, pp. 163–222. Springer, New York (2012)
Chapter Google Scholar
Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol. 2. Association for Computational Linguistics, pp. 598–602 (2011)
Google Scholar
Qiu, X., Zhou, J., Huang, X.: An effective feature selection method for text categorization. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 50–61. Springer, Heidelberg (2011)
Chapter Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Article Google Scholar
Eyheramendy, S., Madigan, D.: A novel feature selection score for text categorization. In: Proceedings of the Workshop on Feature Selection for Data Mining, in conjunction with the 2005 SIAM International Conference on Data Mining, pp. 1–8 (2005)
Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Article Google Scholar
Vapnik, V., Golowich, S.E., Smola, A.: Support vector method for function approximation, regression estimation, and signal processing. In: Advances in Neural Information Processing Systems, vol. 9. Citeseer (1996)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MATH MathSciNet Google Scholar
Kasun, H.G.V., Zhou, H.: Representational learning with elms for big data scholarly article. IEEE Intell. Syst. 28(6), 31–34 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

BITS-Pilani, K.K. Birla Goa Campus, Goa, India
Rajendra Kumar Roul & Sanjay Kumar Sahay

Authors

Rajendra Kumar Roul
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Kumar Sahay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajendra Kumar Roul .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, Washington, USA
Nikolaj Bjørner
Indian Institute of Technology Delhi, New Delhi, India
Sanjiva Prasad
IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA
Laxmi Parida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roul, R.K., Sahay, S.K. (2016). K-means and Wordnet Based Feature Selection Combined with Extreme Learning Machines for Text Classification. In: Bjørner, N., Prasad, S., Parida, L. (eds) Distributed Computing and Internet Technology. ICDCIT 2016. Lecture Notes in Computer Science(), vol 9581. Springer, Cham. https://doi.org/10.1007/978-3-319-28034-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-28034-9_13
Published: 25 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28033-2
Online ISBN: 978-3-319-28034-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics