Abstract
The rapid growth of digital documents and internet users on the web increases the searching time of a document for the end user, which affects the performance of the search engine badly. Hence, to reduce the searching time and to increase the efficiency of the search engine, text classification is the need of the day. But to do an efficient text classification, selection of good features also equally important. To address this issue, the current paper proposes an approach called Combined Correlation Discriminative Power Measure (CCDPM) where first the highly correlated terms (features) are removed from the corpus and then using the scores generated by discriminative power measure technique, the uncorrelated features of the corpus are ranked. Top k features are selected to generate the reduced training feature vector. For classification, Extreme Learning Machine (ELM) is used, and the empirical results on four benchmark datasets show the efficiency the proposed approach compared to other state-of-the-art feature selection techniques. Results of ELM are more promising compared to other conventional classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Decided based on the experiment so that we should not lose more terms.
- 2.
- 3.
- 4.
- 5.
References
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. (CSUR) 41(2), 12 (2009)
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining text Data, pp. 163–222. Springer (2012)
Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 598–602. Association for Computational Linguistics (2011)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Roul, R.K., Asthana, S.R., Kumar, G.: Study on suitability and importance of multilayer extreme learning machine for classification of text data. Soft Comput. 1–18 (2016)
Roul, R.K., Nanda, A., Patel, V., Sahay, S.K.: Extreme learning machines in the field of text classification. In: 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1–7. IEEE (2015)
Kalaivani, P., Shunmuganathan, K.: Feature selection based on genetic algorithm and hybrid model for sentiment polarity classification. Int. J. Data Min. Modell. Manag. 8(4), 315–329 (2016)
Sarkar, A., Sahoo, G., Sahoo, U.: Feature selection in accident data: an analysis of its application in classification algorithms. Int. J. Data Anal. Tech. Strateg. 8(2), 108–121 (2016)
Lee, J., Kim, D.-W.: Mutual information-based multi-label feature selection using interaction information. Expert Syst. Appl. 42(4), 2013–2025 (2015)
Roul, R.K., Sahay, S.K.: K-means and wordnet based feature selection combined with extreme learning machines for text classification. In: International Conference on Distributed Computing and Internet Technology, pp. 103–112. Springer (2016)
Roul, R.K., Bhalla, A., Srivastava, A.: Commonality-rarity score computation: a novel feature selection technique using extended feature space of elm for text classification. In: Proceedings of the 8th Annual Meeting of the Forum on Information Retrieval Evaluation, pp. 37–41. ACM (2016)
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. dissertation, The University of Waikato, 1999
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann Publishers Inc. (1994)
Kohavi, R., Sommerfield, D.: Feature subset selection using the wrapper method: overfitting and dynamic search space topology, pp. 192–197. In: KDD (1995)
Collins, R.T., Liu, Y., Leordeanu, M.: Online selection of discriminative tracking features. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1631–1643 (2005)
Manning, C., Raghavan, P.: Introduction to Information Retrieval (2008)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text categorization. Expert Syst. Appl. 33(1), 1–5 (2007)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Rao, C.R., Mitra, S.K., et al.: Generalized Inverse of a Matrix and Its Applications, vol. 1, pp. 601–620 (1972)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)
Huang, G.-B., Chen, L.: Convex incremental extreme learning machine. Neurocomputing 70(16), 3056–3062 (2007)
Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Roul, R.K., Sahoo, J.K. (2018). Text Categorization Using a Novel Feature Selection Technique Combined with ELM. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-10-8633-5_23
Download citation
DOI: https://doi.org/10.1007/978-981-10-8633-5_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8632-8
Online ISBN: 978-981-10-8633-5
eBook Packages: EngineeringEngineering (R0)