Skip to main content

Text Categorization Using a Novel Feature Selection Technique Combined with ELM

  • Conference paper
  • First Online:
Recent Findings in Intelligent Computing Techniques

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 709))

Abstract

The rapid growth of digital documents and internet users on the web increases the searching time of a document for the end user, which affects the performance of the search engine badly. Hence, to reduce the searching time and to increase the efficiency of the search engine, text classification is the need of the day. But to do an efficient text classification, selection of good features also equally important. To address this issue, the current paper proposes an approach called Combined Correlation Discriminative Power Measure (CCDPM) where first the highly correlated terms (features) are removed from the corpus and then using the scores generated by discriminative power measure technique, the uncorrelated features of the corpus are ranked. Top k features are selected to generate the reduced training feature vector. For classification, Extreme Learning Machine (ELM) is used, and the empirical results on four benchmark datasets show the efficiency the proposed approach compared to other state-of-the-art feature selection techniques. Results of ELM are more promising compared to other conventional classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Decided based on the experiment so that we should not lose more terms.

  2. 2.

    https://www.dmoz.org/.

  3. 3.

    http://qwone.com/~jason/20Newsgroups/.

  4. 4.

    http://www.daviddlewis.com/resources/testcollections/reuters21578/.

  5. 5.

    http://www.dataminingresearch.com/index.php/2010/09/classic3-classic4-datasets/.

References

  1. Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. (CSUR) 41(2), 12 (2009)

    Article  Google Scholar 

  2. Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining text Data, pp. 163–222. Springer (2012)

    Google Scholar 

  3. Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 598–602. Association for Computational Linguistics (2011)

    Google Scholar 

  4. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  5. Roul, R.K., Asthana, S.R., Kumar, G.: Study on suitability and importance of multilayer extreme learning machine for classification of text data. Soft Comput. 1–18 (2016)

    Google Scholar 

  6. Roul, R.K., Nanda, A., Patel, V., Sahay, S.K.: Extreme learning machines in the field of text classification. In: 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1–7. IEEE (2015)

    Google Scholar 

  7. Kalaivani, P., Shunmuganathan, K.: Feature selection based on genetic algorithm and hybrid model for sentiment polarity classification. Int. J. Data Min. Modell. Manag. 8(4), 315–329 (2016)

    Google Scholar 

  8. Sarkar, A., Sahoo, G., Sahoo, U.: Feature selection in accident data: an analysis of its application in classification algorithms. Int. J. Data Anal. Tech. Strateg. 8(2), 108–121 (2016)

    Article  Google Scholar 

  9. Lee, J., Kim, D.-W.: Mutual information-based multi-label feature selection using interaction information. Expert Syst. Appl. 42(4), 2013–2025 (2015)

    Article  Google Scholar 

  10. Roul, R.K., Sahay, S.K.: K-means and wordnet based feature selection combined with extreme learning machines for text classification. In: International Conference on Distributed Computing and Internet Technology, pp. 103–112. Springer (2016)

    Google Scholar 

  11. Roul, R.K., Bhalla, A., Srivastava, A.: Commonality-rarity score computation: a novel feature selection technique using extended feature space of elm for text classification. In: Proceedings of the 8th Annual Meeting of the Forum on Information Retrieval Evaluation, pp. 37–41. ACM (2016)

    Google Scholar 

  12. Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. dissertation, The University of Waikato, 1999

    Google Scholar 

  13. Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann Publishers Inc. (1994)

    Google Scholar 

  14. Kohavi, R., Sommerfield, D.: Feature subset selection using the wrapper method: overfitting and dynamic search space topology, pp. 192–197. In: KDD (1995)

    Google Scholar 

  15. Collins, R.T., Liu, Y., Leordeanu, M.: Online selection of discriminative tracking features. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1631–1643 (2005)

    Article  Google Scholar 

  16. Manning, C., Raghavan, P.: Introduction to Information Retrieval (2008)

    Google Scholar 

  17. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  18. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)

    Google Scholar 

  19. Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text categorization. Expert Syst. Appl. 33(1), 1–5 (2007)

    Article  Google Scholar 

  20. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)

    Article  Google Scholar 

  21. Rao, C.R., Mitra, S.K., et al.: Generalized Inverse of a Matrix and Its Applications, vol. 1, pp. 601–620 (1972)

    Google Scholar 

  22. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)

    Article  Google Scholar 

  23. Huang, G.-B., Chen, L.: Convex incremental extreme learning machine. Neurocomputing 70(16), 3056–3062 (2007)

    Article  Google Scholar 

  24. Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajendra Kumar Roul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Roul, R.K., Sahoo, J.K. (2018). Text Categorization Using a Novel Feature Selection Technique Combined with ELM. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-10-8633-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8633-5_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8632-8

  • Online ISBN: 978-981-10-8633-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics