Skip to main content

News Topic Classification Using Machine Learning Techniques

  • Conference paper
  • First Online:
International Conference on Communication, Computing and Electronics Systems

Abstract

News topic classification is a method of classifying news articles available in text data into some predefined classes or labels. This is one of the applications of text classification. Text classification can be applied in the fields of spam filtering, language recognition, segmenting customer feedbacks, segregating technical documents, etc. This paper discusses news topic classification on AG's News Topic Classification Dataset using machine learning algorithms such as linear support vector machine, multinomial Naive Bayesian classifier, K-Nearest Neighbor, Rocchio, bagging, and boosting. This paper discusses three steps for classification, namely pre-processing of text, then applying feature extraction techniques, and finally implementing machine learning algorithms. These algorithms are compared using evaluation metrics like Accuracy, Recall, Precision, and F1 Score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kowsari K, JafariMeimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150

    Article  Google Scholar 

  2. Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Mining text data. Springer, Berlin/Heidelberg, Germany, pp 163–222

    Google Scholar 

  3. Aggarwal CC, Zhai CX (2012) Mining text data. Springer, Berlin/Heidelberg, Germany

    Book  Google Scholar 

  4. Sulova S, Todoranova L, Penchev B, Nacheva R (2017) Using text mining to classify research papers. Int Multidisc Sci GeoConf Surv Geol Min Ecol Manag SGEM 17(21):647–654

    Google Scholar 

  5. McCallum A, Nigam K (1999) Text classification by bootstrapping with keywords, EM and shrinkage. In: Unsupervised learning in natural language processing

    Google Scholar 

  6. Scott S, Matwin S (1998) Text classification using WordNet hypernyms. In: Usage of WordNet in natural language processing systems

    Google Scholar 

  7. Menaka (2014) Text classification using keyword extraction technique

    Google Scholar 

  8. Nguyen TH, Shirai K (2013) Text classification of technical papers based on text segmentation. In: International conference on application of natural language to information systems. Springer, Berlin, Heidelberg, pp 278–284

    Google Scholar 

  9. Bui DDA, Del Fiol G, Jonnalagadda S (2016) PDF text classification to leverage information extraction from publication reports. J Biomed Inform 61:141–148

    Article  Google Scholar 

  10. Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66

    Google Scholar 

  11. Verma T, Renu R, Gaur D (2014) Tokenization and filtering process in RapidMiner. Int J Appl Inf Syst 7(2):16–18

    Google Scholar 

  12. Aggarwal CC (2018) Machine learning for text. Springer International Publishing, Cham

    Book  Google Scholar 

  13. Spirovski K, Stevanoska E, Kulakov A, Popeska Z, Velinov G (2018) Comparison of different model’s performances in task of document classification. In: Proceedings of the 8th international conference on web intelligence, mining and semantics, pp 1–12

    Google Scholar 

  14. Singh J, Gupta V (2016) Text stemming: approaches, applications, and challenges. ACM Comput Surv (CSUR) 49(3):1–46

    Article  Google Scholar 

  15. Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc

    Google Scholar 

  16. Jiang S, Pang G, Wu M, Kuang L (2012) An improved K-nearest-neighbor algorithm for text categorization. Expert Syst Appl 39(1):1503–1509

    Article  Google Scholar 

  17. Korde V, Mahender CN (2012) Text classification and classifiers: a survey. Int J Artif Intell Appl 3(2):85

    Google Scholar 

  18. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  19. Freund Y (1992) An improved boosting algorithm and its implications on learning complexity. In: Proceedings of the fifth annual workshop on computational learning theory, pp 391–398

    Google Scholar 

  20. Bloehdorn S, Hotho A (2004) Boosting for text classification with semantic features. In: International workshop on knowledge discovery on the web. Springer, Berlin, Heidelberg, pp 149–166

    Google Scholar 

  21. Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66

    Google Scholar 

  22. Kim SB, Han KS, Rim HC, Myaeng SH (2006) Some effective techniques for Naive Bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Ramaiah Institute of Technology, Bangalore-560054, and Visvesvaraya Technological University, Jnana Sangama, Belagavi -590018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pramod Sunagar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sunagar, P., Kanavalli, A., Nayak, S.S., Mahan, S.R., Prasad, S., Prasad, S. (2021). News Topic Classification Using Machine Learning Techniques. In: Bindhu, V., Tavares, J.M.R.S., Boulogeorgos, AA.A., Vuppalapati, C. (eds) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol 733. Springer, Singapore. https://doi.org/10.1007/978-981-33-4909-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-4909-4_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-4908-7

  • Online ISBN: 978-981-33-4909-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics