Abstract
News topic classification is a method of classifying news articles available in text data into some predefined classes or labels. This is one of the applications of text classification. Text classification can be applied in the fields of spam filtering, language recognition, segmenting customer feedbacks, segregating technical documents, etc. This paper discusses news topic classification on AG's News Topic Classification Dataset using machine learning algorithms such as linear support vector machine, multinomial Naive Bayesian classifier, K-Nearest Neighbor, Rocchio, bagging, and boosting. This paper discusses three steps for classification, namely pre-processing of text, then applying feature extraction techniques, and finally implementing machine learning algorithms. These algorithms are compared using evaluation metrics like Accuracy, Recall, Precision, and F1 Score.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kowsari K, JafariMeimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150
Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Mining text data. Springer, Berlin/Heidelberg, Germany, pp 163–222
Aggarwal CC, Zhai CX (2012) Mining text data. Springer, Berlin/Heidelberg, Germany
Sulova S, Todoranova L, Penchev B, Nacheva R (2017) Using text mining to classify research papers. Int Multidisc Sci GeoConf Surv Geol Min Ecol Manag SGEM 17(21):647–654
McCallum A, Nigam K (1999) Text classification by bootstrapping with keywords, EM and shrinkage. In: Unsupervised learning in natural language processing
Scott S, Matwin S (1998) Text classification using WordNet hypernyms. In: Usage of WordNet in natural language processing systems
Menaka (2014) Text classification using keyword extraction technique
Nguyen TH, Shirai K (2013) Text classification of technical papers based on text segmentation. In: International conference on application of natural language to information systems. Springer, Berlin, Heidelberg, pp 278–284
Bui DDA, Del Fiol G, Jonnalagadda S (2016) PDF text classification to leverage information extraction from publication reports. J Biomed Inform 61:141–148
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66
Verma T, Renu R, Gaur D (2014) Tokenization and filtering process in RapidMiner. Int J Appl Inf Syst 7(2):16–18
Aggarwal CC (2018) Machine learning for text. Springer International Publishing, Cham
Spirovski K, Stevanoska E, Kulakov A, Popeska Z, Velinov G (2018) Comparison of different model’s performances in task of document classification. In: Proceedings of the 8th international conference on web intelligence, mining and semantics, pp 1–12
Singh J, Gupta V (2016) Text stemming: approaches, applications, and challenges. ACM Comput Surv (CSUR) 49(3):1–46
Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc
Jiang S, Pang G, Wu M, Kuang L (2012) An improved K-nearest-neighbor algorithm for text categorization. Expert Syst Appl 39(1):1503–1509
Korde V, Mahender CN (2012) Text classification and classifiers: a survey. Int J Artif Intell Appl 3(2):85
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Freund Y (1992) An improved boosting algorithm and its implications on learning complexity. In: Proceedings of the fifth annual workshop on computational learning theory, pp 391–398
Bloehdorn S, Hotho A (2004) Boosting for text classification with semantic features. In: International workshop on knowledge discovery on the web. Springer, Berlin, Heidelberg, pp 149–166
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66
Kim SB, Han KS, Rim HC, Myaeng SH (2006) Some effective techniques for Naive Bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466
Acknowledgements
This work was supported by Ramaiah Institute of Technology, Bangalore-560054, and Visvesvaraya Technological University, Jnana Sangama, Belagavi -590018.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sunagar, P., Kanavalli, A., Nayak, S.S., Mahan, S.R., Prasad, S., Prasad, S. (2021). News Topic Classification Using Machine Learning Techniques. In: Bindhu, V., Tavares, J.M.R.S., Boulogeorgos, AA.A., Vuppalapati, C. (eds) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol 733. Springer, Singapore. https://doi.org/10.1007/978-981-33-4909-4_35
Download citation
DOI: https://doi.org/10.1007/978-981-33-4909-4_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4908-7
Online ISBN: 978-981-33-4909-4
eBook Packages: EngineeringEngineering (R0)