An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation

Kumar, Sachin; Sharma, Aditya; Reddy, B Kartheek; Sachan, Shreyas; Jain, Vaibhav; Singh, Jagvinder

doi:10.1007/s13198-021-01471-7

An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation

Original article
Published: 07 November 2021

Volume 13, pages 1341–1355, (2022)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Sachin Kumar ORCID: orcid.org/0000-0002-2150-4502¹,
Aditya Sharma¹,
B Kartheek Reddy¹,
Shreyas Sachan¹,
Vaibhav Jain¹ &
…
Jagvinder Singh²

510 Accesses
6 Citations
Explore all metrics

Abstract

Digital technologies, their product and services have empowered the masses to generate information at a faster pace. Digital technologies based information sharing platforms such as news websites and social media platforms such as Facebook, Twitter, Instagram, What’s app etc have flooded the information space due to the easy generation of information and dissemination to the masses instantly. Information classification has been an important task, especially in newspapers and media organisations. In another area also, information or text classification has an important role to play so that important and vital information can be classified based on the already predefined categories. In journalism, editors and resources persons were allocated the task to recognise and classify the news stories so that they can be placed in the predefined categories of economy and business news, political news, social news, editorial section, education and career, and sports information etc. Nowadays the process of classification and segregation of textual information has become challenging due to the flow of diverse, vast information. Additionally, the pace of information and its updates, access and competition among the media House have made it more challenging. Hence automated and intelligent tools which can classify the information and text accurately and efficiently is needed to reduces human efforts, time and increase productivity. This paper presents an intelligent, efficient and robust intelligent machine learning model based on Multinomial Naive Bayes(MNB) to classify the current affairs news stories. The proposed Inverse Document Frequency(IDF) integrated MNB model achieves classification accuracy of 87.22 per cent. The experiment results are also compared with other machine learning models such as Logistics Regression(LR), Support Vector Machine(SVM), K-Nearest Neighbours(KNN) and Random forest(RF). The results demonstrate that the presented model is better in term of accuracy and may be deployed in real world information classification and media domain to improve the productivity, efficiency of the current affairs news classification process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mizo News Classification Using Machine Learning Techniques

Performance Comparison of Different Machine Learning Algorithms on Hindi News Classification

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Article 05 March 2020

References

Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288
Article Google Scholar
Al Omran FNA, Treude C (2017) Choosing an nlp library for analyzing software documentation: a systematic literature review and a series of experiments. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), IEEE, pp 187–197
Alhothali A, Hoey J (2015) Good news or bad news: Using affect control theory to analyze readers’reaction towards news articles. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1548–1558
Ashuri T (2016) When online news was new: Online technology use and constitution of structures in journalism. Journal Stud 17(3):301–318
Google Scholar
Bail CA (2017) Taming big data: Using app technology to study organizational behavior on social media. Sociol Methods Res 46(2):189–217
Article MathSciNet Google Scholar
Boumans JW, Trilling D (2016) Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digit Journal 4(1):8–23
Google Scholar
Canito J, Ramos P, Moro S, Rita P (2018) Unfolding the relations between companies and technologies under the big data umbrella. Comput Ind 99:1–8
Article Google Scholar
Carstens L, Toni F (2017) Using argumentation to improve classification in natural language problems. ACM Trans Internet Technol (TOIT) 17(3):30
Article Google Scholar
Castillo C, El-Haddad M, Pfeffer J, Stempeck M (2014) Characterizing the life cycle of online news stories using social media reactions. In: Proceedings of the 17th ACM conference on Computer supported cooperative work and social computing, ACM, pp 211–223
Chava RVSP, Dhar S, Gaur Y, Rambhakta P, Shetty S (2018) Big data text summarization-hurricane irma
Conroy NJ, Rubin VL, Chen Y (2015) Automatic deception detection: Methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1–4
Article Google Scholar
Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311
Article Google Scholar
Del Vicario M, Bessi A, Zollo F, Petroni F, Scala A, Caldarelli G, Stanley HE, Quattrociocchi W (2016) The spreading of misinformation online. Proc Nat Acad Sci 113(3):554–559
Article Google Scholar
Fong S, Zhuang Y, Li J, Khoury R (2013) Sentiment analysis of online news using mallet. In: Proceedings of the 2013 International Symposium on Computational and Business Intelligence, IEEE Computer Society, Washington, DC, USA, ISCBI ’13, pp 301–304, 10.1109/ISCBI.2013.67
Gui Y, Gao Z, Li R, Yang X (2012) Hierarchical text classification for news articles based-on named entities. In: International Conference on Advanced Data Mining and Applications, Springer, pp 318–329
Habash N, Rambow O, Roth R (2009) Mada+ tokan: A toolkit for arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd international conference on Arabic language resources and tools (MEDAR), Cairo, Egypt, vol 41, p 62
Hakim AA, Erwin A, Eng KI, Galinium M, Muliady W (2014) Automated document classification for news article in bahasa indonesia based on term frequency inverse document frequency (tf-idf) approach. In: 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE), IEEE, pp 1–4
Hogenboom F, Frasincar F, Kaymak U, De Jong F, Caron E (2016) A survey of event extraction methods from text for decision support systems. Decis Support Syst 85:12–22
Article Google Scholar
Hsu WL, Lang SD (1999) Classification algorithms for netnews articles. In: Proceedings of the Eighth International Conference on Information and Knowledge Management, ACM, New York, NY, USA, CIKM ’99, pp 114–121, 10.1145/319950.319965
Hu Y, Ye X, Shaw SL (2017) Extracting and analyzing semantic relatedness between cities using news articles. Int J Geogr Inf Sci 31(12):2427–2451
Article Google Scholar
Joachims T (1998) Making large-scale svm learning practical. Technical report, Tech rep
Google Scholar
Kanan T, Fox EA (2016) Automated Arabic text classification with p-s temmer, machine learning, and a tailored news article taxonomy. J Assoc Inf Sci Technol 67(11):2667–2683
Article Google Scholar
Kibriya AM, Frank E, Pfahringer B, Holmes G (2004) Multinomial naive bayes for text categorization revisited. In: Australasian Joint Conference on Artificial Intelligence, Springer, pp 488–499
Kiranyaz S, Ince T, Gabbouj M (2014) Multidimensional particle swarm optimization for machine learning and pattern recognition. Springer, Berlin
Book MATH Google Scholar
Korb KB, Nicholson AE (2010) Bayesian artificial intelligence. CRC Press, Florida
Book MATH Google Scholar
Kumar S, Kalia A, Sharma A (2018a) Predictive analysis of alertness related features for driver drowsiness detection. Adv Intell Syst Comput 736:368–377
Google Scholar
Kumar S, Pal SK, Singh R (2018b) A novel method based on extreme learning machine to predict heating and cooling load through design and structural attributes. Energ Build 176:275–286
Article Google Scholar
Kumar S, Singh R, Pal SK (2018c) A conceptual architectural design for intelligent health information system: Case study on india. Quality, IT and Business Operations: Springer Proceedings in Business and Economics, vol 1. Springer, Singapore, pp 1–15
Kumar S, Saibal KP, Singh R (2019) A novel hybrid model based on particle swarm optimisation and extreme learning machine for short-term temperature prediction using ambient sensors. Sustain Cities Soc
Kumar S, Singh J, Singh O (2020) Ensemble-based extreme learning machine model for occupancy detection with ambient attributes. Int J Syst Assur Eng Manag
Kurt I, Ture M, Kurum AT (2008) Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl 34(1):366–374
Article Google Scholar
Li J, Fong S, Zhuang Y, Khoury R (2016) Hierarchical classification in text mining for sentiment analysis of online news. Soft Comput 20(9):3411–3420
Article Google Scholar
Lin WY, Hu YH, Tsai CF (2011) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst, Man, Cybern, Part C (Appl Rev) 42(4):421–436
Google Scholar
Lykourentzou I, Giannoukos I, Nikolopoulos V, Mpardis G, Loumos V (2009) Dropout prediction in e-learning courses through the combination of machine learning techniques. Comput Educ 53(3):950–965
Article Google Scholar
Marconi F (2020) Newsmakers: artificial intelligence and the future of journalism. Columbia University Press, Columbia
Book Google Scholar
McCallum A, Nigam K, et al. (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, Citeseer, vol 752, pp 41–48
Medagoda N (2016) Sentiment analysis on morphologically rich languages: An artificial neural network (ann) approach. In: Artificial Neural Network Modelling, Springer, pp 377–393
Miller K, Oswalt A (2017) Fake news headline classification using neural networks with attention. Tech. Rep., California State University
Mukwazvure A, Supreethi K (2015) A hybrid approach to sentiment analysis of news comments. 2015 4th International Conference on Reliability. Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions), IEEE, pp 1–6
Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567
Article Google Scholar
Onan A, Korukoğlu S, Bulut H (2016) Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst Appl 57:232–247
Article Google Scholar
Paragios N, Chen Y, Faugeras OD (2006) Handbook of mathematical models in computer vision. Springer, Berlin
Book MATH Google Scholar
Pröllochs N, Feuerriegel S, Neumann D (2016) Negation scope detection in sentiment analysis: Decision support for news-driven trading. Decis Support Syst 88:67–75
Article Google Scholar
Ramík DM, Sabourin C, Moreno R, Madani K (2014) A machine learning based intelligent vision system for autonomous object detection and recognition. Appl Intell 40(2):358–375
Article Google Scholar
Rasouli K, Hsieh WW, Cannon AJ (2012) Daily streamflow forecasting by machine learning methods with weather and climate inputs. J Hydrol 414:284–293
Article Google Scholar
Ravì D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang GZ (2016) Deep learning for health informatics. IEEE J Biomed Health Inf 21(1):4–21
Article Google Scholar
Salminen J, Yoganathan V, Corporan J, Jansen BJ, Jung SG (2019) Machine learning approach to auto-tagging online content for content marketing efficiency: A comparative analysis between methods and content type. J Bus Res 101:203–217
Article Google Scholar
Sen PC, Hajra M, Ghosh M (2020) Supervised classification algorithms in machine learning: A survey and review. In: Emerging technology in modelling and graphics, Springer, pp 99–111
Singh G, Kumar B, Gaur L, Tyagi A (2019) Comparison between multinomial and bernoulli naïve bayes for text classification. 2019 International Conference on Automation. Computational and Technology Management (ICACTM), IEEE, pp 593–596
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Article Google Scholar
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article Google Scholar
Thomson T, Angus D, Dootson P, Hurcombe E, Smith A (2020) Visual mis/disinformation in journalism and public communications: current verification practices, challenges, and future opportunities. Journalism Practice pp 1–25
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(Nov):45–66
MATH Google Scholar
Ur-Rahman N, Harding JA (2012) Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Syst Appl 39(5):4729–4739
Article Google Scholar
Valdivia A, Luzón MV, Herrera F (2017) Sentiment analysis in tripadvisor. IEEE Intell Syst 32(4):72–77
Article Google Scholar
Van Veldhoven Z, Vanthienen J (2021) Digital transformation as an interaction-driven perspective between business, society, and technology. Electron Mark pp 1–16
Wang N, Wang P, Zhang B (2010) An improved tf-idf weights function based on information theory. In: 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering, IEEE, vol 3, pp 439–441
Witten IH, Frank E, Hall MA, Pal C, DATA M (2005) Practical machine learning tools and techniques. In: DATA MINING, vol 2, p 4
Zahid N, Abouelala O, Limouri M, Essaid A (2001) Fuzzy clustering based on k-nearest-neighbours rule. Fuzzy Sets Syst 120(2):239–247
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Acknowledgements, if any, should follow the conclusions, and be placed above any Appendices or the references.

Funding

No funding received for this research work.

Author information

Authors and Affiliations

Cluster Innovation Centre, University of University, Delhi, India
Sachin Kumar, Aditya Sharma, B Kartheek Reddy, Shreyas Sachan & Vaibhav Jain
Dept of Management, Delhi Technological University, Delhi, India
Jagvinder Singh

Authors

Sachin Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Sharma
View author publications
You can also search for this author in PubMed Google Scholar
B Kartheek Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Shreyas Sachan
View author publications
You can also search for this author in PubMed Google Scholar
Vaibhav Jain
View author publications
You can also search for this author in PubMed Google Scholar
Jagvinder Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sachin Kumar.

Ethics declarations

Conflict of Interest

There is no conflict of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, S., Sharma, A., Reddy, B.K. et al. An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation. Int J Syst Assur Eng Manag 13, 1341–1355 (2022). https://doi.org/10.1007/s13198-021-01471-7

Download citation

Received: 06 December 2020
Revised: 20 September 2021
Accepted: 19 October 2021
Published: 07 November 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s13198-021-01471-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation

Abstract

Access this article

Similar content being viewed by others

Mizo News Classification Using Machine Learning Techniques

Performance Comparison of Different Machine Learning Algorithms on Hindi News Classification

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation

Abstract

Access this article

Similar content being viewed by others

Mizo News Classification Using Machine Learning Techniques

Performance Comparison of Different Machine Learning Algorithms on Hindi News Classification

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation