Skip to main content
Log in

An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation

  • Original article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Digital technologies, their product and services have empowered the masses to generate information at a faster pace. Digital technologies based information sharing platforms such as news websites and social media platforms such as Facebook, Twitter, Instagram, What’s app etc have flooded the information space due to the easy generation of information and dissemination to the masses instantly. Information classification has been an important task, especially in newspapers and media organisations. In another area also, information or text classification has an important role to play so that important and vital information can be classified based on the already predefined categories. In journalism, editors and resources persons were allocated the task to recognise and classify the news stories so that they can be placed in the predefined categories of economy and business news, political news, social news, editorial section, education and career, and sports information etc. Nowadays the process of classification and segregation of textual information has become challenging due to the flow of diverse, vast information. Additionally, the pace of information and its updates, access and competition among the media House have made it more challenging. Hence automated and intelligent tools which can classify the information and text accurately and efficiently is needed to reduces human efforts, time and increase productivity. This paper presents an intelligent, efficient and robust intelligent machine learning model based on Multinomial Naive Bayes(MNB) to classify the current affairs news stories. The proposed Inverse Document Frequency(IDF) integrated MNB model achieves classification accuracy of 87.22 per cent. The experiment results are also compared with other machine learning models such as Logistics Regression(LR), Support Vector Machine(SVM), K-Nearest Neighbours(KNN) and Random forest(RF). The results demonstrate that the presented model is better in term of accuracy and may be deployed in real world information classification and media domain to improve the productivity, efficiency of the current affairs news classification process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288

    Article  Google Scholar 

  • Al Omran FNA, Treude C (2017) Choosing an nlp library for analyzing software documentation: a systematic literature review and a series of experiments. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), IEEE, pp 187–197

  • Alhothali A, Hoey J (2015) Good news or bad news: Using affect control theory to analyze readers’reaction towards news articles. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1548–1558

  • Ashuri T (2016) When online news was new: Online technology use and constitution of structures in journalism. Journal Stud 17(3):301–318

    Google Scholar 

  • Bail CA (2017) Taming big data: Using app technology to study organizational behavior on social media. Sociol Methods Res 46(2):189–217

    Article  MathSciNet  Google Scholar 

  • Boumans JW, Trilling D (2016) Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digit Journal 4(1):8–23

    Google Scholar 

  • Canito J, Ramos P, Moro S, Rita P (2018) Unfolding the relations between companies and technologies under the big data umbrella. Comput Ind 99:1–8

    Article  Google Scholar 

  • Carstens L, Toni F (2017) Using argumentation to improve classification in natural language problems. ACM Trans Internet Technol (TOIT) 17(3):30

    Article  Google Scholar 

  • Castillo C, El-Haddad M, Pfeffer J, Stempeck M (2014) Characterizing the life cycle of online news stories using social media reactions. In: Proceedings of the 17th ACM conference on Computer supported cooperative work and social computing, ACM, pp 211–223

  • Chava RVSP, Dhar S, Gaur Y, Rambhakta P, Shetty S (2018) Big data text summarization-hurricane irma

  • Conroy NJ, Rubin VL, Chen Y (2015) Automatic deception detection: Methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1–4

    Article  Google Scholar 

  • Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311

    Article  Google Scholar 

  • Del Vicario M, Bessi A, Zollo F, Petroni F, Scala A, Caldarelli G, Stanley HE, Quattrociocchi W (2016) The spreading of misinformation online. Proc Nat Acad Sci 113(3):554–559

    Article  Google Scholar 

  • Fong S, Zhuang Y, Li J, Khoury R (2013) Sentiment analysis of online news using mallet. In: Proceedings of the 2013 International Symposium on Computational and Business Intelligence, IEEE Computer Society, Washington, DC, USA, ISCBI ’13, pp 301–304, 10.1109/ISCBI.2013.67

  • Gui Y, Gao Z, Li R, Yang X (2012) Hierarchical text classification for news articles based-on named entities. In: International Conference on Advanced Data Mining and Applications, Springer, pp 318–329

  • Habash N, Rambow O, Roth R (2009) Mada+ tokan: A toolkit for arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd international conference on Arabic language resources and tools (MEDAR), Cairo, Egypt, vol 41, p 62

  • Hakim AA, Erwin A, Eng KI, Galinium M, Muliady W (2014) Automated document classification for news article in bahasa indonesia based on term frequency inverse document frequency (tf-idf) approach. In: 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE), IEEE, pp 1–4

  • Hogenboom F, Frasincar F, Kaymak U, De Jong F, Caron E (2016) A survey of event extraction methods from text for decision support systems. Decis Support Syst 85:12–22

    Article  Google Scholar 

  • Hsu WL, Lang SD (1999) Classification algorithms for netnews articles. In: Proceedings of the Eighth International Conference on Information and Knowledge Management, ACM, New York, NY, USA, CIKM ’99, pp 114–121, 10.1145/319950.319965

  • Hu Y, Ye X, Shaw SL (2017) Extracting and analyzing semantic relatedness between cities using news articles. Int J Geogr Inf Sci 31(12):2427–2451

    Article  Google Scholar 

  • Joachims T (1998) Making large-scale svm learning practical. Technical report, Tech rep

    Google Scholar 

  • Kanan T, Fox EA (2016) Automated Arabic text classification with p-s temmer, machine learning, and a tailored news article taxonomy. J Assoc Inf Sci Technol 67(11):2667–2683

    Article  Google Scholar 

  • Kibriya AM, Frank E, Pfahringer B, Holmes G (2004) Multinomial naive bayes for text categorization revisited. In: Australasian Joint Conference on Artificial Intelligence, Springer, pp 488–499

  • Kiranyaz S, Ince T, Gabbouj M (2014) Multidimensional particle swarm optimization for machine learning and pattern recognition. Springer, Berlin

    Book  MATH  Google Scholar 

  • Korb KB, Nicholson AE (2010) Bayesian artificial intelligence. CRC Press, Florida

    Book  MATH  Google Scholar 

  • Kumar S, Kalia A, Sharma A (2018a) Predictive analysis of alertness related features for driver drowsiness detection. Adv Intell Syst Comput 736:368–377

    Google Scholar 

  • Kumar S, Pal SK, Singh R (2018b) A novel method based on extreme learning machine to predict heating and cooling load through design and structural attributes. Energ Build 176:275–286

    Article  Google Scholar 

  • Kumar S, Singh R, Pal SK (2018c) A conceptual architectural design for intelligent health information system: Case study on india. Quality, IT and Business Operations: Springer Proceedings in Business and Economics, vol 1. Springer, Singapore, pp 1–15

  • Kumar S, Saibal KP, Singh R (2019) A novel hybrid model based on particle swarm optimisation and extreme learning machine for short-term temperature prediction using ambient sensors. Sustain Cities Soc

  • Kumar S, Singh J, Singh O (2020) Ensemble-based extreme learning machine model for occupancy detection with ambient attributes. Int J Syst Assur Eng Manag

  • Kurt I, Ture M, Kurum AT (2008) Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl 34(1):366–374

    Article  Google Scholar 

  • Li J, Fong S, Zhuang Y, Khoury R (2016) Hierarchical classification in text mining for sentiment analysis of online news. Soft Comput 20(9):3411–3420

    Article  Google Scholar 

  • Lin WY, Hu YH, Tsai CF (2011) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst, Man, Cybern, Part C (Appl Rev) 42(4):421–436

    Google Scholar 

  • Lykourentzou I, Giannoukos I, Nikolopoulos V, Mpardis G, Loumos V (2009) Dropout prediction in e-learning courses through the combination of machine learning techniques. Comput Educ 53(3):950–965

    Article  Google Scholar 

  • Marconi F (2020) Newsmakers: artificial intelligence and the future of journalism. Columbia University Press, Columbia

    Book  Google Scholar 

  • McCallum A, Nigam K, et al. (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, Citeseer, vol 752, pp 41–48

  • Medagoda N (2016) Sentiment analysis on morphologically rich languages: An artificial neural network (ann) approach. In: Artificial Neural Network Modelling, Springer, pp 377–393

  • Miller K, Oswalt A (2017) Fake news headline classification using neural networks with attention. Tech. Rep., California State University

  • Mukwazvure A, Supreethi K (2015) A hybrid approach to sentiment analysis of news comments. 2015 4th International Conference on Reliability. Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions), IEEE, pp 1–6

  • Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567

    Article  Google Scholar 

  • Onan A, Korukoğlu S, Bulut H (2016) Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst Appl 57:232–247

    Article  Google Scholar 

  • Paragios N, Chen Y, Faugeras OD (2006) Handbook of mathematical models in computer vision. Springer, Berlin

    Book  MATH  Google Scholar 

  • Pröllochs N, Feuerriegel S, Neumann D (2016) Negation scope detection in sentiment analysis: Decision support for news-driven trading. Decis Support Syst 88:67–75

    Article  Google Scholar 

  • Ramík DM, Sabourin C, Moreno R, Madani K (2014) A machine learning based intelligent vision system for autonomous object detection and recognition. Appl Intell 40(2):358–375

    Article  Google Scholar 

  • Rasouli K, Hsieh WW, Cannon AJ (2012) Daily streamflow forecasting by machine learning methods with weather and climate inputs. J Hydrol 414:284–293

    Article  Google Scholar 

  • Ravì D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang GZ (2016) Deep learning for health informatics. IEEE J Biomed Health Inf 21(1):4–21

    Article  Google Scholar 

  • Salminen J, Yoganathan V, Corporan J, Jansen BJ, Jung SG (2019) Machine learning approach to auto-tagging online content for content marketing efficiency: A comparative analysis between methods and content type. J Bus Res 101:203–217

    Article  Google Scholar 

  • Sen PC, Hajra M, Ghosh M (2020) Supervised classification algorithms in machine learning: A survey and review. In: Emerging technology in modelling and graphics, Springer, pp 99–111

  • Singh G, Kumar B, Gaur L, Tyagi A (2019) Comparison between multinomial and bernoulli naïve bayes for text classification. 2019 International Conference on Automation. Computational and Technology Management (ICACTM), IEEE, pp 593–596

  • Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437

    Article  Google Scholar 

  • Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  • Thomson T, Angus D, Dootson P, Hurcombe E, Smith A (2020) Visual mis/disinformation in journalism and public communications: current verification practices, challenges, and future opportunities. Journalism Practice pp 1–25

  • Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(Nov):45–66

    MATH  Google Scholar 

  • Ur-Rahman N, Harding JA (2012) Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Syst Appl 39(5):4729–4739

    Article  Google Scholar 

  • Valdivia A, Luzón MV, Herrera F (2017) Sentiment analysis in tripadvisor. IEEE Intell Syst 32(4):72–77

    Article  Google Scholar 

  • Van Veldhoven Z, Vanthienen J (2021) Digital transformation as an interaction-driven perspective between business, society, and technology. Electron Mark pp 1–16

  • Wang N, Wang P, Zhang B (2010) An improved tf-idf weights function based on information theory. In: 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering, IEEE, vol 3, pp 439–441

  • Witten IH, Frank E, Hall MA, Pal C, DATA M (2005) Practical machine learning tools and techniques. In: DATA MINING, vol 2, p 4

  • Zahid N, Abouelala O, Limouri M, Essaid A (2001) Fuzzy clustering based on k-nearest-neighbours rule. Fuzzy Sets Syst 120(2):239–247

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Acknowledgements, if any, should follow the conclusions, and be placed above any Appendices or the references.

Funding

No funding received for this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sachin Kumar.

Ethics declarations

Conflict of Interest

There is no conflict of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, S., Sharma, A., Reddy, B.K. et al. An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation. Int J Syst Assur Eng Manag 13, 1341–1355 (2022). https://doi.org/10.1007/s13198-021-01471-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-021-01471-7

Keywords

Navigation