1 Introduction

Fake news identification is one of the eminent research topics, which has been studied in recent years (Sengupta et al. 2021). Fake news is often spread by yellow journalism before digital technology with the intention of glorious news like hilarious news, accidents, rumors, and crime news (Islam et al. 2020). In the digital era, it is simpler for spreading fake news while a user may distribute fake news to neighbors, their friends, and so on due to the unique characteristics of social media (Habib et al. 2019). Thus, fake news can be propagated in a cycle format because of the vast usage of social media by every individual (Singh and Sharma 2021). Moreover, comments on fake news can be varied every time that reducing the reliability of real news, the fake news has directly spread in a faster way while comparing to real news (Yang et al. 2021). Fake news can have several ranges of impacts by misleading or influencing governments or whole populations (Kim and Ko November 2021). Some of the techniques in detecting fake news utilize several approaches like machine learning techniques, language techniques, and knowledge-based techniques (Vereshchaka et al. 2020). Some of the strong social media platforms are Twitter, YouTube, Instagram, Facebook, and WhatsApp have offered news and entertainment with the rising employment of mobile devices and simpler WiFi connections (Ribeiro Bezerra 2021). Social media and emerging technologies have several profiles for propagating fake news (Sharma 2021). Every technology has its extremities and limitations due to the positive effects technology on society and social media (Mridha et al. 2021). Moreover, the recent literature analyzes the several advantages of fake new detection.

Nowadays, online fake news has become the main aspect of the growing interest in online media, social-networking sites, and online news portals (Bondielli and Marcelloni 2019). However, most people are generally incompetent for spending adequate time cross-checking the references and for ensuring the credibility of news (Zhou and Zafarani 2020; D’Ulizia 2021). Thus, more attention to fake news detection inspires the research community. In recent days, more research works regarding fake news detection have been implemented (Rama Krishna et al. 2021), though several studies only concentrated on news of specific categories like political or e-commerce reviews. Consequently, they have designed and developed certain features with some standard datasets with their topic of interest. These studies face poor performance in detecting news of another topic and also dataset bias (Beer and Matthee 2020). Therefore, it is necessary for studying whether these models are suitable for diverse classes of news propagated in social media through the evaluation of diverse datasets on different models and investigating their efficiency or performances (Ahmad et al. 2020). On the other hand, conventional studies on fake news detection techniques are focused on either a limited number of models or a particular category of the dataset (Dabbous et al. 2020a). Thus, there is a need of reviewing a fake news detection model.

The study on fake news detection needs a huge number of evaluations through machine learning techniques on a broad range of datasets (Kansal 2021). New methods should obtain deep knowledge regarding the nature of fake news and the way of spreading it over the world. On the other hand, the recent work contributes in this way to implementing a model through novel approaches, which verify the significance of deep learning approaches for detecting fake news (Meneses Silva et al. 2021). Among them, the “Convolutional Neural Network (CNN)” model has been utilized and shown a higher competitive performance while comparing with other existing machine learning models. In addition, “long short-term memory (LSTM)” has utilized for analyzing linguistic features and has shown noteworthy performance (Simko et al. 2021). More particularly, different variants of CNN can be suggested for detecting fake news. Although deep learning algorithms offer superior efficiency in getting classification results, they suffer from certain challenges like lack of interpretability, the necessity of large training datasets, and complexity in discovering the optimal hyperparameters for every dataset and problem (Silva et al. 2021). Thus, recent advancements in bio-inspired approaches permit the optimization of deep learning constraints, and the necessity of advanced intelligent techniques also increases for solving the problems persisted in existing works (Chauhan and Palivela 2021). Consequently, there is a need of studying the recent research works in the field of fake news identification models to assist social media users in getting real news.

The major focus of the study on different fake news detection models is given here.

  • To prepare an in-depth survey on fake news detection models by collecting noteworthy information from recent studies along with diverse algorithms utilized for achieving it.

  • To present a complete study about a chronological review, their related works and contribution to fake news detection models, research designs, and general findings on fake news detection models.

  • To analyze the performance metrics, applications focused, datasets used, simulation platforms utilized, and necessary research gap with the challenges present in existing fake news detection models.

The remaining sections of the paper are given here. Section 2 discusses the literature survey, research designs, and general findings on fake news detection with the chronological review. Section 3specifies the algorithmic classification, feature extraction techniques, and dataset used in the existing fake news detection models. Section 4gives the simulation platforms and applications focused on the existing fake news detection models. Section 5 describes the architectural view of general fake news detection models and performance measures used in state-of-the-art fake news detection models. Section 6 shows the consequences of fake news and research challenges and the future scope of fake news detection models. Section 7 concludes the survey.

2 Literature survey, research designs, and general findings on fake news detection with a chronological review

2.1 Related works

2.1.1 Existing fake news detection model approaches

In 2019, Ko et al. (2019) have used a reverse-tracking approach for defining the possibility of fake news in the articles that were taken from the cognitive system, where the designed model has been termed a Fake News Detection System (FNDS). This model has been tested in two case studies, where the first one was posted on February 9, 2017. This article was given about the "blacklisting of extreme right movie studios and Kim Jong Dae, a member of the National Assembly". The second case study was posted in September 2017. This article was about "North Korea, nuclear test, earthquake, Kim Jeong Eun". This model was faster than the conventional models. In 2019, Barbado et al. (2019) have implemented a feature framework to detect fake reviews in the consumer electronics area, which consisted of four stages of building a dataset to classify fake reviews in four cities in the consumer electronics area through a scraping approach. Secondly, a feature scheme for proposing fake review detection was suggested with the exploitation of the social perspective. It has also focused on selecting fake reviews for organizing and characterizing the features. The results have shown that the AdaBoost classifier has performed better than other existing classifiers. In 2019, Shu et al. (2019) have implemented a new FakeNewsTracker for gathering the social context and new pieces to create valuable datasets, and then, useful features were extracted for detecting the fake news through Social Article Fusion model, and then, different machine learning models were built for detecting the fake news. In 2020, Henrique and Ferreira (2020) have implemented a new fake news detection model for detecting fake news from social media texts in Germanic, Latin, and Slavic languages. The detection was carried out through “support vector machines and random forest”. In 2020, Talwar et al. (2020) have adopted a mixed method for exploring the sharing behavior of fake news. This model has identified six behavioral manifestations correlated with fake news sharing from qualitative data. The control variables were taken as gender and age. This model had created a positive effect on fake news sharing owing to religiosity and lack of time. This model has suggested that active corrective action was engaged by social media users for sharing fake news. In 2020, Xu et al. (2020) have characterized that numerous real and fake news was shared by comments, reactions, and shares on Facebook in two ways like content understanding and domain reputation. It has revealed that the web sites of news publishers have exhibited various domain popularity, domain ranking, registration timing, and registration behaviors. Additionally, for a certain amount of time, fake news has disappeared. Further, news has been fed to “latent Dirichlet allocation (LDA) topic modeling” and TF-IDF for fake news detection when discovering the document similarity with the word and term vectors. In 2020, Oliveira et al. (2020) have implemented a “computational-stylistic analysis based on NLP”. This model has used one-class SVM to detect fake news and applied it to data for reducing the dimensionality reduction approaches like data compaction and latent semantic analysis (LSA). In 2020, Li et al. (2020) have introduced the MCNN for getting the global semantics and local convolutional features for getting the semantic information from the texts for classifying fake news. The weight of sensitive words (TFW) method was used for computing the robust significance of true or fake labels. Thus, MCNN-TFW has focused on extracting the weight of sensitive words and article representation for each news. This model has achieved higher accuracy than other existing approaches to datasets. In 2020, Vereshchaka et al. (2020) have solved the issue of predicting fake news by getting the socio-cultural and textual characteristics of fake news features and by analyzing and detecting fake news features. Further, data analytics was investigated for constructing a concordance of phase and word frequency. They have formed binary classifiers for extracting the features through deep learning algorithms like GRU, RNN, and LSTM. In 2020, Kaur et al. (2020) have implemented a new fake news identification model through a multi-level voting ensemble model including 12 classifiers, where the features were extracted using Hashing-Vectorizer (HV), “Count-Vectorizer (CV), and Term Frequency–Inverse Document Frequency (TF–IDF)” through three datasets. It has predicted the textual or fake content from online social media websites. It has verified the performance in terms of less training time, better efficiency, and the trade-off between accuracy and efficiency. Thus, from the analysis, the best classifier was selected for both higher accuracy and efficiency. In 2021, Shahbazi and Byun (2021) have implemented an integrated model for different criteria of natural language processing and block chain for applying machine learning approaches for detecting fake news and offered a better prediction on posts and accounts on fake users. They used reinforcement learning approach that was used for this process. This scheme utilized the decentralized block chain framework for offering security, which has offered the method of outlining the digital contents. Lastly, the learning rate of the model was predicted for detection to explore the correlation among contents. In 2021, Mehta et al. (2021) have focused on the fake news classification model through “Bidirectional Encoder Representations from Transformers termed as BERT”. It has required a nominal pre-processing technique and has utilized two diverse versions of BERT, which has shown considerable improvement in terms of the fake news classification model regarding binary classification measures. The designed model has shown higher reliability in terms of multi-label classification. In 2021, Shishah (2021) has implemented a new fake news detection model through BERT with a joint learning scheme by integrating the Named Entity Recognition (NER) and Relational Features Classification (RFC). In 2021, Jiang et al. (2021) have investigated the efficiency of three deep learning models and five machine learning models. The superior performance of the designed model was verified while estimating with other existing approaches. In 2021, Kumari and Ekbal ( 2021) have implemented a novel multimodal fake news detection scheme with a suitable fusion of multimodal features, which leveraged the information from images and text and tried for maximizing the correlation among them for efficient multimodal distributed depiction. The performance of the designed model was improved by combining the text with images. The experiments have been conducted for validating the efficiency of the designed model that has attained superior performance to others.

2.1.2 Fake news detection model with existing diverse classifiers

In 2018, Jang et al. (2018) have studied the problem present in the US presidential election in the year 2016. It has gathered 307,738 tweets with 30 real and 30 fake news stories. Thus, there was a need of examining the evolution patterns, producers of the source, and root content. They have utilized the evolution tree modeling method for examining misinformation, the transmission of news, and management. Finally, based on the diverse evolution patterns, fake and real news have been identified. In 2019, Altunbey and Alatas (2020) have designed a two-step method to identify the “fake news on social media”, which has several steps like pre-processing, vector conversion, and classification. The pre-processing was carried out for converting the unstructured datasets. Initially, the texts in the dataset including the news were depicted by vectors through the attained Document-term matrix and Term- Frequency (TF) weighting method. Secondly, 23 supervised algorithms, like kernel logistic regression (KLR), IBk, decision tree, bagging, sequential minimal optimization (SMO), J48, attribute selected classifier (ASC), simple cart, ordinal learning model (OLM), Ridor, multilayer perceptron (MLP), weighted instances handler wrapper (WIHW), classification via clustering (CvC), locally weighted learning (LWL), logistic model tree (LMT), randomizable filtered classifier (RFC), CV parameter selection (CVPS), stochastic gradient descent (SGD), ZeroR, decision stump, OneR, JRip, and BayesNet, have been experimented in the dataset for transforming the structured format with the text mining algorithms. In 2019, Jadhav and Thepade (2019) have implemented a new framework for detecting and classifying fake news messages through Deep Structure Semantic Model (DSSM) and improved RNN classifiers. Initially, the twitter data was pre-processed using tokenization, and then, TF-IDF and CV were used for extracting the features. Further, semantic features and multi-layer projection were performed in DSSM, and then, the data were forwarded to the improved RNN for classifying the fake news. In 2020, Kumar et al. (2020) have developed a new fake news detection model with the help of deep CNN (FNDNet), which has learned the discriminatory features automatically to classify the fake news by several hidden layers. Further, various features were extracted at every layer for maximizing the accuracy of detection. In 2020, Singh et al. (2020) have explored Bernoulli’s Naive Bayes Classifier with the help of "Multinomial Naive Bayes with predictors as Boolean variables" for detecting fake news. This model has classified the data into two classes 1 or 0, where 1 stand for unique news articles and fake new is represented by 1. In 2020, Hiriyannaiah et al. (2020) have offered the adverse effects of fake news in society particularly advancements in the vast usage of social media. Generative adversarial networks (GANs) have offered more efficient results in getting fake news detection in terms of validation accuracy than other existing classifiers. It has learned the complex functions for getting a higher accuracy rate. This model has solved the problem of gradient descent in GANs. It has used SeqGAN, where the REINFORCE algorithm was used in the generator for updating their weights by taking the identification of the discriminator network. In 2020, Umer et al. (2020) have suggested a novel hybrid neural network architecture by LSTM and CNN for classifying the news articles with stance labels, which has also gathered data from the news articles. Initially, the features were gathered from articles using word2vectors, where the dimensionality reduction was carried out for getting the minimized feature set. Finally, the hybrid CNN-LSTM was utilized for detecting fake news for showing its effectiveness. In 2020, Agarwal et al. (2020) have suggested a new deep learning system for predicting the nature of an article while using an input. They have pre-processed the texts using “word embedding (GloVe)” for constructing a vector space of words and established a lingual relationship. They have combined CNN and RNN for getting the benchmark outcomes while predicting fake news. Moreover, this model has minimized the overfitting problem by the dropout layer and generated higher accurate values. It was shown that the designed model has attained superior outcomes while evaluated with other algorithms. In 2020, Shrivastava et al. (2020) have investigated the propagation of fake news and described the dissemination of misinformation between groups with the influence of several misinformations refuting metrics. This model has considered the fake news prediction model from online social networks during COVID-19. They have also completely analyzed the equilibrium and stability, which has also prevented the spreading of fake news. It has also been verified via examination of users in online social networks. Several conditions have been evaluated to demonstrate the performance of social network stability and verified theoretical outcomes by experimental outcomes. In 2020, Mahabub (2020) has suggested an Ensemble Voting Classifier for designing a new fake news detection model among fake and real tasks. The detection was performed by considering several classifiers. Then, better three machine learning algorithms in “Ensemble Voting Classifier” were used after cross-validation. This model has attained superior results while comparing with existing classifiers while detecting fake messages, fake profiles, etc.; finally, the results have verified the superior sufficiency scores of individual classifiers. In 2020, Choudhary and Arora (2020) have proposed a solution for detecting and classifying fake news through a linguistic model for getting the properties of content for generating the language-driven features. It has also extracted the readability, sentimental, grammatical, and syntactic features of specific news. This model solves the problems of handcrafted features and time-consumption. Thus, for getting superior results in detecting fake news, a neural-based sequential learning model was applied in terms of accuracy and time. In 2021, Ying et al. (2021a) have designed a new "end-to-end Multi-level Multi-modal Cross-attention Network (MMCN)". The high-quality representations have been generated for image regions and text words, respectively, by pre-trained ResNet and BERT models. They have further fused the feature embeddings of the image regions and text words, respectively, for getting diverse as well as duplicate modalities. A multi-level encoding network was used for getting the higher multi-level semantics for enhancing the depictions regarding posts owing to the diverse layers of transformer architecture. In 2021, Li et al. (2021a) have suggested a new automatic model for returning and adding accurate outcomes for assisting the neural network in getting positive sample cases for enhancing the accuracy of the neural network. Initially, this model gathered data, and then, supervised and unsupervised tasks were trained simultaneously through a semi-supervised deep learning network. In 2021, Sivasankari and Vadivu (2021) have studied a new detection and identification model for learning discriminative features from Facebook posts, tweets content through social network graphs. It has often confirmed the minimization of their propagation. In 2021, Braşoveanu and Andonie (2021) have improved fake news detection through semantic features. They have suggested a new semantic fake news detection model with relational features such as facts, entities, or sentiment extracted from text. They have mostly considered short texts with several degrees of truth and shown that utilizing semantic features focused on enhancing the accuracy. In 2021, Setiawan et al. (2021) have implemented the Hybrid “Support Vector Machine (SVM)” for detecting fake news, where the data were gathered from the standard dataset that has been subjected to a feature extraction phase through TF-IDF. Then, the classification was performed by hybrid SVM. In 2021, Raj and Meel (2021) have implemented coupled ConvNet architecture with image-CNN and text-CNN modules for detecting fake news. Initially, the input data were pre-processed in both modules and then given to CNN. In addition, coupled ConvNet architecture was suggested by extending the usage of CNN. This model was also suitable for larger datasets. In 2021, Kaliyar et al. (2021a) have designed a BERT-based deep learning technique by integrating several parallel blocks of the single layer of deep CNN including filters and several kernel sizes. Thus, it can handle ambiguity and outperform the performance in terms of accuracy while being evaluated with other existing models. It was also carried out with the powerful capability of capturing long-distance and semantic dependencies in sentences. In 2021, Altunbey and Alatas (2021) have suggested a new model for detecting fake news through “Adaptive Salp swarm optimization with oscillating strategy inertia weight (ASSO-OSIW) using an oscillating inertia weight and nonlinear decreasing coefficient and Grey Wolf Optimizer (GWO)” techniques for finding superior optimal solutions while evaluating online social media contents. It has been performed through flexible fitness functions for getting a superior performance. In 2021, Qureshi et al. (2021) have introduced a source-based approach focused on the news propagation community consisting of re-tweeters and posters for detecting fake contents from a twitter-based real-world COVID-19 dataset. It has included several features, where the complex network metrics were explored for identifying the news labels and examined the user profile features. Finally, the results have shown superior performance through CATBoost and RNN in detecting fake news. In 2021, Song et al. (2021) have implemented a new fake news detection model through a temporal propagation framework along with a graph neural network that has fused the temporal information, content semantics, and topological structures. Finally, this model has attained superior performance while evaluating other existing algorithms. In 2021, Saleh et al. (2021) have adopted an optimized CNN model for detecting fake news, where the feature extraction from input data was performed by N-gram and TF-IDF. They have used several layers for extracting low-level and high-level features. The parameters in every layer were optimized through grid search and hyperopt optimization algorithms. Finally, the high level of accuracy detected fake news efficiently while evaluating with other approaches. In 2021, Ali et al. (2021) have investigated the reliability of four diverse deep learning architectures like hybrid CNN-RNN, RNN, CNN, and multilayer perceptron (MLP). In addition, the detector complexity was explored, where the robustness of the learned model can be solved with the training loss and input sequence length. This model has also focused on solving the vulnerabilities of recent fake-news detectors. In 2021, Ni et al. (2021) have suggested a fake news detection model with multi-view attention networks (MVANs) for examining online social media. This model has included propagation structure attention and text semantic attention that has ensured the superior capturing of information. This model has ensured performance in terms of accuracy. It has also some interpretability in both ways of propagation and text structure. In 2021, Li et al. (2021b) have suggested a new fake news detection model with the help of an autoencoder, which has improved the performance. Further, the internal relationship among features and hidden information was obtained by adding the self-attention layer and bidirectional GRU layers into the autoencoder, and further, they reconstructed the remaining for detecting fake news. The experimentation was conducted on two real-world datasets and showed superior and positive results while estimated with other approaches. In 2021, Verma et al. (2021) have a two-phase benchmark model for solving the authentication of news on social media. They have used word embedding over linguistic features, where initially data pre-processing was performed and validated the veracity of news content through linguistic features. Secondly, the linguistic features with word embedding were merged and applied to voting classification. Finally, the performance of the designed model was evaluated with other existing approaches that have specified superior efficiency in detecting fake news. In 2021, Ying et al. (2021b) have implemented a new end-to-end multi-modal topic memory network (MTMN) that incorporated the topic memory phase for an explicit characterization of final representation. For multimodal fusion, a new blended attention phase was implemented with the ability to exploit the intra-modal correlation within image regions or sentence words, which has also learned the image regions and inter-modal interrelation among sentence words for enhancing and complementing every feature for multimodal and high-quality representations. Lastly, the designed model has depicted better efficiency than others. In 2021, Han et al. (2021) have implemented a two-stream network for detecting fake videos on the Face-Forensics +  + dataset, which can handle low-quality data. Further, the designed model has divided the input videos. Then, spatial-rich model filters were used for leveraging the extracted noise features in the second stream. In addition, considerable improvement was observed by a suggested model with both stream fusion and segmental fusion. It has obtained more state-of-the-art performance than others. In 2020, Dong et al. (2021b) have designed two-path deep semi-supervised learning with CNN for detecting fake news, in which one path was used for unsupervised learning, whereas another path is supervised learning. Here, the unsupervised learning path can learn a large range of unlabeled data, while the supervised learning path focused on learning the limited number of labeled data. These two paths were fed to CNN that were optimized for whole semi-supervised learning. Further, a shared CNN was constructed for getting the low-level features on both unlabeled and labeled data for feeding them into these two paths. The experimental results have verified the higher efficiency while recognizing the fake news with less labeled data. In 2021, Do et al. (2021) have implemented a generic model that considered both social context and news content for identifying fake news. Particularly, several aspects of the news content were explored through deep and shallow representations. The deep representations were created through transformer-based systems, while the shallow representations were generated with doc2vec and word2vec models. These representations can separately or jointly address the four significant tasks toxicity detection, sentiment analysis, clickbait detection, and bias detection. Additionally, graph CNN and mean-field layers were exploited for specifying the structural information of news articles. Finally, the correlation among the articles was explored by leveraging the social context information. The efficiency of the designed model has been more verified than others. In 2021, Caravanti et al. (2021) have implemented a network-based technique through label propagation with positive and unlabeled learning, where the classification is performed by transductive and one-class semi-supervised learning techniques. They have considered languages like Portuguese and English and class balancing for specifying the superior balance among datasets. The performance of the designed model was superior to other algorithms like positive and unlabeled learning, and one-class learning. Thus, superior performance was observed even evaluating with unbalanced datasets. In 2021, Kaliyar et al. (2021b) have modeled a new deep neural network architecture for analyzing the social context and news content for getting superior results in terms of detecting fake news. Further, for getting a latent representation of news articles, a joint matrix-tensor factorization has been utilized, where the comparative analysis has been conducted on three techniques like social context-based, news content-based, and a combination of both. Thus, superior results in terms of higher accuracy were observed than existing approaches. In 2021, Kaliyar et al. (2021c) have suggested a new fake new identification model by considering the existence of echo chambers and the content of the new article in the social network. They have designed an efficient deep learning algorithm with tensor factorization. This model was implemented with several counts of filters across every dropout layer with a dense layer. Deep neural network (DNN) was implemented with optimal hyperparameters for classifying the social content and news content-based information individually. The superior efficiency of the designed model was better than the conventional models while detecting fake news. In 2021, Saad et al. (2021) have implemented an approach through three diverse models trained and created an ensemble of entire models through an aggregation approach to generate final predictions. It has extracted rich information from text reviews through parallel CNNs and bag-of-n-grams. Here, the non-textual and textual linguistic features were used for detecting fake news. In 2021, Choudhary et al. (2021) have implemented a deep learning model called BerConvoNet with the help of BERT and CNN, where news text was classified into real or fake news with the lowest error. This model has consisted of two major building blocks: multi-scale feature block and a news embedding block. Finally, the experiments with batch size, kernel size, and article embedding have been performed for ensuring the prediction quality. The experimental analysis of the designed model over existing models has presented a superior performance in terms of several performance measures. In 2021, Samadi et al. (2021) have designed three diverse classifiers like CNN, MLP, and single-layer perceptron (SLP) along with pre-trained models like RoBERTa, GPT2, BERT, and funnel transformer for getting features from deep contextualized representation. The performance analysis was conducted on three datasets that have shown the efficiency of the designed model while estimating with conventional approaches regarding classification accuracy. In 2021, Meel et al. (2021) have implemented an intelligent CNN-based semi-supervised scheme through self-ensemble theory for considering the stylometric information and leveraging the “linguistic information of annotated news articles”, where the hidden patterns in unlabeled data were explored. This model has achieved the highest classification accuracy in terms of fake news recognition. It has also aimed to save cost, labor, and time and also solve inconsistencies derived during the data annotation procedure. In 2021, Esther et al. (2021) have implemented the “Attention-based Convolutional Bidirectional Long Short-Term Memory (AC-BiLSTM)” approach to detect fake news and classify them into six classes. The input data were gathered from the standard dataset, which was further given to the AC-BiLSTM for classifying the fake news through several layers. This model has tackled the fake news detection challenges in a multi-class environment that has improved the accuracy of fake news detection. In 2021, Scott et al. (2021) have presented a “Cross-stitch based Semi-supervised End-to-end neural Attention Network (Cross-SEAN)” model for leveraging the huge range of unlabeled data, which has generalized the fake news in COVID-19 that has learned from suitable external knowledge.

2.1.3 Fake news detection model based on Non-English existing approaches

In 2020, Silva et al. (2020) have offered a novel fake news detection model for real and fake news in Portuguese with a detailed analysis of machine learning approaches. It has manually constructed reference corpus with fake and true news. Several approaches have been used for evaluating the performance of diverse classes of features like distributed, distributive and linguistic-based features with text representations. The ensemble learning model with SVM, RF, Bagging, and AdaBoost has been utilized for evaluating the performance. In 2021, Zervopoulos et al. (2019) have detected fake news from tweets that have also focused on predicting patterns in both structures of tweets and linguistic content. A custom filtering process was used through a custom filtering procedure through hash tags co-occurrences. Through the performance analysis, the designed model has improved the performance of conventional deep learning techniques from Hong Kong protests. In 2021, Gokhan et al. (2021) have used natural language processing approaches for detecting fake news from Turkish-language posts on particular topics on Twitter. Moreover, word embeddings were used for pre-training Turkish language structures, where word2vec and Term Frequency-Inverse Document Frequency (TF-IDF) have given superior performance on fake news detection. The social network analysis has been applied for identifying fake news from Twitter API. In 2021, Mitra et al. (2021) have presented a new neural network-based approach for detecting fake videos through CNN with a classifier network including Resnet50 and Inception V3. To classify video, the features extracted from convolutional neural network (CNN) classifier were fed to the subsequent classifier. This model has attained lower computational requirements and higher accuracy compared to conventional research works. In 2021, Meesad (2021) has suggested a new framework for reliable detection of fake news in the Thai language, which has consisted of three major modules. It has also composed of two stages like data collection stage and the building phase of the machine learning model. The web-crawler information retrieval was used for obtaining the data from Thai online news websites in the data collection phase. Then, the data were analyzed for getting suitable features from web data through natural language processing approaches. The detection was performed by LSTM, which was compared to other conventional techniques for detecting fake news.

2.1.4 Fake news detection model with existing diverse analytics approaches

In 2019, Zhang et al. (2019) have offered a new FakE News Detection (FEND) system. The designed mode was a two-layered method that consisted of identifying the fake events and fake topics. It has grouped the legitimate news into several clusters based on topic, where every cluster can have shared some common topics. Then, the events were extracted from the gathered articles through an event-extraction scheme. Further, they have proposed and implemented a credibility metric to evaluate the authenticity of news by estimating the news authenticity. Then, FEND was focused on detecting fake news by leveraging a huge database with legitimate news. In 2020, Kauffmann et al. (2020) have designed a methodology for analyzing the reviews automatically that has transformed the positive and negative user opinions into a quantitative score. This model has analyzed the online reviews on Amazon by using sentiment analysis, where the fake reviews have been removed and detected by the designed model from high-tech industries. Based on the consumer sentiments, the rating of brands was performed, which results in getting detailed and appropriate decision-making based on the scores attained.

2.1.5 Fake news detection model with existing diverse ensemble learning approaches

In 2020, Huang and Chen (2020) have suggested a new fake news detection system by deep learning algorithms. Initially, the pre-processing of news articles has been performed with diverse training models like LIWC, text analysis, grammar analysis, and tokenization words for getting bi-grams and uni-grams. Further, four diverse models like “N-gram CNN, LIWC CNN, depth LSTM, and LSTM” were combined to form an ensemble learning model for detecting fake news. Here, the “Self-Adaptive Harmony Search (SAHS) algorithm was used for optimizing the weights of the ensemble learning” model to get a higher accuracy rate. It has also solved the cross-domain intractability issue by experimenting with it on different domain-oriented datasets. In 2020, Reddy et al. (2020) have discussed several techniques for detecting fake news by utilizing the features attained from the text of the news without their metadata. They have employed an ensemble learning approach with the integration of text-based vector representations and “stylometric features” for the accurate prediction of fake news. The ensemble learning was designed by considering the classifiers like voting, boosting, and bagging. They have also used the media content in the news articles or no information concerning the users, which was the advantage of the designed model.

2.1.6 Fake news detection model with existing optimization algorithms

In 2021, Sheikhi (2021) has implemented a new fake news detection model through content-based features and optimized the “Extreme Gradient Boosting Tree (xgbTree) algorithm by the Whale Optimization Algorithm (WOA)”. Initially, the data were collected from the ISOT Fake News dataset and extracted the content-based features for choosing significant features. Further, the extracted features were fed to the WOA-Xgbtree algorithm for classifying the fake news from real news. Finally, the classification outcomes have revealed that the designed model has attained superior performance while evaluated with existing algorithms.

2.1.7 Detecting fake news during the COVID-19 pandemic based on existing approaches

In 2020, Wang et al. (2022) have revealed the components to determine the acceptance of fake news rebuttals on Sina Weibo. Here, the ELM has been used to analyze the central route, rebuttal acceptance, and peripheral route. The results have recommended the negative and positive effects of the given components. In 2020, Zheng et al. (2022) have analyzed the Internet users' responses to health-related online fake news (HOFN) for the duration of the coronavirus (COVID-19) pandemic (Nistor and Zadobrischi 2022) with the help of the protective action decision model (PADM). The data were investigated using a multi-level linear model. In 2020, Gupta et al. (2022) have investigated the diverse news across the world. The datasets have been acquired from Twitter based on keywords provided by the Web crawler. Then, the investigation regarding the re-creation of the datasets has been demonstrated by the word clouds through the period of the COVID-19 pandemic (Ncube and Mare 2022).

2.2 Chronological review

The chronological review generally shows that the information concerned with the count of contributions till now carried out in the field of detecting fake news using deep learning and machine learning-based algorithms. The chronological review of the fake news detection model is depicted in Fig. 1 by considering the total contribution over the years. From the graphical representation, 1.5% of the contribution is taken from the year 2018, 9.2% of the research works are taken from 2019, 30.7% of the papers are reviewed from the year 2020 and finally, 58.4% of the works are taken from 2021, respectively. It inspires other researchers for increasing innovative techniques in the next subsequent years.

Fig. 1
figure 1

A chronological review of the existing fake news detection models

2.3 Research designs and general findings

There is no common definition for describing fake news. Fake news can be spread over the world, which can be propagated in any field like COVID-19, politics, e-commerce, marketing, and so on. So, there is a need of analyzing fake news to understand the real news in any particular field. However, some of the writers, publishers, and vendors, posting non-authentic online comments or any third-party monitoring online comments will act as real customers and spreads fake news on online social media for increasing product sales. Similarly, a vast number of users on social media can broadcast fake news based on their opinions. In the case of the tourism field, tweets on social media may propagate fake news based on their imagination without spending at a destination (Das et al. 2021). It may lead to the loss of genuine consumers due to fake news on online platforms. In general, fake news is considered one of the huge threats to freedom of expression, journalism, and democracy. It also influences the political impacts, from which the fake news generation can be derived due to the comments, reactions, and shares posted on Facebook, WhatsApp, Instagram, and common websites (Brenes Peralta et al. 2021; Chang 2021). More specifically, fake news detection can be divided into four perspectives like source-based approaches, propagation-based approaches, style-based approaches, and knowledge-based approaches. Finally, recent advancements in “deep learning have been utilized for detecting” fake news from online social media platforms. Deep learning has several features over machine learning approaches, which are superior accuracy, the capability of extracting high-dimensional features, and “lightly dependent on data pre-processing”. Moreover, the recent broader “availability of data and programming schemes has increased the robustness and utilization of deep learning-based algorithms”. Thus, in the past years, various research articles on fake news detection models have been implemented based on deep learning techniques.

3 Algorithmic classification, feature extraction techniques, and dataset used in the existing fake news detection models

3.1 Dataset

In this section, the datasets utilized in existing studies for evaluating the performance of their model are listed in Table 1. They have utilized benchmark datasets for both training and testing. The major problem in detecting fake news is the lack of a massive dataset and a labeled benchmark dataset with ground-truth labels. For example, some of the datasets are constructed only with political statements like PolitiFact, LIAR, Weibo, etc. The Twitter dataset consists of social media posts, whereas the FNC-1 dataset is built based on news articles. Moreover, datasets can be varied through size, labels, and modalities. Similarly, most of the studies use self-collected data from either news articles or any social media platforms.

3.2 NLP techniques used in fake news detection

Natural language processing is an innovative field in machine learning that has the ability of a computer for learning, analyzing, manipulating, and possibly generate human languages. This process has included several tasks like pre-processing, word embedding, and feature extraction techniques. Several fake news detection models utilize data pre-processing as the initial step, which is used for representing obscure attributes, managing lost words, binarization of attributes, and complicated structures with attributes. In the data pre-processing process, various visualization processes are useful. Data pre-processing helps save space and computational time, which solves the noisy data. Secondly, word vectorizing is involved in the mapping of text or words to a list of vectors. Further, a bag of words and TF-IDF are often utilized in several machine learning frameworks to detect fake news. In recent times, fake news identification models have employed pre-trained word-embedding models like word2vec and GloVe due to their ability to train larger datasets. Some of the NLP approaches and word vector models employed in deep learning-based fake news detection models are reviewed in Table 2.

Table 1 The description of publically available datasets used in conventional fake news detection models
Table 2 Benefits and limitations of the word vector models
Table 3 Benefits and limitations of some machine learning-based fake news detection models

While analyzing the huge number of variables, a high range of memory and computational power is necessary. Furthermore, classification techniques induce poor samples and overfitting samples. Stylometric features (Reddy et al. 2020) have also been utilized for analyzing social media content. Feature extraction is a procedure of constructing combinations of variables for overcoming the above-mentioned complications when describing the data with accurate precision. Few fake news detection models use social context features (Shu et al. 2019) for getting suitable features from the news content. N-gram (Vereshchaka et al. 2020; Agarwal et al. 2020; Saleh et al. 2021; Kaliyar et al. 2021b) generates words and characters from contents with several n-gram orders. Finally, the N-gram vectors are grouped for getting one feature vector for each information. Linguistic feature extraction (Verma et al. Aug. 2021) is used for analyzing the performance of fake news, which has several feature classes like quantity features, user credibility, stylistic features, psycho-linguistic features, and readability index. Word embedding (Kaliyar et al. 2021a; Choudhary et al. 2021; Trueman et al. 2021; Kumari and Ekbal December 2021) generates the word vectors for the downstream tasks. Though, it is complex for constructing the word vectors from scratch with several words on a large-scale dataset. Thus, this review has depicted several tasks with their features and challenges for helping future research works. The contribution of NLP tasks is depicted in Fig. 2.

Fig. 2
figure 2

Natural language processing tasks utilized in existing fake news detection models

3.3 Algorithmic classification

Machine learning is efficiently used for reviewing the fake news detection model, which is divided into two categories, namely supervised and unsupervised learning. Here, unsupervised learning gets useful feature information from unlabeled data which makes it much easy for getting the training data. Conversely, the detection efficiency of unsupervised learning approaches is often inferior to supervised learning approaches. Supervised learning is dependent on the significant information in labeled data, where classification is the most general process, though labeling of data is often “time consuming and expensive”. Similarly, the lack of sufficient labeled data creates a major challenge to supervised learning. Deep learning is a recent research paradigm often utilized for several identification models because of “recent achievements of these techniques in complex natural language processing tasks”. Similarly, fake news detection can be performed by deep learning algorithms. The common algorithms used for fake news identification models are categorized in Fig. 3.

Fig. 3
figure 3

Algorithmic classification of the existing fake news detection model

The shallow models are also known as traditional machine learning models that have included several algorithms like supervised learning and unsupervised learning, where unsupervised learning includes k-means (Zhang et al. 2019) and supervised learning consists of techniques like evolution tree analysis (Jang et al. July 2018), SVM (Faustini and Covões November 2020; Kauffmann et al. October 2020; Oliveira et al. 2020), Hybrid SVM (Setiawan et al. 2021), Bernoulli’s naive Bayes (Singh et al. 2020), LDA (Reddy et al. 2020), and voting (Verma et al. 2021) classifier.

Deep learning model has several deep networks, in which unsupervised learning algorithms are GAN (Srinidhi Hiriyannaiah et al. 2020), Autoencoder (Li et al. 2021b), whereas supervised learning (Li et al. 2021a; Souza et al. 2021) algorithms are AdaBoost (Barbado et al. July 2019), WOA-Xgbtree (Sheikhi 2021), RNN (Shu et al. 2019; Agarwal et al. 2020), LSTM (Umer et al. 2020; Braşoveanu and Andonie 2021; Meesad 2021), GRU (Vereshchaka et al. 2020), DSSM-RNN (Jadhav and Thepade 2019), AC-BiLSTM (Trueman et al. 2021), Ensemble (Ozbay and Alatas February 2020; Huang and Chen November 2020; Silva et al. May 2020; Reddy et al. 2020; Jiang et al. 2021; Javed et al. 2021; Kumari and Ekbal December 2021), ensemble voting (Mahabub 2020; Qureshi et al. 2021), multi-level voting ensemble (Kaur et al. 2020), CNN (Kaliyar et al. June 2020; Umer et al. 2020; Agarwal et al. 2020; Kaliyar et al. 2021a; Saleh et al. 2021; Huu Do et al. 2021; Mitra et al. 2021; Samadi et al. 2021; Meel and Vishwakarma September 2021; Dong et al. Dec. 2020), MCNN (Li et al. 2020), C-LSTM (Zervopoulos et al. 2019), Coupled ConvNet (Raj and Meel 2021), BerConvoNet (Choudhary et al. 2021), MMCN (Ying et al. 2021a), MVAN (Ni et al. 2021), NN (Choudhary and Arora 2020; Jain et al. 2021), Graph neural network (Song et al. 2021), MTMN (Ying et al. 2021b), DNN (Ali et al. 2021; Kaliyar et al. 2021c), Lyapunov function (Shrivastava et al. Oct. 2020), social network graph (Sivasankari and Vadivu 2021; Taskin et al. 2021), Xception (Han et al. July 2021), XGBoost (Kaliyar et al. 2021b), Cross-SEAN (Paka et al. 2021), while reinforcement learning (Shahbazi and Byun 2021) includes BERT (Mehta et al. 2021; Shishah 2021; Kaliyar et al. 2021a). These algorithms are focused on directly learning the features from the original data like texts and images without the need for manual feature engineering. It can be executed in an end-to-end manner. While compared with the shallow models, deep learning methods have considerable features for large datasets in terms of interpretability, learning capacity, feature representation, number of parameters, and running time. Similarly, some miscellaneous techniques are used for detecting fake news, which is the reverse-tracking approach (Ko et al. June 2019), honeycomb framework (Talwar et al. 2020), and ASSO-OSIW and GWO (Ozbay and Alatas 2021). The most used techniques with some advantages and challenges are listed in Table 3.

Table 4 Performance measures considered for evaluating the efficiency of existing fake news identification models

4 Architectural view of fake news detection models and performance measures used in traditional fake news detection models

4.1 Architectural view of the fake news detection model

The overall procedure of the fake news detection model is given in Fig. 4.

Fig. 4
figure 4

The architecture of the general fake news detection model with deep learning-based models

The major goal of the fake news detection model is to identify fake news to ensure the authenticity of the news by differentiating the fake news from real news. Thus, initially, the standard data can be gathered from any social-networking sites like Hike, WhatsApp, Instagram, Twitter, Facebook, and online news articles. As the gathered data requires special attention, pre-processing is required for adopting machine learning or deep learning algorithms. Various methods have been suggested in recent years to pre-process the text and make it ready for further processing. Secondly, NLP tasks like word embedding and feature extraction will be conducted for getting the most suitable information from the data, which reduces the time and computational complexities. Finally, the features are fed to the classification model, where the classified outcomes in terms of fake vs real news are generated with the help of neural network-based algorithms. At last, the predicted outcomes demonstrate the real news that guarantees higher accuracy in detection.

4.2 Performance measures

To evaluate the efficiency of the existing fake news identification models, several performance metrics have been suggested, which are depicted in Table 4. The most commonly used techniques are discussed here.

F1 score: “harmonic mean between precision and recall. It is used as a statistical measure to rate performance”.

$$F1score = \frac{{2T^{p} }}{{2T^{p} + F^{p} + F^{n} }}$$
(1)

Precision: It is “the ratio of positive observations that are predicted exactly to the total number of observations that are positively predicted”.

$$Pes = \frac{{T^{p} }}{{T^{p} + F^{p} }}$$
(2)

Accuracy: It is a “ratio of the observation of exactly predicted to the whole observations”.

$$Ac = \frac{{\left( {T^{p} + T^{n} } \right)}}{{\left( {T^{p} + T^{n} + F^{p} + F^{n} } \right)}}$$
(3)

The recall is referred to as “the number of true positive results”

$${\text{Re}} = \frac{{T^{p} }}{{T^{p} + F^{n} }}$$
(4)

Here, “\(T^{p}\),\(T^{n}\),\(F^{p}\),\(F^{n}\) refer to the true positives, true negatives, false positives, and false negatives”, respectively.

A confusion matrix is “a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class”.

4.3 Applications considered in identifying the fake news

The fake news can be propagated in any kind of field like health, education, democracy, politics, COVID-19, etc., that could negatively affect individuals and society. Hence, the applications focused on the recent fake news detection model are diagrammatically represented in Fig. 5. From the analysis, most of the articles consider politics as their major focus. The remaining studies focus on news articles or social media comments regarding tourism, culture, COVID- 19, marketing, and e-commerce topics. This helps future researchers to focus on the new field to gather information regarding the new field that gives the innovative way of research in the fake news detection model.

Fig. 5
figure 5

Applications focused on the existing fake news detection model

5 Consequences of fake news and Research challenges, future scope on fake news detection models

5.1 Consequences of fake news

Since the start of human civilization, fake news has often emerged. Though, the propagation of fake news can emerge through the utilization of the global media landscape and modern technologies. Fake news affects several fields including economic, political, and social environments (Aslam et al. 2021). On the other hand, fake news and fake information have several faces. Fake news poses tremendous impacts as information molds the view of humans around the world, though critical decisions can be made through fake information, which also leads to wrong decision-making. Similarly, good decisions cannot be made by this fabricated, distorted, false, or fake information on the Internet. The major impacts of fake news affect health, innocent people, democratic impacts, and financial impacts.

Democratic Impact: Fake news has been discussed in media due to its major role in the election. It is also considered an essential democratic problem. Hence, there is a need of predicting fake news and stop spreading fake news.

Financial Impact: Fake news is recently a complicated issue in the business world and industries. For increasing individuals' profits, dishonest businessmen may propagate fake reviews or news. Consequently, fake information can ruin the fame of a business.

Impact on Health: On the internet, health-related news is more vastly searched. People’s lives can be affected due to the emergence of fake news in health. Thus, this is one of the noteworthy problems in recent times. Consequently, social media environments have created some policy changes for banning or limiting the issuing the misinformation on health-related articles as they affect health advocates, lawmakers, and doctors.

Impact on Innocent People: Some specific people can be affected by rumors. Social media harasses these kinds of people. People may also face threats and insults that result in real-life consequences.

5.1.1 Research gaps and future works

This review helps society from the propagation of fake news, which creates awareness of people and their impacts on social media nowadays (Alsaeedi and Al-Sarem 2020). The major aim of detecting fake news is to increase the betterment of society. The existing models use several deep learning approaches like LSTM and NNs. This NN-based training model has improved the identification of misleading news (Savyan and Bhanu 2020). The identification of fake news has posed several limitations in the existing studies. For eradicating a huge range of fake news from social media platforms, the recognition of fake news among real news can be performed by detecting the fake news subjects and creators (Kapusta and Obonya 2020). However, there is a complexity in addressing the fake news detection problems.

The major issue of fake news is inherently a multimodal and multilingual one, which consists of information in an array of languages, auditory, visual or textual forms, and generally involved in communication in a language that may be unfamiliar to users (Hakak et al. 2021). Although deep learning-based approaches offer a superior accuracy rate while analyzing with other algorithms, thus, there is a new future perspective in making it more acceptable (Gravanis et al. 2019). Fake news detection can also be affected by selecting suitable feature extraction and classifier algorithms. The research studies must consider which classification technique is more applicable for specific features (Al-Ahmad et al. 2021). Moreover, the utilization of sequence models requires processing the long textual features. Hence, there is a need for more concentration in choosing the features and classifiers for enhancing performance. Though there are fewer probabilities of inaccurate outcomes through deep learning models, and thus, there is a need of adopting intelligent approaches to detecting (Ambati 2021) fake news.

The future direction lies to outstretch and improvise the conventional research studies to implement the conventional works toward designing an automated system for e-commerce websites, where identification of fake news has become considerably significant (Faustini and Covões 2020). Future research works concerning fake news detection are only using supervised models, where the texts are not enough for all the cases. This problem can be solved by adding additional information like information regarding authors. The most eminent technique will be designing a knowledge-based automatic fake news detection model (Jwa et al. 2019). The results of the designed model will be extracting the information for the text and checking the information related to the dataset, which will alert the clients that the news will be considered fake. Based on this framework, the consumers can obtain awareness to solve the untrusted information (Mouratidis et al. 2021). The major problem of the existing technique is to identify the misinformation and health-related fake news, and thus, there is a new future scope for getting fake news regarding health-related fake news.

6 Conclusion

This paper has offered a detailed review of the fake news detection model with the help of a set of contributions from last year. This survey has presented information concerned with different machine learning and deep learning techniques. Additionally, this model has also given the details regarding datasets, simulation environments, algorithms, and their features and challenges. Furthermore, it has offered performance metrics used for evaluating the performance, and the research gaps and challenges for developing a new fake news detection model. Therefore, this survey can motivate future researchers to focus on novel fake news identification models with intelligent approaches. This will help the researchers in detecting fake news for gaining a concise, better perspective of conventional problems, solutions, and future directions.