To benchmark our method, we have considered several classification approaches. These methods are based on the news article’s textual content, user-context, and user-based relations.
Dataset
Experiments have been conducted to validate the performance of our proposed model using real-world fake news dataset: BuzzFeed and PolitiFact from the FakeNewsNet.Footnote 3 The number of news articles and users in the fake news dataset is tabulated in Table 2 for PolitiFact and BuzzFeed. We have taken 145 news articles for training and 37 for testing the model (80:20 ratio). We have also validated the performance of our model using 37 fake news articles. The useful information in the dataset as follows:
-
News content: Having the attributes as news-id, URL, title, text, authors, and source of news.
-
News-User engagement: It contains the information that how many times a news article has been shared by a user on social media.
-
User–user engagement: It contains the relationships between the users.
Table 2 FakeNewsNet dataset Feature extraction and hyperparameter setting
Feature extraction
In this research, we have considered content, context, and user-community-based features for fake news classification. The Sklearn library is used to construct the features matrices. The dimensions of all the matrices (used as input features) are shown in Table 3.
We have extracted 81 communities (featured as Echo chamber) using the Clauset–Newman–Moore algorithm [41].
Table 3 Dimensionality of feature matrices Hyperparameter setting
Hyperparameters [52,53,54] can be defined as major elements or variables for a learning algorithm during the process of training and testing of any classification model. There exist two main approaches for selecting and optimizing the context-specific hyperparameters as a manual and automatic selection. The decision of selecting hyperparameters typically represents a trade-off between the manually versus automatic selection (in which the high computational cost is required). In our approach (for more details refer to Table 4), we have set the values of hyperparameters before training and optimizing the weights and bias.
Table 4 Hyperparameters for EchoFakeD Performance parameters
To validate the performance of our proposed model, different performance parameters have been considered: precision, recall, accuracy, and confusion matrix as evaluation matrices.
Confusion matrix
The information about the actual and predicted samples can be represented with the help of a confusion matrix. For binary classification, a confusion matrix is shown with the help of Table 5.
Table 5 Representation of confusion matrix Precision and recall
Recall is defined as:
$$\begin{aligned} {\text {Recall}}=\frac{{\text {TP}}}{{\text {TP}}+{\text {FN}}} \end{aligned}$$
(20)
where precision is :
$$\begin{aligned} {\text {Precision}} = \frac{{\text {TP}}}{{\text {TP}}+{\text {FP}}} \end{aligned}$$
(21)
\(F_1\)-score
\(F_1\) score is the harmonic mean of Precision and Recall.
$$\begin{aligned} F_1=\frac{2*({\text {Precision}}* {\text {Recall}})}{({\text {Precision}}+{\text {Recall}})} \end{aligned}$$
(22)
Accuracy
Accuracy is defined as:
$$\begin{aligned} {\text {Accuracy}}=\frac{{\text {TP}}+{\text {TN}}}{{\text {TP}}+{\text {TN}}+{\text {FP}}+{\text {FN}}} \times 100 \end{aligned}$$
(23)
Here (TP) = correctly identified instances, (FP) = incorrectly identified instances, (TN) = correctly rejected instances, (FN) = incorrectly rejected instances
Experiments
To classify the combination of both news content as well as social context-based information, tensor-based factorization method has been deployed. The order of classification tasks performed in our research as follows:
-
EchoFakeD (our proposed deep neural network) with news content: For the experiment, the input feature matrix is the count matrix N.
-
EchoFakeD with social context: For the experiment, social context-based are used. Matrix \((X_1)\) obtained after mode-1 matricization is used as input feature.
-
EchoFakeD with news content and social context: For the experiment, we have used both news content as well as social context-based features for classification. Our proposed model has given state-of-the-art results with the combination of features.
Experimental results
Experiments have been conducted using our proposed deep learning classifier (EchoFakeD) with different learning paradigms. From Fig. 5, we can observe the architectures of our proposed deep neural network. Classification results demonstrate that features selection and classification model plays an important role in the detection of fake news. In this research, a real-world fake news dataset (FakeNewsNet) has been used for classification.
Further, experiments have been conducted using our proposed deep neural network using both contents as well as social context of news articles. Tables 8 and 11 show that the combination of features gives more accurate results by employing a deep neural network. Respective confusion matrices for the deep learning approaches are shown with the help of Tables 6, 7, 8, 9, 10 and 11. The elements of confusion matrices give the number of correct and incorrect classifications. Our proposed model gave a better performance as compared to existing benchmarks employing tensor factorization methods using deep learning.
Table 6 Confusion matrix for news content-based classification with EchoFakeD (BuzzFeed) Table 7 Confusion matrix for social context-based classification with EchoFakeD (BuzzFeed) Table 8 Confusion matrix for news content + social context-based classification with EchoFakeD (BuzzFeed) Table 9 Confusion matrix for news content-based classification with EchoFakeD (PoitiFact) Table 10 Confusion matrix for social context-based classification with EchoFakeD (PoitiFact) Table 11 Confusion matrix using content and context-based features with EchoFakeD (PoitiFact) To validate the performance of our proposed model with the existing methods, several performance parameters like precision, recall, F1-Score, false-positive rate, false-negative rate, and accuracy have been considered. Complete classification results (using Politifact and BuzzFeed dataset) are tabulated in Tables 12 and 13. In Table 13, the results using different combinations (news content, social context, and content+context) are presented with our proposed approach. Among the content and social context-based methods, our proposed model has achieved an accuracy of 86.84% and 89.19%, respectively. Combining social-context and news-content features, our proposed model achieved a marginal improvement over the baseline methods with an accuracy of 92.30%. With these results, we recommend the effectiveness of social context-based features for fake news classification.
Table 12 Performance of our proposed model with BuzzFeed Table 13 Performance of our proposed model with PolitiFact In this research, considering the performance of all classifiers, we found that with our proposed deep architecture, we achieved a validation accuracy of 92.30% using PolitiFact dataset. From Figs. 6 and 7, we can observe that with our proposed deep neural network, the validation accuracy is high and cross-entropy loss is minimum using both real-world fake news dataset. Our proposed model achieved accuracy with 91.80% using BuzzFeed dataset (refer Fig. 7). To validate the performance of our model, more performance parameters have been included (false-positive rate (FPR) and false-negative rate (FNR)). The false-positive rate is 9.52% and the false-negative rate is 13.64% with our proposed model using BuzzFeed dataset (refer Table 14 for more details). False-negative rate is just 13.04% and the false-negative rate is 9.52% with our proposed model using PolitiFact dataset (refer Table 15 for more details). Results motivate the researchers to use our proposed method-EchoFakeD in future for the classification of fake news in their research.
Table 14 False-positive rate (FPR) and false-negative rate (FNR) using BuzzFeed Table 15 False-positive rate (FPR) and false-negative rate (FNR) using PolitiFact Comparison with existing classification methods
From Tables 16 and 17, a comparison between existing classification benchmarks with our proposed model (EchoFakeD) has been shown. Table 16 shows the classification results with BuzzFeed dataset and Table 17 shows the classification results with PolitiFact dataset. Our proposed deep neural network has shown higher accuracy among all existing benchmark. False-positive rate and the false-negative rate are also less with our proposed model. Existing studies have primarily focused on the news content-based analysis. The problem of fake news has been investigated with not only the content-based attributes but also the relationship between news article and user on social media. Our approach is one step ahead of the existing one. In our approach, we have investigated the problem of fake news with an efficient deep neural network using the feature-vectors receiving from coupled matrix–tensor factorization method as a 3-mode tensor. In this method, a tensor is created using the social context of news articles with several existing communities in the network. This method improved the performance of fake news classification compared to the existing methods. Results further motivated us to use our deep neural network as compared to existing traditional methods for efficient results.
Table 16 Comparison with existing benchmarks with BuzzFeed Table 17 Comparison with existing benchmarks using PolitiFact Discussion
In Fig. 8, an example of fake news is shown. In this paper, we have performed extensive feature set-based studies for the classification of fake news. News content-based methods primarily focus on extracting different features from fake news articles, including both content-based (B) as well as style-based. Style-based methods mainly focus on the writing style of manipulators and creators (A) for the context of fake news. It is evident that for efficient fake news detection, content-based methodologies are alone not sufficient. We need to investigate the fake news articles with social context-based methods. Social context-based methods deal with the relationship among users, news article, and related publishers. These methodologies are efficient to recognize fake news articles. Social context (C) provides valuable information about users-based interaction with both the fake news as well as the real news. In the era of computing, at any social media platform, a user is always connected to a specific group of peoples having the same mindset or liking is called a user community (D). These user communities can be an essential factor for fake news classification due to their common perception about sharing articles. Therefore, we have designed an effective deep neural network combining (\(B+C+D\)) the content level features of news articles with user’s social engagement (echo-chamber infused) to achieve significant results. Subsequently, the tensor factorization-based approach has been used with content as well as context-based information.
In Fig. 8, an example of fake news is shown. In this paper, we have performed extensive feature set-based studies for the classification of fake news. News content-based methods primarily focus on extracting different features from fake news articles, including both content-based (B) as well as style-based. Style-based methods mainly focus on the writing style of manipulators and creators (A) for the context of fake news. It is evident that for efficient fake news detection, content-based methodologies are alone not sufficient. We need to investigate the fake news articles with social context-based methods. Social context-based methods deal with the relationship among users, news articles, and related publishers. These methodologies are efficient in recognizing fake news articles. Social context (C) provides valuable information about users-based interaction with fake news and real news. In the era of computing, at any social media platform, a user is always connected to a specific group of peoples having the same mindset or liking is called a user-community (D). These user communities can be an essential factor for fake news classification due to their common perception about sharing articles. Therefore, we have designed an effective deep neural network combining (\(B+C+D\)) the content level features of news articles with user’s social engagement (echo-chamber infused) to achieve significant results. Subsequently, the tensor factorization-based approach has been used with content as well as context-based information. In this paper, a user’s engagement with the news articles is captured and fused with user-community interaction to form a 3-mode tensor (content, social-context, and user-community information). This tensor is capable of handling multi-relational data and provides a higher dimensional generalization of matrices. Tensor factorization decomposes the higher-order tensor into low-rank tensors. The resulting low-rank tensors capture the complex relations between the objects representing the help of models of the tensor. Therefore, in this research, a coupled matrix–tensor factorization method is used with a latent representation of news articles’ news content and context. In the coupled matrix–tensor factorization method (also known as CP-decomposition), we have used the standard factorization method to decompose the matrix. To validate the classification performance, the proposed deep learning model (EchoFakeD) is employed with both contents and context-based information. Our model outperformed existing and appropriate baselines for fake news detection and achieved an accuracy of 92.30%.