Augmentation and heterogeneous graph neural network for AAAI2021-COVID-19 fake news detection

Karnyoto, Andrea Stevens; Sun, Chengjie; Liu, Bingquan; Wang, Xiaolong

doi:10.1007/s13042-021-01503-5

Augmentation and heterogeneous graph neural network for AAAI2021-COVID-19 fake news detection

Original Article
Published: 08 January 2022

Volume 13, pages 2033–2043, (2022)
Cite this article

Download PDF

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Augmentation and heterogeneous graph neural network for AAAI2021-COVID-19 fake news detection

Download PDF

Andrea Stevens Karnyoto ORCID: orcid.org/0000-0001-8266-2072¹,
Chengjie Sun¹,
Bingquan Liu¹ &
…
Xiaolong Wang¹

4702 Accesses
22 Citations
1 Altmetric
Explore all metrics

Abstract

Misinformation has become a frightening specter of society, especially fake news that concerning Covid-19. It massively spreads on the Internet, and then induces misunderstandings of information to the national and global communities during the pandemic. Detecting massive misinformation on the Internet is crucial and challenging because humans have struggled against this phenomenon for a long time. Our research concerns detecting fake news related to covid-19 using augmentation [random deletion (RD), random insertion (RI), random swap (RS), synonym replacement (SR)] and several graph neural network [graph convolutional network (GCN), graph attention network (GAT), and GraphSAGE (SAmple and aggreGatE)] model. We constructed nodes and edges in the graph, word-word node, and word-document node to graph neural network. Then, we tested those models in different amounts of sample training data to obtain accuracy for each model and compared them. For our fake news detection task, we found training accuracy steadily increasing for GCN, GAT, and SAGE models from the beginning to the end of the epochs. This result proved that the performance of GNN, whether GCN, GAT, or SAGE gained an entirely insignificant difference precision result.

GIN-FND: Leveraging users’ preferences for graph isomorphic network driven fake news detection

Article 24 July 2023

Intra-graph and Inter-graph joint information propagation network with third-order text graph tensor for fake news detection

Article 15 February 2023

Exploring Fake News Detection with Heterogeneous Social Media Context Graphs

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Misinformation has become a regular history of civilization, especially the rampant circulation of fake news. The spread of fake news on online media is very dangerous [1, 2]. Furthermore, the effects can lead to casualties [3], psychological effects [4, 5], character assassination [5], elections for political parties [6], and state chaos [7]. Fake news concerning Covid-19 spread massively resulted in information misunderstandings to the national and global communities during the pandemic. Detecting this misinformation on the Internet is a crucial task and challenging because humans are struggling against this phenomenon for long time ago.

Our research concerns detecting fake news that associated to covid-19 using the Constraint @ AAAI2021-COVID19 fake news detection in English dataset [8] and Natural Language Processing Approaches. We used the dataset for training and testing are released by the "Constraint shared task organizer” [9], which aims to fight fake news that related to COVID-19 on social media platforms like Facebook, Twitter, Instagram, and other popular news website. The dataset contains 10,700 social media posts, and their labels are categorized into real and fake; all text is written in English. Several previous studies have contributed to this Constraint @ AAAI2021—COVID19 fake news detection in English shared task utilized various methods. Such as Azhan et al. [10] applied pre-trained ULMFiT, Kakwani et al. [11] compiled the IndicGLUE benchmark, Baris et al. [12] proposed a modeling framework for those features by using BERT. Considering the challenging task and the number of research studies using this dataset, we decided to get involved in the COVID19 fake news detection shared-task [8] using a different method.

To contribute to the COVID19 fake news detection shared-task, we utilized easy data augmentation (EDA) [13] and graph neural network [14,15,16,17,18]. Data augmentation techniques have been employed in image processing, visual recognition, and text classification projects as it is simple to create and generate new data by straightforward and fast transformations [19]. Augmentation aims to increase the number of training data samples to reduce the overfitting of the model [20, 21]. The main concept of graph neural networks (GNNs) are neural models that can seize the relationship of graphs via messages passing between the nodes of graphs. GNNs have reached sophisticated performance in many research tasks such as predicting protein interface, learning molecular fingerprints, modeling physics systems, and modeling to analyzing disease. Also, it was used for graph classification tasks, link prediction, and many nodes classification [22]. Some research stated that GNN can gain good performance even on a small number of rows in the dataset [23, 24]. Many evolutions of GNNs like graph recurrent network (GRN), graph attention network (GAT), graph convolutional network (GCN) have proved innovative achievements on various deep learning projects [25].

Our contributions are as follows:

1.
For the input of graph models, we concatenated of word co-occurrence in the entire corpus (word-word edges) and word occurrence in documents (document-word edges) to create edges among nodes. Then, we utilized the term frequency-inverse document frequency (TF-IDF) of the word inside the document that has resulted from word node and document node edge weight. We tested three graph models to train datasets with and without text augmentation. Three graphs are graph convolutional network, graph attention network, and GraphSAGE (SAmple and aggreGatE). For the augmentation method, we utilized easy data augmentation [random deletion (RD), random insertion (RI), random swap (RS), synonym replacement (SR)]. GraphSAGE with data augmentation gave the highest precision and F1-score in our experiment.
2.
In order to prove the resistance of precision and the robustness, we trained our proposed method in various dataset sizes (30, 50, 80, and 100%) of Constraint @ AAAI2021—COVID19 fake news detection dataset shared task.

Additionally, as a study concerning graph neural network and text augmentation, our proposed approach improved precision even though the size of dataset training is limited. Also, it proved that our proposed model gains slightly different precision for all datasets that we are tested. We arranged the rest of the paper subsequently. Section 2 formally defines related work in fake news detection, text augmentations, and graph neural network. Our proposed method (the dataset, pre-processing, the main model, and the explanation of each graph neural network) is described in Sect. 3. And for Sect. 4, we explained the experiment and task. In Sect. 5, we present the result. We put the conclusion in Sect. 6 of this paper.

2 Related work

2.1 Fake news detection

The current state of misleading information prevention and fake news detection research reveals that most experiments used the textual level for their detection-based. It is extensively known in Natural Language Processing (NLP) tools [26]. Some fake news detection studies use Machine Learning [27,28,29], and others use Deep Learning [30, 31]. Generally, those techniques can be classified as Social Context-based learning and News Content-based learning. News content-based methods deal with the different writing styles of published news articles, focusing on extracting several texts that can be categorized as misleading information related to word order in a text and the writing style. Social context-based approaches deal with the latent connection words between the writer and news article. We can apply social engagements as a noteworthy feature for detecting fake news (to get the semantic connection between writers and their news articles) [32]. In the field of fake news detection research, many datasets have been utilized, such as PolitiFact [33, 34], Fake news Kaggle [27, 35], the fake news challenge (FNC-1) [36, 37], and Constraint@AAAI2021—COVID19 fake news detection [9,10,11,12]. Ahmad et al. [27] used Machine Learning Ensemble Methods that consisting of Random Forest (RF), Support Vector Machine, and Logistic Regression to develop fake news detection research. Monti et al. [30] proposed a method that works on graph-structured data. They claimed that their approach is a novel class of deep learning techniques that exploit deep geometric learning to learn fake news-specific propagation patterns. Konkobo et al. [33] created a model to distill expression of users’ comments and opinions. Their work evaluated users' credibility and built a small network using the CredRank algorithm toward a piece of given news.

2.2 Text augmentation

Augmentation has helped numerous classification tasks. The data structure should be added and modified to increase the number of row samples and the possibility of incomplete samples availability. More diversity patterns fed into the model during training will improve model generalizes. As a result, the system will give better prediction when performed by using new examples. Many websites offer automatic paraphrasing nowadays, but the most reliable way to augment sentences is by employing experts who have language skills/native speaker even though it is costly. Textual data augmentation requires language modeling rules written by experts in linguistics, but this task is still challenging due to human resources limitations and low-resource languages. We observed that many studies had been done to handle augmentation problems in text classification, for example, Word2vec-based augmentation [38], WordNet-based [13], Semantic-enriched Representation [39], and many more.

In this paper, we used “Easy data augmentation techniques for boosting performance on text classification tasks [13]” to complete our experiments. EDA using WordNet to generated random insertions and all synonyms for synonym replacements. It uses 300-dimensional word embeddings trained using GloVe. First, induced some noise to helps prevent overfitting by producing augmented data that is alike to original data. Second, this approach introduces new words as a substitute when synonym replacement and random insertion operations are activated, approving models to induce vocabulary in the test set that was not in the training set. EDA provides several features that result in text augmentations: (1) random deletion (RD): base on probability p, randomly remove individual words in the text. (2) Random insertion (RI): replace random words with random synonyms that nearest meaning other than those that are stopwords. Put that synonym into any position. Repeat it for n times. (3) Random swap (RS): swap two words are chosen randomly in the text, and it will change the position. Repeat it for n times. (4) Synonym replacement (SR): randomly pick n words from the text other than is stopwords. Then change those words with a random synonym. Example result of EDA (Table 1):

Table 1 Example sentence are resulted by easy data augmentation

Full size table

2.3 Graph neural network

Recently, graph classifications have been solved a significant task and achieved excellent performance for many sophisticated applications. Many researchers from various organizations have developed graph neural networks effectively, relying on graph embeddings to preserving global structure knowledge, and it is based on relational structure. A new research area that uses graph embeddings and graph neural networks has moved to broad field of research [40, 41]. Even they expand their research further. For example, Zhang et al. [42] facilitated regular convolution operations computing subgraphs in the quantum walk to build subgraph convolutional network (SCN). Zhang et al. [43], based on Weisfeiler–Lehman (WL) algorithm, built the SortPooling layer-based to get vertex information globally and locally for recent deep graph convolutional neural network (DGCNN) model as a result. On text classification, some modern studies investigated graph neural networks [15, 44,45,46,47]. However, they also surveyed a text or a sentence as a graph of word nodes [45,46,47] or made articles citation relation to build the graph for the not-routinely-available [15].

Our work is based on the following method:

GCN Gori et al. [24] and Scarselli et al. [48] created recurrent neural networks based on graphs as neural networks relation. Later, Li et al. (2016) assembled the original graph neural network framework into recurrent neural network training in sophisticated practices. Duvenaud et al. [49] by using graphs and methods for graph-level analysis to made a convolution-like propagation rule. Kipf et al. [15] developed convolutional neural networks based on the spectral graph; this idea first time is introduced by Bruna et al. [50], and then it was extended by Defferrard et al. [45] using fast localized convolutions. Last, Liang Yao et al. [14], try to build a single text graph on word co-occurrence in the corpus-based; they implemented GCN for text classification.

GAT Graph attention network can be categorized as a partial part of MoNet [30]. It is also made edges vector sharing based on neural network relation. It is implicative of the formulation of relational systems [51] and VAIN [52], wherein relationships between agents or objects are aggregated pair-wise by applying a shared work system. As another attention model, this approach can be related to Denil et al. [54] and Duan et al. [53], which employ a nearest node attention operation. Finally, Veličković et al. [16] introduced the recent neural network designs that work on graph-structured data, which are called graph attention networks (GATs).

GraphsSAGE This approach has been successfully implemented for a wide range of purposes. GraphsSAGE (SAmple and aggreGatE) conceptually related to node embedding approaches [55,56,57,58,59], supervised learning over graphs [23, 24], and graph convolutional networks [45, 49, 50]. GraphSAGE [17] to train a model that produces embeddings uses leverage feature information for node embedding approaches toward unseen nodes. For supervised learning, GraphSAGE generating useful representations for individual nodes. Furthermore, for the graph convolution network, an approach can be seen as an enlargement of the GCN framework to the inductive environment.

3 Proposed method

3.1 Dataset statistics

The Constraint @ AAAI2021—COVID19 fake news detection in English dataset [8] provided the shared task dataset, which contains 10,700 humans interpreted from media articles and posts are acquired from multiple platforms. It is divided into data training (6420 rows), validation (2140 rows), and test (2140 rows). The unique words in the training dataset are 30,046, most length tokens are 1481, and balance data distribution for Real and Fake labels. The dataset contains the post ID, tweet, and label fields (Table 2).

Table 2 Data distribution for constraint @ AAAI2021—COVID19 fake news detection

Full size table

Table 3 shows post tweets containing URL, Mention, Retweet, Hashtag, HTML special entities, and Number.

Table 3 Some post fake and real

Full size table

3.2 Data preprocessing

First, we executed our tweet preprocessing and text preprocessing to remove useless punctuation marks for text classification. We kept symbols’@’ and’#’ due to those have specific semantics in tweets. Second, we transformed the text into lowercase and replaced URLs, mentions, and emojis into unique tokens. Third, we used the Python emoji library to displace the emoji with a short textual description: redheart:,:thumbsup:, etc. Furthermore, we converted hashtags into words (“#COVID” → ”COVID”).

3.3 Augmentation

We are using Easy Data Augmentation Techniques, which be made up of four robust yet straightforward operations: random deletion (RD), random insertion (RI), random swap (RS), synonym replacement (SR) [13]. In our case, generating original sentences into more than four augmented sentences was not helpful since models tend to generalize properly when a massive number of samples are available. According to easy data augmentation [13] paper, the following are recommendations usage parameters for EDA tools in Table 4.

Table 4 Recommended usage parameters

Full size table

The Alpha (α) parameter roughly means "percent of words in sentence changed by each augmentation." N_aug is the number of augmented sentences that were generated using EDA per original sentence. Since our N train is more than 5000 rows, we used parameters α = 0.1 and N_aug = 4 to produce additional rows for SR, RI, RS, and RD.

3.4 Graph convolutional networks (GCN)

A graph convolutional network (GCN) is a multilayer neural network based on the properties of their neighborhoods to induce embedding vectors of nodes and operates directly on a graph. The base formulation is G = (Ѵ, Ɛ), where Ɛ is sets of edges and Ѵ with |Ѵ |= n is set of nodes. With edges (v_i; v_j) ϵ Ɛ, N nodes v_i ϵ Ѵ, a degree matrix D_ii = Ʃ_j A_ij and an adjacency matrix A ϵ ℝ^{N x N} (binary or weighted).

Figure 1 shows each document node connected to many word nodes by edges. Also, word nodes are connected to many document nodes by edges. Document nodes start with "O," and others nodes are described as word nodes. Document-word edges are drawn in bold black edges, and magenta edges are word-word edges. X is a matrix of node feature vectors (words in sentence). R(x) is representation (embedding) of X. Dotted lines are document classes (two classes are shown as described in a different color).

As shown in Fig. 1, a significant and heterogeneous Text GCN [14] contains document nodes and word nodes. It makes global word co-occurrence can be clearly modeled, and graph convolution can be easily adjusted. The sum of the number of unique words (vocabulary size) in a corpus and the number of documents (corpus size) yields the number of nodes in the text graph |V|. A one-hot vector as the input from every word or document is interpreted as feature matrix X for graph convolution network. The concatenate of word co-occurrence in the whole corpus (word-word edges), and word occurrence in documents (document-word edges) created edges among nodes. The term frequency-inverse document frequency (TF-IDF) of the word in the document has resulted from word node and document node edge weight.

Graph convolution neural network was the introduction in GCN, and the formula is defined by dgl.ai [18] as follows:

$$h_{i}^{{\left( {l + 1} \right)}} = \sigma \left( {b^{\left( l \right)} + \mathop \sum \limits_{j \in {\mathscr{N}}\left( i \right)} \frac{1}{{c_{ji} }}h_{j}^{\left( l \right)} W^{\left( l \right)} } \right)$$

where Ɲ(i) is the set of neighbors of node i, c_ji is the product of the square root of node degree (i.e., c_ji = $\sqrt{{\mathscr{N}}(\mathrm{i})} \sqrt{{\mathscr{N}}(\mathrm{i})}$). $\sigma$(·) is an activation function; such as the ReLU(·) = max (0; ·). It will apply an activation function to the updated node features. ${h}_{j}^{(l)}$ is the current input feature. ${h}_{i}^{(l+1)}$ is the current output feature. H^(l) ϵ ℝ^{N × M} is the matrix of activation in the lth layer; and H⁽⁰⁾ = X. C_ji is the way to apply normalizer. There are three types of normalizers are ‘right,’ ‘none,’ and ‘both’. If the normalizer is ‘right’, divide the aggregated messages by each node in degrees, which is equivalent to averaging the received message. If the normalizer is 'none,' no normalization is applied. In our case we are using ‘both’. ${b}^{\left(l\right)}$ is bias, and function will add a learnable bias to the output. W^(l) is a layer-specific trainable weigh matrix. If a weight tensor on each edge is provided, the weighted graph convolution is defined as:

$${h}_{i}^{(l+1)}=\sigma ({b}^{\left(l\right)}+\sum_{j\in {\mathscr{N}}(i)}\frac{{e}_{ji}}{{c}_{ji}}{h}_{j}^{(l)}{W}^{(l)})$$

where ${e}_{ji}$ is the scalar weight on edge from node j to node i. Also, ${e}_{ji}$ is NOT variable which is equivalent to the weighted graph convolutional network formulation.

3.5 Graph attention networks

The input to our GAN layer is a set of node features, $h=\left\{{\overrightarrow{h}}_{1},{\overrightarrow{h}}_{2},\dots ,{\overrightarrow{h}}_{N}\right\}, {\overrightarrow{h}}_{i}\in {\mathbb{R}}^{F}$, where N is the number of nodes, and F is the number of features in each node. We used the same method to transform features into nodes as shown in Fig. 1 in the word document graph box. The output of first layer is set of node feature, ${h}^{i}=\left\{{\overrightarrow{h}}_{1}^{^{\prime}},{\overrightarrow{h}}_{2}^{^{\prime}},\dots ,{\overrightarrow{h}}_{N}^{^{\prime}}\right\}, {\overrightarrow{h}}_{i}^{^{\prime}}\in {\mathbb{R}}^{{F}^{^{\prime}}}$. According to dgl.ai [18], we use the following equation to get the value of each edge:

$${e}_{ij}^{l}=LeakyReLU({\overrightarrow{a}}^{T}|{Wh}_{i}||{Wh}_{j}|)$$

W is weight matrix that is produced from shared linear transformation, $W\in {\mathbb{R}}^{{F}^{^{\prime}}xF}$, it is applied to every node. ${\overrightarrow{a}}^{T}$ is self-attention on the nodes, it is an attentional mechanism, ${\overrightarrow{a}}^{T}: {\mathbb{R}}^{{F}^{^{\prime}}}x {\mathbb{R}}^{{F}^{^{\prime}}}\to {\mathbb{R}}$. ${e}_{ij}^{l}$ is an edges values as attention coefficients. $LeakyReLU$ is Leaky Rectified Linear Unit. It is a type of activation function based on a ReLU, but it has a small slope for negative values instead of a flat slope.

$${\alpha }_{ij}^{l}={softmax}_{i}({e}_{ij}^{l})$$

where ${\alpha }_{ij}^{l}$ is the attention score between node i and node j. $Softmax$ is normalized exponential function as logistic function to get regression values from ${e}_{ij}^{l}$.

$${h}_{i}^{(l+1)}=\sigma (\sum_{j\in {\mathscr{N}}(i)}{\alpha }_{i,j}{h}_{j}^{(l)}{W}^{(l)})$$

where, ${h}_{j}^{(l)}$ is the current input feature. ${h}_{i}^{(l+1)}$ is the current output feature.

3.6 GraphSAGE

GraphSAGE uses forward propagation for embedding generation. The GraphSAGE approach begins by considering the model has already been trained, and the weight matrices and aggregator function parameters are set. For each node, this algorithm aggregates information from the node’s neighbors iteratively, the node’s neighbors-neighbors, and so on. In this paper, we used the same method to transform features into nodes as shown in Fig. 1 in the word document graph box. According to dgl.ai [18], explanation of GraphSAGE in next following equation:

$${h}_{N(i)}^{(l+1)}=aggregate(\{{h}_{j}^{l},{\forall }_{j}\in \mathcal{N}(i)\})$$

$${h}_{i}^{(l+1)}=\sigma (W\cdot concat({h}_{i}^{l},{h}_{N(i)}^{l+1}))$$

$${h}_{i}^{(l+1)}=norm({h}_{i}^{l})$$

where, ${h}_{j}^{(l)}$ is the current input feature. ${h}_{i}^{(l+1)}$ is the current output feature. $\sigma$(·) is an activation function. Aggregation step (l) depends on the representations generated at the previous iteration. After aggregating the neighboring feature vectors, GraphSAGE then concatenates the node’s current representation ${h}_{N(i)}^{l+1}$, with the aggregated neighborhood vector ${h}_{i}^{l}$, and this concatenated vector is supplied through a fully connected layer with nonlinear activation function σ, which implements the representations to be applied at the next step. If a weight tensor on each edge is provided, the aggregation becomes:

$${h}_{N(i)}^{(l+1)}=aggregate(\{{e}_{ji}{h}_{j}^{l},{\forall }_{j}\in \mathcal{N}(i)\})$$

where ${e}_{ij}$ is the scaler weight on the edge from node j to node i. ${e}_{ij}$ is broadcastable with ${h}_{j}^{l}$.

4 Experiment and task

4.1 Experiment setup

The experiments ran on Intel Core (TM) i7 8700, 6 core 3.20 GHz Processor, 16 GB RAM, Nvidia GeForce GTX 1050 Ti GPU 4 GB. Base program and training use Python 3.7.8 and TensorFlow Version: 2.1.0. We used TfidfVectorizer from NLTK 3.5 and Cosine Similarity from Scipy 1.4.1 for preprocessing and get the distance between words. We used dgl.ai as the main tools for build the graph. Several important hyper-parameters determine this architecture: training model learning-rate = 0.01, epoch 15, batch-size 16.

4.2 Description of task

To gain comprehensive results and prove our proposed method got good precision, we tested models for various training dataset sizes. We took 30, 50, 80, and 100 percent of the total dataset rows, then those are referred to Train-30, Train-50, Train-80, and Train-100, respectively. Moreover, for datasets that use augmentation, we defined them as Train-30 + Aug, Train-50 + Aug, Train-80 + Aug, and Train-100 + Aug. Next, we did preprocess and augmentation as explains in the proposed method. In this process, the dataset is processed to produced more rows due to the addition of random deletion (RD), random insertion (RI), random swap (RS), and synonym replacement (SR). Then, we ran the remove-word process and build-graph to construct nodes and edges in the graph, word-word node, and word-document node. Furthermore, we tested each model on the testing dataset, and the results can be seen in the next chapter.

5 Result and analysis

This section presents the experiment results and analysis from the proposed method and experiments and tasks that we detailed in previous chapters. We arrange the result and analysis base on experiments order and it contain discussion for each table and figure.

5.1 Statistic and precision result

Table 5 shows the dataset statistic after preprocessing and augmentation process. The training dataset rows number with augmentation is always five times more than training data without augmentation because we used the original text and added four texts that are generated using SR, RI, RS, and RD. The increasing number of tokens after augmentation shows that our augmentation method successfully generates more new vocabulary in the text. The max number of tokens column indicates that the augmentation makes the text longer on average.

Table 5 Dataset statistic after preprocessing and augmentation

Full size table

We utilized precision and F1-score as metric to compare the reliability of each model. Precision is important because it represents closeness between label prediction and actual label that the model can guess correctly. And the F1-score is important as it represents the mean of precision and accuracy. As displayed in Table 6, we used non-graph model network (LSTM and CNN) as baseline, and the three graph methods (GCN, GAT, GraphSAGE). Graph group with Train-100 + Aug dataset obtained the highest precision, recall, and f1-score due to more diverse vocabulary than other datasets as shown in Table 5. In contrast to LSTM and CNN, the two non-graph networks got the highest results on Train-80 + Aug. We assume that this is due to training structure model differences even though GNN and the non-graph networks use TfidfVectorizer as primary process input. The worse precision was obtained using Train-30, Train-30, and Train-80 for GCN, GAT, and GraphSAGE, respectively. We assume Train-80 in the GraphSAGE group got worse result because of the epoch iterations for this condition is still underfitting in 15 epochs.

Table 6 Precision and F1-score of testing results

Full size table

5.2 Effect of augmentation operation on precision

We tested each augmentation operation on models for comprehensive results. We took Train-100 + Aug dataset and then tested selected operation one by one. After that we discussed which operation has the most impact on testing precision.

Table 7 shows that using the whole operation yield the best results. When only one operator was activated, random insertion (RI) got the highest and closest precision to the precision of the entire operation was activated. This result is due to RI adding new vocabulary in the text without eliminating other existing vocabulary. The lowest precision was obtained by random deletion (RD) because the drawback of one of the words in the text will affects the quality of the training process.

Table 7 Comparison between operation

Full size table

5.3 Model framework analysis

Table 6 shows graphSAGE gained the highest precision in almost every testing group result. This condition happened because the main structure of graphSAGE is a general inductive framework. The graph model can start their node simultaneously in one complete text and go deep to the most important nodes. The graphSAGE mechanism works by generating embedding using samples and aggregators from neighboring nodes for the beginning process. In our case, this mechanism creeps every salient word in the text, with our TF-IDF approach making it easy to find a node sample. In other words, unlike standard convolution in GCN, graphSAGE also uses multiple aggregators. The aggregator function can be formed from the Mean aggregator, LSTM aggregator, or pooling aggregator. This kind of graph does not use all their neighbors but utilizes fixed-size by uniform sampling. The sampling method uses task-dependent heuristics and applies uniform sampling and node as one of the sampling methods. GraphSAGE also replaces complete Laplacian graphs with learnable aggregations, allowing graphSAGE to select or skip hidden nodes or select the most valuable nodes. Finally, so it looks like graphSAGE uses a random neighbor sampling method to alleviate receptive field expansion. This method is different from GAT and GCN, which utilize all of their neighbors, making GAT and GCN more time-consuming and unsuitable for massive Graph structures. For our case, graphSAGE is more relevant and robust.

Table 6 also shows the most model that gains the lowest precision so often is GAT. Although the attention mechanism has been effectively applied in sequence-based applications such as machine translation, machine reading, and so on, the attention mechanism for GNN yielded the lowest result in our experiments. Compared to standard GCN, which treats all neighbors node equally, the attention mechanism sets different attention scores and identifies the most important neighbor. We assume that GAT got the lowest score because GAT has a more profound ability to rearrange the vocabulary of its neighbor and gives less precise value to each word during the learning process, and it created some mistakes in weighted.

5.4 Comparison between same models for training accuracies

This sub section shows the result and comparison for experiments of all datasets from three models.

As shown in Fig. 2, the accuracies are pretty similar for all models at the end of the epoch. Train-30 + Aug Epoch got high precision in every model, and this figure shows that Train-30 + Aug raised the stability in accuracy escalation during the training dataset time. The worse accuracies are Train + 50, Train + 50, Train + 50, and Train + 80 for GCN, GAT, and GraphSAGE. All dataset accuracy results that produce the worst outcomes are datasets without augmentation. Meanwhile, datasets which gain highest accuracy are datasets with augmentation, even though those datasets do not use 100 percent rows.

5.5 Comparison between different the number of samples in different models

Different from Table 6, which shows datasets that got the highest precision, recall, and f1-score for testing dataset. Furthermore, we compare and discuss GCN, GAT, and GraphSAGE toward Train-100 + Aug dataset for training dataset.

According to Fig. 3, we found a steady accuracy increase for GCN, GAT, and GraphSAGE models in our fake news detection task start from the beginning to the end of epochs. As we can see, the difference in accuracy between each result is also very close one to another, it is only about 0.0096. This result proved that the performance of GNN, whether GCN, GAT, or GraphSAGE produced an entirely insignificant difference. According to Table 6, we found the same precision result for GCN and GAT, but the highest is the GraphSAGE model. We hypothesize these small gaps precision between models due to exceptionally trained datasets, which producing a helpful data encoding.

5.6 Effect of sentence length

In our testing dataset, there are 1210 rows (56.54%) contain 15 words or less, 878 rows (41.02%) have 16–30 words, 47 rows (2.19%) contain 31–45 words and five rows (0.23%) contain more than 45.

Table 8 shows the effect of sentence length for each model towards Train-100 + Aug dataset. The sentences that contain 16–30 words got better accuracy for most models. All models have the same accuracy trend and a small range between them, which shows that ability of each model's detection is not significantly different. Moreover, sentences that contain over 45 words have the lowest accuracy due to the uneven distribution of data.

Table 8 Effect of sentences length toward classification accuracies

Full size table

5.7 Most common terms in fake news our model can detect

We also explored the most frequent words in our fake news dataset that our model can detect. However, we found the terms frequently used in fake news are similar to those that appeared most often in the entire dataset.

Figure 4 shows the most frequently used words in the dataset that our model can detect. The context of every line in the dataset may contain more than one most common word. Because the dataset is related to covid-19, we only present the essential terms related to covid-19. The words "Covid," "Cases," and "Coronavirus" are the most frequently used words in this fake news dataset, but in fact those words are common to appear in the news media and attract readers the most. After the "India" term, the next frequently used words do not significantly differ in number.

6 Conclusion

Detecting this misinformation on the Internet is a crucial task and challenging because humans have struggled against this phenomenon for a long time. People also faced difficulty detecting fake news based on context. To deal with the COVID19 fake news detection shared-task, we used easy data augmentation (EDA) and Graph neural network (graph convolutional network (GCN), graph attention network (GAT), and GraphSAGE (SAmple and aggreGatE)). We took 30, 50, 80, and 100 percent of the total dataset rows and then augmented them. The Train-100 + Aug dataset obtained the highest precision, recall, and f1-score due to a more diverse vocabulary than other datasets. The worse precision was obtained using train-30, train-30, and train-80 for GCN, GAT, and GraphSAGE. The precision is slightly different. For our case, graphSAGE is more relevant and robust. GraphSAGE replaces complete Laplacian graphs with learnable aggregations, allowing graphSAGE to select or skip hidden nodes or select the most valuable nodes. It looks like graphSAGE uses a random neighbor sampling method to alleviate receptive field expansion. It is different with GAT and GCN, which utilize all of their neighbors; GraphSAGE is less time-consuming. The sentences that contain 16–30 words got better accuracy for most models. The words "Covid," "Cases," and "Coronavirus" are the most frequently used words in this fake news dataset.

References

Shu K, Mahudeswaran D, Wang S, Lee D, Liu H (2020) Fakenewsnet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 8(3):171–188
Article Google Scholar
Riedel B, Augenstein I, Spithourakis GP, Riedel S (2017) A simple but tough-to-beat baseline for the fake news challenge stance detection task. arXiv preprint http://arxiv.org/abs/1707.03264
Apuke OD, Omar B (2020) Fake news proliferation in Nigeria: consequences, motivations, and prevention through awareness strategies. Humanit Soc Sci Rev 8(2):318–327
Google Scholar
Ozbay FA, Alatas B (2020) Fake news detection within online social media using supervised artificial intelligence algorithms. Physica A 540:123174
Article Google Scholar
Pulido CM, Ruiz-Eugenio L, Redondo-Sama G, Villarejo-Carballido B (2020) A new application of social impact in social media for overcoming fake news in health. Int J Environ Res Public Health 17(7):2430
Article Google Scholar
Maldonado MA (2019) Understanding fake news: technology, affects, and the politics of the untruth. Historia y Comunicación Social 24(2):533
Article Google Scholar
Waisbord S (2018) Truth is what happens to news: on journalism, fake news, and post-truth. J Stud 19(13):1866–1878
Google Scholar
Constraint-shared-task-2021. Available: https://constraint-shared-task-2021.github.io/ (current April 2021)
Akhtar MS, Chakraborty T (2021). Overview of constraint 2021 shared tasks: detecting English covid-19 fake news and Hindi hostile posts. In: Combating online hostile posts in regional languages during emergency situation: first international workshop, CONSTRAINT 2021, collocated with AAAI 2021, virtual event, February 8, 2021, Revised Selected Papers (p. 42). Springer Nature
Azhan M, Ahmad M (2021) LaDiff ULMFiT: a layer differentiated training approach for ULMFiT. arXiv preprint http://arxiv.org/abs/2101.04965
Kakwani D, Kunchukuttan A, Golla S, Gokul NC, Bhattacharyya A, Khapra MM, Kumar P (2020) iNLPSuite: monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In: Proceedings of the 2020 conference on empirical methods in natural language processing: findings, pp 4948–4961
Baris I, Boukhers Z (2021) ECOL: early detection of COVID lies using content, prior knowledge and source information. arXiv preprint http://arxiv.org/abs/2101.05499
Wei J, Zou K (2019) Eda: easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint http://arxiv.org/abs/1901.11196
Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, no 01, pp 7370–7377
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint http://arxiv.org/abs/1609.02907
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint http://arxiv.org/abs/1710.10903
Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. arXiv preprint http://arxiv.org/abs/1706.02216
Wang M, Zheng D, Ye Z, Gan Q, Li M, Song X, Zhang Z et al (2019) Deep graph library: a graph-centric, highly-performant package for graph neural networks. arXiv preprint http://arxiv.org/abs/1909.01315
Giridhara PKB, Mishra C, Venkataramana RKM, Bukhari SS, Dengel A (2019) A study of various text augmentation techniques for relation classification in free text. ICPRAM 3:5
Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint http://arxiv.org/abs/1810.00826
Dai H, Dai B, Song L (2016) Discriminative embeddings of latent variable models for structured data. In: International conference on machine learning. PMLR, pp 2702–2711
Gori M, Monfardini G, Scarselli F (2005) A new model for learning in graph domains. In: Proceedings. 2005 IEEE international joint conference on neural networks, 2005, vol 2. IEEE, pp 729–734
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Sun M et al (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81
Article Google Scholar
Granskogen T (2018) Automatic detection of fake news in social media using contextual information. Master's thesis, NTNU
Ahmad I, Yousaf M, Yousaf S, Ahmad MO (2020) Fake news detection using machine learning ensemble methods. Complexity 2020:8885861
Google Scholar
Gilda S (2017) Notice of violation of IEEE publication principles: evaluating machine learning algorithms for fake news detection. In: 2017 IEEE 15th student conference on research and development (SCOReD). IEEE, pp 110–115
Aphiwongsophon S, Chongstitvatana P (2018) Detecting fake news with machine learning method. In: 2018 15th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 528–531
Monti F, Frasca F, Eynard D, Mannion D, Bronstein MM (2019) Fake news detection on social media using geometric deep learning. arXiv preprint http://arxiv.org/abs/1902.06673
Sahoo SR, Gupta BB (2021) Multiple features based approach for automatic fake news detection on social networks using deep learning. Appl Soft Comput 100:106983
Article Google Scholar
Kaliyar RK, Goswami A, Narang P (2021) FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimedia Tools Appl 80(8):11765–11788
Article Google Scholar
Konkobo PM, Zhang R, Huang S, Minoungou TT, Ouedraogo JA, Li L (2020) A deep learning model for early detection of fake news on social media. In: 2020 7th international conference on behavioural and social computing (BESC). IEEE, pp 1–6
Oriola O (2021) Exploring N-gram, word embedding and topic models for content-based fake news detection in FakeNewsNet evaluation. Int J Comput Appl 975:8887
Google Scholar
Shakeel D, Jain N (2021) Fake news detection and fact verification using knowledge graphs and machine learning
Xu J, Zadorozhny V, Zhang D, Grant J (2020) FaNDS: fake news detection system using energy flow. arXiv preprint http://arxiv.org/abs/2010.02097
Hassan FM, Lee M (2020) Multi-stage news-stance classification based on lexical and neural features. In: Conference on complex, intelligent, and software intensive systems. Springer, Cham, pp 218–228
Yan B, Janowicz K, Mai G, Gao S (2017) From itdl to place2vec: reasoning about place type similarity and relatedness by learning embeddings from augmented spatial contexts. In: Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 1–10
Lu X, Zheng B, Velivelli A, Zhai C (2006) Enhancing text categorization with semantic-enriched representation and training data augmentation. J Am Med Inform Assoc 13(5):526–535
Article Google Scholar
Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, Pascanu R et al (2018) Relational inductive biases, deep learning, and graph networks. arXiv preprint http://arxiv.org/abs/1806.01261
Cai H, Zheng VW, Chang KCC (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637
Article Google Scholar
Zhang Z, Chen D, Wang J, Bai L, Hancock ER (2019) Quantum-based subgraph convolutional neural networks. Pattern Recogn 88:38–49
Article Google Scholar
Zhang M, Cui Z, Neumann M, Chen Y (2018) An end-to-end deep learning architecture for graph classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, no 1
Henaff M, Bruna J, LeCun Y (2015) Deep convolutional networks on graph-structured data. arXiv preprint http://arxiv.org/abs/1506.05163
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. arXiv preprint http://arxiv.org/abs/1606.09375
Peng H, Li J, He Y, Liu Y, Bao M, Wang L, Yang Q et al (2018) Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In: Proceedings of the 2018 world wide web conference, pp 1063–1072
Zhang Y, Liu Q, Song L (2018) Sentence-state lstm for text representation. arXiv preprint http://arxiv.org/abs/1805.02474
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Article Google Scholar
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, Gómez-Bombarelli R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. arXiv preprint http://arxiv.org/abs/1509.09292
Bruna J, Zaremba W, Szlam A, LeCun Y (2013) Spectral networks and locally connected networks on graphs. arXiv preprint http://arxiv.org/abs/1312.6203
Santoro A, Raposo D, Barrett DG, Malinowski M, Pascanu R, Battaglia P, Lillicrap T (2017) A simple neural network module for relational reasoning. arXiv preprint http://arxiv.org/abs/1706.01427
Hoshen Y (2017) Vain: attentional multi-agent predictive modeling. arXiv preprint http://arxiv.org/abs/1706.06122
Duan Y, Andrychowicz M, Stadie BC, Ho J, Schneider J, Sutskever I, Zaremba W et al (2017) One-shot imitation learning. arXiv preprint http://arxiv.org/abs/1703.07326
Denil M, Colmenarejo SG, Cabi S, Saxton D, de Freitas N (2017) Programmable agents. arXiv preprint http://arxiv.org/abs/1706.06383
Cao S, Lu W, Xu Q (2015) Grarep: learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067–1077
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1225–1234

Download references

Funding

NLP Lab.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Andrea Stevens Karnyoto, Chengjie Sun, Bingquan Liu & Xiaolong Wang

Authors

Andrea Stevens Karnyoto
View author publications
You can also search for this author in PubMed Google Scholar
Chengjie Sun
View author publications
You can also search for this author in PubMed Google Scholar
Bingquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Stevens Karnyoto.

Ethics declarations

Conflict of interest

The authors (Andrea Stevens Karnyoto, Chengjie Sun, Bingquan Liu, Xiaolong Wang) of the paper (Title: Augmentation and Heterogeneous Graph Neural Network for AAAI2021-COVID19 Fake News Detection) declare that there is no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karnyoto, A.S., Sun, C., Liu, B. et al. Augmentation and heterogeneous graph neural network for AAAI2021-COVID-19 fake news detection. Int. J. Mach. Learn. & Cyber. 13, 2033–2043 (2022). https://doi.org/10.1007/s13042-021-01503-5

Download citation

Received: 21 May 2021
Accepted: 29 December 2021
Published: 08 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s13042-021-01503-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Augmentation and heterogeneous graph neural network for AAAI2021-COVID-19 fake news detection

Abstract

Similar content being viewed by others

GIN-FND: Leveraging users’ preferences for graph isomorphic network driven fake news detection

Intra-graph and Inter-graph joint information propagation network with third-order text graph tensor for fake news detection

Exploring Fake News Detection with Heterogeneous Social Media Context Graphs

1 Introduction

2 Related work

2.1 Fake news detection

2.2 Text augmentation

2.3 Graph neural network

3 Proposed method

3.1 Dataset statistics

3.2 Data preprocessing

3.3 Augmentation

3.4 Graph convolutional networks (GCN)

3.5 Graph attention networks

3.6 GraphSAGE

4 Experiment and task

4.1 Experiment setup

4.2 Description of task

5 Result and analysis

5.1 Statistic and precision result

5.2 Effect of augmentation operation on precision

5.3 Model framework analysis

5.4 Comparison between same models for training accuracies

5.5 Comparison between different the number of samples in different models

5.6 Effect of sentence length

5.7 Most common terms in fake news our model can detect

6 Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation