Disinformation detection on social media: An integrated approach

Rastogi, Shubhangi; Bansal, Divya

doi:10.1007/s11042-022-13129-y

Disinformation detection on social media: An integrated approach

Published: 12 May 2022

Volume 81, pages 40675–40707, (2022)
Cite this article

Download PDF

Multimedia Tools and Applications Aims and scope Submit manuscript

Disinformation detection on social media: An integrated approach

Download PDF

3140 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

The emergence of social media platforms has amplified the dissemination of false information in various forms. Social media gives rise to virtual societies by providing freedom of expression to users in a democracy. Due to the presence of echo chambers on social media, social science studies play a vital role in the spread of false news. To this aim, we provide a comprehensive framework that is adapted from several scholarly studies. The framework is capable of detecting information into various types, namely real, disinformation and satire based on authenticity as well as intention. The process highlights the use of interdisciplinary approaches derived from fundamental theories of social science and integrating them with modern computational tools and techniques. Few of these theories claim that malicious users suggest writing fabricated content in a different style to attract the audience. Style-based methods evaluate the intention i.e., the content is written with an intent to mislead the audience or not. However, the writing style can be deceptive. Thus, it is important to involve user-oriented social information to improve model strength. Therefore, the paper used an integrated approach by combining style based and propagation-based features with a total of thirty-one features. The extracted features are divided into ten categories: relative frequency, quantity, complexity, uncertainty, sentiment, subjectivity, diversity, informality, additional, and popularity. The features have been iteratively utilized by supervised classifiers and then selected the best-correlated ones using the ANOVA test. Our experimental results have shown that the selected features are able to distinguish real from disinformation and satirical news. It has been observed that the Ensemble machine learning model outperformed other models over the developed multi-labelled corpus.

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Fake news on Social Media: the Impact on Society

Article Open access 19 January 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, the prevalence of social media platforms has acted as a catalyst and resulted in an exponential spike in popularity [29]. Thus, societies and social behavior can now be mapped on these online social networks (OSNs) which in turn give rise to virtual societies. By studying these virtual societies, human behavior patterns can be analyzed to gauge a better understanding of societies by leveraging computational tools [52]. However, the absence of fact-checking of the posts and the unregulated nature of the internet makes social media a fertile ground for the spread of unverified and false information. Since the amount of data posted on social media is huge, manual fact-checking is not a feasible solution. Therefore, computational tools can be very effective in developing techniques to counter false news in an automated manner. To this aim, we provide a comprehensive framework that has been adapted from several scholarly studies. The framework is capable of detecting and classifying information into various types: real, disinformation and satire. The process highlights the use of interdisciplinary approaches deriving from fundamental theories of social sciences and integrating them with modern computational tools and techniques. The spreading of disinformation is an old problem and with the use of OSNs, the spread has become exponential. There do exist theories in social science that have the solution but have not been considered while developing tools and techniques to fight these issues. This study combines the two concepts and generalizes them into four perspectives, namely knowledge-based, style-based, propagation-based and source-based. For instance, as per psychology, a fabricated text with an intent to harm the public is written differently as compared to real text (Undeutsch hypothesis) [5]. Throughout history, false news has been used deliberately to manipulate the beliefs and opinions of people. Ancient Indian mythology, ’Mahabharata’ has the earliest reference of fake news wherein false news was spread to kill Dronacharya, the guru of the Pandavas and Kauravas [7]. Also, during World War II, the allied forces planned ’Operation Mincemeat’ which was a successful British deception operation. Earlier fake news had limited impact but due to online sources of information like OSNs, the reach of fake news has become global. This reach of fake news was best highlighted during the 2016 US presidential election which remains under investigation. During the Covid-19 pandemic, social media platforms have become a key forum for the dissemination of information rapidly. At the same time, the huge amount of health-threatening false information is spreading faster than the virus itself. Moreover, when the whole world was suffering from the pandemic, India went through communal riots that seemed to have been caused by disinformation. This paper highlights this recent event (as described in Section 6.1) which was portrayed as propaganda by a community in order to harm public interests. Clearly, the instances are evident that the motive of these platforms is to get users engaged to earn business revenues rather than providing factual information. Also, social media platforms suffer from echo chambers due to which users see their point of interests without dwelling on facts [20]. Thus, users get trapped in propaganda rather than following authentic news which many a time can also lead to a national crisis. The issue is complex, serious and multi-faceted. To this end, some of the highlights of this paper are: (1) Mainly, studies have considered politics as a domain to detect fake news. Our work introduces a multi-labelled corpus related to an event that happened due to disinformation. This can be used for the development of a cross-domain fake news detection model. Also, a complete annotation guideline has been provided to consider the authenticity and intention of the news; (2) Our work provides an integrated approach by combining two perspectives (style and social context-based); (3) Existing studies highlight various features used to differentiate disinformation from real news, but features inspired from fundamental theories are more explainable; (4) ANOVA statistical technique has been used to select significant features in order to distinguish disinformation from real and satire news. The selected features have performed well on our dataset. Figure 1 summarizes the approach followed in the paper.

1.1 The problem

The overarching goal of this research is to automate the process of finding the probability of a particular tweet being disinformation, satire or real news using an integrated approach on a check worthy and undiscovered domain. We define the null hypothesis and alternate hypothesis as follows: Null Hypothesis, H_o: There is no significant difference between means of features for disinformation, satire and real news.

Alternate Hypothesis, H_a: There is a significant difference between means of features for disinformation, satire and real news.

Since this is a multiclassification problem, ANOVA (Analysis of Variance) [15] statistical test has been performed on each feature as shown in Table 2. The p-value obtained for most of the features is less than or equal to 0.05, which indicates that we may reject the null hypothesis and accept the alternate hypothesis. This has been further explained in detail in Section 4.3. Therefore, we defined the problem statement as:

“Given a tweet feature matrix, popularity matrix, semantic matrix, multiclass partial label vector, and a generated secondary matrix, we aim to predict remaining unlabeled tweets vector.”

1.2 Key contributions

The literature highlights various research gaps which have been explored in this paper. Hence, following are the key contributions based on the potential research tasks in the future scope of various studies to improve the efficiency of the current fake news detection model:

C1. Intention-based detection of fake content The study considers authenticity as well as intention to measure fake content. Style-based features help to capture the intention. Also, intention depends on data labels. To the best of our knowledge, current studies have not provided clear guidelines of how annotators have manually evaluated text to find out the intention behind sharing political information. To consider the intent of social media posts, this paper describes the complete guidelines for manual annotation in Section 3.1.
C2. Integrated Approach The four perspectives described in the introduction of this paper are not independent of each other and it is highly desirable to predict fake news using features from multiple perspectives jointly. The style-based approaches capture intention but heavily depend on the writing style which thus varies with the domain, language and time. Thus, the paper has described the formulation of an integrated approach by combining social features and style-based features derived from fundamental social science theories. The features inspired by well-established theories are more explainable and helps to detect disinformation accurately.
C3. Cross-domain fake news analysis Current studies to detect fake news primarily consider politics as a domain. This paper presents an analysis of a recent incident related to communal national riots which got amplified due to disinformation spread on social media platforms. The work will form the basis to build a comprehensive fake news detection model as a part of ongoing research work.
C4. Identifying check worthy topics The topic or event is check-worthy if the content causes extensive debates on social media, relates to national affairs and has the historical potentials of being fake. In this paper, a check worthy topic described in Section 6.1 (i.e., case study) has been considered concerning the national crisis.
C5. Use of multilabel classification to find out the veracity of fake content To detect partially correct news, a multilabel classification is required. In this paper, we are predicting the probability of being fake which has further been used to scale the text from 1 to 5.

The remaining sections are structured as follows. Section 2 outlines the background of different terms related to false information, fundamental social science theories, an overview of various perspectives and the review of related works. Section 3 introduces the methodology and proposed framework. Section 4 presents the research experiments conducted to evaluate features iteratively and respective results with the intent to find the most suitable model. Section 5 states the benchmark studies and visualization. Section 6 gives the case study with limitations. Section 7 makes concluding remarks.

2 Background and related works

2.1 Important concepts related to fake news

The problem with social media posts leading to the National crisis is not that the information is completely false or certain events never happened. Rather, it is the misleading context presented in the posts with possibly an intent to harm which does most of the actual damage. Hence, a major focus is to find the intention of spreading false information. Literature provides different terms related to fake news such as misinformation, satire and disinformation and many more based on authenticity and intention. However, there is no universal definition available in the literature since it varies with the account of event [38]. Broadly literature has defined these terms based on intention and authenticity. Hence, false information with the intent of causing harm is called disinformation; False information with no intention to harm the audience is called misinformation; whereas information with the intent to entertain the audience and created for fun are called satirical news [33, 57, 60]. Furthermore, to develop a solid foundation for false news analysis, a clear definition is given below for each category which has been used for representation purpose in this paper:

$$ \begin{array}{@{}rcl@{}} label(t_{i})= \begin{cases} 0 & \text{if}\ t_{i}\ \text{is verified and unbiased}\\ 1 & \text{if}\ t_{i}\ \text{false and intention is to mislead}\\ 2 & \text{if}\ t_{i}\ \text{is false but intention is to entertain} \end{cases} \end{array} $$

For authenticity, the government should encourage credible sources of information without compromising on freedom of expression which itself is very challenging to ensure. The credible sources can be maintained by expert domains who check the authenticity of information manually. But it is impractical to manually check this voluminous data on social media against credible sources. Therefore, this paper aims at automating this process to flag data that has a high probability of being fake. Essentially, false information with an intent to harm is written in such a way that it can deceive the targeted audience. Hence, for the analysis of intent, different social science theories have been studied [62]. Although, the intent analysis does require some level of manual annotation and the accuracy of such annotations leads to accurate machine learning models.

2.2 Fundamental theories

The problem of detecting false information requires inter-disciplinary approaches derived from areas like psychology, philosophy, economics and others [61]. Therefore, this paper has identified fundamental social science theories which can be potentially used to understand the problem. In our work, we have mapped these theories to important features used in social media as shown in Table 1.

Table 1 Theories in Social Sciences helpful in deterring the spread of false information

Disinformation detection on social media: An integrated approach

Abstract

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Fake news on Social Media: the Impact on Society

A review on sentiment analysis and emotion detection from text

1 Introduction

1.1 The problem

1.2 Key contributions

2 Background and related works

2.1 Important concepts related to fake news

2.2 Fundamental theories

2.3 The four perspectives for detecting fake news

3 Research design and methodology

3.1 Dataset and annotation

3.2 Data preprocessing

3.3 Feature engineering

3.4 Proposed framework

4 Experiment results

4.1 Setup

4.2 Computational results using N-grams with TF-IDF

4.3 Computational results using Iterative Feature Selection

5 Benchmark observations and visualization

5.1 Comprehensive Model Test

6 Discussion

6.1 Case study: Tablighi Jamaat narrative

6.2 Limitation

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation