Keywords

1 Introduction

Various recent events, such as the COVID-19 pandemic or the European elections in 2019, were marked by the discussion about potential consequences of the massive spread of misinformation, disinformation, and so-called “fake news.” Scholars and experts argue that fears of manipulated elections can undermine trust in democracy, increase polarization, and influence citizens’ attitudes and behaviors (Benkler et al. 2018; Tucker et al. 2018). This has led to an increase in scholarly work on disinformation, from less than 400 scientific articles per year before 2016 to about 1’500 articles in 2019Footnote 1.

One initial challenge for this field of research is the definition and conceptualization of the phenomenon. Researchers have discussed different terms, including misinformation (non-intentional deception), disinformation (intentional deception), and mal-information (harmful content) (Wardle and Derakhshan 2017). Research often examined the phenomenon of disinformation, as it is relevant from a democratic theory perspective and can have serious societal consequences. The term refers to fabricated news reports and decontextualized information, but is sometimes also used in relationship with similar concepts, such as conspiracy theories, propaganda, or rumors (Freelon and Wells 2020; Tandoc et al. 2018). However, empirical research often fails to determine clearly whether false information was disseminated deliberately or inadvertently (Freelon and Wells 2020). Furthermore, it is argued that the term “fake news” is politized and used for different purposes in both scholarly articles and in the news (Quandt et al. 2019). Thus, Egelhofer and Lecheler (2019) suggest to differentiate between the “fake news label” used by politicians, for example, to discredit news media, and the “fake news genre” (e.g. fabricated news reports).

Recent communication research in the field of disinformation has mainly dealt with online and social media environments. Researchers argue that although the phenomenon is not new, it is seems to be precisely these environments where disinformation spreads massively because it can be easily disseminated by users.Thus, a large audience can be reached and possibly manipulated (Miller and Vaccari 2020; Vosoughi et al. 2018).

2 Common Research Designs and Combination of Methods

The concept of disinformation is studied across various disciplines, e.g. social sciences, computer science, medicine, or law. Within social sciences, surveys and experiments dominated in the last few years—presumably because of the societal need to answer urgent questions regarding exposure (Allcott and Gentzkow 2017; Grinberg et al. 2019; Guess et al. 2020), concerns (European Commission 2018; Jang and Kim 2018), or digital literacy and the ability to recognize disinformation (Pennycook et al. 2018; Roozenbeek and van der Linden 2019; Vraga and Tully 2019). Moreover, scholars frequently investigated connected concepts such as knowledge (Amazeen and Bucy 2019), traits and beliefs (Anthony and Moulding 2019; Petersen et al. 2018), or credibility of and trust in the news media and public actors (Newman et al. 2018; Zimmermann and Kohring 2020). Content analysis is used less frequently and studies conducting content analyses mostly use automated approaches or mixed methods designs (Amazeen et al. 2019; Chadwick et al. 2018; Grinberg et al. 2019; Guess et al. 2019). Those studies link survey data to digital trace data in order to examine who is exposed to disinformation, and which users interact with disinformation and for what reason. For example, Guess et al. (2019) link a representative survey to behavioral data on Facebook and identify age and political ideology as relevant factors explaining the willingness to share mis- and disinformation. Such approaches go beyond self-reported activities on social media but pose challenges in terms of storing personal data. Besides the methodological approaches, research frequently focuses on samples from the U.S. and analyzes social media platforms, such as Twitter and Facebook (Allcott et al. 2019; Bovet and Makse 2019; Grinberg et al. 2019; Guess et al. 2019; Ross and Rivers 2018). It should be noted that the samples are often a result of limited data access by platforms (Bruns 2019; Freelon 2018; Puschmann 2019) and that user data from other relevant channels such as messenger services (e.g. WhatsApp) are difficult to access for research (Rossini et al. 2020). Moreover, research on disinformation mostly focuses on specific issues and real-world events, such as election campaigns or times of crises. This poses challenges for the comparability and replication of studies.

3 Main Constructs

Despite the increasing interest in the subject, this is a rather young field of research. Therefore, the following research areas and analytical constructs should be understood as a current snapshot of recent years, and not yet as an overview of saturated research fields. However, the various approaches with content analyses can be summarized in five areas, which are neither exhaustive nor disjunctive. In these studies, the identification and operationalization of disinformation is a crucial part which often poses challenges. The identification of disinformation is of great interest since it is relevant for research in two ways: detection of disinformation for sampling purposes and the automated detection as an own object of research. Since these two objectives may overlap in the future, a distinction is made between manual (see 1) and automated identification (see 2) of disinformation.

1. Manual detection of disinformation. To manually detect and operationalize disinformation, research mainly follows two approaches, which Li (2020, p. 126) labels as story or source. Focusing on sources, studies are not primarily concerned with the falseness of information, but with the producers and publishers of false messages (Grinberg et al. 2019). In this context, several authors have argued that alternative right-wing media are potential disseminators of disinformation (Figenschou and Ihlebæk 2019; Post 2019). So far, research has predominantly focused on the US (Guess et al. 2018; Lazer et al. 2018; Nelson and Taneja 2018). To foster comparative research from a territorial (national information environments), as well as from a temporal (ephemerality of certain alternative media sites) perspective, key dimensions and typologies of alternative media need to be established (Frischlich et al. 2020; Holt et al. 2019).

The story-based approach for the identification of disinformation uses single false stories or lists of false claims (Allcott and Gentzkow 2017; Humprecht 2019) published by factchecking websites (e.g. snopes.com, politifact.com, factcheck.org), or news media (e.g. The Guardian, The Washington Post, Buzzfeed). Studies in this area (Al-Rawi et al. 2019; Ferrara 2020; Graham et al. 2020; Graham and Keller 2020; Hindman and Barash 2018; Metaxas and Finn 2019) analyze content on Twitter using hashtags referring to specific false claims (#pizzagate), events (#covid19), or a combination of both (#australienbushfires, #ArsonEmergency).

2. Automated detection of disinformation: Besides the manual detection of disinformation, automated approaches are frequently applied. Automated detection approaches are superior to manual approaches in terms of capacity, veracity, and ephemerality (Zhang and Ghorbani 2020). They also allow the identification of a multitude of possible actors as well as human or non-human distributers (e.g. social bots, Shao et al. 2018).

According to the extensive overview work by Zhang and Ghorbani (2020, p. 11), state-of-the-art research-based detection approaches are: Component-based (creator-, content-, and context analysis), data mining-based (supervised learning: deep learning, machine learning, unsupervised learning), implementation-based (online, offline), and platform-based (social media, other online news platforms) methods. The authors point at current challenges and future studies needed in the field of unsupervised learning, such as (i) cluster analysis to identify homogenous content and authors, (ii) outlier analysis of abnormal behavior of objects, (iii) embedding technologies of natural language processing as an important component of the detection processes (Word2vec, FastText, Sent2vec, Doc2vec), and (iv) semantic similarity analysis to detect near-duplicate content (Zhang and Ghorbani 2020, pp. 19–21). The latter is especially relevant regarding the identification of decontextualized information.

3. Dissemination of disinformation: Exploring the dissemination and spread of false messages presents another strand of research, especially in the social sciences. In research on the dissemination of disinformation, three main foci can be distinguished: 1) diffusion, 2) amplification and 3) strategy. (1) Research focusing on diffusion examines the spread of false messages and focuses on the amount of user interactions with “fake sources” or “fake stories” over time (Allcott et al. 2019). In addition, research explores what types of content users are sharing across different countries (Bradshaw et al. 2020; Marchal et al. 2020; Neudert et al. 2019). To further investigate the origin and variations of content, studies compared the original sender (incl. implemented links, e.g. from alternative media sites) and modifications of the content of false and real stories with evolutionary tree analysis (Jang et al. 2018), or analyzed if initial publications reappear and if they modify over time using time series analysis and text similarity (Shin et al. 2018). Moreover, studies using content or network analysis frequently analyze the extent and reach of disinformation. Those studies identify an increased potential exposure towards disinformation among ideologically homogeneous and polarized communities (Bessi et al. 2016; Del Vicario et al. 2016; Hjorth and Adler-Nissen 2019; Schmidt et al. 2018; Shin et al. 2017; Walter et al. 2020). (2) Regarding amplifications, research investigates how dissemination processes can be driven or amplified by the news media. Research seem to agree on the agenda-setting power of false messages and alternative media sites (Rojecki and Meraz 2016; Tsfati et al. 2020), but some studies also relativize its influence (Vargo et al. 2018). (3) Research on strategic coordination, for example, investigates how rumors actively turn into a disinformation campaign by applying document-driven multi-site trace ethnography (Krafft and Donovan 2020). Another study by Keller et al. (2020) focuses on actors and the identification and validation of astroturfing agents. Lukito (2020) uses a different approach and investigates the temporally coordination of a disinformation campaign, namely IRA activities, with time series analysis (2015–2017) on several social media platforms (Facebook, Twitter, Reddit).

4. Content of disinformation: Another strand of research focusses on the content of disinformation. Studies in this area have been conducted in various filed, including political communication, health and science communication (Brennen et al. 2020; Wang et al. 2019). The aim of these—so far few—studies is to identify different types of disinformation. Thus, this research area is of crucial importance for the conceptualization of disinformation. The two studies presented here both find and highlight that online disinformation is not only a technology-driven phenomenon, but additionally defined by partisanship, identity politics and national information environments (Humprecht 2019; Mourão and Robertson 2019). Humprecht (2019) qualitatively identifies different types of online disinformation and finds cross-national differences by quantitatively analyzing topics, speakers, and target objects of fact-checked disinformation articles. Mourão and Robertson (2019) investigate sensationalism, bias, clickbait and misinformation of “Fake News”-sites and analyzed which articles and components triggered engagement on social media.

5. User participation: Digital interactions of users on online or social media can be investigated not only in terms of what content attracts attention and spreads (for example via the number of shares/retweets), but also in terms of how people respond to certain messages in terms of liking or commenting. Barfar (2019) analyzed emotions, incivility and cognitive thinking in the comments of Facebook posts by automated text analysis (using the Linguistic inquiry and word count dictionary, see Pennebaker et al. 2007) and compared posts with true and false claims. Additionally, a study by Introne et al. (2018) examines online discussions within specific issue-related forums on a website called “abovetopsecret”. The authors applied discourse-, narrative- and content analysis in order to investigate how false narratives are constructed.

4 Research Desiderata

Research on disinformation has considerably increased in recent years, but there are still some significant gaps, e.g. in terms of comparative research, dimensions and typologies, and methods.

Debarking from the 2016 presidential elections in the U.S., many studies have focused on the role of disinformation in the context of this election campaign. As a result, findings were generalized without considering the different political and media opportunity structures in the individual countries. However, some notable exceptions show that significant differences exist between countries regarding various aspects of disinformation (Humprecht et al. 2020; Neudert et al. 2019). In order to enable comparative research, established criteria for sampling of disinformation sources are needed, e.g. of alternative media that potentially disseminate disinformation.

An important contribution to the state of research would also be the investigation of key features of disinformation by means of content analyses, e.g. types and forms of presentation of disinformation. This would both enable reproducibility and strengthen the theoretical discourse (Freelon and Wells 2020).

Moreover, research has largely neglected to role of news media in the dissemination of disinformation (Tsfati et al. 2020). It has been argued that news media act as multiplicators for disinformation, e.g. by republishing social media posts of political actors. Another promising, but not yet sufficiently researched aspect is the use of the term “fake news” by political actors to discredit the media. This is an important aspect against the background of increasing polarization and mistrust in news media in many countries (Egelhofer and Lecheler 2019).

Methodologically, the detection of disinformation is probably the greatest challenge for current research. To be able to compare the extent of the spread of disinformation between different platforms and countries, established dictionaries and identifiers are needed. More importantly, researchers need better access to the data of platform operators. For future research, it would be desirable that platform operators cooperate with researchers and make the data available in a transparent manner so that the production and dissemination of disinformation can be scientifically traced.

To sum up, the field of research on disinformation offers great potential for content analysis research. Moreover, combinations of automated and manual content analysis could be very fruitful with regard to the research gaps mentioned above.

Relevant Variables in DOCA—Database of Variables for Content Analysis