COCO: an annotated Twitter dataset of COVID-19 conspiracy theories

Langguth, Johannes; Schroeder, Daniel Thilo; Filkuková, Petra; Brenner, Stefan; Phillips, Jesper; Pogorelov, Konstantin

doi:10.1007/s42001-023-00200-3

COCO: an annotated Twitter dataset of COVID-19 conspiracy theories

Dataset/Software
Open access
Published: 04 April 2023

Volume 6, pages 443–484, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Computational Social Science Aims and scope Submit manuscript

COCO: an annotated Twitter dataset of COVID-19 conspiracy theories

Download PDF

Johannes Langguth^1,2,
Daniel Thilo Schroeder ORCID: orcid.org/0000-0003-0125-5243^1,5,
Petra Filkuková¹,
Stefan Brenner³,
Jesper Phillips⁴ &
…
Konstantin Pogorelov¹

3675 Accesses
7 Citations
4 Altmetric
Explore all metrics

Abstract

The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.

COVID-19’s (Mis)Information Ecosystem on Twitter: How Partisanship Boosts the Spread of Conspiracy Narratives on German Speaking Twitter

An augmented multilingual Twitter dataset for studying the COVID-19 infodemic

Article 20 October 2021

Characterizing partisan political narrative frameworks about COVID-19 on Twitter

Article Open access 30 October 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The COVID-19 pandemic severely affected the entire world, and consequently it has dominated world news and social media throughout years 2020 and 2021. Along with this media attention, an abundance of misinformation has swept through social media [1]. The pandemic has demonstrated the crucial role that misinformation plays when societies are faced with unfamiliar circumstances, and how highly implausible claims can have a dramatic real-world impact. While initially there was a great deal of genuine uncertainty about the origin of the virus, its effects, and the vaccines, with different experts supporting different assertions, a large number of ideas that are scientifically impossible or highly implausible were promoted by non-experts on social media and other channels. Many of these ideas took the form of conspiracy theories which provided easy explanations for the complex medical and societal events occurring during the COVID-19 pandemic, usually in the form that events happen due to the hidden influence of some prominent individual or group [2].

We use the common term conspiracy theories for narratives that consist of disproved or unproven accusations against any individual or any group perceived as powerful to give an explanation for impactful economic, cultural, social, political, or other events by utilizing claims of clandestine malevolent schemes [3, 4]. While believes in paranormal powers, supernatural entities, or pseudoscience may also be a part of these narrations, we focused on clandestine malevolent schemes to cluster related conspiracy theories into categories. The spreading of conspiracy theories increased substantially during the COVID-19 pandemic [5, 6] and they were among the most prominent misinformation phenomena during that time. For that reason, our dataset focuses on conspiracy theories. The more narrow focus allows a more precise characterization of the contents that was being spread.

To a large extent, misinformation such as conspiracy theories is ultimately inconsequential, but some of it has the potential to cause real-world harm and due to the massive amount of social media contents, it is essentially impossible to find all harmful misinformation manually. Thus, conventional fact-checking can typically only counteract misinformation narratives after they have gained significant traction. To provide warnings in advance, automated systems are needed. However, the automatic detection of misinformation narratives is very challenging. The texts that spread misinformation may be short messages on Twitter and they often transmit misinformation by relying on context and implication rather than by stating counterfactual information outright, and satirical messages complicate the issue further.

To train automated systems, several different misinformation datasets have been released in the past. However, most have only true/false annotations, such as the ISOT dataset [7], or annotations on a scale from true to blatantly false, which is the case for the LIAR dataset [8]. Training on these datasets will not enable a machine learning model to distinguish between different misinformation narratives. This distinction is important because during the COVID-19 pandemic, many different misinformation narratives were promoted on social networks, some of them being related and some contradicting each other. To train machine learning classifiers to distinguish between narratives, we created a quality-controlled, human-annotated dataset of 3495 annotated tweets which we present in this paper. We created a total of 12 categories of conspiracy theories and labeled each tweet as belonging to one of three classes in each category for a total of 41,940 labels.

Furthermore, we give a detailed qualitative and quantitative description of the contents of the dataset and the resulting conclusions on misinformation during the COVID-19 pandemic. Due to the complexity of the multi-category annotation, understanding the contents can be helpful for further evaluation of the results of natural language processing (NLP) systems. While the primary purpose of the dataset is to train NLP models capable of detecting stances and distinguishing topics, it can also be used for further quantitative and qualitative investigation of misinformation narratives.

Dataset creation

The dataset was created in a multi-stage process, starting with the raw dataset which was created by collecting a large number of tweets related to the COVID-19 pandemic from Twitter between January 17, 2020 and Jun 30, 2021. We used the Twitter search API via our custom distributed Twitter scrapping framework called FACT [9] and targeted COVID-19 related keywords. The list of keywords is given in the Appendix. Note that this data collection is a long-running project. Therefore, the collection was not specifically geared towards the dataset described in this paper. The raw dataset contains approximately two billion statuses (i.e. tweets, retweets, quotes, and replies). We first removed retweets, quotes, and replies, leaving over 400 million tweets.

Tweet selection

Since conspiracy tweets are not particularly frequent, random sampling of the data would result in a very low number of such tweets as the number of tweets that can be labeled manually is limited. To avoid that, we use a list of keywords related to conspiracy theories and perform a text search. During the COVID-19 pandemic we observed misinformation trends and developed the list. Some keywords were chosen based on previous knowledge of conspiracy topics [10, 11], while others were added because they were widely discussed, and a few were discovered in other misinformation tweets and subsequently added to the list [12].

This second list of keywords is also given in the Appendix. By applying it as a filter, we narrowed the selection to slightly more than one million tweets. We then removed tweets that contain hyperlinks. This was done because the tweet set was used during the MediaEval multimedia evaluation challenge in 2021 [13], where using the links could distract from the goal of the challenge, i.e. natural language understanding via machine learning systems.

Hyperlinks can be very valuable for understanding the intent of a tweet, and this technique has frequently been used in previous work [14, 15]. However, our goal is to work towards the understanding of language rather than links. Furthermore we feel that focusing on tweets containing links may introduce bias in the analysis since links generally represent information that the users saw elsewhere, while tweet texts represent information that the users formulated themselves, even though their ideas may have been influenced from other sources. Investigating such text allows a much clearer view at the evolution of narratives over time. About half of the selected tweets contained no links.

Table 1 Number of Tweets in the different dataset preparation stages

Full size table

For the remaining tweets, we attempted to resolve the self-reported location of the tweet authors. Location can be highly useful, especially since many tweets refer to the politics of the country of the author. We make use of a system to resolve locations from previous work [12]. The system is described in the Appendix. We then removed the tweets for which the location could not be resolved. This again cuts the number of tweets approximately by half. Among those, we select tweets that have a high number of characters since inferring narratives from very short tweets is impossible. This leaves a set of about 100,000 tweets. Finally, we randomly selected 3495 tweets and performed the manual labeling. The selection was done in a way that ensures that a constant proportion of the tweets was selected from every day in the dataset, to ensure an even distribution and to account for the fact that the daily number of COVID-19 related tweets was much higher in Spring 2020 than during the later stages of the pandemic. Table 1 shows the exact numbers for each step.

Manual labeling

We created 12 labels, one for each category of conspiracy theory. The categories are describe in the section "Categories". The labeling was performed by a diverse group of staff scientists, postdocs, and graduate students in computer science, media studies, and psychology. Since many tweets constitute corner cases and are difficult to label, we ensured reliability of the dataset by having three separate annotators for each tweet. Annotators were issued an initial description of the categories, which is contained in the appendix. The annotators also met regularly to discuss their understanding of the categories.

Each label is the result of a majority vote among the three annotators. In case of a triple disagreement, which can happen since there are three annotators and three classes, the project leader broke the tie. Thus, the dataset was created using 36 human annotations per tweet for a total of 125,820, with the final dataset having 41,940 consolidated annotations.

The inter-annotator agreement was 92.27% on average, varying between 98.11% and 85.61% for all categories except for the catchall category Other conspiracy theory where it was 75.85%. Because there are twelve categories, disagreement on at least one of them was quite frequent, occurring in 55.68% of all tweets. Inter-annotator agreement for each category is listed in Table 2.

We used a custom web–based annotation tool to make the labeling as efficient as possible. The tool also handled multiple annotations and voting automatically. No additional information, e.g. other tweets by the same user, was taken into account during the labeling. The reason for this is to ensure the usability of the dataset to train NLP systems based on the available text and labels alone.

Classes

We created ten different categories of conspiracy topics, which are described in the section "Categories". In addition, we defined two unspecified categories to label other conspiracy theories and other misinformation. For each of the 12 categories, a tweet is labeled as belonging to one of the following three classes. Thus, every tweet has 12 separate labels which can be one of the following:

1.
Unrelated The tweet is not related to that particular category. Such tweets contain conspiracy related keywords, but use them in a completely different context.
2.
Related (but not supporting) The tweet is related to that particular category, but does not actually promote the misinformation or conspiracy theory. Typically the authors of such tweets point out that other believe in the misinformation.
3.
Conspiracy (related and supporting) The tweet is related to that particular category, and it is spreading the conspiracy theory. This requires that the author gives the impression of at least partially believing the presented ideas. This can be expressed as a statement of fact, but also in other ways such as by using suggestive questions. It includes statements which present the misinformation as uncertain but possible or likely for statements of fact that are impossible or highly unlikely, such as microchips contained in vaccines.

Since our focus lies on detecting intentions contained in the wording, we do not consider the Related (but not supporting) category to be spreading misinformation. Of course, based on the mere exposure effect [16, 17], which implies that even talking about misinformation can make it more likely for people to believe in it, a different definition is possible. In this case, the task to detect spreaders of misinformation would be far easier for natural language processing systems, since intention in this classification would not be relevant. However, to identify e.g. spreaders of disinformation, intention is important and thus it is essential to distinguish between the Related and Conspiracy classes.

While each tweet has a label in each category, in the following, we also classify entire tweets as this allows better descriptive statistics of the dataset. We consider a tweet to be a conspiracy tweet if at least one of the categories was labeled as conspiracy for it. Tweets that have no conspiracy label are considered related if at least one of the categories was labeled as related. Thus, a tweet is classified as unrelated only if it was labeled as unrelated for all twelve categories.

Quantitative dataset description

Table 2 The number of times each label was assigned

Full size table

In this section we give a quantitative overview over the dataset. We start with the number of times each label was assigned, which is given in Table 2. Overall refers to the classification of each entire tweet, as described in the section "Classes". Thus, it is the number of tweets that were assigned at least one conspiracy class label, at least one related class label but no conspiracy label, or only unrelated labels, respectively, and not the sum of the previous entries in each column. Naturally, most tweets are unrelated to most categories, but since a tweet is considered a conspiracy tweet if it promotes misinformation in any category, 1797 out of 3495, i.e. 51%, belong to this class.

Connections between keywords and categories

We give an overview over the connections between pairs of keywords, pairs of categories, and keyword-category pairs in several tables in the appendix. Table 4 shows the number of times keywords are mentioned in the same tweet. Since this is based on text search alone and requires no manual annotation, we extend the search to the Contain no link set mentioned in Table 1. We restricted the table to the 36 keywords with a meaningful number of occurrences and co-occurrences. We observe that especially the QAnon-related keywords have a substantial number of co-mentions.

Next, we show a similar statistic for the categories in Table 5. It illustrates which classes frequently occur together, such as Anti vaccination and Behavior control or Intentional pandemic, New world order, and Depopulation. In the section "Goal narratives" we discuss how these combined categories create specific conspiracy ideas.

Table 6 contains a combination of the above two statistics, showing how often conspiracy tweets from each category contain the different keywords. Some of these connections are obvious since the keywords are identical or almost identical to the category name, but others are more unexpected. For example, the word plandemic is used in both the Fake virus and Intentional pandemic category, but it has a different meaning in there. The numbers can also be used to gauge how much the use of the keywords is correlated with tweets carrying misinformation.

Location analysis

As stated in the section "Tweet selection", all tweets contain a self-reported location which we transform to a country/state pair that can be evaluated automatically. The technique is based on querying the Google geocoding API. It was used and explained in previous work [12]. The left side of Fig. 1 shows the global results. As the keywords we used are predominantly based on misinformation narratives from the US, e.g. QAnon, it is expected that more than two thirds of the tweets come from there. Furthermore, since the keywords are in English, only English-speaking countries appear frequently among the locations. While the US has the most tweets per inhabitant (7.2 per million), Canada is not far behind with 5.8, followed by UK and Ireland with 4.8, and Australia and New Zealand with 3.3. India, Nigeria, and South Africa have a far lower rate. This is also expected since these countries have lower Internet and Twitter usage, and they frequently use languages other than English. All other countries have even lower rates, and thus hardly affect the overall picture.

Due to the strong US focus, we also analyze locations at the state level, as shown on the right side of Fig. 1. About 14% of the US users did not specify a state. For the rest, the number of tweets follows quite closely the population size of the state, with the two notable exceptions being Arizona and the District of Columbia, which has about 2% of the US tweets despite its small population (about 0.2% of the US total).

In Fig. 2 we provide the same statistics for the conspiracy tweets only. We observe that among these, the US is even more dominant (72 vs. 68%) while India, Nigeria, and South Africa are less represented. This is to be expected since the conspiracy narratives are focused on the US.

Among the larger US states, only Florida shows a meaningful difference compared to the overall numbers (7 vs. 6%), and South Carolina and Missouri make it to the top 12 instead of Illinois and Colorado, with South Carolina having the highest rate of conspiracy tweets (28 out of 34). Considering that most of the narratives are pro Trump/Republican and anti-Democrat, it is to be expected that Republican-leaning states have a higher rate of conspiracy tweets, but the data does not show a consistent effect here. More noticeable is the fact that while the percentages for the larger states are almost the same in Figs. 1 and 2, among the conspiracy tweet authors far fewer specify smaller states (27 vs 34%) and far more only specify the US (20 vs 14%). The total number of users covered here is 3094, which means that most users only wrote one tweet in the dataset.

Distribution over time

Finally, we show the distribution of the categories over time. Figure 3 shows the fraction for the conspiracy tweets on a monthly basis. Table 7 in the Appendix gives the corresponding absolute numbers.

We observe that Anti vaccination is by far the most prominent topic, and it remains relatively consistent in size. New world order is also quite large and consistent. Both topics have had a sizeable presence before the COVID-19 pandemic. Therefore, it is not surprising that they are quite prominent in early 2020, before the pandemic fully arrived in the US. Other topics such as Depopulation or Intentional pandemic gain popularity during the pandemic. However, Depopulation seems to shrink in 2021.

We perform the same analysis for the related tweets. Figure 4 shows again the fraction on a monthly basis while Table 8 in the Appendix gives the corresponding numbers.

The picture is quite different here, with Intentional pandemic, Harmful radiation, and Depopulation being much more present than other categories. Clearly, there is a difference between the topics discussed by proponents of conspiracy theories and other Twitter users. For topics that vary widely over time, one might expect a time lag where tweets that are related to categories continue to appear long after the topic lost interest among conspiracy circles. Harmful radiation would be a good example since the topic quickly lost popularity among western Twitter users [12] in the second half of 2020.

However, since we did not include 5 G as a keyword, this hypothesis cannot be confirmed from our dataset. Also, note that the individual numbers by month and category are very small and do not constitute a basis for statistically robust analysis.

Qualitative analysis of the narratives

The spread of many conspiracy ideas differs from a typical information cascade because the information mutates along the way. Thus, each of the categories in the section "Categories" which we use for quantitative study has a large number of variations to the exact narrative. It is not feasible to study these quantitatively, but they can be investigated qualitatively. Thus, the objective of this section is to provide more detail on the exact narratives that are frequently found in the tweets of this dataset.

An important feature of many categories of conspiracy narratives is that they can contain mutually exclusive narratives without seemingly weakening the impact of the category as a whole. This was observed for 9/11 conspiracy theories [3], as well as for 5 G-COVID misinformation [12].

Conspiracy theories need a perpetrator, means, and a goal, although sometimes one of the components can be rather nondescript. Among the COVID-19 conspiracies, means were quite prominent, and the first six categories deal primarily with means. On the other hand, Depopulation and New World Order are goals, and typically some aspect of COVID-19 or the vaccines are the corresponding means. Satanism ostensibly identifies a perpetrator, i.e. satanists, although this carries little weight since anyone could secretly be a satanist. The category focuses at least as much on means, i.e. rituals involving harm or abuse of children. Of the remaining three categories, Other conspiracy theory collects previously known conspiracies which are a mix of means, e.g. chemtrails, perpetrators, e.g. deep state, and goal, e.g. great replacement. See Moffit et al. for a more detailed discussion on the structure of conspiracy theories [5]. The Esoteric misinformation category is unclear, i.e. it does not present identifiable common narratives, and Other misinformation does not follow this structure at all. Thus, for training machine learning models, we recommend to exclude the last there categories since they do not provide identifiable narratives.

Goal narratives

Political goals

Among the most common suspected agendas is the idea that the pandemic serves to prevent Donald Trump from being reelected. This is typically paired with Fake virus narrative, claiming that a nonexistent pandemic is used to create a state of emergency, and sometimes it is also paired with the Intentional pandemic narrative. Here the claim is that China in collusion with the Deep State or individuals such as George Soros, released the virus to create the state of emergency. However, the opposite idea, i.e. that the state of emergency is actually a means to keep Donald Trump in power, was also present although it disappeared later in the year 2020. Many political tweets make reference to QAnon and related terms, and some also to the Trump campaign slogans MAGA and KAG (Make America Great Again/Keep America Great).

A less concrete agenda appears in the New world order category. Here, the state of emergency and the control of the behavior of the population is intended to bring about the New World Order, which is sometimes described as socialist. Infrequently, it is referred to as one world agenda or great reset, as introduced by the World Economic Forum [26]. In cases where the alleged perpetrator is the Chinese leadership, the alleged goal is often to hurt the US or the western world.

As US users are the plurality of the authors of the investigated tweets, ideas concerning politics in other countries were less common. Thus, it is more difficult to establish recurring narratives. For example, the search term population control appears frequently in India, but there it refers to the population control bill [27] rather than a Depopulation conspiracy theory.

Depopulation goals

The primary goal besides the political ones is depopulation, for which we created the Depopulation category. The narrative is sometimes straightforward: the perpetrators created an Intentional pandemic with the goal of reducing the world population. A similar narrative relies on the Fake virus and Anti vaccination idea, claiming that COVID-19 is either harmless or non-existing, but the public concern about it serves to pressure people to accept a vaccine which is deadly. The second version was more common, often with Bill Gates as the alleged perpetrator.

In developing countries, the idea of population growth control via infertility caused by a vaccine has been relatively common [28], especially with the goal of reducing population growth of specific ethnic groups. However, in developed countries population growth has all but stopped, making it much less of a public concern. Since the dataset is US/UK focused, infertility narratives were rare.

Financial goals

Some conspiracy theories claim financial motivations of the perpetrators, although they are far less frequent than political goals. They do appear regularly in Suppressed cures narratives, and sometimes as a motivation for an Intentional pandemic with the aim of earning money on vaccines with the alleged perpetrator usually being Bill Gates.

Imaginary goals

In addition to the above, some conspiracy theories suggest goals which are scientifically impossible. The most prominent are Mind Control narratives which we sorted in the Behavior control. Another impossible goal is contained in the Adenochrome narrative, which claims that the substance is harvested from children to prolong the life of older members of the elite. We classified this narrative under Satanism as it resembles ritual murder narratives and the authors sometimes refer to it as satanic.

There is considerable speculation about the motivations for belief in fictitious conspiracy theories. One common interpretation is that such ideas are meant figuratively [29]. For example, from this viewpoint the Adenochrome narrative could signify that older people benefit from anti-COVID measures such as lockdowns in the form of reduced risk of death from COVID-19, while the younger people predominantly pay the price in the form of lost school education or work income. However, understanding the motivations of the authors of such tweets is beyond the scope of this paper.

In the context of 5 G-COVID, a substantial amount of Esoteric misinformation was found in previous work [12] which was suggesting imaginary goals. While some of the keywords we used here cover similar topics (especially the mind control narrative), such posts were exceedingly rare in this dataset.

Means narratives

Fearmongering

For the political goals described in the section "Goal narratives", the most common alleged means was the idea that the perpetrators create unfounded fear in the population to attain their goals, using narratives from the Fake virus category. Typically, they claim that there is no (SARS-COV-2) virus, and that the perpetrators use fear of COVID-19 to make the population act according to their designs. Less often, conspiracy theories claim that fear mongering happens via an Intentional pandemic. Here, the authors do not doubt the COVID-19 fatalities, but claim hat they are a side effect and that e.g. widespread lockdowns as a result of fearing COVID-19 is the intended effect.

A common but weaker version of the Fake virus narrative was false reporting of COVID-19 numbers. The authors of such tweets do not deny the existence of COVID-19, but claim that the number of victims is far lower than the official numbers suggest, either via direct manipulation by the government, or by an alleged financial incentives for hospitals to misreport deaths as COVID-19 related. This is often combined with the claim that the remaining cases are caused by a seasonal flu rather than a pandemic.

Some related tweets containing counterstatements claimed that supporters of Donald Trump changed their message from denying the existence of COVID-19 as an independent pandemic to claiming that numbers were manipulated, which seems to be the case in our dataset.

Vaccines

Vaccines are the primary means for depopulation, financial, and many imaginary goals. Similar to 5 G, which had substantial opposition prior to COVID-19 [12], opposition to vaccines has been quite substantial before the pandemic [30]. Sometimes, they claim that Fearmongering or an Intentional Pandemic is used as a means to persuade people to accept the vaccine. In that case people taking the vaccine becomes a goal rather than a means.

Suppressed cures

Tweets discussing Suppressed cures conspiracy theories were quite infrequent. We found two recurring categories: The first deals with Hydroxychloroquine (HCQ), which was initially considered an effective treatment for COVID-19 in some countries [31] and popularized by Donald Trump [32]. Several countries that did use it ceased to do so after clinical trials showed high risks and low effectiveness [33]. The conspiracy theories claim that HCQ was abandoned either because it is not profitable for the pharmaceutical companies or to encourage people to accept dangerous vaccines instead. Thus, such narratives posit usually financial and sometimes depopulation goals. More rarely, they suggest imaginary mind control goals or support for fearmongering. The idea here is that by removing effective medications, people are more likely to be afraid of COVID-19. This narrative however is relatively rare.

In addition to HCQ, suppressed cures narratives for colloidal silver, which is an alternative medicine product, are being used to promote it as a "secret" miracle cure. Such narratives posit the same goals as other Suppressed cures tweets. However, the motivation of the tweet authors is likely to promote ineffective medications that they themselves are selling.

5 G, magnetism, microchips, and tagging

Imaginary goals are generally accompanied by imaginary means, i.e. means which have no scientific basis for functioning. One of the most common narratives in the dataset is the idea that COVID-19 vaccines contain microchips (and sometimes nanochips or Smartdust [34]). These chips either allow the perpetrator to control the mind of the recipients or allow tracking them via radio frequency identification (RFID). This idea is sometimes connected to the ID2020 digital identity provider.

Note that imaginary means are different from Suppressed cures, since it is conceivable that existing medications are effective against COVID-19, but imaginary means have no conceivable way of working in reality.

The tracking idea is common enough to have spawned tweets containing counter statements. Typically they observe that the ubiquitous smartphones already track their user’s location, which makes tracking via implanted chips obsolete.

For the outlandish idea that COVID-19 vaccines render the user magnetic, we only found counter-statements. It is likely that this idea commanded much more attention among users of mainstream media than among proponents of conspiracy theories.

Intentional pandemic

The main narrative in this class claims that COVID-19 is a bioweapon developed in Wuhan that was released intentionally, either to reach a political goal (usually with George Soros, Anthony Fauci, the Deep State, or the Chinese leadership as the perpetrator), or with the aim of depopulation (usually by Bill Gates). There is a substantial variety in the exact story, but due to its concrete focus on Wuhan, bioweapon, and recognizable perpetrators, this represents one of the most consistent narratives found in the dataset.

A somewhat weaker form of the Intentional pandemic narrative asserts a failure to act, either on the part of the Chinese leadership for not warning the world adequately of the developing pandemic or by Donald Trump w.r.t. to the US pandemic response. What makes these statements relevant in the context of conspiracy theories is that they assert malice on the part of the acting entity. Some of these tweets contain extreme statements, such as "Trump murdered 150,000 people".

Mark of the Beast

A frequent narrative involving the Satanism category is the Mark of the Beast, which refers to a passage from the Book of Revelation which reads: He causes all, both small and great, rich and poor, free and slave, to receive a mark on their right hand or on their foreheads, and that no one may buy or sell except one who has the mark or the name of the beast, or the number of his name. [The Bible][Rev 13:16–17]. The mark is associated with proof of vaccination, which in some countries was required to enter stores during lockdowns in 2021 [35]. Some conspiracy tweets refer to the mark as implanted chips rather than the proof of vaccination systems that were actually used. In either case, putting COVID measures into an eschatological context and calling them a tool of the devil provides a narrative that justifies opposition to the measure. The mark is always presented as a means for exerting control over the population.

Perpetrators

Deep State

The Deep State turned out to be one of the most frequent perpetrators. Many tweets that contain QAnon or related keywords mention it, usually with political goals using Fearmongering as a means. While the Deep State is generally not explained further within the tweets, it is often linked to, or even used as a proxy, for the Democratic party in the US.

George Soros and globalists

George Soros is a frequent target of right-wing conspiracy theories [36]. In our dataset, he was mostly mentioned as the perpetrators of political goal conspiracies, such as plots to prevent the reelection of Donald Trump, or the establishment of a New world order. He is usually mentioned along with the Deep State. The term globalists is often used in conjunction with Soros, or sometimes as perpetrators of similar conspiracy narratives.

Anthony Fauci and Bill Gates

Both names appear frequently as perpetrators of the alleged conspiracies. While Soros was a keyword in our search, Bill Gates and Anthony Fauci were not but they appeared frequently anyway. Bill Gates is usually the alleged perpetrator of Anti vaccination and Depopulation conspiracies. These conspiracy ideas are widely known [37]. While the focus on vaccines and depopulation in tweets mentioning Gates can be explained by the work of the Bill and Melinda Gates foundation, the association with microchips is less obvious. We suspect that among multiple narratives, it had greater fluency [38] due to the strong association between Gates and the word Microsoft.

Fauci is typically mentioned in connection with the Wuhan Institute of Virology, as the alleged sponsor or initiator of the development of SARS-COV-2 (usually referred to as a bioweapon in such tweets), usually acting on behalf of the Deep State.

Donald Trump

As mentioned in the section "Goal narratives", some conspiracy theories suspected that COVID-19 is a plot to ensure that Donald Trump would remain US president after 2020. More extreme statements claim that Donald Trump intentionally let COVID-19 spread, thereby intentionally letting a large number of US citizens die. These however are relatively rare. Conspiracy theories focusing on Donald Trump were the only recurring narrative among the rare pro-Democrat conspiracies.

China

China and the Chinese leadership is frequently mentioned as a perpetrator in Intentional pandemic narratives. Sometimes the claim is that China was working together with Gates, Soros, or the Deep State. Other tweets claim that China was acting alone in order to damage the western world. Furthermore, China is frequently accused of having developed a bioweapon (i.e. COVID-19) in the dataset. However, we did not count such tweets as spreading an Intentional pandemic narrative unless they also claim that the bioweapon was released on purpose.

Powerful organizations

Sometimes groups or organizations that are perceived as influential or powerful appear as perpetrators. These include the Illuminati, Freemasons, the Rockefeller Foundation, the World Economic Forum, and the Rothschild family. They usually appear as perpetrators of Intentional pandemic or New world order conspiracies. However, in this dataset they far less common that the alleged perpetrators mentioned above.

Aliens

Also commonly connected to conspiracy theories [39], a small number of tweets (less than \(1\%\)) makes reference to aliens. However, they do not promote a unified and recognizable narrative.

Table 3 Number of conspiracy tweets by category mentioning the frequent alleged perpetrators

Full size table

Counting perpetrator mentions

We count the number of conspiracy tweets mentioning the frequent perpetrators and show the numbers by category in Table 3. The names are based on the case-insensitive search string. While Trump appears most often, both Donald Trump and China are often mentioned in other contexts than being perpetrators of a conspiracy. Thus, Bill Gates is most often listed as a perpetrator. His sum/total ratio is also the highest, which means that he is frequently associated with multiple conspiracy theories, typically Anti vaccination, Behavior control, Depopulation, or Intentional pandemic.

Conspiracy detection

The presented dataset served as the basis for the MediaEval Challenge 2021. MediaEval is a benchmark that provides standardized task descriptions, data sets, and evaluation procedures for the multimedia research community. The benchmark aims to make possible systematic comparison of the performance of different approaches to problems of retrieving or analyzing multimedia content. The goal is to identify state-of-the-art algorithms and promote research progress. In the following, we summarize the most important results of the MediaEval FakeNews: Corona Virus and Conspiracies Task 2021 [13]. The task includes three subtasks.

The first subtask is text-based fake news detection. Here, participants are asked to build a multi-class classifier that can flag tweets that promote or support the presented conspiracy theories.

The second subtask is the detection of conspiracy theory topics, where the goal is to build a detector that can detect whether a text refers to any of the predefined conspiracy topics.

The third subtask is the combined misinformation and conspiracy detection, where the goal is to build a complex multi-labelling multi-class detector that can predict whether a tweet promotes or supports a particular topic from a list of predefined conspiracy topics.

Despite a large number of promising results [40,41,42] and partly creative approaches [43], the transformer-based approaches [44,45,46], particularly CT-BERT [47], performed the best. In the following, we briefly summarize the results of the winning group [48]. The proceedings of the MediaEval Challenge 2021 including the work of all participants is available at https://ceur-ws.org/Vol-3181/.

The authors evaluated three different approaches for each of the subtasks 1, 2, and 3. First, a term frequency-inverse document frequency based approach in which the features were subsequently fed into different supervised learning algorithms. In Task 1, the classifiers were used in a multi-class asset. In the multilabel case of Task 2, the authors used a multi-output classifier.

Second, pre-trained language models that are then fine-tuned on the task of Natural Language Inference were leveraged. Or in other words, given two statements (a premise and a hypothesis), these models are trained to classify the logical relationship between both of them: entailment (agreement or support), contradiction (disagreement), or neutrality (undetermined).

Thirdly, the authors proposed using transformer-based models, specifically RoBERTa and COVID-TwitterBERT to perform classification with a weighted Cross Entropy loss function.

All the models were evaluated on a stratified 5-fold cross-validation set and then evaluated on the test set. Furthermore, Transformer-based approaches delivered the best results. Here, CT-BERT delivered the most competitive results with an Matthews correlation coefficient of 0.720, 0.774, and 0.775 for tasks 1, 2 and 3.

Related work

In the last four years, a significant body of work has proposed methods for automatic fake news detection. The work covers a wide range of approaches, including knowledge graphs and spreading models in addition to natural language processing.

Perez-Rosas et al. [49] present a systematic approach for detecting fake news using natural language processing techniques. A key contribution of their work is the introduction of two novel datasets covering seven different news domains, which allows for a more comprehensive evaluation of their proposed methods. The authors introduce classification models that rely on a combination of lexical, syntactic, and semantic information, as well as features representing text readability properties. Experimental results show that the proposed models were able to achieve satisfactory levels of accuracy in detecting fake news, with the best performing models reaching accuracies that are comparable to human ability to spot fake content.

Le et al. [50], addresses the question of what would happen if adversaries attempted to attack automated detection models for fake news. To this end, they introduce MALCOM, an end-to-end adversarial comment generation framework that allows for attacking such models. Through a comprehensive evaluation, the authors demonstrate that on average, MALCOM can successfully mislead five of the latest neural detection models to always output targeted real and fake news labels approximately 94% and 93.5% of the time, respectively.

Limeng Cui et al. [51], proposes a method for detecting misinformation in the healthcare domain. They introduce a knowledge-guided graph attention network called DETERRENT which utilizes domain-specific knowledge and graph structure to improve the performance of misinformation detection for the medical sector.

Beer et al. [52] conduct a systematic literature review to identify the main approaches for identifying fake news, such as different situations these approaches can be applied in, with examples, challenges and appropriate context in which an approach can be used. This work highlights the importance of tackling the problem of fake news as it can have a range of consequences, from being annoying to influencing and misleading societies or even nations.

Giachanou et al. [53], present a method for detecting conspiracy theories in social media using a combination of natural language processing techniques and psycholinguistic characteristics. The author utilized a dataset of tweets related to conspiracy theories and used this data to train a machine learning model that can identify conspiracy propagators based on specific linguistic patterns. The model outperformed other state-of-the-art baselines in terms of performance. The author also highlighted the advantage of using psycho-linguistic characteristics for detecting conspiracy theorists, where it can provide more insights into the nature of conspiracy theories and the personalities of their propagators.

Rangel et al. [54, 55] present the results of the 8th International Author Profiling Shared Task at PAN 2020, which focused on identifying Twitter authors who spread fake news in English and Spanish. The participants used different features, including ngrams, stylistics, personality and emotions, and embeddings. They employed machine learning algorithms such as Support Vector Machines and Logistic Regression, and few participants used deep learning techniques like Fully-Connected Neural Networks, CNN, LSTM and Bi-LSTM with self-attention. The results showed that traditional machine learning approaches obtained higher accuracy than deep learning ones. The six top-performing teams used combinations of n-grams with traditional machine learning algorithms, and the best results were obtained in Spanish and English. The paper also highlights that the highest confusion in English is from Real News spreaders to Fake News Spreaders, while in Spanish is the other way around, from Fake News Spreaders to Real News Spreaders. The paper concludes that it is possible to automatically identify potential Fake News Spreaders on Twitter with high precision, but the high rate of false positives highlights the importance of careful error analysis.

These methods generally rely on labeled datasets. Consequently a variety of misinformation datasets have been published in the recent years.

Wang et al. [8] present LIAR: a publicly available dataset for fake news detection collected over the time span of a decade. The dataset includes approx. 12K manually labeled short statements in various contexts from politifact.com, which provides detailed analysis report and links to source documents for each case.

Nabil et al. [56] present a Twitter dataset for Arabic language sentiment analysis, called ASTD. The dataset comprises around 10,000 tweets, categorized into four classes: objective, subjective positive, subjective negative and subjective mixed.

Salem et al. [57] created the first dataset focused on fake news surrounding the conflict in Syria. The authors have also built fully-supervised machine-learning models for detecting fake news and testing it on news articles related to the Syrian war as well as other fake news datasets.

Dai et al. [58] introduce a data set called FakeHealth that aims to facilitate research in the area of health fake news. The data repository contains two feature-rich datasets that include a large amount of news content, social engagements, and user-user social networks. The authors conduct exploratory analyses to show the characteristics of the datasets and identify potential patterns and challenges in detecting fake health news.

Shu et al. [59] present a data set called FakeNewsNet that aims to facilitate research in the area of fake news detection. The repository contains a large amount of data collected from news content, social context, and spatiotemporal information. The authors also conduct a preliminary exploration of the various features in FakeNewsNet and demonstrate its utility by using it in a fake news detection task against several state-of-the-art baselines.

A comprehensive overview over the different datasets was provided in recent work [60]. Furthermore, in a recent survey, fake news spreading was studied together with polarisation dynamics and bots [61].

As COVID-19 misinformation has attracted substantial attention from the research community, several datasets dealing specifically with this topic have been published recently [60, 62]. Darius and Urquhart [63] specifically study conspiracy theories related to COVID-19. However, unlike out dataset, they rely on hashtags rather than human annotation.

We also created a Twitter dataset dealing specifically with 5 G-related COVID-19 misinformation, as well as the retweet graphs of such tweets [23, 24]. The dataset, wich is called WICO (WIreless COnspircacy) was used in the MediaEval 2020 challenge on fake news detection [64]. It also served as the foundation of an analysis focusing on the 5 G-COVID phenomenon [12]. The MediaEval 2020 fake news detection task closely resembles stance classification [65]. Furthermore, there are many competitions that provide datasets to evaluate language technology, e.g. CLEF [66, 67] and SemEval [68, 69].

COCO, our new dataset, distinguishes 12 categories of conspiracy narratives rather than focusing on 5 G and COVID-19 alone. Due to the intense coverage of this misinformation category, we excluded 5 G from the search terms in the new dataset. An earlier version containing parts of the new dataset was used in in the MediaEval 2020 challenge on fake news detection [64], where the objective was to train and evaluate machine learning classifiers based on this data. Several participating teams achieved strong results [70]. Thus, our contribution resembles multi-narrative datasets such as Emergent [71].

Conclusion

We have presented a new human-labeled misinformation dataset connected to COVID-19 related conspiracy theories. Unlike many previous datasets which only differentiate between true and false information, we label the tweets to distinguish different conspiracy narratives, as well as tweets related to but not promoting such narratives.

This means that conspiracy and non-conspiracy tweets will often use similar words. Thus, obtaining high accuracy when training NLP models to distinguish between both classes becomes harder. They can no longer rely on differences in word frequency, which causes difficulties for methods such as TF-IDF [72]. Instead, they have to analyze the meaning. While BERT-based approaches [44] worked reasonably well in the MediaEval2021 challenge, it was observed that BERT sometimes struggles with negations [73] which are common in the Related category.

In addition, the distinction between the Conspiracy and Related classes allows further analysis of the spread of conspiracy narratives. There is a meaningful difference between categories such as Anti vaccination, which have many tweets in the Related class and Depopulation, which has few, as shown in Table 2. This allows further investigation into the question whether publicly discussing conspiracy theories without promoting their contents nonetheless increases the number of people who believe in them.

The dataset is made publicly available. However, following Twitter’s terms of service, the text of the tweets is not contained in the dataset. In future work, we will use the dataset to train advanced machine learning classifiers and apply them to the entire set of tweets. In this manner we will gain a detailed picture about the spread of the different conspiracy narratives during the COVID-19 pandemic.

Availability of data and materials

The tweetIds including their labels are available at https://osf.io/qj7c3/?view_only=2df72913b52a4aa792d8391a06d5b7d3. To hydrate the tweetIds we recommend to use the script available at https://github.com/konstapo/2022-Fake-News-MediaEval-Task/blob/main/tools/twitter_downloader/download_tweets.py.

References

Ali, H. S., & Kurasawa, F. (2020). #COVID19: Social media both a blessing and a curse during coronavirus pandemic. https://bit.ly/3bjVQgQ
Ecker, U. K., Lewandowsky, S., Cook, J., Schmid, P., Fazio, L. K., Brashier, N., Kendeou, P., Vraga, E. K., & Amazeen, M. A. (2022). The psychological drivers of misinformation belief and its resistance to correction. Nature Reviews Psychology, 1(1), 13–29. https://doi.org/10.1038/s44159-021-00006-y
Article Google Scholar
Douglas, K. M., Uscinski, J. E., Sutton, R. M., Cichocka, A., Nefes, T., Ang, C. S., & Deravi, F. (2019). Understanding conspiracy theories. Political Psychology, 40, 3–35. https://doi.org/10.1111/pops.12568
Article Google Scholar
Hristov, T., McKenzie-McHarg, A., & Romero-Reche, A. (2020). Routledge handbook of conspiracy theories (pp. 11–15). London: Routledge.
Book Google Scholar
Moffitt, J. D., King, C., & Carley, K. M. (2021). Hunting conspiracy theories during the COVID-19 pandemic. Social Media Society, 7(3), 20563051211043212. https://doi.org/10.1177/20563051211043212
Article Google Scholar
Pertwee, E., Simas, C., & Larson, H. J. (2022). An epidemic of uncertainty: rumors, conspiracy theories and vaccine hesitancy. Nature Medicine, 28(3), 456–459. https://doi.org/10.1038/s41591-022-01728-z
Article Google Scholar
Ahmed, H., Traoré, I., & Saad, S. (2017). Detection of online fake news using n-gram analysis and machine learning techniques. In: Traoré, I., Woungang, I., Awad, A. (Eds.) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments - First International Conference, ISDDC 2017, Vancouver, BC, Canada, October 26-28, 2017, Proceedings. Lecture Notes in Computer Science, vol. 10618 (pp. 127–138). Springer, New York, USA. https://doi.org/10.1007/978-3-319-69155-8_9
Wang, W. Y. (2017). “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In: Barzilay, R., Kan, M. (Eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, Volume 2: Short Papers (pp. 422–426). Association for Computational Linguistics, Stroudsburg, USA. https://doi.org/10.18653/v1/P17-2067
Schroeder, D. T., Pogorelov, K., & Langguth, J. (2019). FACT: a framework for analysis and capture of twitter graphs. In: Alsmirat, M. A., Jararweh, Y. (Eds.) Sixth International Conference on Social Networks Analysis, Management and Security, SNAMS 2019, Granada, Spain, October 22–25, 2019 (pp. 134–141). IEEE, New York, USA. https://doi.org/10.1109/SNAMS.2019.8931870
Bartoschek, S. (2020). Bekanntheit von und zustimmung zu verschwörungstheorien-eine empirische grundlagenarbeit. Hannover: jmb.
Google Scholar
Butter, M., & Knight, P. (2020). Routledge handbook of conspiracy theories. London: Routledge.
Book Google Scholar
Langguth, J., Filkuková, P., Brenner, S., Schroeder, D. T., & Pogorelov, K. (2022). COVID-19 and 5G conspiracy theories: long term observation of a digital wildfire. International Journal of Data Science and Analytics. https://doi.org/10.1007/s41060-022-00322-3
Article Google Scholar
Pogorelov, K., Schroeder, D. T., Brenner, S., & Langguth, J. (2021). Fakenews: Corona virus and conspiracies multimedia analysis task at mediaeval 2021. In: Hicks, S., Pogorelov, K., Lommatzsch, A., de Herrera, A. G. S., Martin, P., Hassan, S. Z., Porter, A., Kasem, A., Andreadis, S., Lux, M., Ocaña, M. G., Liu, A., Larson, M. A. (Eds.) Working Notes Proceedings of the MediaEval 2021 Workshop, Online, 13-15 December 2021. CEUR Workshop Proceedings, vol. 3181. CEUR-WS.org, Aachen, Germany. http://ceur-ws.org/Vol-3181/paper56.pdf
Ribeiro, M. H., Calais, P. H., Almeida, V. A. F., & Jr., W. M. (2017). “Everything I disagree with is #fakenews”: correlating political polarization and spread of misinformation. CoRR abs/1706.05924 1706.05924
Pennycook, G., Epstein, Z., Mosleh, M., Arechar, A. A., Eckles, D., & Rand, D. G. (2021). Shifting attention to accuracy can reduce misinformation online. Nature, 592(7855), 590–595. https://doi.org/10.1038/s41586-021-03344-2
Article Google Scholar
Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, 9(2p2), 1. https://doi.org/10.1037/h0025848
Article Google Scholar
Bornstein, R. F. (1989). Exposure and affect: overview and meta-analysis of research, 1968–1987. Psychological Bulletin, 106(2), 265. https://doi.org/10.1037/0033-2909.106.2.265
Article Google Scholar
Yablokov, I. (2020). Conspiracy theories in Putin’s Russia: the case of the ‘New World Order’. Routledge Handbook of Conspiracy Theories (pp. 582–595). London: Routledge.
Chapter Google Scholar
für Gesundheit, B. (2021). Eine Impfpflicht wird es nicht geben. Nachrichten und Beiträge, die etwas anderes behaupten, sind falsch. https://twitter.com/bmg_bund/status/1347120866908372992
Bundestag, D. (2022). Gesetzentwurf für allgemeine Impfpflicht ab 18 Jahren. https://www.bundestag.de/presse/hib/kurzmeldungen-883000
Österreich Parlament, R. (2022). COVID-19-Impfpflichtgesetz. https://www.parlament.gv.at/PAKT/VHG/XXVII/A/A_02173/
Ridley, M., & Chan, A. (2021). Viral: the search for the origin of COVID-19. HarperCollins, New York, USA. https://books.google.no/books?id=o2ozEAAAQBAJ
Pogorelov, K., Schroeder, D. T., Filkukova, P., Brenner, S., & Langguth, J. (2021). WICO text: A labeled dataset of conspiracy theory and 5G-corona misinformation tweets. In: Guidi, B., Michienzi, A., Ricci, L. (Eds.) OASIS@HT 2021: Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks, Virtual Event, Ireland, 30 August 2021 (pp. 21–25). ACM, New York, USA. https://doi.org/10.1145/3472720.3483617
Schroeder, D. T., Schaal, F., Filkukova, P., Pogorelov, K., & Langguth, J. (2021). WICO graph: a labeled dataset of twitter subgraphs based on conspiracy theory and 5G-corona misinformation tweets. In: Rocha, A. P., Steels, L., van den Herik, H. J. (Eds.) Proceedings of the 13th International Conference on Agents and Artificial Intelligence, ICAART 2021, Volume 2, Online Streaming, February 4–6, 2021 (pp. 257–266). SCITEPRESS, Setúbal, Portugal. https://doi.org/10.5220/0010262802570266
Spark, A. (2000). Conjuring order: the new world order and conspiracy theories of globalization. The Sociological Review, 48(2-suppl), 46–62. https://doi.org/10.1111/j.1467-954X.2000.tb03520.x
Article Google Scholar
World economic forum: the great reset. (2020). https://www.weforum.org/great-reset. Accessed 3 Aug 2022
Qureshi, S. (2014). Govt working on formulating population control law: union minister Sanjeev Balyan. https://www.indiatoday.in/india/story/govt-working-on-formulating-population-control-law-union-minister-sanjeev-balyan-1619713-2019-11-16
Aroh, A., Asaolu, B., & Okafor, C. T. (2021). Myths and models: what’s driving vaccine hesitancy in Africa and how can we overcome it? https://www.africaportal.org/features/myths-and-models-whats-driving-vaccine-hesitancy-in-africa-and-how-can-we-overcome-it/. Accessed 3 Aug 2022
Dentith, M. (2014). The philosophy of conspiracy theories. London: Springer.
Book Google Scholar
Whitehead, M., Taylor, N., Gough, A., Chambers, D., Jessop, M., & Hyde, P. (2019). The anti-vax phenomenon. The veterinary record, 184(24), 744.
Article Google Scholar
Meo, S., Klonoff, D., & Akram, J. (2020). Efficacy of chloroquine and hydroxychloroquine in the treatment of COVID-19. European Review for Medical and Pharmacological Sciences, 24(8), 4539–4547.
Google Scholar
Lovelace, Berkeley. (2020). Trump says he still thinks hydroxychloroquine works in treating early stage coronavirus. https://www.cnbc.com/2020/07/28/trump-says-he-still-thinks-hydroxychloroquine-works-in-treating-early-stage-coronavirus.html. Accessed 8 Aug 2022
Fiolet, T., Guihur, A., Rebeaud, M. E., Mulot, M., Peiffer-Smadja, N., & Mahamat-Saleh, Y. (2021). Effect of hydroxychloroquine with or without azithromycin on the mortality of coronavirus disease 2019 (COVID-19) patients: a systematic review and meta-analysis. Clinical Microbiology and Infection, 27(1), 19–27. https://doi.org/10.1016/j.cmi.2020.08.022
Article Google Scholar
Ilyas, M., & Mahgoub, I. (2018). Smart dust: sensor network applications, architecture and design. London: CRC.
Book Google Scholar
Hale, T., Angrist, N., Goldszmidt, R., Kira, B., Petherick, A., Phillips, T., Webster, S., Cameron-Blake, E., Hallas, L., Majumdar, S., & Tatlow, H. (2021). A global panel database of pandemic policies (Oxford COVID-19 government response tracker). Nature Human Behaviour, 5(4), 529–538. https://doi.org/10.1038/s41562-021-01079-8
Article Google Scholar
Staff, R. (2020). Fact check: false claims about George Soros. https://www.reuters.com/article/uk-factcheck-false-george-soros-claims-idUSKBN23P2XJ
Check, R. F. (2021). Fact check: false claims about George Soros. https://www.reuters.com/article/factcheck-gates-list-idUSL1N2LO230
Brashier, N. M., & Marsh, E. J. (2020). Judging truth. Annual Review of Psychology, 71(1), 499–515. https://doi.org/10.1146/annurev-psych-010419-050807. PMID: 31514579.
Article Google Scholar
Uscinski, J., Enders, A., Klofstad, C., Seelig, M., Drochon, H., Premaratne, K., & Murthi, M. (2022). Have beliefs in conspiracy theories increased over time? PLOS ONE, 17(7), 1–19. https://doi.org/10.1371/journal.pone.0270429
Article Google Scholar
Shebaro, M., Oliver, J., Olarewaju, T., & Tesic, J. (2021). DL-TXST fake news: Enhancing tweet content classification with adapted language models. In: Hicks, S., Pogorelov, K., Lommatzsch, A., de Herrera, A. G. S., Martin, P., Hassan, S. Z., Porter, A., Kasem, A., Andreadis, S., Lux, M., Ocaña, M. G., Liu, A., Larson, M. A. (Eds.) Working Notes Proceedings of the MediaEval 2021 Workshop, Online, 13–15 December 2021. CEUR Workshop Proceedings, vol. 3181. CEUR-WS.org, Aachen, Germany. http://ceur-ws.org/Vol-3181/paper62.pdf
Yanagi, Y., Orihara, R., Tahara, Y., Sei, Y., & Ohsuga, A. (2021). Classifying COVID-19 conspiracy tweets with word embedding and BERT. In: Hicks, S., Pogorelov, K., Lommatzsch, A., de Herrera, A. G. S., Martin, P., Hassan, S. Z., Porter, A., Kasem, A., Andreadis, S., Lux, M., Ocaña, M. G., Liu, A., Larson, M. A. (Eds.) Working Notes Proceedings of the MediaEval 2021 Workshop, Online, 13-15 December 2021. CEUR Workshop Proceedings, vol. 3181. CEUR-WS.org, Aachen, Germany. http://ceur-ws.org/Vol-3181/paper57.pdf
To, T., Nguyen, N., Vo, D., Le-Pham, N., Nguyen, H., & Tran, M. (2021). HCMUS mediaeval 2021: Multi-model decision method applied on data augmentation for COVID-19 conspiracy theories classification. In: Hicks, S., Pogorelov, K., Lommatzsch, A., de Herrera, A. G. S., Martin, P., Hassan, S. Z., Porter, A., Kasem, A., Andreadis, S., Lux, M., Ocaña, M. G., Liu, A., Larson, M. A. (Eds.) Working Notes Proceedings of the MediaEval 2021 Workshop, Online, 13-15 December 2021. CEUR Workshop Proceedings, vol. 3181. CEUR-WS.org, Aachen, Germany. http://ceur-ws.org/Vol-3181/paper63.pdf
Schröder, P. (2021). Don’t just drop them: Function words as features in COVID-19 related fake news classification on twitter. In: Hicks, S., Pogorelov, K., Lommatzsch, A., de Herrera, A. G. S., Martin, P., Hassan, S. Z., Porter, A., Kasem, A., Andreadis, S., Lux, M., Ocaña, M. G., Liu, A., Larson, M. A. (Eds.) Working Notes Proceedings of the MediaEval 2021 Workshop, Online, 13–15 December 2021. CEUR Workshop Proceedings, vol. 3181. CEUR-WS.org, Aachen, Germany. http://ceur-ws.org/Vol-3181/paper41.pdf
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (Eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers) (pp. 4171–4186). https://doi.org/10.18653/v1/n19-1423
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 1907.11692
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 1910.01108
Müller, M., Salathé, M., & Kummervold, P. E. (2020). Covid-twitter-bert: a natural language processing model to analyse COVID-19 content on twitter. CoRR abs/2005.07503 2005.07503
Peskine, Y., Alfarano, G., Harrando, I., Papotti, P., & Troncy, R. (2021). Detecting covid-19-related conspiracy theories in tweets. In: MediaEval 2021, MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop, 13–15 December 2021 (Online Event), p. 65
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., & Mihalcea, R. (2018). Automatic detection of fake news. In: Bender, E. M., Derczynski, L., Isabelle, P. (Eds.) Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20–26, 2018 (pp. 3391–3401). https://aclanthology.org/C18-1287/
Le, T., Wang, S., & Lee, D. (2020). MALCOM: generating malicious comments to attack neural fake news detection models. In: Plant, C., Wang, H., Cuzzocrea, A., Zaniolo, C., Wu, X. (Eds.) 20th IEEE International Conference on Data Mining, ICDM 2020, Sorrento, Italy, November 17–20, 2020 (pp. 282–291). https://doi.org/10.1109/ICDM50108.2020.00037
Cui, L., Seo, H., Tabar, M., Ma, F., Wang, S., & Lee, D. (2020). Deterrent: knowledge guided graph attention network for detecting healthcare misinformation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’20 (pp. 492–502). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3394486.3403092
de Beer, D., & Matthee, M. (2020). Approaches to identify fake news: a systematic literature review. In T. Antipova (Ed.), Integrated Science in Digital Age (pp. 13–22). Cham: Springer. https://doi.org/10.1007/978-3-030-49264-9_2
Chapter Google Scholar
Giachanou, A., Ghanem, B., & Rosso, P. (2023). Detection of conspiracy propagators using psycho-linguistic characteristics. Journal of Information Science, 49(1), 3–17. https://doi.org/10.1177/0165551520985486
Article Google Scholar
Pardo, F. M. R., Giachanou, A., Ghanem, B., & Rosso, P. (2020). Overview of the 8th author profiling task at PAN 2020: Profiling fake news spreaders on twitter. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (Eds.) Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22–25, 2020. CEUR Workshop Proceedings, vol. 2696. CEUR-WS.org, Aachen, Germany. http://ceur-ws.org/Vol-2696/paper_267.pdf
Bevendorff, J., Chulvi, B., la Peña Sarracén, G. L. D., Kestemont, M., Manjavacas, E., Markov, I., Mayerl, M., Potthast, M., Rangel, F., Rosso, P., Stamatatos, E., Stein, B., Wiegmann, M., Wolska, M., & Zangerle, E. (2021). Overview of PAN 2021: Authorship verification, profiling hate speech spreaders on twitter, and style change detection. In: Candan, K.S., Ionescu, B., Goeuriot, L., Larsen, B., Müller, H., Joly, A., Maistro, M., Piroi, F., Faggioli, G., Ferro, N. (Eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction - 12th International Conference of the CLEF Association, CLEF 2021, Virtual Event, September 21–24, 2021, Proceedings. Lecture Notes in Computer Science, vol. 12880 (pp. 419–431). Springer, New York, USA. https://doi.org/10.1007/978-3-030-85251-1_26
Nabil, M., Aly, M.A., & Atiya, A.F. (2015). ASTD: Arabic sentiment tweets dataset. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (Eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015 (pp. 2515–2519). The Association for Computational Linguistics, Stroudsburg, USA. https://doi.org/10.18653/v1/d15-1299
Salem, F. K. A., Feel, R.A., Elbassuoni, S., Jaber, M., & Farah, M. (2019). FA-KES: a fake news dataset around the Syrian war. In: Pfeffer, J., Budak, C., Lin, Y., Morstatter, F. (Eds.) Proceedings of the Thirteenth International Conference on Web and Social Media, ICWSM 2019, Munich, Germany, June 11–14, 2019 (pp. 573–582). AAAI Press, Palo Alto, USA. https://ojs.aaai.org/index.php/ICWSM/article/view/3254
Dai, E., Sun, Y., & Wang, S. (2020). Ginger cannot cure cancer: battling fake health news with a comprehensive data repository. In: Proceedings of the International AAAI Conference on Web and Social Media, 14(1), 853–862. https://doi.org/10.1609/icwsm.v14i1.7350
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2020). Fakenewsnet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data, 8(3), 171–188. https://doi.org/10.1089/big.2020.0062
Article Google Scholar
Cui, L., & Lee, D. (2020). Coaid: COVID-19 healthcare misinformation dataset. CoRR abs/2006.00885 2006.00885
Ruffo, G., Semeraro, A., Giachanou, A., & Rosso, P. (2023). Studying fake news spreading, polarisation dynamics, and manipulation by bots: a tale of networks and language. Computer Science Review, 47, 100531. https://doi.org/10.1016/j.cosrev.2022.100531
Article Google Scholar
Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., Ekbal, A., Das, A., & Chakraborty, T. (2021). Fighting an infodemic: COVID-19 fake news dataset. In T. Chakraborty, K. Shu, H. R. Bernard, H. Liu, & M. S. Akhtar (Eds.), Combating online hostile posts in regional languages during emergency situation (pp. 21–29). Cham: Springer. https://doi.org/10.1007/978-3-030-73696-5_3
Chapter Google Scholar
Darius, P., & Urquhart, M. (2021). Disinformed social movements: a large-scale mapping of conspiracy narratives as online harms during the covid-19 pandemic. Online Social Networks and Media, 26, 100174.
Article Google Scholar
Pogorelov, K., Schroeder, D. T., Burchard, L., Moe, J., Brenner, S., Filkukova, P., & Langguth, J. (2020). Fakenews: Corona virus and 5G conspiracy task at mediaeval 2020. In: Hicks, S., Jha, D., Pogorelov, K., de Herrera, A. G. S., Bogdanov, D., Martin, P., Andreadis, S., Dao, M., Liu, Z., Quiros, J. V., Kille, B., Larson, M. A. (Eds.) Working Notes Proceedings of the MediaEval 2020 Workshop, Online, 14–15 December 2020. CEUR Workshop Proceedings, vol. 2882. CEUR-WS.org, Aachen, Germany. http://ceur-ws.org/Vol-2882/paper64.pdf
Bar-Haim, R., Bhattacharya, I., Dinuzzo, F., Saha, A., & Slonim, N. (2017). Stance classification of context-dependent claims. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (pp. 251–261). Association for Computational Linguistics, Valencia, Spain. https://www.aclweb.org/anthology/E17-1024
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., Alam, F., Haouari, F., Hasanain, M., Babulkov, N., Nikolov, A., Shahi, G. K., Struß, J. M., & Mandl, T. (2021). The clef-2021 checkthat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In D. Hiemstra, M.-F. Moens, J. Mothe, R. Perego, M. Potthast, & F. Sebastiani (Eds.), Advances in Information Retrieval (pp. 639–649). Cham: Springer. https://doi.org/10.1007/978-3-030-72240-1_75
Chapter Google Scholar
Arampatzis, A., Kanoulas, E., Tsikrika, T., Vrochidis, S., Joho, H., Lioma, C., Eickhoff, C., Névéol, A., Cappellato, L., & Ferro, N. (Eds.) (2020). Experimental IR Meets Multilinguality, Multimodality, and Interaction - 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, September 22–25, 2020, Proceedings. Lecture Notes in Computer Science, vol. 12260. Springer, New York, USA. https://doi.org/10.1007/978-3-030-58219-7
Emerson, G., Schluter, N., Stanovsky, G., Kumar, R., Palmer, A., Schneider, N., Singh, S., & Ratan, S. (Eds.) (2022). Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, Seattle, Washington, United States, July 14–15, 2022. Association for Computational Linguistics, United States. https://aclanthology.org/volumes/2022.semeval-1/
Fersini, E., Gasparini, F., Rizzi, G., Saibene, A., Chulvi, B., Rosso, P., Lees, A., & Sorensen, J. (2022) SemEval-2022 task 5: Multimedia automatic misogyny identification. In: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) (pp. 533–549). Association for Computational Linguistics, Seattle, United States. https://doi.org/10.18653/v1/2022.semeval-1.74
Alfarano, G. (2021–2022). Detecting fake news using natural language processing. Master’s Thesis, Politecnico di Torino
Ferreira, W., & Vlachos, A. (2016). Emergent: a novel data-set for stance classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1163–1168). Association for Computational Linguistics, San Diego, California. https://doi.org/10.18653/v1/N16-1138
Sammut, C., & Webb, G. I. (Eds.). (2010). TF–IDF (pp. 986–987). Boston, MA: Springer. https://doi.org/10.1007/978-0-387-30164-8_832
Book Google Scholar
Ettinger, A. (2020). What BERT is not: lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics, 8, 34–48. https://doi.org/10.1162/tacl_a_00298
Article Google Scholar
Google Maps Platform: Google Geocoding API. (2020).https://developers.google.com/maps/documentation/geocoding/overview. Accessed 12 Dec 2021

Download references

Acknowledgements

The authors acknowledge support from Michael Kreil in the collection of Twitter data.

Funding

Open access funding provided by OsloMet - Oslo Metropolitan University. This work was funded by the Norwegian Research Council under contracts #272019 and #303404, The research presented in this paper has benefited from the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by the Research Council of Norway under contract #270053.

Author information

Authors and Affiliations

Simula Research Lab, Kristian Augusts Gate 23, Oslo, Norway
Johannes Langguth, Daniel Thilo Schroeder, Petra Filkuková & Konstantin Pogorelov
Norwegian Business School, Nydalsveien 37, Oslo, Norway
Johannes Langguth
Stuttgart Media University, Nobelstraße 10, Stuttgart, Germany
Stefan Brenner
Bates College, Andrews Rd 2, Lewiston, ME, USA
Jesper Phillips
Department of Journalism and Media Studies, Oslo Metropolitan University, Pilestredet Park 0890, 0176, Oslo, Norway
Daniel Thilo Schroeder

Authors

Johannes Langguth
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Thilo Schroeder
View author publications
You can also search for this author in PubMed Google Scholar
Petra Filkuková
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Brenner
View author publications
You can also search for this author in PubMed Google Scholar
Jesper Phillips
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Pogorelov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Thilo Schroeder.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: COVID-19 search keywords

The list of search English keywords for obtaining the initial set of COVID-19 related tweets is as follows:

#corona, corona, covidiot, #covidiot, #coronaoutbreak, #coronarvirus, #coronavirus, #coronavirusde, #coronavirusoutbreak, #covid, #covid19, #covid2019, #covid_19, #covid-19, #wuhan, #wuhancoronavirus, #wuhancoronovirus, #wuhanvirus, coronarvirus, coronavirus, coronavirusde, coronavírus, covid, covid-19, covid19, covid2019, covid_19, covid-19, epidemic, pandemic, quarantine, quarantined, wuhan.

Appendix B: Misinformation search keywords

The search was performed concurrently with searches in other languages. However, only English tweets were included in the dataset. To select the candidate tweets for annotation, we used the following list of case-insensitive keywords. The last five entries are pairs of words connected by logical AND, i.e., only tweets containing both word in either order were selected.

aluminium salts, zinc salts, reptiloids, zeolite, ritual sacrifice, haarp, geoengineering, 60ghz, population reduction, planned pandemic, forced vaccination, chemtrails, mind control, magnetic, rfid, rothschild, antichrist, false flag, mark of the beast, adrenochrome, implant, population control, sheeple, microchip, new world order, id2020, soros, deep state, bioweapon, wwg1wga, spiritual, plandemic, qanon, nwo, freemasons, mms, there is no virus, depopulation, quantum, trust the plan, trusttheplan, lockstep, operationlockstep, orgone, exosomes & 5 G, infertile & vaccine, child & ritual, wayfair & child, hcq & patent

Most keywords were chosen based on previous knowledge. Others were identified in the dataset and subsequently included.

Appendix C: Automated location analysis

We built a system to decode the self-reported locations of the Twitter users. Initially, we experimented with the tweet locations reported by Twitter, but only a small number of users enable this feature, and it is not clear whether this sample would be representative. On the other hand, about half the tweets come from users that have a meaningful self-reported location. While it is not possible for us to determine the accuracy of the locations, we assume that there is no systematic widespread misreporting, in accordance with accepted practice in the social sciences. However, decoding the locations automatically into data that can be evaluated by country poses an additional challenge.

We solve this problem in the following way: we first count the frequency of each self-reported location string. The count shows that less than 120,000 location strings appear more than once. Therefore, it becomes possible to use the Google Geocoding API [74] which transforms the text string into a Country/State/City record. We only consider countries and US states, and we ignore smaller and non-English speaking countries. In this manner, we obtain a valid location for about half the tweets. For the COCO dataset, we selected only tweets where the authors self-report at least the country. Calling the Geocoding API for every individual tweet or user is possible, but prohibitively expensive, since Google charges users for each individual request.

Table 4 Co-mentions of keywords in the tweet set after removing links with a total of 514,716 tweets

Full size table

Table 5 The main diagonals show the number of times each label was assigned

Full size table

Table 6 The number of conspiracy tweets containing each keyword per category

Full size table

Table 7 The number of conspiracy tweets per category over time

Full size table

Table 8 The number of related tweets per category over time

Full size table

Guidelines provided to the annotators for each category:

1.
Suppressed cures Narratives which propose that effective medications for COVID-19 were available, but whose existence or effectiveness has been denied by authorities, either for financial gain by the vaccine producers or some other harmful intent.
2.
Behavior control Narratives containing the idea that the pandemic is being exploited to control the behavior of individuals, either directly through fear, through laws which are only accepted because of fear, or through techniques which are impossible with today’s technology, such as mind control through microchips.
3.
Anti vaccination Narratives that suggest that the COVID-19 vaccines serve some hidden nefarious purpose in this category. Examples include the injection of tracking devices, nanites or an intentional infection with COVID-19, but not concerns about vaccine safety or efficacy, or concerns about the trustworthiness of the producers.
4.
Fake virus Narratives saying that there is no COVID-19 pandemic or that the pandemic is just an over-dramatization of the annual flu season. Example intent is to deceive the population in order to hide deaths from other causes, or to control the behavior of the population through irrational fear.
5.
Intentional pandemic Narratives claiming that the pandemic is the result of purposeful human action pursuing some illicit goal. Does not include asserting that COVID-19 is a bioweapon or discussing whether it was created in a laboratory since this does not prelude the possibility that it was released accidentally.
6.
Harmful radiation Narratives that connect COVID-19 to wireless transmissions, especially from 5 G equipment, claiming for example that 5 G is deadly and that COVID-19 is a coverup, or that 5 G allows mind control via microchips injected in the bloodstream.
7.
Depopulation Conspiracy theories on population reduction or population growth control suggest that either COVID-19 or the vaccines are being used to reduce population size, either by killing people or by rendering them infertile. In some cases, this is directed against specific ethnic groups.
8.
New world order New World Order (NWO) is a preexisting conspiracy theory which deals with the secret emerging totalitarian world government. In the context of the pandemic, this usually means that COVID-19 is being used to bring about this world government through fear of the virus or by taking away civil liberties, or some other, implausible ideas such as mind control.
9.
Esoteric misinformation Truly esoteric ideas concerning spiritual planes etc. Note that conventional faith-based statements such as "praying for the pandemic to end" do not fall into this category.
10.
Satanism Narratives in which the perpetrators are alleged to be some kind of satanists, perform objectionable rituals, or make use of occult ideas or symbols. May involve harm or sexual abuse of children, such as the idea that global elites harvest adrenochrome from children.
11.
Other conspiracy theory Catchall category for tweets that interpret other known conspiracy theories in the light of COVID-19 or connect some of the above categories to preexisting conspiracy theories.
12.
Other misinformation Catchall category for tweets containing substantial misinformation that does not fulfill the requirements of a conspiracy theory. Only include obvious misinformation.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Langguth, J., Schroeder, D.T., Filkuková, P. et al. COCO: an annotated Twitter dataset of COVID-19 conspiracy theories. J Comput Soc Sc 6, 443–484 (2023). https://doi.org/10.1007/s42001-023-00200-3

Download citation

Received: 26 September 2022
Accepted: 16 February 2023
Published: 04 April 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s42001-023-00200-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

COCO: an annotated Twitter dataset of COVID-19 conspiracy theories

Abstract

Similar content being viewed by others

COVID-19’s (Mis)Information Ecosystem on Twitter: How Partisanship Boosts the Spread of Conspiracy Narratives on German Speaking Twitter

An augmented multilingual Twitter dataset for studying the COVID-19 infodemic

Characterizing partisan political narrative frameworks about COVID-19 on Twitter

Explore related subjects

Introduction

Dataset creation

Tweet selection

Manual labeling

Classes

Categories

Quantitative dataset description

Connections between keywords and categories

Location analysis

Distribution over time

Qualitative analysis of the narratives

Goal narratives

Political goals

Depopulation goals

Financial goals

Imaginary goals

Means narratives

Fearmongering

Vaccines

Suppressed cures

5 G, magnetism, microchips, and tagging

Intentional pandemic

Mark of the Beast

Perpetrators

Deep State

George Soros and globalists

Anthony Fauci and Bill Gates

Donald Trump

China

Powerful organizations

Aliens

Counting perpetrator mentions

Conspiracy detection

Related work

Conclusion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: COVID-19 search keywords

Appendix B: Misinformation search keywords

Appendix C: Automated location analysis

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation