Advertisement

Characterizing the 2016 Russian IRA influence campaign

  • Adam BadawyEmail author
  • Aseel Addawood
  • Kristina Lerman
  • Emilio Ferrara
Original Article

Abstract

Until recently, social media were seen to promote democratic discourse on social and political issues. However, this powerful communication ecosystem has come under scrutiny for allowing hostile actors to exploit online discussions in an attempt to manipulate public opinion. A case in point is the ongoing U.S. Congress investigation of Russian interference in the 2016 U.S. election campaign, with Russia accused of, among other things, using trolls (malicious accounts created for the purpose of manipulation) and bots (automated accounts) to spread propaganda and politically biased information. In this study, we explore the effects of this manipulation campaign, taking a closer look at users who re-shared the posts produced on Twitter by the Russian troll accounts publicly disclosed by U.S. Congress investigation. We collected a dataset of 13 million election-related posts shared on Twitter in the year of 2016 by over a million distinct users. This dataset includes accounts associated with the identified Russian trolls as well as users sharing posts in the same time period on a variety of topics around the 2016 elections. We use label propagation to infer the users’ ideology based on the news sources they share. We are able to classify a large number of the users as liberal or conservative with precision and recall above 84%. Conservative users who retweet Russian trolls produced significantly more tweets than liberal ones, about 8 times as many in terms of tweets. Additionally, trolls’ position in the retweet network is stable overtime, unlike users who retweet them who form the core of the election-related retweet network by the end of 2016. Using state-of-the-art bot detection techniques, we estimate that about 5% and 11% of liberal and conservative users are bots, respectively. Text analysis on the content shared by trolls reveal that conservative trolls talk about refugees, terrorism, and Islam, while liberal trolls talk more about school shootings and the police. Although an ideologically broad swath of Twitter users were exposed to Russian trolls in the period leading up to the 2016 U.S. Presidential election, it is mainly conservatives who help amplify their message.

Keywords

Social media manipulation Russian trolls Bots Influence campaigns 

1 Introduction

Social media have helped foster democratic conversation about social and political issues: from the Arab Spring (González-Bailón et al. 2011), to Occupy Wall Street movements (Conover et al. 2013b, a) and other civil protests (González-Bailón et al. 2013; Varol et al. 2014; Stella et al. 2018), Twitter and other social media platforms appeared to play an instrumental role in involving the public in policy and political conversations by collectively framing the narratives related to particular social issues and coordinating online and off-line activities. The use of digital media for political discussions during presidential elections is examined by many studies, including the past four U.S. Presidential elections (Adamic and Glance 2005; Diakopoulos and Shamma 2010; Bekafigo and McBride 2013; Carlisle and Patton 2013; DiGrazia et al. 2013), and other countries like Australia (Gibson and McAllister 2006; Bruns and Burgess 2011), and Norway (Enli and Skogerbø 2013). Findings that focus on the positive effects of social media, such as increasing voter turnout (Bond et al. 2012) or exposure to diverse political views (Bakshy et al. 2015), contribute to the general praise of these platforms as a tool for promoting democracy and civic engagement (Shirky 2011; Loader and Mercea 2011; Effing et al. 2011; Tufekci and Wilson 2012; Tufekci 2014).

However, concerns regarding the possibility of manipulating public opinion and spreading political propaganda or fake news through social media were also raised early on (Howard 2006). These effects are documented by several studies (Ratkiewicz et al. 2011b; Conover et al. 2011b; El-Khalili 2013; Woolley and Howard 2016; Shorey and Howard 2016; Bessi and Ferrara 2016; Ferrara 2017; Fourney et al. 2017). Social media have been proven to be an effective tool to influence individuals’ opinions and behaviors (Aral et al. 2009; Aral and Walker 2012; Bakshy et al. 2011; Centola 2011, 2010); and some studies even evaluate the current tools to combat misinformation (Pennycook and Rand 2017). Computational tools, like troll accounts and social bots, have been designed to perform such type of influence operations at scale, by cloning or emulating the activity of human users while operating at much higher pace (e.g., automatically producing content following a scripted agenda) (Hwang et al. 2012; Messias et al. 2013; Ferrara et al. 2016a; Varol et al. 2017a; Ferrara 2018)—however, it should be noted that bots have been also used, in some instances, for positive interventions (Savage et al. 2016; Monsted et al. 2017).

Early accounts of the adoption of bots to attempt manipulate political communication with misinformation started in 2010, during the U.S. midterm elections, when social bots were employed to support some candidates and smear others; in that instance, bots injected thousands of tweets pointing to Web sites with fake news (Ratkiewicz et al. 2011a). Similar cases are reported during the 2010 Massachusetts special election (Metaxas and Mustafaraj 2012)—these campaigns are often referred to as Twitter bombs or political astroturf (Ferrara et al. 2016b; Varol et al. 2017b). Unfortunately, oftentimes determining the actors behind these operations was impossible (Kollanyi et al. 2016; Ferrara et al. 2016a). Prior to this work, only a handful of other operations are linked to some specific actors (Woolley and Howard 2016), e.g., the alt-right attempt to smear a presidential candidate before the 2017 French election Ferrara (2017). This is because governments, organizations, and other entities with sufficient resources, can obtain the technological capabilities necessary to covertly deploy hundreds or thousands of accounts and use them to either support or attack a given political target. Reverse-engineering these strategies has proven a challenging research venue (Freitas et al. 2015; Alarifi et al. 2016; Subrahmanian et al. 2016; Davis et al. 2016), but it can ultimately lead to techniques to identify the actors behind these operations.

One difficulty facing such studies is objectively determining what is fake news, as there is a range of untruthfulness from simple exaggeration to outright lies. Beyond factually wrong information, it is difficult to classify information as fake.

Rather than facing the conundrum of normative judgment and arbitrarily determine what is fake news and what is not, in this study we focus on user intents, specifically the intent to deceive, and their effects on the Twitter political conversation prior to the 2016 U.S. Presidential election.

Online accounts that are created and operated with the primary goal of manipulating public opinion (for example, promoting divisiveness or conflict on some social or political issue) are commonly known as Internet trolls (trolls, in short) (Buckels et al. 2014). To label some accounts or sources of information as trolls, a clear intent to deceive or create conflict has to be present. A malicious intent to harm the political process and cause distrust in the political system was evident in 2752 now-deactivated Twitter accounts that are later identified as being tied to Russia’s “Internet Research Agency” troll farm, which was also active on Facebook (Dutt et al. 2018). The U.S. Congress released a list of these accounts as part of the official investigation of Russian efforts to interfere in the 2016 U.S. Presidential election.

Since their intent is clearly malicious, the Russian Troll accounts and their messages are the subject of our scrutiny: we study their spread on Twitter to understand the extent of the Russian interference effort and its effects on the election-related political discussion.

1.1 Research questions

In this paper, we aim to answer three crucial research questions regarding the effects of the interference operation carried out by Russian trolls:
  1. RQ1

    What is the role of the users’ political ideology? We investigate whether political ideology affects who engages with Russian trolls, and how that may have helped propagate trolls’ content. If that is the case, we will determine if the effect is more pronounced among liberals or conservatives, or evenly spread across the political spectrum.

     
  2. RQ2

    How central are the trolls in the network of spreading information in the year of 2016 (pre and post-the US presidential elections)? We offer analyses of the position of trolls and the users who spread their messages in the retweet network progressively in time from the beginning of 2016 to the end of that year.

     
  3. RQ3

    What is the role of social bots? We characterize whether social bots play a role in spreading content produced by Russian trolls and, if that was the case, where on the political spectrum are bots situated.

     
  4. RQ4

    Do trolls succeed in specific areas of the US? We offer an extensive analysis of the geospatial dimension and how it affects the effectiveness of the Russian interference operation; we test whether users located within specific states participate in the consumption and propagation of trolls’ content more than others.

     

This paper improves upon our previous work (Badawy et al. 2018) by (1) extending the span of the data to 1 year before and after the 2016 US elections rather than just 2 months as in the previous paper; (2) using sophisticated network analysis to understand the influence of malicious users across time. We collect Twitter data for a year leading into the election. We obtained a dataset of over 13 million tweets generated by over a million distinct users in the year of 2016. We successfully determine the political ideology of most of the users using label propagation on the retweet network with precision and recall exceeding 84%. Next, using advanced machine learning techniques developed to discover social bots (Ferrara et al. 2016a; Subrahmanian et al. 2016; Davis et al. 2016) applied on users who engage with Russian trolls, we find that bots existed among both liberal and conservative users. We perform text analysis on the content Russian trolls disseminated and find that conservative trolls are concerned with particular causes, such as refugees, terrorism and Islam, while liberal trolls write about issues related to the police and school shootings. Additionally, we offer an extensive geospatial analysis of tweets across the USA, showing that it is mostly proportionate to the states’ population size—as expected—however, a few outliers emerge for both liberals and conservatives.

1.2 Summary of contributions

Findings presented in this work can be summarized as:
  • We propose a novel way of measuring the consumption of manipulated content through the analysis of activities of Russian trolls on Twitter in the year of 2016.

  • Using network-based machine learning methods, we accurately determine the political ideology of most users in our dataset, with precision and recall above 84%.

  • We use network analysis to map the position of trolls and spreaders in the retweet network over the year of 2016. We define spreaders in this paper as users who retweet trolls at least once. Although retweeting a troll once does not make a user malicious, we use this term to focus on the act of spreading the message of trolls not the intention.

  • State-of-the-art bot detection on users who engage with Russian trolls shows that bots are engaged in both liberal and conservative domains.

  • Text analysis shows that conservative Russian trolls are mostly promoting conservative causes in regards to refugees, terrorism, and Islam as well as talking about Trump, Clinton, and Obama. For liberal trolls, the discussion is focused on school shootings and the police, but Trump and Clinton are in the top words used as well.

  • We offer a comprehensive geospatial analysis showing that certain states overly-engaged with production and diffusion of Russian trolls’ contents.

2 Data collection

2.1 Twitter dataset

To collect Twitter data about the Russian trolls, we use a list of 2752 Twitter accounts identified as Russian trolls that is compiled and released by the U.S. Congress.1 To collect the tweets, we use Crimson Hexagon,2 a social media analytic platform that provides paid datastream access. This tool allows us to obtain tweets and retweets produced by trolls and subsequently deleted in the year of 2016. Table 1 offers some descriptive statistics of the Russian troll accounts. Out of the accounts appearing on the list, 1148 accounts exist in the dataset, and a little over a thousand of them produce more than half a million of original tweets.
Table 1

Descriptive statistics of Russian trolls

 

Value

# Of Russian trolls

2752

# Of trolls in our data

1148

# Of trolls wrote original tweets

1032

# Of original tweets

538,166

We also collect users that did not retweet any troll, since it helps us have a better understanding of trolls behavior online versus normal users and how it affects the overall discourse on Twitter. We collect non-trolls’ tweets using two strategies. First, we collect tweets of such users using a list of hashtags and keywords that relate to the 2016 U.S. Presidential election. This list is crafted to contain a roughly equal number of hashtags and keywords associated with each major Presidential candidate: we select 23 terms, including five terms referring to the Republican Party nominee Donald J. Trump (#donaldtrump, #trump2016, #neverhillary, #trumppence16, #trump), four terms for Democratic Party nominee Hillary Clinton (#hillaryclinton, #imwithher, #nevertrump, #hillary), and several terms related to debates. To make sure our query list was comprehensive, we add a few keywords for the two third-party candidates, including the Libertarian Party nominee Gary Johnson (one term), and Green Party nominee Jill Stein (two terms). Our second strategy is to collect tweets from the same users that do not include the same key terms mentioned above.

Table 2 reports some aggregate statistics of the data. It shows that the number of retweets and tweets with URLs are quite high, more than 3/4 of the dataset. Figure 1 shows the timeline of the tweets’ volume and the users who produced these tweets in the 2016 with a spike around the time of the election.
Fig. 1

Timeline of the volume of tweets (in blue) generated during our observation period and users the produced these tweets (in red) (colour figure online)

Table 2

Twitter data descriptive statistics

Statistic

Count

# Of tweets

13,631,266

# Of retweets

10,556,421

# Of distinct users

1,089,974

# Of tweets/retweets with a URL

10,621,071

2.2 Classification of media outlets

We classify users by their ideology based on the political leaning of the media outlets they share. The classification algorithm is described in Sect. 3.1. In this section, We describe the methodology of obtaining ground truth labels for the media outlets.

We use lists of partisan media outlets compiled by third-party organizations, such as AllSides3 and Media Bias/Fact Check.4 We combine liberal and liberal-center media outlets into one list and conservative and conservative-center into another. The combined list includes 641 liberal and 398 conservative outlets. However, in order to cross-reference these media URLs with the URLs in the Twitter dataset, we need to get the long URLs for most of the links in the dataset, since most of them are shortened. As this process is quite time-consuming, we get the top 5000 URLs by count and then retrieve the long version for those. These top 5000 URLs accounts for more than 2.1 million or around 1/5 of the URLs in the dataset.

After cross-referencing the 5000 long URLs with the media URLs, we observe that 7912 tweets in the dataset contain a URL that points to one of the liberal media outlets and 29,846 tweets with a URL pointing to one of the conservative media outlets. Figure 2 shows the distributions of tweets with URLs from liberal and conservative outlets. As we can see in the figures, American Thinker dominated the URLs being shared in the conservative sample under study, while in the liberal sample more media outlets were equally represented. Table 3 shows the list of the left and right media outlets/domain names after removing left/right center from the list.
Fig. 2

Distribution of tweets with links to liberal (right) and conservative (left) media outlets

We use a majority rule to label Twitter users as liberal or conservative depending on the number of tweets they produce with links to liberal or conservative sources. In other words, if a user has more tweets with URLs to liberal sources, he/she is labeled as liberal and vice versa. Although the overwhelming majority of users include URLs that are either liberal or conservative, we remove any users that has equal number of tweets from each side. Our final set of labeled users includes 10,074 users.

3 Methods and data analysis

3.1 Label propagation

We use label propagation5 to classify Twitter accounts as liberal or conservative, similar to prior work (Conover et al. 2011a). In a network-based label propagation algorithm, each node is assigned a label, which is updated iteratively based on the labels of node’s network neighbors. In label propagation, a node takes the most frequent label of its neighbors as its own new label. The algorithm proceeds updating labels iteratively and stops when the labels no longer change [see Raghavan et al. (2007) for more information]. The algorithm takes as parameters (i) weights, in-degree or how many times node i retweets node j; (ii) seeds (the list of labeled nodes). We fix the seeds’ labels so they do not change in the process, since this seed list also serves as our ground truth.

We construct a retweet network, containing nodes (Twitter users) with a directed link between them if one user retweets a post of another. Table 4 shows the descriptive statistics of the retweet network. It is a sparse network with a giant component that includes 1,357,493 nodes and over 1.4 million distinct users. The number of distinct users in the retweet network is larger than the number of unique users in the dataset, because the users of the retweet network include retweeted users collected from the text of the tweets, while the distinct number of users in the dataset are the users we have tweets for both original and retweets.
Table 4

Descriptive statistics of the retweet network

Statistic

Count

# Of nodes

1,407,190

# Of edges

4,874,786

Max in-degree

198,262

Max out-degree

8458

Density

2.46e−06

We use more than 10,000 users mentioned in the media outlets sections as seeds—those who mainly retweet messages from either the liberal or the conservative media outlets in Fig. 2—and label them accordingly. We then run label propagation to label the remaining nodes in the retweet network. To validate results of the label propagation algorithm, we apply stratified cross (fivefold) validation to the set of more than 10,000 seeds. We train the algorithm on 4/5 of the seed list and see how it performs on the remaining 1/5. The average precision and recall scores for the fivefold are around 0.84. Since we combine liberal and liberal-center into one list (same for conservatives), we can see that the algorithm is not only labeling the far liberal or conservative correctly, which is a relatively easier task, but it is performing well on the liberal/conservative center as well.

3.2 Bot detection

Determining whether either a human or a bot controls a social media account has proven a very challenging task (Ferrara et al. 2016a; Subrahmanian et al. 2016; Kudugunta and Ferrara 2018). We use an openly accessible solution called Botometer (a.k.a. BotOrNot) (Davis et al. 2016), consisting of both a public Web site (https://botometer.iuni.iu.edu/) and a Python API (https://github.com/IUNetSci/botometer-python), which allows for making this determination with high accuracy. Botometer is a machine learning framework that extracts and analyses a set of over one thousand features, spanning six sub classes:
User:

Meta-data features that include the number of friends and followers, the number of tweets produced by the users, profile description and settings.

Friends:

Four types of links are considered here: retweeting, mentioning, being retweeted, and being mentioned. For each group separately, botometer extracts features about language use, local time, popularity, etc.

Network:

Botometer reconstructs three types of networks: retweet, mention, and hashtag co-occurrence networks. All networks are weighted according to the frequency of interactions or co-occurrences.

Temporal:

Features related to user activity, including average rates of tweet production over various time periods and distributions of time intervals between events.

Content:

Statistics about length and entropy of tweet text and Part-of-Speech (POS) tagging techniques, which identifies different types of natural language components, or POS tags.

Sentiment:

Features such as: arousal, valence and dominance scores (Warriner et al. 2013), happiness score (Kloumann et al. 2012), polarization and strength (Wilson et al. 2005), and emotion score (Agarwal et al. 2011).

Botometer is trained with thousands of instances of social bots, from simple to sophisticated, yielding an accuracy above 95% (Davis et al. 2016). Typically, Botometer returns likelihood scores above 50% only for accounts that look suspicious to a scrupulous analysis. We adopted the Python Botometer API to systematically inspect the most active users in our dataset. The Python Botometer API queries the Twitter API to extract 300 recent tweets and publicly available account meta-data and feeds these features to an ensemble of machine learning classifiers, which produce a bot score. To label accounts as bots, we use the 50% threshold—which has proven effective in prior studies (Davis et al. 2016): an account is considered to be a bot if the overall Botometer score is above 0.5.

3.3 Geolocation

There are two ways to identity the location of tweets produced by users. One way is to collect the coordinates of the location the tweets were sent from; however, this is only possible if users enable the geolocation option on their Twitter accounts. The second way is to analyze the self-reported location text in users’ profiles. The latter includes substantially more noise, since many people write fictitious or imprecise locations—for example, they may identify the state and the country they reside in, but not the city.

In this paper, we use the location provided by Crimson Hexagon, which uses two methodologies. First, it extracts the geotagged locations, which is only available in a small portion of the Twitter data. For the tweets which are not geotagged, Crimson Hexagon estimates the users’ countries, regions, and cities based on “various pieces of contextual information”, for example, their profile information as well as users’ time zones and languages. Using the country and state information provided by Crimson Hexagon, this dataset has over 9 million tweets with country location. As shown in Table 5, more than 7.5 million of these geolocated tweets come from the US, with the UK, Nigeria, and Russia trailing behind the US with 287k, 277k, and 192k tweets, respectively. There are more than 4.7 million US tweets with State information provided. The top four states are as expected: California, Texas, New York, and Florida with 742k, 580k, 383k, and 360k tweets.
Table 5

Distribution of tweets by country (top 10 countries by tweet count)

Country

# Of tweets

USA

7,548,463

UK

287,031

Nigeria

277,507

Russia

192,305

Canada

145,371

India

91,464

Australia

79,188

Mexico

70,268

Indonesia

37,716

France

28,452

4 Results

Let us address the three research questions we sketched earlier:
  1. RQ1

    What is the role of the users’ political ideology?

     
  2. RQ2

    How central are trolls and the users who retweet them, spreaders?

     
  3. RQ3

    What is the role of social bots?

     
  4. RQ4

    Did trolls especially succeed in specific areas of the US?

     

In Sect. 4.1, we analyze how political ideology affects engagement with content posted by Russian trolls. Section 4.2 shows how the position of trolls and spreaders evolve over time in the retweet network. Section 4.3 focuses on social bots and how they contribute to the spreading of content produced by Russian trolls. Finally, in Sect. 4.4 we show how users contribute to the consumption and the propagation of trolls’ content based on their location.

4.1 RQ1: political ideology

The label propagation method succeeds in labeling most of the users as either liberal or conservative; however, the algorithm was not able to label some users outside of the giant component. Table  6 shows the number of trolls by ideology, and we can see that there are almost double the amount of conservative trolls compared to liberal ones in terms of number of trolls overall and the number of trolls who wrote original tweets. Although the number of liberal trolls is twice smaller, they produce more tweets than conservative trolls. The mean and standard deviation of the number of tweets for liberal trolls are 978 and 3168 while for conservatives they are 354 and 1202, respectively. It is hard to determine the distribution of the tweets from these numbers, but by looking at the distribution of number of original tweets for liberal/conservative trolls, we can see that the distribution of original tweets for liberal trolls is more even than for conservative ones.
Table 6

Breakdown of the Russian trolls by political ideology, with the ratio of conservative to liberal trolls

 

Liberal

Conservative

Ratio

# Of trolls

339

688

2

# Of trolls w/ original tweets

306

608

1.98

# Of original tweets

299,464

215,617

0.7

Table 7 shows descriptive statistics of spreaders. The table shows that only few spreaders wrote original tweets and that more than half of the tweets are retweets of trolls. There are fewer conservative spreaders, but they write substantially more tweets than their liberal counterparts (see Table 8). Besides talking about the candidates, liberals talk about being black, women, and school shootings. Conservatives talk about being American, Obama, terrorism, refugees, and Muslims (see Table 9 for top 20 words used for liberals and conservatives). Although the difference in language use between liberals and conservatives is important to understand, analyzing other aspects of their respective behavior online is crucial and that is what the rest of the paper focuses on.
Table 7

Descriptive statistics of spreaders, i.e., users who retweeted Russian trolls

 

Value

# Of spreaders

720,558

# Of times retweeted trolls

3,540,717

# Of spreaders with original tweets

21,338

# Of original tweets

319,565

# Of original tweets and retweets

7,357,717

Table 8

All spreaders by political ideology; bot analysis for 115,396 spreaders (out of a 200k random sample of spreaders)

 

Liberal

Conservative

Ratio

# Of spreaders

446,979

273,546

0.6

# Of tweets

1,715,696

5,641,988

3.2

# Of bots

3528

4896

1.4

# Of tweets by bots

26,233

181,604

7

Ratio: conservative/liberal

Table 9

Top 20 meaningful lemmatized words from the tweets of Russians trolls classified as Conservative and Liberal

Conservative

Count

Liberal

Count

Trump

10,362

Police

15,498

Hillary

5494

Trump

13,999

People

4479

Man

13,118

Clinton

4295

Black

9942

Obama

2593

Year

8627

One

2447

State

7895

American

2388

Woman

7748

Woman

2167

Shooting

6564

Day

2152

Killed

6407

Donald

2148

People

5913

Time

2105

Clinton

5606

Refugee

2103

School

5143

President

2079

Shot

5132

Terrorist

1994

City

4754

Country

1980

Win

4543

Muslim

1963

Cop

4515

Year

1893

Day

4478

Need

1856

Fire

4408

Think

1835

Officer

4397

Al

1823

Death

4305

4.2 RQ2: temporal analysis

Analyzing the influence of trolls across time is one of the most important questions in terms of the spread of political propaganda. The way we measure the influence of trolls is by measuring where they are located in the retweet network. We choose the retweet network in particular because retweeting is the main vehicle for spreading information on Twitter. In terms of the location of the user, there are multiple ways to measure the location of a user in the network he/she is embedded in. We choose the k-core decomposition technique, because it captures the notion of who is in the core of the network versus the periphery, while giving an ordinal measure reflecting the number of connections.

The k-core of a graph is the maximal subgraph in which every node has at least degree k. In a directed network, k is the sum of in-degree + out-degree. The k-core decomposition is a recursive approach that progressively trims the least connected nodes in a network (those with lower degree) in order to identify the most central ones (Barberá et al. 2015).

We measure the centrality of trolls across time by dividing the number of trolls by the total number of nodes in every core for every snapshot of the network. Since we want to measure the evolution of the trolls’ importance, we construct monthly retweet networks, where every network contains only the nodes and edges of that month. We constructed these 3-d plots sequentially to see how the trolls’ and spreaders’ positions evolve across the retweet network from month to month throughout the year of 2016.

Figure 3 shows that trolls were mostly in the middle or high k-cores across the time period of the dataset. We replicate the analysis for spreaders in Fig. 3 (right-side), which shows that spreaders’ proportion across most of the k-cores increases progressively across time and dominates the retweet network.
Fig. 3

Temporal k-core plots for trolls and spreaders; the z-axis shows the fraction of trolls/spreaders over the total per k-core for each month in the retweet network. For each month, the graph contains only the nodes and edges from that month

4.3 RQ3: social bots

Since there are many spreaders in this dataset, we take a random sample of 200,000 spreaders and use the approach explained in Sect. 3.2 to obtain bot scores. We use Botometer to obtain bot scores only for the sample and not for all of the spreaders, since it takes considerably a long time to get bot scores for such a number due to the Twitter API call limit. We are able to obtain bot scores for 115,396 spreaders out of the 200,000 spreaders.

The number of users that have bot scores above 0.5 and can therefore be safely considered bots according to prior work (Varol et al. 2017a), stands at 8424 accounts. Out of the 115,396 spreaders with bot scores, 68,057 are liberal, and 3528 of them have bot scores above 0.5, about 5% of the total. As for the conservatives, there are 45,807 spreaders, 4896 of which have bot scores > 0.5, representing around 11% of the total. As the results summarized in Table 8 show, although the number of conservative spreaders is lower than liberal ones, there are more bots among conservatives who write considerably more tweets.

The top 100 liberal posters produce 341,940 tweets (including retweets). Out of the top 100 liberal accounts, 15 have bot scores above 0.5 and they post 14,815 tweets, which is only 4% of the total tweets produced by the top 100 liberal accounts. For the top 100 conservative accounts, they produce 828,334 tweets. Out of the top 100 conservative accounts, 25 accounts have bot scores above 0.5 and produce 85,102 tweets, a bit more than 10% of the total tweets produced by the top 100 most productive conservative posters. It is evident that among the top most productive posters, conservative ones produce more tweets, include more bots, and the bot accounts produce more tweets than their liberal counterparts.

Figure 4 shows the probability density of bot scores of the liberal and conservative spreaders. Again, putting aside the disproportionate number of liberals to conservatives, the mean value of the bot scores of the conservative spreaders (0.29) is higher than the liberal one (0.18). We performed a two-sided t test for the null hypothesis that the two distributions have identical mean values, and the p-value is < 0.0, meaning that we can reject the null. Furthermore, its conservative spreaders have higher means on all of the Botometer subclass scores in comparison to their liberal counterparts.
Fig. 4

Overall bot score

4.4 RQ4: geospatial analysis

Figure 5 shows the proportion of the number of retweets by liberal and conservative users (classified according to the label propagation algorithm) of the content produced by Russian trolls per each state normalized per state by the total number of liberal and conservative tweets respectively. The ratio \(\rho\) is computed as \(\rho = (T_S / P_S)\), where \(T_S\) is the total number of liberal/conservative retweets of liberal/conservative trolls from a given state S, and \(P_S\) is the total number of tweets per each State.
Fig. 5

Proportion of the number of retweets by conservative users of Russian trolls per each state normalized by the total number of conservative tweets by state (right); equivalent map for liberals on the left

We notice that few states exhibit very high proportions of retweets per total number of tweets for liberals and conservatives. We test the deviations using a two-tailed t test on the z-scores of each deviation calculated on the distribution of ratios. For conservatives, the average is 0.34 and the standard deviation is 0.12, while for liberal states, the average is 0.26 and the standard deviation is 0.13. For conservatives, Wyoming leads the ranking (\(\rho =3.65\), p value = 0.001). For liberals, Montana (\(\rho =0.54\), p value = 0.023) and Kansas (\(\rho =0.51\), p value = 0.046) lead the ranking and are the only states with ratios that are statistically significant. It is also interesting to note that a little < 1/2 of the trolls have Russia as their location while a small number of spreaders and other users have Russia as their location (see Table 10).
Table 10

Users that reported Russia as their location

 

Count

# Of trolls from Russia

438

# Of spreaders from Russia

1980

# Of overall from Russia

3017

5 Conclusions

The dissemination of information and the mechanisms for democratic discussion have radically changed since the advent of digital media, especially social media. Platforms like Twitter are praised for their contribution to democratization of public discourse on civic and political issues. However, many studies highlight the perils associated with the abuse of these platforms. The spread of deceptive, false and misleading information aimed at manipulating public opinion are among those risks.

In this work, we investigated the role and effects of trolls, using the content produced by Russian trolls on Twitter as a proxy for propaganda. We collected tweets posted during the year of 2016 by Russian trolls, spreaders, and other users who tweet in this period. We showed that propaganda (produced by Russian trolls) is shared more widely by conservatives than liberals on Twitter. Although the number of liberal spreaders is close to 2:1 in comparison to conservative ones, the latter write about 3.2 times as many tweets as the liberal spreaders. Using state-of-the-art bot detection method, we estimated that about 5% and 11% of the liberal and conservative users are bots. Conservative bot spreaders produce about seven times as many tweets as liberal ones.

The spread of propaganda by malicious actors can have severe negative consequences. It can enhance malicious information and polarize political conversation, causing confusion and social instability. Scientists are currently investigating the consequences of such phenomena (Woolley and Howard 2016; Shorey and Howard 2016). We plan to explore in detail the issue of how malicious information spread via exposure and the role of peer effect. Concluding, it is important to stress that, although our analysis unveiled the current state of the political debate and agenda pushed by the Russian trolls who spread malicious information, it is impossible to account for all the malicious efforts aimed at manipulation during the last presidential election. State- and non-state actors, local and foreign governments, political parties, private organizations, and even individuals with adequate resources (Kollanyi et al. 2016), could obtain operational capabilities and technical tools to construct propaganda campaigns and deploy armies of social bots to affect the directions of online conversations. Future efforts will be required by the social and computational science communities to study this issue in depth and develop more sophisticated detection techniques capable of unmasking and fighting these malicious efforts.

Footnotes

Notes

Acknowledgements

The authors gratefully acknowledge support by the Air Force Office of Scientific Research (Award #FA9550-17-1-0327). The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFOSR or the U.S. Government.

References

  1. Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on Link discoveryGoogle Scholar
  2. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of Twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, pp 30–38Google Scholar
  3. Alarifi A, Alsaleh M, Al-Salman A (2016) Twitter turing test: identifying social machines. Inf Sci 372:332–346CrossRefGoogle Scholar
  4. Aral S, Walker D (2012) Identifying influential and susceptible members of social networks. Science 337(6092):337–341MathSciNetzbMATHCrossRefGoogle Scholar
  5. Aral S, Muchnik L, Sundararajan A (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc Natl Acad Sci 106(51):21544–21549CrossRefGoogle Scholar
  6. Badawy A, Ferrara E, Lerman K (2018) Analyzing the digital traces of political manipulation: the 2016 Russian interference Twitter campaign. In: ASONAMGoogle Scholar
  7. Bakshy E, Hofman J, Mason W, Watts D (2011) Everyone’s an influencer: quantifying influence on Twitter. In: 4th WSDMGoogle Scholar
  8. Bakshy E, Messing S, Adamic LA (2015) Exposure to ideologically diverse news and opinion on Facebook. Science 348(6239):1130–1132MathSciNetzbMATHCrossRefGoogle Scholar
  9. Barberá P, Wang N, Bonneau R, Jost JT, Nagler J, Tucker J, González-Bailón S (2015) The critical periphery in the growth of social protests. PLoS ONE 10(11):e0143611CrossRefGoogle Scholar
  10. Bekafigo MA, McBride A (2013) Who tweets about politics? Political participation of Twitter users during the 2011 gubernatorial elections. Soc Sci Comput Rev 31(5):625–643CrossRefGoogle Scholar
  11. Bessi A, Ferrara E (2016) Social bots distort the 2016 us presidential election online discussion. First Monday.  https://doi.org/10.5210/fm.v21i11.7090 CrossRefGoogle Scholar
  12. Bond RM, Fariss CJ, Jones JJ, Kramer AD, Marlow C, Settle JE, Fowler JH (2012) A 61-million-person experiment in social influence and political mobilization. Nature 489(7415):295CrossRefGoogle Scholar
  13. Bruns A, Burgess JE (2011) The use of Twitter hashtags in the formation of ad hoc publics. In: 6th ECPR general conferenceGoogle Scholar
  14. Buckels EE, Trapnell PD, Paulhus DL (2014) Trolls just want to have fun. Personal Individ Differ 67:97–102CrossRefGoogle Scholar
  15. Carlisle JE, Patton RC (2013) Is social media changing how we understand political engagement? An analysis of Facebook and the 2008 presidential election. Political Res Q 66(4):883–895CrossRefGoogle Scholar
  16. Centola D (2010) The spread of behavior in an online social network experiment. Science 329(5996):1194–1197CrossRefGoogle Scholar
  17. Centola D (2011) An experimental study of homophily in the adoption of health behavior. Science 334(6060):1269–1272CrossRefGoogle Scholar
  18. Conover M, Gonçalves B, Ratkiewicz J, Flammini A, Menczer F (2011a) Predicting the political alignment of Twitter users. In: Proceedings of 3rd IEEE conference on social computing, pp 192–199Google Scholar
  19. Conover M, Ratkiewicz J, Francisco MR, Gonçalves B, Menczer F, Flammini A (2011b) Political polarization on Twitter. ICWSM 133:89–96Google Scholar
  20. Conover MD, Davis C, Ferrara E, McKelvey K, Menczer F, Flammini A (2013a) The geospatial characteristics of a social movement communication network. PLoS ONE 8(3):e55957CrossRefGoogle Scholar
  21. Conover MD, Ferrara E, Menczer F, Flammini A (2013b) The digital evolution of occupy wall street. PLoS ONE 8(5):e64679CrossRefGoogle Scholar
  22. Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJournal Complex Syst 1695(5):1–9Google Scholar
  23. Davis CA, Varol O, Ferrara E, Flammini A, Menczer F (2016) Botornot: a system to evaluate social bots. In: Proceedings of 25th international conference on world wide web, pp 273–274Google Scholar
  24. Diakopoulos NA, Shamma DA (2010) Characterizing debate performance via aggregated Twitter sentiment. In: SIGCHI ConferenceGoogle Scholar
  25. DiGrazia J, McKelvey K, Bollen J, Rojas F (2013) More tweets, more votes: social media as a quantitative indicator of political behavior. PLoS ONE 8(11):e79449CrossRefGoogle Scholar
  26. Dutt R, Deb A, Ferrara E (2018) ‘Senator, we sell ads’: analysis of the 2016 Russian Facebook ads campaign. In: Third international conference on intelligent information technologies (ICIIT 2018)Google Scholar
  27. Effing R, Van Hillegersberg J, Huibers T(2011) Social media and political participation: are Facebook, Twitter and Youtube democratizing our political systems? In: Electronic participation, pp 25–35CrossRefGoogle Scholar
  28. El-Khalili S (2013) Social media as a government propaganda tool in post-revolutionary Egypt. First Monday.  https://doi.org/10.5210/fm.v18i3.4620 CrossRefGoogle Scholar
  29. Enli GS, Skogerbø E (2013) Personalized campaigns in party-centred politics: Twitter and Facebook as arenas for political communication. Inf Commun Soc 16(5):757–774CrossRefGoogle Scholar
  30. Ferrara E (2017) Disinformation and social bot operations in the run up to the 2017 French presidential election. First Monday.  https://doi.org/10.5210/fm.v22i8.8005 CrossRefGoogle Scholar
  31. Ferrara E (2018) Measuring social spam and the effect of bots on information diffusion in social media. In: Lehmann S, Ahn YY (eds) Complex spreading phenomena in social systems. Springer, Cham, pp 229–255CrossRefGoogle Scholar
  32. Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016a) The rise of social bots. Commun ACM 59(7):96–104CrossRefGoogle Scholar
  33. Ferrara E, Varol O, Menczer F, Flammini A (2016b) Detection of promoted social media campaigns. In: Tenth international AAAI conference on web and social media, pp 563–566Google Scholar
  34. Fourney A, Racz MZ, Ranade G, Mobius M, Horvitz E (2017) Geographic and temporal trends in fake news consumption during the 2016 US presidential election. In: CIKM, vol 17Google Scholar
  35. Freitas C, Benevenuto F, Ghosh S, Veloso A (2015) Reverse engineering socialbot infiltration strategies in Twitter. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 25–32Google Scholar
  36. Gibson RK, McAllister I (2006) Does cyber-campaigning win votes? Online communication in the 2004 australian election. J Elect Public Opin Parties 16(3):243–263CrossRefGoogle Scholar
  37. González-Bailón S, Borge-Holthoefer J, Rivero A, Moreno Y (2011) The dynamics of protest recruitment through an online network. Sci Rep 1:197CrossRefGoogle Scholar
  38. González-Bailón S, Borge-Holthoefer J, Moreno Y (2013) Broadcasters and hidden influentials in online protest diffusion. Am Behav Sci 57(7):943–965CrossRefGoogle Scholar
  39. Howard P (2006) New media campaigns and the managed citizen. Cambridge University Press, CambridgeGoogle Scholar
  40. Hwang T, Pearce I, Nanis M (2012) Socialbots: voices from the fronts. Interactions 19(2):38–45CrossRefGoogle Scholar
  41. Kloumann IM, Danforth CM, Harris KD, Bliss CA, Dodds PS (2012) Positivity of the english language. PLoS ONE 7(1):e29484CrossRefGoogle Scholar
  42. Kollanyi B, Howard PN, Woolley SC (2016) Bots and automation over Twitter during the first us presidential debate. Political BotsGoogle Scholar
  43. Kudugunta S, Ferrara E (2018) Deep neural networks for bot detection. Inf Sci 467(October):312–322CrossRefGoogle Scholar
  44. Loader BD, Mercea D (2011) Networking democracy? Social media innovations and participatory politics. Inf Commun Soc 14(6):757–769CrossRefGoogle Scholar
  45. Messias J, Schmidt L, Oliveira R, Benevenuto F (2013) You followed my bot! Transforming robots into influential users in Twitter. First Monday.  https://doi.org/10.5210/fm.v18i7.4217 CrossRefGoogle Scholar
  46. Metaxas PT, Mustafaraj E (2012) Social media and the elections. Science 338(6106):472–473CrossRefGoogle Scholar
  47. Monsted B, Sapiezynski P, Ferrara E, Lehmann S (2017) Evidence of complex contagion of information in social media: an experiment using Twitter bots. PLoS ONE 12(9):1–12CrossRefGoogle Scholar
  48. Pennycook G, Rand DG (2017) Assessing the effect of “disputed” warnings and source salience on perceptions of fake news accuracy. Social Science Research Network. https://papers.ssrn.com/sol3/papers.cfm
  49. Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106CrossRefGoogle Scholar
  50. Ratkiewicz J, Conover M, Meiss M, Gonçalves B, Patil S, Flammini A, Menczer F (2011a) Truthy: mapping the spread of astroturf in microblog streams. In: 20th WWW conference, pp 249–252Google Scholar
  51. Ratkiewicz J, Conover M, Meiss MR, Gonçalves B, Flammini A, Menczer F (2011b) Detecting and tracking political abuse in social media. In: ICWSM, vol 11, pp 297–304Google Scholar
  52. Savage S, Monroy-Hernandez A, öllerer TH (2016) Botivist: calling volunteers to action using online bots. In: 19th CSCWGoogle Scholar
  53. Shirky C (2011) The political power of social media: technology, the public sphere, and political change. Foreign Aff 90:28–41Google Scholar
  54. Shorey S, Howard PN (2016) Automation, algorithms, and politics: a research review. Int J Commun 10:5032–5055Google Scholar
  55. Stella M, Ferrara E, De Domenico M (2018) Bots increase exposure to negative and inflammatory content in online social systems. Proc Natl Acad Sci 115(49):12435–12440CrossRefGoogle Scholar
  56. Subrahmanian V, Azaria A, Durst S, Kagan V, Galstyan A, Lerman K, Zhu L, Ferrara E, Flammini A, Menczer F (2016) The DARPA Twitter bot challenge. Computer 49(6):38–46CrossRefGoogle Scholar
  57. Tufekci Z (2014) Big questions for social media big data: representativeness, validity and other methodological pitfalls. In: ICWSMGoogle Scholar
  58. Tufekci Z, Wilson C (2012) Social media and the decision to participate in political protest: observations from Tahrir Square. J Commun 62(2):363–379CrossRefGoogle Scholar
  59. Varol O, Ferrara E, Ogan CL, Menczer F, Flammini A (2014) Evolution of online user behavior during a social upheaval. In: Proceedings of the 2014 ACM conference on web scienceGoogle Scholar
  60. Varol O, Ferrara E, Davis C, Menczer F, Flammini A (2017a) Online human–bot interactions: detection, estimation, and characterization. In: ICWSM, pp 280–289Google Scholar
  61. Varol O, Ferrara E, Menczer F, Flammini A (2017b) Early detection of promoted campaigns on social media. EPJ Data Sci 6(13):13CrossRefGoogle Scholar
  62. Warriner AB, Kuperman V, Brysbaert M (2013) Norms of valence, arousal, and dominance for 13,915 english lemmas. Behav Res Methods 45(4):1191–1207CrossRefGoogle Scholar
  63. Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 347–354Google Scholar
  64. Woolley SC, Howard PN (2016) Automation, algorithms, and politics: introduction. Int J Commun 10:9Google Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2019

Authors and Affiliations

  • Adam Badawy
    • 1
    Email author
  • Aseel Addawood
    • 2
  • Kristina Lerman
    • 1
  • Emilio Ferrara
    • 1
  1. 1.USC Information Sciences InstituteLos AngelesUSA
  2. 2.University of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations