1 Introduction

Online social networks in which users can be both consumers and producers of content, such as Twitter or Facebook, provide means to exchange information in an almost instantaneous, inexpensive, and not mediated way, forming a substrate for the spread of information with unprecedented capabilities. These new channels of communication have enormously altered the way in which we take decisions, form political opinions, align in front of different issues, or choose between the adoption of different technological options [1]. Such online communication networks are orders of magnitude larger than those classically available in social sciences [2], making it possible to perform measurements and experiments that have led to the definition of a new computational social science [3].

One of the characteristic features of online communication networks is their marked degree of homophily. That is, individuals prefer to interact with others similar to them, or who share the same views and orientations [4,5,6]. Homophily leads to a natural polarization of societies into groups with different perspectives, that leave fingerprints in the online realm. These digital traces provide researchers with large-scale data sets for the study of polarization in different contexts, such as the US and French presidential elections [7], secular vs. Islamist discussions during the 2011 Egyptian revolution [8, 9], or the 15M movement of 2011 in Spain [10]. Political orientation, in particular, has been shown to drive the segregation of online communication networks into separated communities [11, 12]. The presence of these clusters formed by users with a homogeneous content production and diffusion has been named echo chambers [13], referring to the situation in which one’s beliefs are reinforced due to repeated interactions with individuals sharing the same points of view [14]. Echo chambers have been shown to pervade the offline realm [15], to be related to the spreading of misinformation [16, 17], or the development of ideological radicalism [18]. Recent studies, however, have challenged the impact of echo chambers and partisan segregation in communication networks over online social media [19, 20].

This novel debate calls for a quantitative analysis aimed at identifying the impact of polarization over the diffusion of information. In this paper, we contribute to this endeavor by quantifying the political leaning of users in the description of the associated echo chambers, and the effects on information spreading processes running on top of them. To this aim, we reconstruct a political communication (PC) network, in which individuals exchange messages related to the impeachment process of the former Brazilian President Dilma Rousseff, over the social microblogging platform Twitter. We collected over 12 million tweets from half million users, in a time window of 9 months, covering the main events related to the impeachment process and related street protests. The political orientation of users was inferred by means of a hand-tagged analysis of the hashtags adopted in the messages, which are assigned with anti-impeachment, pro-impeachment, or neutral leanings.

The topological analysis of the resulting static PC network reveals clusters of individuals sharing similar opinions, defining the presence of echo chambers. We gauge the impact of these echo chambers over information spreading by means of simple spreading models, characterizing the efficiency of single users to disseminate information, or spreading capacity. Differently from previous studies, we characterize information diffusion by taking into account the full temporal evolution of the social interactions, represented in terms of a temporal network [21, 22]. This ensures that the spreading process respects the communication dynamics. Our analysis shows that the spreading capacity of users is strongly correlated with their political orientation: information sent by pro-impeachment individuals spreads throughout the network much better than messages sent by other users. Furthermore, by analyzing the composition of the audience reached, we discover that users with larger spreading capacity are able to reach individuals with more diverse leanings, actually escaping their echo chamber.

2 Polarization in political communication networks

In Twitter, users post real-time short messages (tweets), sometimes annotated with hashtags indicating the topic of the message, that are broadcast to the community of their followers. A user can also transmit (retweet) messages from other users, forwarding it to its own followers, as a way to endorse its content. Analysis of retweets (RTs) have been used to study viral propagation of information in several contexts [23,24,25]. However, RTs do not involve an explicit effort of content production and do not convey a specific communication target. For this reason, here we discard RTs from our analysis and focus on tweets that include an explicit mention to another user, with the purpose of establishing or continuing a discussion on some topic, carrying even personal messages [26]. This choice allows us to single out only actual social interactions between users, to reconstruct a communication network in which people actually exchange information, discuss, and form their opinion reacting in real time to ongoing political events.

As an example of strongly polarized political discussion, we focus on the debate ensuing the impeachment process of the former Brazilian President Dilma Rousseff, taking place during 2016. Tweets related to the impeachment process were gathered by setting a specific filter for public tweets containing selected keywords. The keyword list was kept up to date as new trending topics continuously appeared on Twitter, see Supplementary Information (SI) (Additional file 1). Furthermore, the full dynamics of social interactions was taken into account by including the real timing of tweets in a temporal network representation [22]. This ensures that information diffusion over the resulting temporal PC network follows time-respecting paths, which are expected to have an effect in slowing down or speeding up the spreading dynamics [27]. From this temporal network representation, a static aggregated, directed, weighted network [28] was constructed, in which a directed link from node i to node j indicates a message sent from user i to user j. The associated weight \(W_{ij}\) represents the number of tweets from i to j. Note that, while we keep the temporal evolution of the social interactions when addressing information spreading dynamics, we measure polarization over the aggregated, static network representation.

Twitter is known to be populated by social bots, that contribute to the spreading of misinformation and poison political debate [29]. Recent studies revealed that while bots tend to interact with humans, e.g. by targeting influential users, the opposite behavior, interactions from humans toward bots, are way less frequent [30, 31]. Thus, once reconstructed the aggregated network, we extracted its largest strongly connected component (SCC) [28] to possibly discard social bots and ensure that only real social interactions between users are considered. Our analysis is restricted only to the set of individuals composing this SCC. This choice comes at the cost of greatly reducing the network size (almost by 90%), but it ensures that each user can be both source and destination of information content. In this way, information transfer is in principle possible between any pair of users, and it is possible to single out the impact of the network’s dynamics. In Table 1 we present a summary of the main topological properties of the PC network and its SCC. See Methods and SI for a detailed explanation of the data set collection.

Table 1 Main properties of aggregated PC network and its largest strongly connected component (SCC): number of users N, with overall positive (negative) values of political position \(N_{+}\) (\(N_{-}\)), total number of interactions W, and average out-degree \(\langle k _{\textrm{out}} \rangle\). See Table S8 in SI for the PC network obtained from a different hashtag classification

Tweets can carry different political leanings, that can be characterized by the hashtags used. We assign to each tweet t a leaning, \(l_{t} = \{-1, 0, +1\}\), corresponding to a pro-impeachment, neutral, or anti-impeachment leaning, respectively, the second one meaning that a hashtag can be used in the other two contexts. For a given user i, that has sent a number \(a_{i}\) of tweets (defined as his/her activity), we can associate a time-ordered set of leanings \(\mathcal{L}_{i} = \{ l_{1}, l_{2}, \ldots, l_{a _{i}-1}, l_{a_{i}} \}\), and define his/her average leaning, or political position \(P_{i}\), as

$$ P_{i} \equiv\frac{\sum_{t=1}^{a_{i}} l_{t}}{a_{i}}, $$
(1)

which is bounded in the interval \([-1, +1]\). This definition permits to characterize a user’s political position as a continuous variable, allowing to discern different degrees of orientation, in opposition to most common binary measures. Since such definition crucially depends on the hashtag classification, we checked the robustness of our results by reconstructing also a PC network based on a different classification of neutral hashtags. See Methods and SI for details.

In Fig. 1(a) we plot the distribution of the political position of users, showing that they are clearly split into two groups with opposite orientations, while a few users show neutral position (\(P \sim0\)). Interestingly, this distribution is strongly asymmetric with respect to \(P=0\): For \(P >0\) the great majority of users have extreme position \(P \simeq+1\), while for \(P<0\) more users have milder values of P. The number of users with overall positive (\(N_{+}\)) and negative (\(N_{-}\)) values of political leaning are, however, similar, see Table 1. The average leaning of a user is inherently correlated with his/her activity. In a scenario in which users send tweets of opposite leanings with the same probability, the political leaning variable would follow a binomial distribution, and the expected average leaning would decrease with activity. Figure 1(b) shows that the correlation between average leaning and activity is far from being driven by a random process: more active users are also more extreme. Interestingly, pro-impeachment users with the largest average activity have \(P \sim-0.75\), while the activity of anti-impeachment users is almost constant for \(0 < P < 0.75\), and reaches a maximum for \(P\sim1\).

Figure 1
figure 1

(a) Number of users as a function of the political position P. (b) Average activity as a function of P. Only users with activity \(a \ge10\) in the SCC are considered for (a) and (b). (c) Visualization of the time-aggregated representation of the PC network, formed by \(N = 31{,}412\) users in the SCC. The size of nodes increases (non-linearly) with their degree. Colors represent political position, as defined by (1), blue for pro-, red for anti-impeachment, and white for neutral average leaning. (d) Community size and average political position of different communities identified by the Louvain algorithm

Figure 1(c) shows a visualization of the time-aggregated PC network, in which users are color-coded according to their average leaning. Two communities with opposite leanings are clearly visible in the network, while users with neutral position are found more frequently bridging the two groups. One can quantify this observation by identifying the community structure [32] as obtained by means of the Louvain algorithm [33]. In Fig. 1(d) we plot the average leaning and size of the different communities found. We show that the PC network is characterized by two large communities, both with approximately 104 users and opposite leaning with similar absolute values, \(P_{+}\approx0.82\) and \(P_{-}\approx-0.70\). However, pro-impeachment users also form other communities of relevant sizes with more moderate average leaning, while users with strong anti-impeachment leanings essentially belong to a single community. See SI for more details.

3 Topological evidence of echo chambers

One can quantify the presence of echo chambers by relating the political position of a user with the leaning of the tweets he/she receives, as well as with the leaning of his/her neighbors. In politics, echo chambers are characterized by users sharing similar opinions and exchanging messages with similar political views [13]. This translates, at the topological level, into a node i with a given political position \(P_{i}\) connected with nodes with a position close to \(P_{i}\), and receiving with higher probability messages with similar average leaning \(P_{i}\). In order to quantify these insights, we define, for each user i, the average position of incoming tweets, \(P_{i}^{\textrm{IN}}\), by applying (1) to the set of tweets from any user \(j \neq i\) mentioning user i. Analogously, the average position of the nearest neighbors, or successors, of user i, \(P_{i}^{\textrm{NN}}\), can be defined as \(P_{i}^{\textrm{NN}} \equiv\sum_{j} A_{ij} P_{j} / k_{\mathrm{out}, i}\), where \(A_{ij}\) is the adjacency matrix of the integrated PC network, \(A_{ij}=1\) if there is a link from node i to node j, \(A_{ij}=0\) otherwise, and \(k_{\mathrm{out}, i} = \sum_{j}A_{ij}\) is the out-degree of node i.

Figure 2 shows the correlation between the political position of a user i and (a) the position of his/her nearest neighbors, \(P_{i}^{\textrm{NN}}\), and (b) the average leaning of received tweets, \(P_{i}^{\textrm{IN}}\). Both plots are color-coded contour maps, representing the number of users in the phase space \((P,P^{\textrm{NN}})\) or \((P,P^{\textrm{IN}})\): the lighter the area in the map, the larger the density of users in that area. Figure 2 shows a strong correlation between the position of a user and the average position of both his/her nearest neighbors and the received tweets. Similar results are found for Fig. 2(a) when considering predecessors as nearest neighbors, see Fig. S11 in the SI. The Pearson correlation coefficient is \(r=0.89\) for \((P,P^{\textrm{NN}})\) and \(r=0.80\) for \((P,P^{\textrm{IN}})\), both statistically significant with a p-value \(p < 10^{-6}\). These topological properties of the PC network confirm the presence of echo chambers: users expressing both pro- and anti-impeachment leaning are more likely to send/receive messages to/from users that share their political opinion.

Figure 2
figure 2

Contour maps for the (a) average political position of the nearest-neighbor \(P^{\text{NN}}\) and (b) average leaning of tweets received, \(P^{\text{IN}}\) against the average political position of a user P. Colors represent the density of users: the lighter the larger the number of users. Probability distribution of P, \(P^{\text{NN}}\), and \(P^{\text{IN}}\) are plotted in the axes. Only users with activity \(a\ge10\) (corresponding to 14813 users) are considered

Figure 2, however, also reveals that the densities in both plots are not symmetric between anti- and pro-impeachment positions: for \(P > 0\), most users are concentrated in a small region of the \((P,P^{\textrm{NN}})\) and \((P,P^{\textrm{IN}})\) spaces, while for \(P<0\), they spread on a larger area. This means that users with extreme position \(P \simeq1\) are more likely to interact only with users that share the same extreme leaning, while users with \(P<0\) exchange information (send and receive tweets) also with peers that do not share their political opinion. These observations are also in consonance with the characterization of the community structure, as shown in Fig. 1(d), in which users with strong anti-impeachment leanings form a single, large community, and users with pro-impeachment leanings form several more heterogeneous communities.

The differences in the topological structure of the two communities can be related to the political context under study: while users characterized by anti-impeachment leanings refer to a more homogeneous political area (Partido dos Trabalhadores and small left-wing parties), pro-impeachment users share different political views, including center and right-wing positions, and show different levels of sympathy in favor to the impeachment. Another possible and important source of asymmetry is the constant release of content from other sources, in particular from the traditional media, broadcasting mostly contents that stimulate pro-impeachment leanings, possibly reinforcing their dissemination to a more diversified audience.

4 Effects of political position on information spreading

The presence of echo chambers implies that users mainly exchange messages with other users sharing similar leanings. This fact can have an impact on the way in which information is transmitted through the whole PC networks. A possible empirical way to gauge the effects of echo chambers on information spreading is to consider the number of RTs that a given user can achieve [23,24,25]. One can expect that more influential users, producing content that attracts more interest, will be rewarded by a larger number of RTs. Figure S12 (see SI) shows the number of RTs of the users, as a function of both his/her activity and position. One can see that the number of times that a user is retweeted is strongly correlated with the activity of that user. With hindsight, this observation is to be expected, since a user that produces many tweets gets a larger chance of being retweeted, within a homogeneous assumption of equal probability of retweeting. However, if we consider the number of RTs normalized by the total tweets sent, we observe a lack of evident correlations with users’ political position, as shown in Fig. S12(b) in SI.

In order to better understand the role of the network’s polarization in information propagation, we followed a different approach, by considering simple models of spreading dynamics. We have focused in the susceptible-infected-susceptible (SIS) and susceptible-infected-recovered (SIR) models [34], classical epidemic processes which have also been used to study the diffusion of information [35, 36]. In the SIS model, each agent can be in either of two states, susceptible or infectious. In the SIR model, an agent can also be in a recovered state in which it cannot be infected nor transmit the disease. Susceptible agents may become infectious upon contact with infected neighbors, with certain transmission rate λ in both processes. Infectious agents can spontaneously heal with rate \(\tau^{-1}\), becoming susceptible again or recovered in SIS and SIR, respectively. Within an information diffusion framework, a susceptible node represents a user who is unaware of the circulating information (e.g. rumors, news, an ongoing street protest), while an infectious user is aware of it and can spread it further to his contacts. A recovered agent is aware but not willing to transmit the information.

We ran the SIS and SIR dynamics on the temporal PC network, using the real timing of connections between users as given by the time stamps of interactions, so to ensure that the information diffusion follows time-respecting paths. In temporal networks characterized by an instantaneous duration of contacts, the infection process can be implemented by considering λ as a transmission probability, i.e. whenever a susceptible node i gets in contact with an infectious node j, node i will become infected with probability λ. The healing occurs spontaneously after a fixed time τ with respect to the moment of infection. We start the dynamics with only one node i infected, and stop it on the last interaction of the temporal sequence. The set of nodes that were infected at least once along the dynamics, started with i as source of infection, forms the set of influence of node i, \(\mathcal{I}_{i}\) [37]. The set of influence of a user thus represents the set of individuals that can be reached by a message sent by him/her, depending on the transmission probability λ and healing time τ.

For different values of λ and τ, we measure the spreading capacity \(S_{i}\) of each user i, defined as the relative size of his/her set of influence, namely

$$ S_{i}(\lambda, \tau) \equiv \frac{ \vert \mathcal{I}_{i}(\lambda, \tau) \vert }{N}, $$
(2)

by running a SIS or SIR dynamics with node i as seed of the infection, averaged over several runs. In Fig. 3 we plot the average spreading capacity \(\langle S \rangle\) of users as a function of their political position P and activity a for the SIS model. As expected, the more active are the users, the larger their spreading capacity (darker colors of the plots). However, one can see that \(\langle S \rangle\) is not constant with respect to the users’ political orientation: the spreading capacity is clearly smaller for users with anti-impeachment leaning, while it is larger for users with \(P<0\), reaching a maximum for \(P \sim-0.5\). Different values of λ and τ for the SIS and SIR model (available in the SI) show similar behavior.

Figure 3
figure 3

Heat map of the average spreading capacity \(\langle S \rangle\) of users, as a function of their political position P and activity a. The transmission probability of the SIS dynamics is \(\lambda= 0.5\) and \(\tau= 7\) days. Averages were performed over 100 runs

In order to disentangle the effect of the political position of users on spreading capacity from their activity, in Fig. 4 we plot the average spreading capacity of users as a function of their position, \(\langle S(P) \rangle\), for \(\lambda=0.2\) and \(\tau=7\) days. Other values are shown in the SI. Only users with activity bounded in the interval \(a \in[10,100]\) are considered, so as to ensure that the average activity of these users is relatively homogeneous with respect to their political position (as shown in the SI). Figure 4 shows that the average spreading capacity reaches a maximum for users with intermediate pro-impeachment position, \(P \simeq-0.5\), maximum that is up to four times larger than the value for anti-impeachment users. This striking difference is robust with respect to the values of the transmission probability λ and healing time τ. As shown in the SI, the shapes of the \(\langle S(P) \rangle\) curves are remarkably similar, even though significantly different values are reached. Analogous behavior is observed for the SIR model (see SI).

Figure 4
figure 4

Average spreading capacity \(\langle S (P) \rangle \) (black curve, left axes) of users with political position P. Average diversity \(\langle\sigma(P) \rangle\) (red curve, right axes) and average political position \(\langle\mu(P) \rangle\) (bars, top panel) of the set of influence reached by users with position P. Transmission probability \(\lambda= 0.20\) and \(\tau= 7\) days. Only the 11,386 users with activity \(a \in[10,100]\) are considered. Different ranges of a and values of λ are shown in the SI. Results are averaged over 100 runs, error bars represent standard errors.

5 Diversity increases spreading capacity

The origin of the large spreading capacity of users with pro-impeachment position cannot be traced back to their numeric prevalence in the network, since users are split into two groups of similar size; see Table S8 in SI. Moreover, the great majority of users are characterized by extreme position, \(|P| \simeq1\), yet they show a much smaller spreading capacity than users with intermediate pro-impeachment position, \(P \simeq-0.5\). One way to understand this difference relies in looking at the characteristics of the users reached by the spreading dynamics. One can analyze the political position of the set of influence \(\mathcal{I}_{i}\), by defining, for each user i, the average \(\mu_{i}\) and the variance \(\sigma_{i}\) of the political positions expressed by \(\mathcal{I}_{i}\), as

$$ \mu_{i}\equiv\sum_{j \in\mathcal{I}_{i}} \frac{ P_{j}}{ \vert \mathcal{I} _{i} \vert }, \qquad \sigma_{i} \equiv\sum _{j \in\mathcal{I}_{i}} \frac{(P_{j} - \mu_{i})^{2}}{ \vert \mathcal{I}_{i} \vert }. $$
(3)

The average \(\mu_{i}\) represents the average leaning of the users reached by i, while the variance \(\sigma_{i}\) represents how heterogeneously oriented \(\mathcal{I}_{i}\) is. A small variance \(\sigma_{i}\) indicates that the political position of \(\mathcal{I} _{i}\) is quite uniform and close its average value, while a large value of \(\sigma_{i}\) shows that \(\mathcal{I}_{i}\) has heterogeneous political positions. Therefore, the variance \(\sigma _{i}\) quantifies the diversity of the users reached by i.

In Fig. 4 (top panel) we plot the average political position \(\langle\mu(P) \rangle\) of the set of influence reached by users with position P, showing that users with pro-impeachment (neutral, anti-impeachment) leanings are more likely to reach, on average, users sharing the same pro-impeachment (neutral, anti-impeachment) leanings. This result (robust across different values of λ and τ, as shown in the SI) indicates that, given the strongly polarized structure of the network, information diffusion is biased toward individuals that share the same political opinion, quantifying the effect of echo chambers. The average \(\langle\mu(P) \rangle\), indeed, gauges the strength of the echo chambers: the more \(\langle\mu(P) \rangle\) is close to P, the stronger the echo chamber effect. Furthermore, one can note differences between pro- and anti-impeachment users: μ is almost constant for negative values of P, so echo chamber effects are weak, while μ is growing almost linearly for positive P, indicating stronger echo chambers effects.

Even more interesting, Fig. 4 shows that the diversity \(\sigma_{i}\) of the users reached by i strongly depends on his/her political position \(P_{i}\). The curve of the average diversity as a function of the political position, \(\langle\sigma(P) \rangle\), follows a behavior remarkably similar to the average spreading capacity of users with position P, \(\langle S(P) \rangle\). The strict correlation observed between \(\langle\sigma(P) \rangle\) and \(\langle S(P) \rangle\) indicates that if a user is able to reach a diverse audience, formed by users that do not share his/her average leaning, then the size of his/her set of influence is much larger. That is, individuals with large spreading capacity are able to break their echo chambers. Note that this result is not trivial since the size of the echo chambers are much bigger than the number of users reached. Moreover, the value of \(\langle\sigma(P) \rangle\) is statistically significant and does not depend on the number of users considered in the average. For instance, there are much more users with extreme orientations (\(|P| \simeq1\)) than users with intermediate position (\(P \simeq-0.5\)), yet it holds \(\langle\sigma(P \simeq-0.5 ) \rangle\gg\langle\sigma(|P| \simeq1) \rangle\). Furthermore, given the larger number of users considered, error bars for \(\langle\sigma(|P| \simeq1) \rangle\) are smaller than the ones for \(\langle\sigma(P \simeq-0.5 ) \rangle\).

6 Discussion

The effects of echo chambers on the openness of online political debate have been argued by the scientific community. Recently, it has been shown that echo chambers are expected to enhance the spreading of information in synthetic networks [38]. Their impact in real communication networks, however, remains poorly understood. The main contribution of this paper is twofold: (i) we quantify the presence of echo chambers in the Twitter discussion about the impeachment of the former Brazilian President Dilma Rousseff, showing that communities of users expressing opposite political positions emerge in the topological structure of the communication network, and (ii) we gauge the effects of such echo chambers on information spreading, showing that the capability of users to spread the content they produce depends on their political leaning. The use of spreading models allows us to characterize the internal structure of echo-chambers, showing that users belonging to the same echo chamber, with different convictions (i.e., the intensity of their leaning to the impeachment issue), can have quite different spreading capabilities.

Our method to quantify echo chambers is built upon two main ingredients: (i) we reconstruct a communication network based in mentions, in which people can actually discuss and exchange information related to ongoing political events, and (ii) we define a continuous political leaning measure, by classifying the hashtags used in tweets as expressing a leaning in favor or against the impeachment, which is independent by the network’s reconstruction. We then observe that anti- and pro-impeachment users clearly separate into different communities in the PC network. It is important to remark that, while it is well known that networks formed by RTs can be have a strong partisan structure, since RTs generally imply endorsement, this observation is new for mention networks, in which users characterized by opposite leanings can easily interact[11].

These two clusters of users sharing similar opinions, or echo chambers, can be characterized by looking at the correlations between the in-flow and out-flow of leanings, as well as between the average leanings of an individual and his/her nearest neighbors. The topologies of the two echo chambers are not exactly equivalent. Users expressing anti-impeachment leanings tend to lean towards the extreme, achieving a position \(P \simeq+1\), while users with pro-impeachment leanings show milder tendencies, reflected into the presence of medium-sized communities with overall negative political position.

We have gauged the effects of echo chambers on information diffusion by running simple models of information spreading, observing that, on average, users are more likely to receive information from peers sharing the same average leanings. We then see that people with predominantly pro-impeachment leanings are able to broadcast their message to a potentially larger audience than other users. Furthermore, such audiences are characterized by a greater diversity of opinions, indicating that pro-impeachment leanings can spread to both pro- and anti-impeachment users, a signature that echo chambers can be broken. At the same time, our method is able to identify those users trapped into the echo-chamber, who can be reached only by information shared by peers with similar political leaning, a feature that could be exploited to mitigate echo-chamber effects in online social media.

An interesting question arising here is what makes pro-impeachment users better spreaders than users with an opposite view. Recent works [39, 40] have related spreading efficiency of users with their topological position in the integrated network, in particular with the degree and centrality of individuals as measured by the k-core index [41]. In Fig. S17 of the SI we show that the average position P of users is quite uncorrelated with both their average degree k and k-core index, indicating that users characterized by pro-impeachment leaning cannot be singled out by such simple simple topological features. However, topological differences between the two echo-chambers can be seen in Fig. 1(d). While anti-impeachment users are clustered in a single, large community with strong average anti-impeachment leaning, pro-impeachment users distribute in several, smaller communities characterized by less extreme leaning.

It is important to highlight that our method for quantifying the echo-chamber effects by using epidemic processes comes at the cost of limitations. A first issue is that only very large communication networks can be analyzed, due to the extraction of the strongly connected component that greatly reduces the number of nodes. However, this procedure is essential to properly address the communication dynamics between users, and possibly avoid the presence of social bots. Furthermore, our definition of political position entirely relies on the hand tagged hashtags classification. It is well known that hashtags can be hijacked [42], i.e. they can be used by some users with a different (or opposite) purpose than the one originally intended, thus invalidating the leaning inferred through it. However, our analysis is based on a large number of hashtags, and it is robust with respect to a significant change of the leaning classification; see results for the additional classification in the SI.

Future research in this topic should address three main points. Firstly, more sophisticated methods for detecting political position of users, such as automatic sentiment analysis of tweet contents, could be considered. These methods are, however, not exempt from limitations [43, 44]. Secondly, more realistic models of information diffusion, such as complex contagions, independent cascade and linear threshold models [45,46,47,48], could be used to estimate individual’s spreading capacity. We have checked numerically that a modification of the classic Watts threshold model for complex contagion [45] to the framework of temporal networks [49] leads to the same behavior observed in the SIR and SIS models, see Fig. S18 of the SI. Therefore, while we do not expect our results to qualitatively depend on the details of propagation dynamics considered, interesting features may be added, such as a transmission probability that depends on the similarity between opinions. It would also be interesting to measure the evolution of users’ political position in time, as they are expected to not be constant over the whole temporal sequence. Finally, given that our conclusions are based in a single case study, it would be interesting to replicate our method in different data sets, to identify in a quantitative way the presence of echo chambers across differently polarized political contexts, over different social media.

7 Methods

Here we describe the empirical data used in the paper, available upon motivated request to the authors, and how we reconstruct the network from it, as well as the results of the hashtags classification. For further details, see SI.

7.1 Reconstruction of the PC networks

Our data set is composed of tweets collected daily from the public streaming of the Twitter API by specifying a list of 323 keywords (See Table S2 of SI) related to the impeachment process of the former president of Brazil, Dilma Rousseff. Data have been gathered between March 5th to December 31st of 2016. Only tweets including mentions to other users and at least one of the classified hashtags (see next Section) have been selected, while retweets have been discarded. Tweets containing hashtags of opposite leanings (\(l_{t}=+1\) and \(l_{t}=-1\)) are less than 1%, and have been discarded. The timing of the interactions has been preserved, so that in the temporal PC network a directed link from node i to node j at time t is drawn if user i sends a tweet by mentioning j at time t. Finally, the strong component of the time-aggregated version of the PC network has been extracted.

7.2 Hashtag classification

A list of the 495 most tweeted hashtags from the collected data has been classified by performing a manual annotation of the leanings (anti-, pro-impeachment, neutral, or not related to the issue) by four independent volunteers. Through an interactive webpage, the volunteers had the opportunity to browse Twitter for checking tweets containing the selected hashtag within the time window of interest. The final classification of each hashtag has been determined by the majority (3 of 4) of the opinions of the volunteers. A number of 321 (64.8%) hashtags had a full agreement, while in 443 (89.5%) of them at least 3 of 4 persons agreed. A majority agreement has not been reached for 52 (10.5%) hashtags, which have been excluded from the data set. Discrepancies between any pair of volunteers were less than 10%. A final list of 404 hashtags (see Table S3 to S6 in the SI for final classification) has been used to reconstruct the PC network. The number of interactions in the SCC with at least one of the classified hashtags were \(1{,}126{,}150\) for pro-impeachment, \(144{,}405\) for neutral, and \(756{,}498\) for anti-impeachment leanings, showing a slight tendency for pro-impeachment hashtags. Fig. S4 in the SI shows the usage count of the 100 most popular hashtags.