The drivers of global news spreading patterns

Alipour, Shayan; Di Marco, Niccolò; Avalle, Michele; Etta, Gabriele; Cinelli, Matteo; Quattrociocchi, Walter

doi:10.1038/s41598-024-52076-6

The drivers of global news spreading patterns

Article
Open access
Published: 17 January 2024

Volume 14, article number 1519, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

The drivers of global news spreading patterns

Download PDF

Shayan Alipour¹,
Niccolò Di Marco²,
Michele Avalle¹,
Gabriele Etta¹,
Matteo Cinelli¹ &
…
Walter Quattrociocchi¹

2079 Accesses
8 Altmetric
Explore all metrics

Abstract

The web radically changed the dissemination of information and the global spread of news. In this study, we aim to reconstruct the connectivity patterns within nations shaping news propagation globally in 2022. We do this by analyzing a dataset of unprecedented size, containing 140 million news articles from 183 countries and related to 37,802 domains in the GDELT database. Unlike previous research, we focus on the sequential mention of events across various countries, thus incorporating a temporal dimension into the analysis of news dissemination networks. Our results show a significant imbalance in online news spreading. We identify news superspreaders forming a tightly interconnected rich club, exerting significant influence on the global news agenda. To further investigate the mechanisms underlying news dissemination and the shaping of global public opinion, we model countries’ interactions using a gravity model, incorporating economic, geographical, and cultural factors. Consistent with previous studies, we find that countries’ GDP is one of the main drivers to shape the worldwide news agenda.

Strength of Nations: A Case Study on Estimating the Influence of Leading Countries Using Social Media Analysis

Network News

Modeling Inter-country Connection from Geotagged News Reports: A Time-Series Analysis

Introduction

The rapid evolution of the web has profoundly transformed the global dissemination of news. Nowadays, news flows incessantly, with media outlets promptly reporting unfolding events, and anyone can access this real-time information ecosystem. Furthermore, the Internet has intensified competition among news providers^1,2 as they strive to capture users’ attention, contending with both traditional media and online news sources³. Advancements in data science have unlocked new possibilities for analyzing vast amounts of news content^4,5 with extensive research efforts devoted to studying the impact of social media platforms on news dissemination^6,7,8. A particular emphasis has been posed on their influence on misinformation^9,10 and polarization^11,12,13. In this dynamic environment, news spreads rapidly across the web, reaching numerous outlets in different countries and potentially shaping the news agendas of other nations^14,15. Nevertheless, the reach and influence of the global news network vary significantly from one country to another. Factors such as culture, government censorship, language barriers, and digital division can impact news exchange between nations¹⁶. Therefore, gaining a comprehensive understanding of how news circulates on the Internet yields valuable insights into the economic and cultural influences among countries. It allows for characterizing news spreading routes, providing insights into the interconnected relationships and interdependence between different nations. The study of news spreading and the factors that influence the patterns of information and citation flow between countries have been topics of interest for several decades^17,18,19,20. However, the dynamics of news dissemination across countries have evolved over time, especially with the rise of online media sources. In this landscape, some works emphasize the need for more comprehensive studies to capture these evolving trends²¹. Previous studies^22,23 have shed light on the key factors influencing news dissemination, including economic strength, geographic location, and cultural elements. These factors are pivotal in shaping the pathways through which news flows across countries. Additionally, some other research^{24,25,26,27,28} have revealed hierarchical structures and clustering patterns that are driven by economic growth, language, and political freedoms. Such findings emphasize the complex dynamics in the global news network, where certain countries or regions may have stronger connections due to shared economic interests, linguistic affinities, or similar political environments.

However, previous research on news dissemination patterns faced limitations due to the challenges of obtaining extensive and diverse datasets that reflect the global media landscape. Recent breakthroughs in big data analytics and natural language processing have given rise to new tools that facilitate collecting and examining large volumes of news content. One such innovation is the Global Database of Events, Language, and Tone (GDELT) which automatically compiles and encodes news content from thousands of news sources across multiple languages^29,30. In this study, we construct a comprehensive global network of news flows by leveraging GDELT data from 2022 to understand the peculiar characteristics of this network and what are the factors that determine its properties. Differently from the previous works in the field, our analysis focuses on a temporal approach³¹, i.e. the order in which news spreads within media outlets, thus moving beyond a simple examination of the frequency of countries’ names in news articles.

By studying the sequential mentions of events across countries, we aim to map the intricate interactions and reciprocal influences among nations in disseminating news. To achieve this result, our methodology involves identifying the initial source of each news event and tracing subsequent mentions of the event by other countries. By doing so, we can unravel the paths through which news travels across the global media landscape. This approach allows us to gain insights into how news events unfold, propagate, and are shared among countries, shedding light on the dynamics of news dissemination and the interconnectedness of nations in shaping the global information landscape. In addition to incorporating a temporal of news flow, we leverage a massive and unprecedented dataset of 140 million news articles from 37,802 domains across 183 countries sourced from GDELT. We find a significant skewness and inequality in the resulting graph, representing the interconnectedness of news dissemination. Specifically, we observe a small set of dominant countries that form a well-connected network, indicating their influential role in disseminating news globally and shaping the global public agenda. In pursuit of a deeper comprehension of the intricate dynamics underlying the patterns of news diffusion across countries, we use a gravity model. This methodological approach allows us to unravel the fundamental drivers of the observed skewed distribution. Our analysis reveals two primary determinants that significantly influence this phenomenon: the gross domestic product (GDP) and the geographic proximity of the countries involved. Notably, nations endowed with larger GDPs and closer geographical proximity demonstrate heightened interconnections and wield substantial influence within the global news network. Furthermore, we acknowledge the salient role common languages play, which actively shapes the flow of news across nations.

Results

News and countries

To assess the extent of news outlet activity across different countries, we rely on the Source-Country dataset, as detailed in the Methods section, which allows us to identify and analyze news outlets alongside their respective news articles. To visualize the relationship between the number of news outlets and the corresponding quantity of news articles published by each country, we present Fig. 1a. This graphical representation depicts the interplay between these two variables and shows that countries with more news outlets tend to publish more news articles, following approximately a power law.

We define the initial spreader(s) as the country or countries that first mention the event. Since the GDELT dataset has a time granularity of 15 min, multiple countries can be identified as the initial spreaders for a single event. This is because two or more news outlets may have posted an article within the same 15 min window.

For each event, we consider the number of countries in which that event was mentioned. We call the distribution of these measures the virality distribution. Moreover, we define the spreader score for a country i as the number of times i belongs to the set of initial spreaders of an event. Figure 1b depicts the spreader score of a specific country plotted against the maximum number of countries reached by events published by that country.

Figure 1b indicates that the United States (US) emerged as the initial top spreader in 2022, significantly surpassing other countries. The United Kingdom (UK) and India (IN) followed closely behind. Moreover, we deduce that countries with higher spreader scores are more likely to have their events go viral at least once. The inset plot in Fig. 1 illustrates the overall distribution of virality, suggesting that most events tend to circulate within their countries.

Structural analysis on news spreading

In this section, we aim to quantify the heterogeneity in the news-spreading process by means of a directed weighted network $G = (V, E)$ built using the GDELT dataset. The set of countries, denoted as V, represents the nodes in the network. A directed edge $(i, j) \in E$ is created if country j mentions an event successively after country i. The weight associated with each edge, denoted as $w_{ij}$, represents the number of times this sequential mentioning occurs. Please refer to the Methods section for more detailed information on this construction. The resulting network consists of 183 countries and 23,033 edges.

The network’s density, which indicates the proportion of existing connections compared to all possible connections, is 0.69. This value suggests that relatively few pairs of countries lack an exchange of information. Figure 3 illustrates the relationship between in-degree and out-degree (i.e., the number of in and out links attached to each node), as well as in-strength and out-strength (i.e., the sum of weights of in and out links attached to each node) within the network.

The degree distributions, both in-degree, and out-degree, reveal that most countries in the network have many connections. This indicates that countries tend to be mentioned by many other countries (in-degree) or mention many other countries (out-degree) at least once in the news-spreading process. Consistently with the high network density, the distribution on degrees suggests that countries are actively involved in disseminating news and have significant interactions with other countries.

The in-strength and out-strength distributions exhibit a similar pattern but on a logarithmic scale. This indicates that while most countries have a relatively high level of involvement in receiving or transmitting information, a few stand out with exceptionally high levels of strength in news dissemination. These countries play a prominent role in spreading the news, indicating a concentration of influence in the network.

An interesting observation is a high correlation between the in-degree and out-degree ($\rho = 0.99, p < 2.2 \times 10^{-16}$). This can be justified by the high reciprocity (i.e. proportion of links in both directions) exhibited by the network, with a value of 0.91. This surprisingly high reciprocity, together with high network density, indicates that subsequent mentions are usually mutual (i.e., both countries may mention an event in any order) and that connections among certain countries are more likely to be absent than unreciprocated. This suggests that some events are never mentioned in certain countries. Furthermore, when comparing the weights of mutual edges, they tend to have similar values. This implies that the extent of news diffusion between countries is well-balanced, as indicated by the similarity in the weights of reciprocal edges. Please refer to Supplementary Figure 1 in supplementary information for additional details and visualizations.

To examine the organizational principles of the network, we analyze its topology concerning the associated weights. In line with previous studies³², we compare the unweighted and weighted clustering coefficients to gain insights into the correlation between weights and network topology. Specifically, we define $C_w (s)$ as the weighted directed clustering coefficient³³ and C(s) as its unweighted counterpart, both averaged over nodes with a strength of s (see Methods for details about the coefficients). Both quantities aim at expanding the insight provided by the reciprocity measure, searching for triplets of nodes with reciprocated links that may be considered as the building blocks for highly cohesive groups of countries in the network of news flow.

Figure 4(a) shows the results of the comparison of $C_w (s)$ and C(s) as the strength increases.

The weighted directed clustering coefficient $C_w (s)$ is approximately equal to the unweighted clustering coefficient C(s) for low-strength nodes. However, $C_w (s)$ takes on higher values for high-strength nodes than C(s). This finding suggests that while higher-strength nodes tend to form fewer triangles in terms of network topology, the triangles they do form are more likely to consist of edges with higher weights. In other words, there is a core group of countries where information flows preferentially, contributing to higher weighted clustering coefficients. This indicates the presence of strong interconnectedness and concentrated information exchange within this core group of countries. This is also suggested by Supplementary Figure 2 in supplementary information. Moreover, to further validate the results, we compare the two distributions of clustering coefficients (weighted and unweighted) with that of two appropriate null models. The results show that only the weighted clustering coefficient distribution differs from the null model, suggesting the importance of the weighted structure of the network. The interested reader can find additional details in supplementary information.

The observed scenario in terms of clustering coefficient and the observed in and out-strength distributions form the basis for detecting a rich-club phenomenon³⁴. A rich-club refers to a group of prominent and tightly interconnected nodes that exert control over the flow of information in the network^32,35,36,37. Figure 4 compares the normalized unweighted and weighted rich-club coefficients (refer to the Methods section for further details). The normalized rich-club coefficient measures the extent to which high-degree nodes tend to be more interconnected with each other compared to what would be expected by chance taking into account an appropriate null model of network rewiring. Consistent with the previous findings, we observe that only a strong weighted rich-club ordering (with the coefficient significantly higher than 1 for high-degree nodes) is evident, confirming our earlier hypothesis. Specifically, the eight largest countries (according to their strength) within the rich-club are the United States (US), the United Kingdom (UK), Canada (CA), India (IN), Russia (RS), Germany (GM), France (FR), and Ukraine (UP). These countries play a prominent role in the weighted network, forming a core where information flow is significantly concentrated and interconnected. Contrary to expectations, it is noteworthy that Ukraine belongs to the rich-club. This is likely due to the extensive coverage of events concerning the war between Russia and Ukraine in 2022, that contributed to its central position in the network.

All of the previous analyses indicate that only a small set of countries is actively involved in news diffusion, while the participation of other countries is relatively minimal. However, it is also interesting to examine the individual role of each country in the network. To accomplish this, we employ the HITS algorithm³⁸ and compare the Authority and Hub scores of each node in Fig. 5, using the weights of the edges to represent the intensity of interaction. A node can be considered an Authority if it is linked by many Hubs, and, accordingly to this recursive procedure, a node can be considered a Hub if it points towards many Authorities. In our specific case, a Hub is a node that has many relevant out links and that is likely to be a source of news that are relevant for other prominent countries; an Authority, instead, is a country which immediately follows a Hub in the news flow and can be considered as a country for which events shared by Hubs are likely to be of high relevance.

The plot confirms that the United States (US), the United Kingdom (UK), Canada (CA), and India (IN) have a significant influence on news diffusion.

Specifically, US and UK emerge as the most influential countries of the network. The first has a Hub score close to one, indicating that it tends to be the source of many links, i.e. many countries follow them. On the other hand, the latter shows an Authority score close to one, indicating that it tends to follow many links, witnessing its importance in spreading information. Their relationship, with respect to this measure, reflects its interpretation as previously mentioned. In particular, US identifies itself as a news initiator that is promptly covered by the UK, which is likely to cover also other news by relevant countries such as Canada and India. Another noteworthy effect, related to the roles of US and UK, is their almost symmetrical relationship (which does not emerge in random instances of the model as shown in supplementary information) in the Hub and Authority diagram which may be considered an indicator of both an unbalanced, yet mutual, relationship and of a more general relevance in the interaction with other countries which underlines how these two countries play a central role in disseminating news on a global scale.

Heterogeneity of diffusion and gravity model

In the previous section, we demonstrated that only a small group of countries actively participate in the dissemination of information, indicating a high degree of heterogeneity in how news is spread. To further quantify this aspect, we proceed by computing transition probabilities associated with each edge., i.e.

$$\begin{aligned} p_{ij} = \frac{w_{ij}}{\sum _{k = 1}^{deg(i)} w_{ik}}. \end{aligned}$$

(1)

The transition probability $p_{ij}$ represents the proportion or percentage of information produced by country i that flows to the country j. We are primarily interested in understanding if the observed behavior could emerge randomly. In Fig. 5b, we calculate the disparity index $\gamma _i (k)$³⁹ (see Methods) and display it. The blue line indicates the maximum value that $\gamma _i (k)$ can assume. The values observed in Fig. 5b indicate that news diffusion heterogeneity increases with the node’s out-degree. This suggests that news diffusion does not occur randomly or uniformly across the network. If random, we would expect a more consistent behavior, resulting in $\gamma _i (k) \approx 1$ for any given k. The increasing values of $\gamma _i (k)$ concerning the out-degree imply that certain countries play a more influential role in news diffusion than others. This further result supports, from a slightly different perspective, the notion that news spreading is characterized by heterogeneity, where some countries have a more significant impact and influence in disseminating information than others.

We propose using a gravity model (GM) applied explicitly to news dissemination to investigate the factors influencing the observed behaviour in news diffusion. This approach has been employed in some prior studies, albeit with slight variations in network considerations²⁸. For our analysis, we employ the CEPII gravity dataset⁴⁰, which provides comprehensive economic data on international trade and various aspects of the global economy. However, since this dataset is updated only until 2020, we merge it with GDP data from 2022 obtained from the International Monetary Fund (IMF)⁴¹. Doing so, we aim to incorporate relevant economic factors into the gravity model for news diffusion. This approach will allow us to examine how economic variables, such as GDP, influence news flow between countries.

Our considered model is:

$$\begin{aligned} logit(p_{ij}) = \beta _0 + \beta _1 log(GDP_i) + \beta _2 log(GDP_j) + \beta _3 log(d_{ij}) + \beta _4 C_{ij} + \beta _5 L_{ij} + \beta _6 S_{ij}, \end{aligned}$$

(2)

where:

1.
$GDP_i$ is the GDP of country i measured in billions of dollars;
2.
$d_{ij}$ is the distance between the most populated cities of the two countries i and j, measured in km;
3.
$C_{ij}$ is a dummy variable equal to 1 when country i and j share a common border;
4.
$L_{ij}$ is a dummy variable equal to 1 when country i and j share a common language;
5.
$S_{ij}$ is a dummy variable equal to 1 when country i and j share a common language spoken by at least $9\%$ of the population;

Note that our dependent variables are the logit of $p_{ij}$, namely

$$\begin{aligned} logit(p_{ij}) = \log \left( \frac{p_{ij}}{1-p_{ij}}\right) , \end{aligned}$$

(3)

i.e. the log odds of i continuing the flow in country j. Considering the specified model, coefficients themselves might not have direct and intuitive interpretations in the original scale. In fact, a change in an independent variable induces a change in the logit-transformed dependent variable and, if exponentially transformed, in its odds (i.e., the ratio in the probability of news flow happening between countries i and j to the probability of it not occurring). For example, a unit change in $GDP_i$ changes the value of $\frac{p_{ij}}{1-p_{ij}}$ of a quantity $e^{\beta _1}$ and similarly for the other logged independent variables.

To ensure the assumptions of our regression procedure are satisfied and to account for the potential correlation of errors within countries or country pairs, we employ Ordinary Least Squares (OLS) with robust clustered standard errors. Although estimating the parameter $\beta _k$, where $k \in {1, \ldots , 6}$, using standard OLS is common, it may underestimate the actual variance of the parameters⁴². This underestimation is due to the potential correlation of errors within the country, such as the correlation between country pairs. To address this issue, we use OLS with robust clustered standard errors. This involves specifying a clustering variable (in our case $d_{ij}$) that independently identifies each country pair, regardless of the direction. By employing robust clustered standard errors, we can ensure that all assumptions are satisfied and obtain more reliable estimates of the parameters.

After merging the CEPII and IMF datasets, we obtain a network of 172 nodes and 20468 edges. Table 1 shows the estimated parameters’ values.

Table 1 Result of the GM fitting.

Full size table

The obtained results indicate that our model explains approximately 59% of the variance in the dependent variable ($R^2 = 0.59$).

The findings suggest that news flow tends to occur more frequently between countries that are geographically closer to each other. However, the coefficient for distance in the news diffusion model (approximately $-$ 0.54) is lower in absolute value compared to what is typically observed in real trade networks (which is approximately 1). This difference can be attributed to the fact that news diffusion does not have the same tangible cost associated with it as in trade. News can travel more easily and quickly across long distances without incurring significant transportation or logistical expenses. Thus, the impact of geographical distance on news diffusion is somewhat attenuated compared to its impact on trade flows.

Additionally, the results indicate that economic power plays a role in news diffusion. Countries with higher GDP are more likely to be the target of the news flow, while they are less likely to be the source of the flow, although this effect is relatively weak ($\beta _1 \approx -0.13$).

Regarding cultural factors, the study reveals that the presence of a shared language has a substantial influence on the transmission of news. This is substantiated by the positive and statistically significant coefficients assigned to the language variables ($L_{ij}$, $S_{ij}$) in the model.

In summary, the analysis indicates that several factors, including geographic proximity, economic power, and cultural aspects like language, play a pivotal role in shaping the dynamics of news diffusion among nations. These findings emphasize the significance of considering these factors when examining the patterns of information flow on an international scale.

Conclusion

In this paper we analyzed the network of global news diffusion in 2022. Our results depict a highly unbalanced situation in how news flow worldwide, with a rich-club of countries that act as superspreaders. Interestingly, we find that the countries’ GDP is one of the major factors that establish the role of countries in the news diffusion setting.

Our results were obtained using a network-based analysis and a gravity model to investigate the organizational structure and primary drivers of news dissemination in the digital realm. Moreover, in contrast to prior research, our study incorporates a temporal dimension into the analysis of news diffusion networks and uses a huge dataset provided by GDELT.

We highlight that possible limitations of our work are linked to the model of diffusion we used. In fact, an edge (i, j) between two countries does not represent in all cases a direct flow between country i and j, i.e. we are not sure that j mentions the event because i did. Moreover, the Multilingual Source-Country dataset provides only an approximation of the geographic country of origin of news outlets.

However, since we use a large amount of data and a time-aggregating approach, it is reasonable to suppose that the network provides a good approximation of how news flows between pairs of countries. This is also justifiable by the observed network heterogeneity and the departure of the network structure from random models (as shown in supplementary information).

Although our study uses a different dataset and modelling approach, and despite potential data limitations related to the ongoing Russia-Ukraine conflict, our findings align with prior research, shedding also new light on the dynamics that occur in the news diffusion environment.

Methods

Data collection and preprocessing

In our study, we explored global news dynamics by leveraging the power of big data to understand the trajectory and network structure of news articles. Our primary data sources include the GDELT 2.0 dataset and the Multilingual Source-Country dataset. These datasets offer a comprehensive view of news articles in multiple languages, enabling us to conduct a granular analysis of both Western and non-Western media narratives. The following sections detail our data collection methodology and the preprocessing steps we took.

GDELT dataset

We used the GDELT 2.0 to collect online news articles from January 1 to December 31, 2022, with a 15-minute update resolution. The Global Database of Events, Language, and Tone (GDELT) is an independent, non-profit project providing a vast, updated global news and events database in different languages. The dataset covers news articles written in English and 65 other languages, allowing us to delve deep into non-Western media. The data in GDELT 2.0 is organized into three main tables: Events, Mentions, and the Global Knowledge Graph (GKG). The Events table lists events with related general information. The Mentions table instead contains each news article referring to an event from the Events table. Finally, the GKG table provides detailed information about each article’s actors, emotions, and themes. We used the Mentions table to track a story’s trajectory and network structure as it flowed through the global media system from country to country. In the Mentions table, each mention (i.e. news article) of an event is given its entry in the dataset. This means an event mentioned in 100 news articles will be listed 100 times in the Mentions table. The dataset resulting from this collection, described in Table 2, consists of 140 million news articles referencing 51 million unique events. To ensure data accuracy, we excluded news articles from domains not present in the Source-Country dataset, resulting in 125 million news articles and their respective country of origin. We explain GDELT’s event identification process in the supplementary information.

Table 2 Dataset breakdown.

Full size table

Multilingual source-country dataset

This dataset (gdelt-bq.extra.sourcesbycountry) estimates the country of origin for major online news outlets monitored by GDELT, using the primary geographic focus of the outlet over the monitored time frame. GDELT’s documentation mentions challenges related to outlets with insufficient coverage volume or geographic emphasis and regional wire services with bureaus in particular countries⁴³. Despite the challenges, the dataset is considered a reasonable approximation of the geographic country of origin for these news outlets and is available on GDELT’s website.

Network construction

Starting from our collected data, for each event k we consider a directed weighted graph $G_k = (V_k,E_k)$. In particular, $V_k$ is the unique set of countries that mention k, while $(i,j) \in E_k$ if country j mentions k successively after country i. In more detail, consider two successive times $t_i, t_{i+1}$ (we have $t_{i+1}-t_i \ge 15$ min) and let $C_k(t_i),C_k(t_{i+1})$ be the set of countries that mention event k at time $t,t+1$ (respectively). If $T_k$ is the lifespan of event k, we have that $E_k = \bigcup _{i = 1}^{T_k-1} C_k(t_i)\times C_k(t_{i+1})$. Since it is possible to have multiple edges, we associate to each of them a weight $w_{i,j}^k$ that represents their multiplicity. This procedure results in a collection of graphs $G_k$, one for each event.

We obtain a final network $G = (V,E)$ defined as

$$\begin{aligned} G = \bigcup _{k} G_k \end{aligned}$$

and $w_{ij} = \sum _k w_{ij}^k$, i.e. we overlap all the networks associated with each event. Thus, G is a directed weighted network in which an edge between (i, j) means that j mentions an event successively after i a number of times equal to $w_{ij}$. Such network construction is akin to what is called time-aggregated weighted network in temporal network theory⁴⁴. To focus on the relationship among countries, we delete all the loops, i.e. edges that start and end in the same country, obtaining a simple graph.

To clarify the previous steps, we provide an example that describes how we construct the network for a single event $\epsilon _0$. Consider the following list of countries mentioning $\epsilon _0$ (by means of at least one of their news outlets).

Country	Event	Time stamp
US	$\epsilon _{0}$	$t_{0}$
CA	$\epsilon _{0}$	$t_{1}$
US	$\epsilon _0$	$t_2$
IT	$\epsilon _0$	$t_3$
UK	$\epsilon _0$	$t_4$
US	$\epsilon _0$	$t_5$
CA	$\epsilon _0$	$t_6$

This table can be turned into a chain of (direct) mentions, i.e.

$$\begin{aligned} US \rightarrow CA \rightarrow US \rightarrow IT \rightarrow UK \rightarrow US \rightarrow CA. \end{aligned}$$

Therefore, we obtain the following edges

$$\begin{aligned} E = \{(US,CA),(CA,US),(US,IT),(IT,UK),(UK,US),(US,CA)\}. \end{aligned}$$

In this simple example, all edges have a weight equal to 1, except for (US, CA) which appears twice and therefore has a weight equal to 2. Such edges can be used to build the graph $G_0$ related to the event $\epsilon _0$.

Measures on networks

Weighted clustering coefficient

Clustering coefficients are widely used in the network science literature, with the aim of studying the interconnectedness of nodes’ neighbours. However, fewer measures have been proposed for weighted directed networks.

Since one of our aims is to study the organization of news diffusion network, we decided to consider the Clustering Coefficient proposed by Clemente and Grassi³³, which can be defined as follows

$$\begin{aligned} C_i = \frac{0.5 \left[ (\textbf{W}+\textbf{W}^T)(\textbf{A}+\textbf{A}^T)^2\right] _{ii}}{s_i \left( d_i -1\right) - 2s_i^{\leftrightarrow }} \end{aligned}$$

(4)

where $\textbf{A},\textbf{W}$ are the unweighted and weighted adjacency matrices of the network, $s_i$ is the strength of node i and $s_i ^{\leftrightarrow }$ is defined as the strength of bilateral arcs, i.e.

$$\begin{aligned} s_i ^\leftrightarrow = \sum _{i \ne j} a_{ij} a_{ji} \frac{w_{ij} + w_{ji}}{2}. \end{aligned}$$

The numerator of (4) takes into account all directed triangles that node i actually forms with its neighbours, weighted with the average weight of the links connecting a node i to its adjacent j and k. The denominator counts the total number of directed triangles that it could form, taking weights properly. In our context, comparing unweighted and weighted clustering coefficients sheds light on the relations between the network topology and its weights. In such a way, we are able to quantify possible inequalities in how countries spread news.

Rich-club

Rich-club refers to a network subgraph made up of the most prominent nodes (from a topological or non-topological point of view) being highly interconnected with each other. Many studies report that these nodes tend to be important for the overall structure and function of the network^35,45,46,47. In more detail, given a weighted graph $G = (V,E)$ with binary adjacency matrix A and weighted adjacency matrix W ($W_{i,j} \in \left( 0, +\infty \right) )$, the topological (i.e. unweighted) measure that allows to detect a rich-club with respect to nodes of degree k is:

$$\begin{aligned} \phi (k) = \frac{2 m_{>k}}{n_{>k} (n_{>k} -1)}, \end{aligned}$$

(5)

where $m_{>k}$ and $n_{>k}$ are the number of links and nodes of the subgraph $G_{>k} \subseteq G$ inducted by nodes with degree higher than k.

To consider weighted networks, the presence of a rich-club must be detected with respect to a certain richness parameter r that allows ranking the nodes with respect to it. Common examples of richness parameters can be degree, strength or other measures of centrality^36,37.

Given such a measure, it is possible to define the weighted rich-club as:

$$\begin{aligned} \phi ^w (r) = \frac{W_{>r}}{\sum _{i = 1}^{E_{>r}} W_i^{rank}} \end{aligned}$$

(6)

where $w_i ^{rank} \ge w_{i+1} ^{rank}$, with $i = 1,2,\ldots , |E|$ are the weights of the links ranked in increasing order. Thus, (6) measures the fraction of weights shared by the richest nodes compared to the total amount they could share using the strongest links of the network.

However, the two definitions alone, don’t assure the existence of a rich-club effect. To measure that, a suitable null model that preserves some properties of the original graph must be considered⁴⁸. In this context, we are interested in maintaining the strength distribution, therefore we consider a set of randomized networks obtained from G by reshuffling locally the weights of the outgoing links, as proposed by Opsahl et al.³⁶.

We define $\phi _{null} ^{w} (r)$ as the value of (6) computed for the random null model. Thus, we detect a rich-club effect by computing the ratio

$$\begin{aligned} \rho ^w (r) = \frac{\phi ^w (r)}{{\bar{\phi }}_{null}^w (r)}, \end{aligned}$$

(7)

where ${\bar{\phi }}_{null}^w (r)$ is the mean across N generated null model. A value of $\rho ^w (r) > 1$ indicates that rich nodes concentrate their flow towards other rich nodes more than expected in a set (usually containing 100 randomised instances) of null models. Note that (7) can be applied with no problem to the case of unweighted networks. However, in this case, we consider a null model in which the degree sequence is preserved, using a rewiring algorithm⁴⁹. In the context of news diffusion networks, a rich-club is a subset of the countries in which information flows preferentially and more frequently.

HITS algorithm

HITS (Hyperlink-Induced Topic Search)³⁸ is a web page ranking algorithm used to evaluate web pages relevance based using the network structure, defined by a directed graph $G = (V,E)$. The basic idea is to consider a web page to be important if it is linked to other important pages, considering two types of scores associated with each page: Hub and Authority. In particular, a page with a high Hub score points to many good Authorities and, vice-versa, a high Authority score indicates that a page points to many good Hubs. Moreover, weights of the network can be used as connection strengths. In our context, a Hub is a country that tends to be the source of the news time mentions, i.e. many (Authority) countries follow it in the diffusion. On the other hand, an Authority is a country that tends to be the target of the news time mentions, i.e. it follows many (Hub) countries in the news diffusion and assumes a central role in how information is disseminated.

To give a mathematical description of the two measures, let’s denote with x(p) (y(p)) the Authority score (Hub score) of page p. The algorithm initially sets $x(p) = y(p) = 1$ and then updates the values according to the following simple rules:

1.
For each p, $x(p) = \sum _{\{q: (q,p) \in E\}} y(q)$;
2.
For each p, $y(p) = \sum _{\{q: (p,q) \in E\}} x(q)$;
3.
Normalize the Authority scores such that $\sum _{p \in V} x(p)^2 = 1$;
4.
Normalize the Hub scores such that $\sum _{p \in V} y(p)^2 = 1$;

Linear Algebra tools assure that the previous algorithm converges to fixed points $\bar{x}$ and $\bar{y}$. Moreover, experimental evidence indicates that this convergence is obtained with a limited number of steps.

Disparity index

The disparity index has been widely used in many fields like economics⁵⁰, ecology⁵¹, physics⁵² and network science^39,53. It measures the inhomogeneities in the weights at the local level of nodes and it is defined, for a node i of degree k, as

$$\begin{aligned} \gamma _i (k) = k \sum _{j} p_{ij} ^ 2, \end{aligned}$$

(8)

where $p_{ij}$ are the transition probabilities associated with each edge, as defined in (1). Its value ranges between 1 (perfect homogeneity, i.e. all the links have the same weight) and k (perfect heterogeneity, i.e. only a link carries the whole node strength). In the news diffusion landscape, (8) can be used to understand if the behaviour observed in the network occurs randomly or not. In fact, in the first case, we could expect $\gamma _i \approx 1$, i.e. a uniform behaviour.

Data availability

The datasets generated and/or analysed during the current study are available in the GDELT repository, https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/.

References

Ahmad, A. N. Is twitter a useful tool for journalists?. J. Med. Pract. 11, 145–155. https://doi.org/10.1386/jmpr.11.2.145_1 (2010).
Article Google Scholar
Gentzkow, M. & Shapiro, J. M. Competition and truth in the market for news. J. Econ. Perspect. 22, 133–154. https://doi.org/10.1257/jep.22.2.133 (2008).
Article Google Scholar
Chen, H. & Suen, W. Competition for attention and news quality. Am. Econ. J. Microecon. 15, 1–32 (2019).
Article Google Scholar
Leetaru, K. Data Mining Methods for the Content Analyst: An Introduction to the Computational Analysis of Content (Routledge, 2012).
Book Google Scholar
Sittar, A., Mladenic, D. & Erjavec, T. A dataset for information spreading over the news. In Proc. of the 23th International Multiconference Information Society SiKDD, vol. 100, 5–8 (2020).
Bodaghi, A. & Oliveira, J. The theater of fake news spreading, who plays which role? A study on real graphs of spreading on twitter. Expert Syst. Appl. 189, 116110 (2022).
Article Google Scholar
Abdullah, S. & Wu, X. An epidemic model for news spreading on twitter. In Proc. 2011 IEEE 23rd international conference on tools with artificial intelligence, 163–169 (IEEE, 2011).
Schmidt, A. L. et al. Anatomy of news consumption on Facebook. Proc. Natl. Acad. Sci. 114, 3035–3039 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Del Vicario, M. et al. The spreading of misinformation online. Proc. Natl. Acad. Sci. 113, 554–559 (2016).
Article ADS PubMed PubMed Central Google Scholar
Vosoughi, S., Roy, D. & Aral, S. The spread of true and false news online. Science 359, 1146–1151 (2018).
Article ADS CAS PubMed Google Scholar
Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W. & Starnini, M. The echo chamber effect on social media. Proc. Natl. Acad. Sci. 118, e2023301118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Spohr, D. Fake news and ideological polarization: Filter bubbles and selective exposure on social media. Bus. Inf. Rev. 34, 150–160 (2017).
Google Scholar
Stroud, N. J. Polarization and partisan selective exposure. J. Commun. 60, 556–576 (2010).
Article Google Scholar
Wanta, W., Golan, G. & Lee, C. Agenda setting and international news: Media influence on public perceptions of foreign nations. Journal. Mass Commun. Q. 81, 364–377 (2004).
Article Google Scholar
Kiousis, S. & Wu, X. International agenda-building and agenda-setting: Exploring the influence of public relations counsel on us news media and public perceptions of foreign nations. Int. Commun. Gaz. 70, 58–75 (2008).
Article Google Scholar
Peter, J. & De Vreese, C. H. Agenda-rich, agenda-poor: A cross-national comparative investigation of nominal and thematic public agenda diversity. Int. J. Public Opin. Res. 15, 44–64 (2003).
Article Google Scholar
Galtung, J. & Ruge, M. H. The structure of foreign news: The presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. J. Peace Res. 2, 64–90 (1965).
Article Google Scholar
McCombs, M. E. & Shaw, D. L. The agenda-setting function of mass media. Public Opin. Q. 36, 176–187 (1972).
Article Google Scholar
Hallin, D. C. & Mancini, P. Comparing media systems: Three models of media and politics (Cambridge University Press, 2004).
Book Google Scholar
Gitlin, T. The whole world is watching: Mass media in the making and unmaking of the new left (University of California Press, 2003).
Google Scholar
Guo, L. & Vargo, C. J. Global intermedia agenda setting: A big data analysis of international news flow. J. Commun. 67, 499–520 (2017).
Article Google Scholar
Wallerstein, I. The Modern World-System I: Capitalist Agriculture and the Origins of the European World-Economy in the Sixteenth Century 1st edn. (University of California Press, 2011).
Book Google Scholar
Chang, T.-K. All countries not created equal to be news: World system and international communication. Commun. Res. 25, 528–563. https://doi.org/10.1177/009365098025005004 (1998).
Article Google Scholar
Kim, K. & Barnett, G. A. The determinants of international news flow: A network analysis. Commun. Res. 23, 323–352. https://doi.org/10.1177/009365096023003004 (1996).
Article Google Scholar
Wu, H. D. Homogeneity around the world? comparing the systemic determinants of international news flow between developed and developing countries. Gazette 65, 9–24 (2003).
Article Google Scholar
Guo, L. & Vargo, C. J. Predictors of international news flow: Exploring a networked global media system. J. Broadcast. Electron. Med. 64, 418–437. https://doi.org/10.1080/08838151.2020.1796391 (2020).
Article Google Scholar
Grasland, C. International news flow theory revisited through a space-time interaction model: Application to a sample of 320,000 international news stories published through RSS flows by 31 daily newspapers in 2015. Int. Commun. Gaz. 82, 231–259 (2020).
Article Google Scholar
Fracasso, A., Grassano, N. & Marzetti, G. V. The gravity of foreign news coverage in the EU: Does the euro matter?. JCMS J. Common Market Stud. 53, 274–291. https://doi.org/10.1111/jcms.12182 (2014).
Article Google Scholar
Leetaru, K. & Schrodt, P. A. Gdelt: Global data on events, location, and tone, 1979–2012. In ISA Annual Convention, 4, 1–49 (Citeseer, 2013).
Leetaru, K. H. Mining libraries: Lessons learned from 20 years of massive computing on the world’s information. Inf. Serv. Use 35, 31–50 (2015).
Google Scholar
Ram, S. K., Nandan, S., Boulebnane, S. & Sornette, D. Synchronized bursts of productivity and success in individual careers. Sci. Rep. 12, 7637 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Barrat, A., Barthélemy, M., Pastor-Satorras, R. & Vespignani, A. The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101, 3747–3752. https://doi.org/10.1073/pnas.0400087101 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Clemente, G. & Grassi, R. Directed clustering in weighted networks: A new perspective. Chaos Solitons & Fractals 107, 26–38. https://doi.org/10.1016/j.chaos.2017.12.007 (2018).
Article ADS MathSciNet Google Scholar
Xu, X.-K., Zhang, J. & Small, M. Rich-club connectivity dominates assortativity and transitivity of complex networks. Phys. Rev. E 82, 046117 (2010).
Article ADS Google Scholar
Zhou, S. & Mondragón, R. J. The rich-club phenomenon in the internet topology. IEEE Commun. Lett. 8, 180–182 (2004).
Article Google Scholar
Opsahl, T., Colizza, V., Panzarasa, P. & Ramasco, J. J. Prominence and control: The weighted rich-club effect. Phys. Rev. Lett. 101, 168702. https://doi.org/10.1103/PhysRevLett.101.168702 (2008).
Article ADS CAS PubMed Google Scholar
Cinelli, M. Generalized rich-club ordering in networks. J. Complex Netw. 7, 702–719 (2019).
Article MathSciNet Google Scholar
Kleinberg, J. M. Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632. https://doi.org/10.1145/324133.324140 (1999).
Article MathSciNet Google Scholar
Serrano, M. Á., Boguñá, M. & Vespignani, A. Extracting the multiscale backbone of complex weighted networks. Proc. Natl. Acad. Sci. 106, 6483–6488. https://doi.org/10.1073/pnas.0808904106 (2009).
Article ADS PubMed PubMed Central Google Scholar
Conte, M. et al. The cepii gravity database (CEPII, Paris, France, 2021).
International monetary fund. https://www.imf.org/external/datamapper/NGDPD@WEO/OEMDC/ADVEC/WEOWORLD.
Moulton, B. R. An illustration of a pitfall in estimating the effects of aggregate variables on micro units. Rev. Econ. Stat. 72, 334. https://doi.org/10.2307/2109724 (1990).
Article Google Scholar
GDELT. Announcing new source country cross-referencing dataset. https://blog.gdeltproject.org/announcing-new-source-country-crossreferencing-dataset/ (2014).
Barrat, A. & Cattuto, C. Data summaries and representations: definitions and practical use (2018).
Van Den Heuvel, M. P. & Sporns, O. Rich-club organization of the human connectome. J. Neurosci. 31, 15775–15786 (2011).
Article PubMed PubMed Central Google Scholar
Ma, A., Mondragón, R. J. & Latora, V. Anatomy of funded research in science. Proc. Natl. Acad. Sci. 112, 14760–14765 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Cinelli, M., Ferraro, G. & Iovanella, A. Rich-club ordering and the dyadic effect: Two interrelated phenomena. Phys. A 490, 808–818 (2018).
Article Google Scholar
Colizza, V., Flammini, A., Serrano, M. A. & Vespignani, A. Detecting rich-club ordering in complex networks. Nat. Phys. 2, 110–115 (2006).
Article CAS Google Scholar
Maslov, S. & Sneppen, K. Specificity and stability in topology of protein networks. Science 296, 910–913 (2002).
Article ADS CAS PubMed Google Scholar
Hirschman, A. O. The paternity of an index. Am. Econ. Rev. 54, 761–762 (1964).
Google Scholar
Simpson, E. H. Measurement of diversity. Nature 163, 688–688 (1949).
Article ADS Google Scholar
Derrida, B. & Flyvbjerg, H. Statistical properties of randomly broken objects and of multivalley structures in disordered systems. J. Phys. A Math. Gen. 20, 5273 (1987).
Article ADS MathSciNet Google Scholar
Barthelemy, M., Gondran, B. & Guichard, E. Spatial structure of the internet traffic. Phys. A 319, 633–642 (2003).
Article Google Scholar

Download references

Acknowledgements

The work is supported by IRIS Infodemic Coalition (UK government, Grant No. SCH-00001-3391), SERICS (PE00000014) under the NRRP MUR program funded by the European Union—NextGenerationEU, project CRESP from the Italian Ministry of Health under the program CCM 2022, and PON project “Ricerca e Innovazione” 2014–2020.

Author information

Authors and Affiliations

Sapienza University of Rome, Rome, Italy
Shayan Alipour, Michele Avalle, Gabriele Etta, Matteo Cinelli & Walter Quattrociocchi
University of Florence, Florence, Italy
Niccolò Di Marco

Authors

Shayan Alipour
View author publications
You can also search for this author in PubMed Google Scholar
Niccolò Di Marco
View author publications
You can also search for this author in PubMed Google Scholar
Michele Avalle
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Etta
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Cinelli
View author publications
You can also search for this author in PubMed Google Scholar
Walter Quattrociocchi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.A. collected the data; N.D.M., M.C. and W.Q. conceived the experiments; S.A., N.D.M. conducted the experiments; S.A., N.D.M., M.C. and W.Q. analysed the results; All authors reviewed the manuscript.

Corresponding author

Correspondence to Walter Quattrociocchi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Alipour, S., Di Marco, N., Avalle, M. et al. The drivers of global news spreading patterns. Sci Rep 14, 1519 (2024). https://doi.org/10.1038/s41598-024-52076-6

Download citation

Received: 23 May 2023
Accepted: 12 January 2024
Published: 17 January 2024
DOI: https://doi.org/10.1038/s41598-024-52076-6
Springer Nature Limited

Country	Event	Time stamp
US	\(\epsilon _{0}\)	\(t_{0}\)
CA	\(\epsilon _{0}\)	\(t_{1}\)
US	\(\epsilon _0\)	\(t_2\)
IT	\(\epsilon _0\)	\(t_3\)
UK	\(\epsilon _0\)	\(t_4\)
US	\(\epsilon _0\)	\(t_5\)
CA	\(\epsilon _0\)	\(t_6\)

The drivers of global news spreading patterns

Abstract

Similar content being viewed by others