1 Introduction

Given a complex network, being biological (e.g., neural interactions), technological (e.g., IoT systems), social (e.g., online social media platforms) or transportation related (e.g., scheduled flights), the topology of the underlying graph can easily indicate node importance (Ghoshal et al. 2014). Peripheral nodes generally do not have major impact, whereas nodes positioned in the epicenter of the topology can effectively control several structural and functional behaviors of the network. Thus, identifying the most important nodes in a network brings us closer to understand network dynamics’ complexity from multiple aspects. Node centrality measures are the metrics developed in order to determine these important nodes. Since node importance can be interpreted in plenty of disparate ways, many coexisting centrality measures have been introduced and applied effectively in various domains (Zweig 2016; Das et al. 2018). Depending on the interpretation of node importance, these differently calculated measures can represent local (i.e., neighborhood based) or global information, as well as static and dynamic properties of the network, thus providing relevant metrics for a diverse set of applications. The most commonly known and used centrality measures are degree (Shaw 1954; Nieminen 1973), closeness (Beauchamp 1965; Sabidussi 1966), eigenvector (Bonacich 1972), betweenness (Freeman 1977) and PageRank (Brin and Page 1998).

Several studies focused on investigating the stability of node centrality measures in an empirical way. Recently, a formal definition for the stability of node centrality measures has been given by Segarra and Ribeiro (2015), proving that degree, closeness and eigenvector centrality measures are stable, as opposed to the betweenness centrality. In our work, three distinct perturbation procedures are being studied and the stability of the above-mentioned centrality measures is experimentally investigated on several real-world and synthetic datasets.

This paper is organized as follows. In Sect. 2, we will discuss the related works regarding the stability of network centrality measures. In Sect. 3, we will describe the definition of stability for centrality measures and for the main notations used in the paper. In Sect. 5, we will discuss the three perturbation processes, outline the datasets used in our experiments, and present the results of our numerical experiments. In Sect. 6, we will discuss our findings in detail and also give insights into the future work possibilities. In Sect. 7, we will conclude our work.

2 Related works

Although the concept of node centrality measures’ stability has been investigated in the literature of complex networks and centrality measures (Costenbader and Valente 2003; Zemljič and Hlebec 2005; Borgatti et al. 2006; Boldi et al. 2013; Iyer et al. 2013; Niu et al. 2015; Segarra and Ribeiro 2015; Sarkar et al. 2018), one can find considerably less papers that discuss more than one graph perturbation method at a time. In the cited papers, empirical experiments were performed by comparing the initial graph or network with a modified version of the original one with respect to some randomization processes. Graph sampling, edge weight perturbation, removing or adding new edges are considered as perturbation methods in these studies.

Borgatti et al. (2006) and Sarkar et al. (2018) both investigated the robustness of centrality measures under conditions of imperfect data and incomplete networks. Sarkar et al. (2018) studied the effect of noise in complex networks and performed sensitivity, robustness, and reliability-related analyses. Borgatti et al. (2006) added random error to different networks with varying sizes and densities. To simulate the random error occurrence in the networks, four types of graph manipulation methods (edge deletion, node deletion, edge addition and node addition) were performed. Their research aimed at examining the accuracy of centrality measures in case of incomplete data on random networks, generated by the method of Erdős and Rényi (1959). The work was restricted to degree, closeness, betweenness and eigenvector centrality measures. Centrality robustness was calculated based on the ranking of the most central nodes in the original graph compared to the ones in the perturbed graphs besides other notions of robustness like the square of Pearson correlation between the measures.

Boldi et al. (2013) also experimented with network robustness against node removal, like Borgatti et al. (2006) by analyzing both web graphs and social graphs of various sizes and characteristics. Their main focus was on the alterations of the distance distribution on these networks under different node removal concepts.

Costenbader and Valente (2003) discussed a graph sampling-related approach in their work. They emphasized the relevance of the given problem by introducing the issues of centrality stability for social network analysis. In their study, a bootstrap sampling method was used to determine how it would affect the stability of 11 network centrality measures. The performance of centrality measures was compared in various networks at decreasing sampling levels. The simulation procedures consisted of an initial stage, where the centrality measures were calculated followed by taking repeated random samples of the network at each of eight different sampling proportions: starting at 80%, decrementing by 10%, down to 10%. At each sampling level, the values calculated on the sample were correlated with the original measures 25 times and then took their average.

Zemljič and Hlebec (2005) also experimented with social networks and, like Costenbader and Valente (2003), were focusing on high school students. In their work, the reliability of measures of centrality and prominence is discussed by presenting eight experiments. Reliability of in- and out-degree, in- and out-closeness as well as betweenness and flow betweenness was estimated by the Pearson correlation coefficient.

Attack robustness was investigated by Iyer et al. (2013). In their study, they raise a fundamental issue affecting complex networks, being the robustness of the overall system to the failure of its constituent parts. They propose that the ability of a system to function when degraded mostly depends on the integrity of the underlying network. During their percolation-related experiments, they targeted some vertices by a wider range of non-local measures of potential importance. In their study degree, eigenvector, closeness and betweenness centrality measures were analyzed using undirected simple networks. For any network under consideration, they determined the importance of the vertices by calculating the above-mentioned centrality measures. Then, they computed the effect on the size of the largest connected component of the network, after removing a given fraction of the vertices by their rank.

A similar network manipulation was proposed by Niu et al. (2015) and amended with some new ones. In their work, they focused on the robustness of classic centrality measures (degree, betweenness, closeness, eigenvector and k-shell) against network manipulation. Both artificial and real networks were used in their experiments. The introduced network manipulation techniques were addition, removal and rewiring of links in both random and biased ways. To assess the effect of manipulation on a node’s centrality measure, they calculated the Spearman correlation coefficient between the measures on the original graph and on the perturbed graph.

The above-listed studies interpreted the robustness or stability of centrality measures strongly related to correlation and ranking by the measures. A new stability concept was introduced by Segarra and Ribeiro (2015) which will be elaborated in Sect. 3.2. Their numerical experiments were performed with the use of randomly generated graphs with a given node set of size \(n \ge 10\), where an undirected edge exists with probability \(q = 10/n\) aiming at analyzing the behavior of the centrality measures. The edge weights were randomly picked from a uniform distribution in [0.5, 1.5]. They also performed experiments on two real-world networks, one containing information about the air traffic between the most popular airports in the USA, while the second network records interactions between sectors of the US economy. In their numerical experiments, two types of random noise were used to analyze the specified robustness indicators on node rankings.

Randomized, synthetic perturbation methods were used by Borgatti et al. (2006), Niu et al. (2015) and Segarra and Ribeiro (2015) to simulate imperfect data, network downtime or noise contamination. In our paper, besides the use of synthetic network manipulation techniques, like sampling (Costenbader and Valente 2003; Rezvanian and Meybodi 2016), we put the emphasis on selecting perturbation methods provided naturally by some real-life processes. Graphs or networks were constructed from large, real-world datasets with the usage of similarity or correlation. The selected real-world processes involved changes in the daily closing prices of stocks or in the similarity between users based on rated movies. On the other hand, graph growth with its underlying process of new nodes joining the network and new edges connecting already existing nodes was studied. Trust networks, user–user interaction-based graphs and community–community interaction-based networks were taken into consideration. The stability of node centrality measures in these real-world process-based perturbation situations was analyzed during empirical experiments focusing on the stability concepts introduced by Segarra and Ribeiro (2015). Besides performing numerical experiments with reference to the stability concept, we also experimented with the ordinal association and Jaccard similarity of the nodes. By combining the stability, correlation and node ranking-related experiments on the mentioned real-world datasets, we were able to analyze the performance of the centrality measures in various circumstances.

3 Preliminaries and notations

Let us consider a network represented by a graph \(G=(V, E)\), where V is the set of nodes and E is the set of edges (i.e., pairs of nodes) in the network. In the present paper, we consider directed and undirected, as well as weighted and unweighted graphs. A weighted graph is defined as \(G = (V,E,W)\), where W is the set of weights defined on the edges of G. Edge weights can represent similarities, i.e., connection strength between the nodes, and also dissimilarities, i.e., distances between the nodes depending on the application. The adjacency matrix \(A \in \mathbb {R}^{n \times n}\) is an alternative representation of a graph. Two nodes, i and j, are adjacent if \((i,j) \in E\), moreover in case of undirected graphs \((j,i) \in E\) whenever \((i,j) \in E\). In the case of binary and undirected graphs \(A_{ij}=1\) if i and j are adjacent, while \(A_{ij}=0\) otherwise, whereas in weighted graphs \(A_{ij} = W(i,j)\). Considering directed graphs, two nodes i and j are adjacent if there is an edge from node i to node j. An \(i-j\) path is a sequence of distinct adjacent vertices from vertex i to vertex j. The distance \(\ell (i,j)\) between i and j in graph G is the length of the shortest paths joining them when such a path exists, and is set to “\(\infty\)” otherwise.

3.1 Studied centrality measures

Given a network represented by a graph G, centrality measures can be defined as real-valued functions \(C^G: V_{G} \rightarrow \mathbb {R}_{\ge 0}\), which assigns a nonnegative number to each node of G. The considered centrality measures, that can also be found, e.g., in studies by Das et al. (2018) or Segarra and Ribeiro (2015), are degree (Shaw 1954), closeness (Sabidussi 1966), betweenness (Freeman 1977), eigenvector (Bonacich 1972) and PageRank (Brin and Page 1998). There are two major categories of centrality metrics: neighborhood based and shortest path based. While degree, eigenvector and PageRank are considered to be neighborhood based, closeness and betweenness are representative of the shortest path-based metrics.

Degree centrality is entirely restricted to using local information, i.e., relies only on the number and strength of immediate connections, whereas the other measures are calculated based on global information. The spectral eigenvector and PageRank calculate the importance of a node by taking into consideration the importance of its neighbors. The two shortest path-based metrics also hold global information, closeness calculates how fast the information can travel from one node to every node in the network, while betweenness focuses on a node’s presence on the shortest paths between every pair of nodes in the network.

The neighborhood-based metrics consider edge weights as connection strength or similarities, while in the case of shortest path-based measures, the edge weights represent distances, or dissimilarities. It is important to note that the rankings assigned to a weighted graph, by applying a centrality measure based on dissimilarities, and another based on similarities are not comparable. This is the reason why in our numerical experiments the edge weights corresponding to connection strengths were converted to distances in the case of the above-mentioned metrics.

We have been briefly discussing several grouping options of centrality measures based on the information they possess, as well as their calculation method and their theoretical backgrounds. We have seen that closeness and betweenness both operate based on global shortest path-based information. The following sections will demonstrate that another grouping can be established by the stability notion proposed by Segarra and Ribeiro (2015) that will interestingly place the two measures into separated categories.

3.2 Stability of centrality measures

In our experiment, we use the notion of centrality measures’ stability defined by Segarra and Ribeiro (2015) as follows. A centrality measure C is said to be stable if

$$\begin{aligned} \vert C^G(x)-C^H(x)\vert \;\le \;K_G \cdot d(G,H) \end{aligned}$$
(1)

holds for every node \(x \in V\), where G and H are two graphs over the same node set V, \(K_G\) is a universal constant, and \(d(\cdot ,\cdot )\) is a distance function between graphs G and H. The definition of stability says that a node centrality measure is stable, if the maximum change in the measure is bounded by a constant \(K_G\) multiplied by the distance of the two graphs. The value of \(K_G\) does not depend on the presence of centrality measure normalization. Furthermore, this value must be universal to any perturbed version of the original graph. The similarity between the stability notion and the Lipschitz continuity, applied in a discrete space, can be clearly noticed. A graph distance \(d: G\times H \rightarrow \mathbb {R}_{\ge 0}\) is specified with the purpose of making the stability inequality (1) meaningful as follows. Let

$$\begin{aligned} d(G,H)=\sum _{i,j}|A^G_{ij}-A^H_{ij}|, \end{aligned}$$
(2)

where A denotes the (weighted) adjacency matrix of graph G and H, respectively, and the two graphs share an identical node set V. When comparing the distance of graphs represented by adjacency matrices of different dimensions, we interpret the above (2) as the absolute value of the difference between the sum of the adjacency matrices. The mean stability value for a centrality measure C is calculated as

$$\begin{aligned} \frac{1}{|V|}\sum _{x \in V} |C^G(x) - C^H(x)|. \end{aligned}$$
(3)

Our numerical experiments showcase diverse methodologies that result in graph H from a given initial graph G. It is also of empirical interest how the theoretical constant \(K_G\) value in formula (1) is affected by these perturbation methodologies.

3.3 Theoretical values in stability concepts

In their study, Segarra and Ribeiro (2015) used the stability notion (1) on the most commonly used and widely known centrality measures. By doing so, they proved that the connection strength-based degree and eigenvector centralities, as well as the dissimilarity-based closeness centrality, are stable, in contrast with betweenness centrality. For the latter, its unstable behavior was also proven besides showing several undesired properties during synthetic graph perturbation procedures.

In addition to introducing the stability notion (1), theoretical \(K_G\) bounds were derived for the stable node centrality measures. For degree centrality, the \(K_G=1\) value can be applied to any weighted graph G. The constant value for degree centrality can be explained by the maximum distance of the two studied adjacency matrices, which is at least the maximum difference of the degree centrality value. Moreover, the theoretical value for a given undirected weighted graph G can be reduced to 1/2 due to the symmetry of the adjacency matrices. It is an interesting aspect concerning degree centrality that the theoretical bound of \(K_G=1\) can be reached with the smallest overall change in the directed graph. In case of in- and out-degree, this means that only one node’s connections are affected, and hence this will modify one single row and column, respectively, in the adjacency matrix. Thus, the maximal change in centrality will be equal to the distance of the original and perturbed adjacency matrix.

In their paper, they worked with the decentrality version of the closeness centrality (Freeman 1978), where lower value corresponds to more central nodes and showed that for closeness centrality the theoretical bound \(K_G\) is equal to the number of nodes; hence, it is not a universal constant. It is also immediate that the ranking stability of closeness centrality and the decentrality version is equivalent. The constant \(K_G\) value for the stable eigenvector centrality is computed as \(4/(\lambda _1 - \lambda _2)\), where \(\lambda _1\) and \(\lambda _2\) are the greatest and second greatest eigenvalue of the adjacency matrix of graph G, respectively.

Although the theoretical results for the constant \(K_G\) were given by Segarra and Ribeiro (2015), it would still be interesting to analyze its actual value in real networks under natural perturbation scenarios. By selecting the stable degree, the eigenvector and the closeness measures, the interesting gap between the theoretically worst possible option and the general behavior of the studied networks could be investigated. In the numerical experiments, the \(K_G\) values were far lower than their theoretical bounds for the stable measures. Furthermore, we wanted to answer the question whether a theoretically proven unstable measure could provide any useful insights when applied on various datasets. Numerical experiments confirmed that in spite of the unstable behavior of betweenness centrality, it can still contribute to give an insight into overall network dynamics.

3.4 Similarity and correlation

Cosine similarity (Han et al. 2011) is a measure of similarity between two nonzero vectors of an inner product space that measures the cosine of the angle between them. It is mainly used in positive space for information retrieval and text mining. The outcome of the cosine similarity ranges from \(-1\) meaning exactly the opposite, to 1 meaning exactly the same, with 0 indicating orthogonality or decorrelation, while in-between values indicate intermediate similarity or dissimilarity.

The Kendall’s tau rank correlation coefficient (Kendall 1938) is used to measure the ordinal association between two measured quantities. The coefficient results in high value when observations have a similar rank (i.e., relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables.

4 Datasets and methods

In anticipation of analyzing the stability of centrality measures, three scenarios have been considered. In this section, we will discuss the motivation behind these perturbation scenarios by describing the real-world and synthetic datasets that naturally involve some of the mentioned perturbations. Moreover, we will also give an algorithmic approach of the used methods.

4.1 Edge weight perturbation

The first perturbation method relies on the changing edge weights on fixed node and edge sets of graphs representing real-life processes.

S&P 500 and Mutual Funds Firstly, the well-known S&P 500 financial dataset was (Lamoureux and Wansley 1987) selected. By using stock data as the input for our experiments, the graph perturbation method was obtained directly from real-life processes. The daily closing prices of 330 leading US companies were collected from the Yahoo Finance portal, and the experiments were performed in the time period of 01/01/1995–31/12/2018. The companies selected from the S&P 500 list having complete data on the considered interval. A rolling time window of 200 days was used to construct correlation matrices from stock return time series with starting points \(T_0=01/01/1995\), \(T_k = T_0+k\Delta T\) with \(\Delta T=50\), \(k=1,2,\dots ,S\). In every graph in this consecutive network list, edge weights represent the Pearson correlation coefficient between the nodes (representing assets) on the corresponding time window. Another correlation-based financial graph was constructed based on the Mutual funds dataset (Treynor and Mazuy 1966). We performed experiments on this dataset using the daily closing prices of mutual funds in the period of 01/10/2010–31/12/2018, including the assets of 49 mutual funds, and applied the method as for the S&P 500 graph.

MovieLens As for our second main dataset for the edge weight perturbation, we used the MovieLens dataset collected by GroupLens Research (Harper and Konstan 2016). By selecting the most active users (\(\text {top}_N = 2500\)) and the most actively rated movies (\(\text {top}_M = 2000\)), we received a data frame containing the ratings in the period of 15/09/1997–30/03/2015. In every iteration, we selected a block with starting point \(T_0 = 15/09/1997\) and a block length of \(T_k = k\Delta T\) with \(\Delta T=500\), \(k=1,2,\dots\). From every block, we constructed a matrix M with columns representing the \(\text {top}_N\) users and rows representing the \(\text {top}_M\) analyzed movies. The \(M_{i,j}\) element of the matrix represents the rating given by the j user to the i movie. Based on the M matrix, we constructed an adjacency matrix A where an \(A_{ij}\) element is the cosine similarity between the rating vectors of users i and j. Consecutive graphs were constructed based on the adjacency matrices, making the analysis of rating behavior over time available.

4.2 Graph growth

Our secondly studied scenario was the graph perturbation caused by new nodes and edges joining the graph, i.e., those processes that generate growing graphs over time. The perturbation underlying this approach relies on the fact that during these processes new nodes can connect to the initial graph by establishing one or more connections, and also new connections can occur between already existing nodes. It is an interesting aspect how these new connections in the graph affect the stability of centrality measures.

Cooper–Frieze graph process Synthetic graph growth-related experiments were performed with the implementation of the graph process proposed by Cooper and Frieze (2003) which relies on the general model of web graph growing. The starting point of the process is an initial graph \(G_0\) (at time \(t=0\)). In our experiments, this initial graph consisted of two nodes connected with an edge. The graph process randomly evolves by the addition of new directed edges between existing nodes or by connecting new vertices to the graph with one or more directed edges at each time step \(t=1,2,\dots ,Z\). With probability \(\alpha \in [0,1]\) and \(1-\alpha\) a new node joins the network or an existing node generates edges, respectively. With probability \(p=(p_i: i\ge 1)\), a new node creates i edges. For new nodes, with probability \(\beta \in [0,1]\) the terminal node of a new edge is made uniformly at random and with \(1-\beta\) according to degree (i.e., new edges are preferentially attached). If an existing node generates an edge, where the number of edges given by probability \(q=(q_i: i\ge 1)\), the initial node is selected uniformly with probability \(\delta\) and proportional to its degree with \(1-\delta\). The parameter \(\gamma\) has similar role for existing nodes as \(\beta\) had in the case of new nodes. We started the centrality stability-related measurements after the 100th iteration by block of 10 iterations. Thus, at the end of an iteration block consisting of 10 time step t, a perturbed graph is produced with new nodes and/or edges compared to the graph from the previous iteration block.

Temporal networks To perform our graph growth-related experiments not only on synthetic networks, we selected several real-life temporal networks as well. The main feature, granted by a temporal network, is a time stamp assigned to the appearance of the interaction between the nodes. These interactions can be various user-to-user actions, like commenting on each other’s posts or articles, or rating each other’s behavior, which can eventually result in a trust network. The time-stamp-assigned temporal networks from the SNAP large dataset collection (Leskovec and Krevl 2014) are listed in Table 1, and the process of constructing and analyzing the graphs based on these datasets is described by Algorithm 1.

figure a
Table 1 Temporal network properties

4.3 Sampling

The third analyzed perturbation method relies on the assumption that we are not aware of the whole network, only a smaller part of it is available at a time. This situation often happens in case of large networks due to, for example, storage or bandwidth-related issues. The main question is that how well the properties (now centralities) of the original network can be approximated using only the given smaller parts.

figure b

The process of sampling is described in Algorithm 2. The studied sample sizes range from 20 to 90% of the nodes, and the sample size increases by 10% in every iteration of the outer cycle. It is no doubt that taking only one random sample for each sample size would not hold enough information to depict a general behavior; thus, we performed 25 uniformly at random selection of the nodes to obtain statistically interpretable results, as in the study by Costenbader and Valente (2003). For each individual sample, we calculated the stability- and ranking-related attributes and finally assigned their average to the currently analyzed sample size. As for the two previously discussed perturbation methods, we performed our experiments on consecutive graphs, whereas during graph sampling we compared the graph to the initial one. For the sampling-related experiments, a synthetic preferential attachment-based graph model (Albert and Barabási 2002) and a Facebook graph (Leskovec and Mcauley 2012) were used.

5 Experimental results

We developed a versatile simulation environment in order to perform our experiments and to handle various network data structures. The output of a simulation can be diverse plots, data tables and statistics depending on the user-defined parameters.Footnote 1 We performed a wide range of experiments to analyze numerically the stability of node centrality measures by using the three network perturbation methods described in Sect. 4.

5.1 Stability against edge weight perturbation

S&P 500 and mutual funds Firstly, we will discuss our results concerning the S&P 500 dataset. In the case of the correlation-based financial graph, we analyzed the behavior of the similarity-based measures (degree and eigenvector centralities) as well as the behavior of the dissimilarity or distance-based measures (closeness and betweenness centralities). To calculate distance-based measures, the correlation coefficients were transformed to distances according to Mantegna (1999) and Valle et al. (2018). The behavior of the \(K_G\) constant value over time is presented in Fig. 1a on a logarithmic scale for the analyzed measures. Moreover, separated figures for degree, closeness and betweenness are shown in Fig. 2a–c for deeper insights. As eigenvector shows similar behavior as degree, our findings for degree will automatically apply to it too. It can also be seen that degree and closeness follow similar tendency; thus, in Fig. 2d we only listed the maximal degree and betweenness values along with the matrix distances over time. In Fig. 2a and b, the reported \(K_G\) values show fewer and less radical changes than the betweenness-related values reported in Fig. 2c. This behavior can be associated with the unstable behavior of the betweenness. Nevertheless, an interesting aspect can be applied for all the measures as the stability values show some greater changes in periods of crisis. For degree and closeness values, increases can be observed around 2004, 2008–2009, 2010–2011 and 2013. The betweenness values on the one hand show a similar increase around these periods. On the other hand, these values can just as well indicate completely different events in finance that also took place in 2000, 2012, 2014, 2015, 2016 and finally in 2017. Later on, in Sect. 6, we will give a detailed explanation for all the dates marked on the figures and listed here. It is a noteworthy fact that these actual \(K_G\) values are way lower than their theoretical bounds for the three previously analyzed stable centrality measures.

Fig. 1
figure 1

Stability values during the edge weight perturbation (S&P 500 graph)

Fig. 2
figure 2

Empirical K values for centrality measures and the used components during the edge weight perturbation (S&P 500 graph)

Fig. 3
figure 3

Correlation values during the edge weight perturbation (S&P 500 graph)

Fig. 4
figure 4

Jaccard similarity values during the edge weight perturbation (S&P 500 and MovieLens graphs)

The other interesting approach is the order or ranking by the metrics introduced in Sect. 3.4 that made possible to measure the ordinal association between the analyzed node ranking vectors during the graph perturbation procedure. The values of the Kendall’s tau correlation coefficients are reported in Fig. 3b. In case of degree, eigenvector and closeness centralities, results can be noticed in the range of 0.35–0.7, whereas the betweenness centrality values show less stable results in the range of 0.1–0.6 and also seemingly random higher values can be noticed. These higher values occur in times of financial crises such as between 2003 and 2004, 2008 and 2010 and around 2013.

Fig. 5
figure 5

Empirical K values during the graph growth procedure (directed Cooper–Frieze graph)

Figure 4a gives an insight into how much of the top 30 most central (measured by the different centrality measures) stocks can keep their importance during the time. The reported Jaccard similarity metrics exhibit a much more stable behavior for degree, eigenvector and closeness centrality compared to the values of the radically changing betweenness centrality. These results support the values reported in Fig. 3b. It is also an interesting finding that in the time period around 2009, we can notice high Jaccard similarity values for the three stable measures and for betweenness as well. However, in 2003–2004 betweenness-related values increase, while the other three of them show decreasing tendency. These findings show the differences among betweenness centrality and the other three measures. Pearson correlation coefficient was calculated on the concrete centrality measures and is shown in Fig. 3a. Similar results were produced by analyzing the same centrality measures in case of the Mutual Funds financial network. Delpini et al. (2019) investigated the bipartite network of US mutual fund portfolios and their assets during the 2007–2008 Global Financial Crisis in order to have a better understanding of the relationship between the similarity of individual investment strategies and systematic riskiness. Their results show that the similarity—which can be considered as a proxy of vulnerability—of portfolios has decreased during the crisis. The peak of the systematic damage caused by the financial crisis can be spotted approximately halfway between the years 2007 and 2008. Our results from 06/07/2007 also showcase some higher \(K_G\) values for degree and eigenvector centralities, whereas closeness and betweenness centralities produce lower values compared to results from the previous and subsequent time intervals.

Fig. 6
figure 6

Stability, correlation and similarity values during the graph growth procedure (directed Cooper–Frieze graph)

MovieLens In the case of the MovieLens dataset, the numerical stability values tend to show a higher fluctuation and values, about one magnitude higher compared to the S&P 500 network. Interestingly, betweenness centrality values are in a higher range compared to the other centrality measures, which tend to have quite similar values over time. The results reported by the Jaccard similarity applied on the top 30 nodes by the different measures (Fig. 4b) show a strongly decreasing tendency. We can also notice a more faster decrease in the betweenness centrality-related results compared to the other measures. These results suggest that the fluctuation of betweenness centrality is higher on the top ranking segment of the nodes, whereas the overall ordinal association is stronger than in the case of the other examined centrality measures.

5.2 Stability against graph growth

Directed Cooper–Frieze graph We performed experiments with respect to both the directed and the undirected version of the Cooper–Frieze graph process; however, we will discuss our results for the directed version of it and we will only mention some of the differences between the two. In the directed case, we used the directed versions of betweenness and eigenvector along with in-degree, out-degree, in-closeness, out-closeness and PageRank. The empirical \(K_G\) stability values of the measures are reported in Fig. 5d. The betweenness centrality stability values during the graph growth process stay in the range of 1.65–461.93 (1.57–37.08 in the undirected case), with the mean values in the range of 0.28–5.6 (0.2–0.47). This suggests that the mean stability of the betweenness centrality produces more stable values than when applied on an undirected graph. In the undirected version of the process, the stability values for eigenvector centrality show only slight changes, whereas seemingly radical changes of them over time can be noticed for the directed network. The empirical values range in \(1.6\times 10^{-4}-3\times 10^{-3}\) in the undirected case, while in the directed case several low values are shown in Fig. 5d.

Fig. 7
figure 7

Empirical K values for centrality measures during the graph growth procedure (directed Cooper–Frieze graph)

Fig. 8
figure 8

Empirical K values during the graph growth procedure (Bit coin OTC network)

Fig. 9
figure 9

Empirical K values during the graph growth procedure (Bitcoin OTC network)

As for the Pearson correlation coefficient values and the Kendall’s tau rank correlation coefficient, a slightly more intense convergence can be noticed in Fig. 6b and c, compared to the one produced by the undirected Cooper–Frieze network. On the other hand, in the case of the directed graph, the directed betweenness centrality measure has more stable values in the range of 0.6–1, although the out-closeness centrality produces some low values in the range of 0.47–0.6. Comparing the directed and undirected version of the Cooper–Frieze graph process, we can state that directed betweenness centrality performs better than the undirected measure. When analyzing the centrality measures separately, a decreasing behavior can be noticed in Figs. 5c and 7c and d, whereas directed betweenness centrality presented in Fig. 5a shows a slightly increasing tendency.

Temporal networks The convergence can also be identified in case of temporal networks. Note that all four of the examined temporal networks are directed. For the Bitcoin OTC network, the results are shown in Figs. 8, 9, 10 and 11. In the case of temporal networks, the reported results for betweenness centrality stability are in the range of 45.48–1246.27 for StackOverflow, 159.39–1108.11 for Reddit, 28.05–523.7 for Bitcoin and 19.33–276.44 for WikiTalk network. The range of the stability values by the applied measures is quite the same for all of the examined temporal networks. Changes in the order only occur concerning in-closeness and out-closeness measures. For Reddit and StackOverflow networks, in-closeness values are higher than out-closeness stability values, whereas in the case of Bitcoin OTC and WikiTalk networks the out-degree values tend to be higher than the in-closeness ones. In the case of the directed Cooper–Frieze graph, the in-closeness values had higher values. The range of the correlation coefficient measures on the Bitcoin dataset is 0.47–0.9, and interestingly, betweenness centrality tends to have higher values than the other centrality measures. In general, as for Pearson correlation, in-closeness and out-closeness centrality measures produce values in the lower ranges, compared to the other measured properties. A general convergence can be spotted too when analyzing the similarity of the top 30 nodes. Interestingly, in the case of the Reddit, StackOverflow and WikiTalk networks, the lowest values are delivered by PageRank; the measures altogether show a monotonically increasing tendency. In the case of the WikiTalk network, except the similar increasing behavior, the PageRank values decrease over time. A slightly different result is obtained by examining the Bitcoin OTC network, where in-closeness and out-closeness measures have the lowest values in the range of 0.475–0.75.

Fig. 10
figure 10

Empirical K and correlation values during the graph growth procedure (Bitcoin OTC network)

Fig. 11
figure 11

Empirical correlation and similarity values during the graph growth procedure (Bitcoin OTC network)

5.3 Stability against sampling

The results for the sampling procedure are shown in Figs. 12, 13 and 14. The numerically calculated \(K_G\) values for each dataset support the unstable behavior of betweenness centrality as shown in Figs. 12a and 13a. Major differences can be seen between the behavior of the centrality measures on the different datasets when the simulation reaches the smaller sample sizes.

Fig. 12
figure 12

Empirical K and correlation values during the graph sampling procedure (Barabási–Albert graph)

Fig. 13
figure 13

Empirical K and correlation values during the graph sampling procedure (Facebook graph)

Fig. 14
figure 14

Jaccard similarity during the graph sampling procedure (Barabási–Albert and Facebook graphs)

As for the preferential attachment graph model, both Kendall’s tau for node ranking and Pearson correlation coefficient for the concrete centrality values have an increasing tendency when sample sizes become smaller. This can be explained by the structure of the preferential attachment graph model. The above-mentioned interesting aspect is visualized in Fig. 14a and in Fig. 14b, showing the Jaccard similarity of the top 30 nodes. In the case of the Barabási–Albert preferential attachment graph, the Jaccard similarity metric tends to become higher when sample sizes become smaller. On the other hand, for the real-life Facebook dataset, the Jaccard similarity tends to decrease when sample sizes become smaller.

These features do not affect the numerically calculated \(K_G\) stability values for the two different networks, though. Only slight changes in the behavior of closeness centrality can be noted in Fig. 12a and in Fig. 13a. The stability results for betweenness centrality fall into the range of 22.79–115.09 and of 622.007–1028.36 for the Facebook graph and the preferential attachment graph, respectively. Differences in the calculated Pearson correlation coefficient can be noticed in Fig. 12d and in Fig. 13d. The best performing measure, based on Pearson correlation, is degree centrality for both graphs, followed by eigenvector for the Facebook graph and by closeness in the case of the Barabási–Albert graph.

6 Discussion

The stability of centrality measures against three perturbation categories (edge weight perturbation, graph growth and sampling) was examined by implementing a versatile simulator to calculate the \(K_G\) constant values as proposed by Segarra and Ribeiro (2015). The most commonly used centrality measures (degree, closeness, eigenvector, betweenness), along with their directed versions, as well as PageRank were used in our numerical experiments. Various real-world datasets were selected, a web-graph-like Cooper–Frieze graph process was implemented, and the Barabási–Albert preferential attachment model was also used. Node rankings based on centrality measures were analyzed as well, besides the stability-related experiments. Our simulations resulted in some remarkable results. Against all the three perturbation methods, betweenness centrality showed a generally more unstable behavior compared to other measures, but experiments on real-world datasets proved the measure’s usefulness besides its characteristics. Further on, we will discuss some deeper insights and conclusions concerning the stability properties found in the examined perturbation scenarios.

Numerical experiments on the S&P 500 correlation-based financial graph showed that the stability of centrality measures can indicate changes in real-life processes, like the effects of crisis on stock market. Important dates identified during our analysis are marked in Fig. 2a–c and listed in Sect. 5.1. For degree and closeness values increases around 2004, 2008–2009, 2010–2011 and 2013 can be observed. The increase in 2004 can be explained by the beginning of the Iraq war, the one between 2008 and 2009 may reflect the Global financial crisis and the Lehman Brothers failure, the rise between 2010 and 2011 can be associated with the Sovereign debt crisis, whereas the one in 2013 might represent the US government shutdown. The betweenness values show similar increases around these periods too; however, it can certainly indicate other events in finance taking place in 2000, 2012, 2014, 2015, 2016 and finally in 2017. The additional events that can be noted by the extreme values for betweenness centrality are the Dotcom stock crash (2000), the US housing bubble (2012), the Ebola outbreak (2014), Oil skidding and general lower earnings per share values (2015), the US Fed increase by 0.25 (2016) and the Brexit (2017). In the followings, we will give a detailed discussion on our insights and assumptions for these extreme changes. These higher \(K_G\) values in times of distress can be important on their own. What is more interesting is the fact that the component resulting in higher values can also indicate the type of the crisis.

Numerical \(K_G\) values were calculated by dividing the maximum change in centrality vectors by the distance between the two consecutive networks; see Formula (1). Thus, higher \(K_G\) values can occur if the distance (denominator) decreases or the maximum change (nominator) increases. More importantly, the produced values can be categorized by the component that resulted in its increase.

  • Our intuition for the nominator’s increase in (1) is that it can be associated with events that only affect some sectors, although not the whole market. For example, the beginning of the Iraq war in 2003 could possibly result in higher centrality measures in the oil sector. However, this change might not affect, for example, the tech giants or other independent sectors. These sectorial changes only appear as smaller or local changes in the graph, hence do not produce extreme changes in the distance of the two consecutive graphs.

  • When the distance, i.e., the denominator of (1), is the dominant component that produces the higher \(K_G\) values, we can state that the whole market is somewhat affected at that time. It is a well-known stylized fact in finance that assets correlation increases in times of financial distress, and if this situation remains for a longer period of time, it will result in lower distance values in the stability calculations. So we can state that a financial crisis affecting the whole market—like the one taking place in 2008—can lead to smaller distances between the consecutive graphs and thus in higher \(K_G\) values in our numerical experiments.

As for the other datasets of this perturbation process, we can say that the Mutual Funds financial graph also shows some higher values in times of crisis, occasionally at important events for which the S&P 500 graph remains insensitive. About the rating-based MovieLens dataset, an interesting phenomenon is that the degree centrality almost reaches its theoretical bound, being 1.

In the case of graph growth, simulations were performed on a synthetic graph process and on several real-world temporal networks. Pertaining the Pearson correlations and the Kendall’s tau rank correlations, a general convergence can be noted in both synthetic and real-world scenarios. The stability values with respect to betweenness centrality show an increasing tendency, whereas a decreasing behavior of closeness and PageRank can be noticed across all the studied networks. Interestingly, degree and eigenvector centralities have some stable values between given ranges, although in the case of real-world networks, their behavior indicates a decreasing tendency. When analyzing the graph distances for the Bitcoin OTC, an interesting phenomenon emerges due to two remarkably large values, calculated in August and December of 2013. For deeper insight, we analyzed the ratings among users (values between \(-10\) and 10) batched in blocks of one-month length. The average rating was 1.01 over the analyzed time, and the average monthly ratings were performed in the range of 0.9–1.7. In the aforementioned months, the average rating was \(-2.35\) and \(-1.07\), respectively, which made them the only months that produced negative rating average. The number of negative or punitive ratings in the analyzed time period was 3, 563, from which 2, 413 were equal to \(-10\). Six hundred and seventy-seven of them were received in 08/2013 and 187 in 12/2013. These punitive activities were quite frequent between the small and big bubble of 2013 as stated by Ilaria et al. (2018). Around these two months, radical changes can be noticed in the behavior of both in-degree and out-degree, which is followed by extreme changes in the centrality value vectors shortly after, resulting in higher \(K_G\) values in that particular time period.

Lastly, the stability performance of centrality measures was studied when networks are sampled by analyzing the behavior of the measures on the synthetic Barabási–Albert graph process and the real-life Facebook friendship graphs. The correlation between the centrality values and also the ordinal association of the ranking vectors shows a decreasing tendency in general. In case of the preferential attachment graph, the correlation values become higher when sample sizes become smaller, compared to the values on larger samples. This can be explained by the interesting fact that the preferential attachment-based graph shows some features of the initial graph, when sample sizes are small. Note that this tendency cannot be observed during the analysis of the Facebook real-world dataset.

After discussing our perturbation method specific findings, we form a general statement concerning the behavior of the measures that can be applied for all the three above-mentioned scenarios. It can be stated for all the methods that the empirical \(K_G\) values computed by our simulations are always smaller than the measures’ theoretical values proven by Segarra and Ribeiro (2015). More importantly, as for the real-world datasets the changes in the \(K_G\) values always yield the presence of important phenomena and behaviors. Obviously, this information cannot be deduced directly only taking into consideration the theoretical values. By analogy with the big O notation, the theoretical stability constraints though limit the behavior of a measure, but by doing so, it also has a general blurring effect, whereas in our empirical analysis we aimed at revealing even the smallest changes contributing to the overall network dynamics and found out that there is a correlation between the changes in stability values and real-world events in several cases. Besides betweenness centrality showing unstable behavior regarding the real-world datasets too, it can as well be stated that the changes in the measure can indicate completely different dynamics in the graph compared to the stable measures, which show similar behavior in many times. Thus, by revealing deeper insights, we claim that betweenness centrality can indeed expose useful aspects of the analyzed data, despite its theoretically proven unstable behavior.

Our study can surely influence further theoretical- and application-related research on the stability of centralities. Developing novel concepts of stability, or proving theoretical bounds for additional measures applied for the concept used in this paper can be an interesting approach. As for the application fields, the use of these stability concepts can be fundamental in practical data mining-related tasks. Before using one particular measure, and jumping to a conclusion for time series or similar datasets, one could analyze the performance of the selected measure. By applying it on smaller examples and doing several perturbation scenarios, the centrality measure suited for the exact problem could be easily selected by its stability values. Thus, helpful insights into stability could make crucial tasks, like time and resource allocation much easier and more efficient.

7 Conclusion

The stability of centrality measures against various perturbation methods was analyzed based on the concept proposed by Segarra and Ribeiro (2015). A general statement with respect to the change of numerical stability values yielding the presence of important phenomena was stated. Overall network dynamics could be better explained by analyzing even the smallest changes in the stability values that often showed a correlation with real-life events. The S&P 500 financial dataset, which naturally includes edge weight perturbation, demonstrated that changes in stability values can easily indicate times of financial distresses. The possible usefulness of betweenness centrality despite its theoretically proven unstable behavior was revealed. Concerning the secondly analyzed graph growth process, effects of real-life conducted processes were likewise found with the analysis of the Bitcoin OTC dataset. Sampling-related experiments produced interesting results when reaching smaller sample sizes. Besides providing and discussing our numerical results in detail, we also depicted some possible future work and application-oriented aspects, by highlighting their usefulness pertaining data mining-related tasks and studying the performance of different centrality indices.