PathBased and WholeNetwork Measures
 122 Downloads
Keywords
Undirected Graph Social Network Analysis Cluster Coefficient Betweenness Centrality Geodesic DistanceSynonyms
Glossary
 Betweenness centrality

A measure of the proportion of shortest paths in a network passing through a specific node or edge.
 Closeness centrality

A measure of how close a node is to all the other nodes of a network.
 Clustering coefficient

A measure of how much nodes tend to form groups in a network.
 Diameter

The maximum distance between two nodes.
 Direct connection

An edge between two nodes, usually indicating the existence of a specific relationship, e.g., a friendship between two individuals.
 Dyad

A group of two people.
 Geodesic distance (or distance)

Length of one of the shortest paths between two nodes.
 Indirect connection

A path between two nodes that are not directly connected through an edge.
 Node

An entity in a network, usually representing an individual.
 Path

A sequence of edges sharing common endpoints. e.g., an edge between n _{ i } and n _{ j } followed by an edge between n _{ j } and n _{ k }._{.}
 Triangle

Three nodes with an edge between every pair of them.
Definition
Pathbased measures associate a value to every node in a network according to its direct and indirect connections to other nodes. For example, given a node we can compute the maximum distance to all other nodes: this measure is called node eccentricity. Wholenetwork measures associate a value to an entire network, providing a summary of its structure. For example, the diameter of a network is the maximum eccentricity of its nodes and represents a global measure of the efficiency of information dissemination in that network. In this essay we cover the most popular pathbased and wholenetwork measures.
Introduction
Graphs are a widely used abstract representation of the structure of social networks, where nodes represent individuals and edges indicate relationships between them, e.g., communication acts or friendship ties. While a graph representation hides many details of the original social network – the content of communication relationships, personal data about the individuals, and so on – the structure of the graph may highlight many relevant features. How fast does the information produced by a node reach other nodes? How important is a specific node in facilitating or slowing down information diffusion? Which one of two given networks is more active than the other? To answer these and similar questions, we need quantitative measures describing the network structure and efficient algorithms capable of computing these measures. In this essay we describe the main graph measures related to paths between nodes (eccentricity, closeness, betweenness, clustering coefficient) and those used to summarize whole networks (diameter, network closeness, network clustering coefficient, and density).
From this simple example, we can see how a graph induces a notion of distance between nodes: Rodrigo and Lucia are closer to each other than Renzo and Lucia. Distance is considered a good indicator of how easy or fast it is to send information from one node to another and is used to define several wellknown singlenode measures.
While the distance between pairs of nodes and the role of nodes in between are useful to characterize information propagation, triads (sets of three nodes) are important to study the tendency of members of a network to form groups. In real social networks, when a node is connected to two other nodes, we observe that those two nodes are often connected to each other as well. In our example, we may expect that Renzo and Lucia also exchange their email addresses after some communication mediated by Rodrigo, therefore allowing direct communication and forming a triangle (Fig. 1b). A specific measure called clustering coefficient indicates how frequently pairs of neighbors are neighbors themselves.
Singlenode metrics can often be used to compute wholenetwork measures by taking into consideration the contribution of all nodes. Network closeness and network clustering coefficient can be defined as averages of the corresponding singlenode measures. The diameter of a network corresponds to the maximum eccentricity and thus represents the maximum distance between nodes. Finally, the network density indicates how many of all possible connections actually exist.
Key Points

Graphs are a widely used abstraction of the structure of social networks; for this reason, properties of a social network are inferred from those of its graph representation.

Pathbased and singlenode metrics provide a quantitative measure of attributes or properties of individual nodes in a network, such as the level of “importance” of a node, whether a node is “well connected” to other nodes, and so on.

Wholenetwork measures summarize attributes of the whole graph.

In some cases, wholenetwork measures can be obtained by combining the values of singlenode metrics over all nodes.

Analysis of large graphs can be computationally challenging. New algorithms for social network analysis need to be developed to take advantage of modern highperformance computing architectures.
Historical Background
The foundations of graph theory date back to year 1735 (Alexanderson 2006), when the Swiss mathematician Leonhard Euler solved a problem known as “the seven bridges of Königsberg.” The Prussian city of Königsberg (now Kaliningrad) was located on a bifurcation of the Pregel river that included an island. Seven bridges were placed across the banks formed by the river and the island. The problem was stated as follows: does there exist a path that allows one to cross all bridges exactly once? Euler proved that the problem has no solution.
The application of graphs to the study of social interactions is generally attributed to American psychiatrist of Romanian origins Jacob Moreno (Newman 2010). In his 1934 book Who shall survive (Moreno 1934), Moreno shows and discusses diagrams of human interactions that he calls sociograms as directed or undirected graphs. Since then, social network analysis (SNA) has been applied to the most diverse areas such as friendship patterns in communities; romantic and sexual relationships; collaboration of scientists, actors, or musicians; networks of terrorists; and food chains in ecological systems. We will briefly discuss some of them in section “Key Applications”; the interested reader is referred to Newman (2010) for a more complete list of references.
Measures
Preliminaries: Networks
In this essay, we will mostly consider the representation of a social network as an undirected, unweighted graph.
Definition 1 (Social Network)
A social network is an undirected graph G = (V, E) where V is a set of nodes (individuals) and E ⊆ V × V is a symmetric relation indicating social connections.
In the following, we denote with n the number of nodes and with m the number of edges of G. Given a graph G, a path p = 〈v _{0} ,..., v _{ k }〉 of length k from node u to node v is a sequence of nodes v _{0} ,..., v _{ k } such that v _{0} = u, v _{ k } = v, and every consecutive pair of nodes is connected by an edge: (v _{ i } , v _{ i+1}) ∈ E, for all i = 0,..., k − 1. Two nodes u and v are connected if there exists at least one path between them. Note that, since edges in undirected graphs can be traversed in both directions, a path from u to v can always be reversed to obtain a path from v to u.
Representing a social network as an undirected, unweighted graph is quite common; however, it is important to keep in mind that such representation is an abstraction based on a set of assumptions, which may or may not hold.
The first assumption is to consider undirected graphs, where edges can be traversed in both directions. Undirected graphs are an appropriate representation of social networks where relationships between individuals are symmetric. For example, “friendship” in Facebook is symmetric. However, the following/follower relation on the Twitter network is not symmetric; therefore, a better representation would be based on directed graphs: if Lucia is following Renzo on Twitter, Renzo is not necessarily following Lucia. Many concepts defined for undirected graphs still hold for the directed case, such as the definition of geodesic path used as a basis for most of the measures presented in this essay. However, in directed graphs there can be a path from node u to node v without any path from v to u. In this case the distance between v and u is infinite which may require adjustments in those measures which are based on maximizing or averaging distances. Extensions for directed graphs are presented in White and Borgatti (1994) and can be found in SNA textbooks (Wasserman and Faust 1994; Newman 2010).
The second assumption is to consider unweighted graphs, where edges do not carry any “cost” or “weight” associated with them. Adding weights to edges could be useful to convey additional details on the underlying social network. For example, the weight of an edge may represent the strength of the social relationship it represents. This kind of weight is available in online social networks such as Google+ and can be computed for other networks by looking at the communication acts, e.g., by counting the number of messages exchanged between two nodes. Extensions of pathbased measures to weighted graphs are discussed in (Peay 1980; Opsahl et al. 2010). It should be observed that these extensions are not yet widely adopted. One reason is that giving accurate estimates of edge weights requires a deep understanding of social relationships; this kind of information can be extremely difficult to infer and is often domain specific, i.e., it depends on the type of social network. On the other hand, simple structural properties (who is connected to whom) are much easier to identify.
The third assumption is that the graph is fully connected, i.e., each pair of nodes is connected by at least one path. In fact, social networks tend to have a large connected component, which includes most of the nodes that are typically those of interest. Therefore, analysis of disconnected networks is usually carried out by first splitting the graph into its connected components and then analyzing each component separately.
We now introduce the definition of singlenode measures, including for each one a brief description and, where appropriate, some usage hints or limitations. We then illustrate the corresponding wholenetwork measures, which are typically derived by aggregating the singlenode values. It is important to observe that some network measures have been defined by different authors in different, incompatible ways, e.g., with and without normalization factors. Therefore, different network analysis tools may provide different results when computing the same measure, since they may be using different definitions.
SingleNode and PathBased Metrics
In this section we describe the most commonly used pathbased and singlenode measures in SNA.
Shortest Paths and Geodesic Distance
Geodesic distance d(u, v) between all pairs of nodes in the working example
n _{0}  n _{1}  n _{2}  n _{3}  n _{4}  n _{5}  n _{6}  n _{7}  n _{8}  

n _{0}  0  1  2  1  1  1  3  3  4 
n _{1}  1  0  1  2  2  2  2  2  3 
n _{2}  2  1  0  3  3  3  1  1  2 
n _{3}  1  2  3  0  2  2  4  4  5 
n _{4}  1  2  3  2  0  1  4  4  5 
n _{5}  1  2  3  2  1  0  4  4  5 
n _{6}  3  2  1  4  4  4  0  2  3 
n _{7}  3  2  1  4  4  4  2  0  1 
n _{8}  4  3  2  5  5  5  3  1  0 
Eccentricity
Singlenode measures for the working example (values approximated to the second decimal)
Node  Eccentricity  Closeness  Betweenness clustering  Coefficient 

n _{0}  4  0.50  17.0  1/6 
n _{1}  3  0.53  16.0  0.0 
n _{2}  3  0.50  17.0  0.0 
n _{3}  5  0.35  0.0  0.0 
n _{4}  5  0.36  0.0  1.0 
n _{5}  5  0.36  0.0  1.0 
n _{6}  4  0.35  0.0  0.0 
n _{7}  4  0.38  7.0  0.0 
n _{8}  5  0.29  0.0  0.0 
Closeness
where n is the number of nodes of the graph G. n is sometimes used as a normalization factor instead of n − 1. Moreover, 1/C(u) is used in SNA tools such as Gephi as a measure of closeness, although it would be more appropriate to consider it as a measure of farness since it increases when the average distance increases. However, all these alternative definitions return the same ranking of nodes (in inverse order, if the inverse of Eq. (2) is used), so they can be typically used interchangeably.
Note that the definition of closeness relies on the assumption that the graph is connected, so that there are n − 1 other nodes connected with u. If we consider Fig. 2, the closeness of node n _{0} can be computed by averaging the distances of all other nodes from n _{0}, as reported in Table 1, and inverting the result. Therefore, we get C(n _{0}) = 8/(1 + 2 + 1 + 1 + 1 + 3 + 3 + 4) = 0.5. The third column of Table 2 shows the closeness of each node of our sample graph.
When closeness is used to compare different nodes or networks, it is important to consider that this measure usually spans a small range of values. In real social networks, the distance between nodes tends to be small, while the number of nodes can be very large. In addition, the definition of closeness according to (2) fails to take into consideration the fact that distant edges are less likely to spread information, since in (2) all edges in a shortest path contribute equally. For example, n _{0} may rely more on its neighbors n _{1}, n _{3}, n _{4}, and n _{5} to forward its messages than n _{7} that is not directly connected to it. As a consequence, alternative definitions of closeness may be considered, assigning different weights to specific edges depending on their distance from the node under examination.
Betweenness
Betweenness gives a measure of the load placed on a given node. Intuitively, if a node u has a large value of betweenness, then it tends to appear on many shortest paths. Since shortest paths are the most efficient way to route information, a node with high betweenness is more likely to play an important role in the dissemination of information (White and Borgatti 1994).
Considering our example, peripheral nodes like n _{8} do not belong to any geodesic path; therefore, their betweenness is 0. On the other hand, let us consider node n _{2}. If one of the nodes {n _{8}, n _{7}} wants to send a message to any of {n _{0}, n _{1}, n _{3}, n _{4}, n _{5}, n _{6}} through a geodesic path, then the message must pass through n _{2}. Looking at Fig. 2 it clearly appears how n _{2} plays an important role in allowing information to pass from one side of the network to the other side. The fourth column of Table 2 shows the betweenness of all nodes in the graph.
It is important to observe that the definition of betweenness centrality assumes that information always flows through geodesic paths. While this may not always be the case in real social graphs, we may consider shortest paths as the most likely information channels and thus use this definition of betweenness as an estimate of the real number of messages passing through a node. However, more complex versions of betweenness have been proposed, taking nonshortest paths into consideration (Newman 2005).
Clustering Coefficient
The clustering coefficient (Watts and Strogatz 1998) measures the tendency of the neighbors of a node to be connected to each other forming a fully connected subgraph (clique). The relevance of this measure comes from the fact that real social networks present a higher clustering coefficient than corresponding random networks, indicating a tendency to create triangles. Therefore, this measure discriminates between random and social networks and highlights nodes whose neighbors are well connected to each other.
Let n(u) = N (u) be the number of neighbors of node u. We observe that the maximum number of edges that an undirected graph with n(u) nodes can have is n(u) × (n(u) − 1) /2; this number corresponds to the number of edges of the complete graph with n(u) nodes (in a complete graph, there is an edge connecting each pair of nodes).
WholeNetwork Measures
Wholenetwork measures for the graph in Fig. 2
Measure  Value 

Diameter  5 
Closeness  0.32 
Betweenness  0.71 
Clustering coefficient  0.24 
Density  0.25 
Diameter
Therefore, the diameter of G is the maximum distance between all pair of nodes. For the graph of Fig. 2, we observe from the data shown in the second column of Table 2 that the maximum eccentricity is 5; therefore, the graph diameter is 5.
The diameter of a graph is an indication of the efficiency of information transmission: information produced by a node may need to traverse D(G) edges to reach other specific nodes in the network. Note that information may also traverse more than D(G) edges if it does not follow shortest paths. In addition, for graphs with a dense core and a few distant nodes, the diameter is determined by those nodes and may assume a high value, whereas the majority of nodes are close to each other.
Closeness
where C(v) is the closeness of node v and C _{max} = max_{ v∈V } C(v) is the maximum value of closeness over the whole graph. In our example we have C(G) = 0.32, as it can be obtained from the values in column closeness of Table 2.
The feature emphasized by closeness is similar to those emphasized by eccentricity and diameter. However, a single node far away from the others would affect closeness to a more limited extent than eccentricity and diameter. Also in this case, a higher value of closeness corresponds to a node or network where information may tend to reach other nodes more quickly.
Betweenness
Clustering Coefficient
This measure has also been extended to weighted networks by Opsahl and Panzarasa (2009).
As said, when computed on a social network, the clustering coefficient measures the tendency of individuals to form triangles. In fact, it is often described as a measure of the socalled transitivity of a graph (the term transitivity is sometimes used as a synonym of wholenetwork clustering coefficient). More in general, the clustering coefficient of a network indicates a tendency to form dense subgraphs, also called communities or clusters depending on their interpretation.
Other worthmentioning related measures are the richclub coefficient, indicating the tendency of nodes of high degree to be well connected to each other, and the degree correlation, i.e., the probability that an arbitrary edge connects two nodes of specific degrees. The more general concept of assortativity (also known as assortative mixing) is used to indicate the tendency of nodes to connect to similar nodes, and one wellknown assortativity measure is modularity, used by several methods of community detection (Fortunato 2010). Additional details on these measures can be found in (Costa et al. 2007).
Density
where m = E is the number of edges of G.
Density is always in the range [0, 1]. If ρ(G) = 0, then G has no edges and all nodes are isolated; if ρ(G) = 1 then G is a complete graph, where every pair of nodes is connected by an edge. Note that the minimum density for a connected graph with n nodes is 2/n, since any connected graph must have at least n − 1 edges. The converse is not necessarily true: a graph with n − 1 edges may be disconnected.
The density of social graphs is typically low, meaning that on average every individual is connected to a small number of other individuals. This can be observed on any online social network site, e.g., Facebook or Twitter, where most users are connected to tens or hundreds of other users out of about a billion total users. This feature can be explained by the existence of limits in cognitive activity preventing a person to manage more than a given number of stable relationships (Goncalves et al. 2011) and is fundamental for the design of efficient algorithms. For our graph in Fig. 2, we have n = 9 nodes and m = 9 edges, resulting in a density of ρ(G) = (2 × 9)/(9 × 8) = 1/4.
Summary
Notation summary
V Set of graph nodes 
E Set of graph edges, E ⊆ V × V n 
Number of nodes 
m Number of edges 
d(u, v) Geodesic distance from node u to node v 
σvw Number of shortest paths from v to w 
σvw (u) Number of shortest paths from v to w passing through u 
N (u) Neighbors of node u 
n(u) Number of neighbors of node u, n(u) = N (u) 
EN (u) Set of edges in the neighborhood of u 
Singlenode and wholegraph metrics
Eccentricity \( E(u)=\underset{v\in V}{ \max } d\left( u, v\right) \)  Diameter \( D(G)=\underset{u\in V}{\mathit{\max}} e(u) \) 
Closeness \( C(u)=\frac{n1}{\sum_{v\in V, u\ne u} d\left( u, v\right)} \)  Closeness \( C(G)=\frac{\sum_{v\in V}\left({C}_{\max } C(v)\right)}{\left({n}^23 n+2\right)/\left(2 n3\right)} \) 
Betweenness \( B(u)=\sum_{\begin{array}{c} v, w\in V\\ {} v\ne w\ne u\end{array}}\frac{\sigma_{vw}(u)}{\sigma_{vw}} \)  Betweenness \( B(G)=\frac{\sum_{v\in V}\left({B}_{\max } B(v)\right)}{\left( n1\right){B}_{\max }} \) 
Clustering Coefficient \( CC(u)\frac{2\times \left EN(u)\right}{n(n)\times \left( n(n)1\right)} \)  Clustering \( CC(G)=\frac{\sum_{u\in V} CC(u)}{n} \) 
Density \( \rho (G)=\frac{2 m}{n\left( n1\right)} \) 
Computational Aspects
In this section we describe the algorithms that can be used to compute some of the metrics described in the previous sections. We assume that the reader is familiar with the basic concepts of algorithm design and analysis, as can be found in introductory textbooks such as Cormen et al. (2009). However, in order to make this essay selfcontained, we summarize the main points below.
An algorithm is a finite sequence of steps describing an effective method for calculating a function. A fundamental attribute of an algorithm is its efficiency, representing the amount of resources (e.g., CPU time or storage space) it needs to compute the result. The amount of resources depends on the input size: it is reasonable to expect that for larger inputs, the algorithm will require more CPU time and/or storage space to compute the result. The input size of most graph algorithms is the size of the input graph, which in turn is proportional to the number of nodes n and edges m.
The goal of algorithm analysis is to define a function mapping the input size to the number of elementary steps (time complexity ) or storage locations (space complexity ) required by the algorithm to compute the result. Since it is in general not possible to give a precise definition of the complexity function, it is common to estimate its asymptotic behavior, that is, the growth rate of the complexity function for arbitrarily large inputs.
The big O notation is used to concisely express the asymptotic growth rate of the cost function. Assume that an algorithm requires T (n) steps (storage locations) to compute the results for an input of size n. We say that the asymptotic cost of the algorithm is O (f (n)) if there exist constants c > 0 and n _{0} > 0 such that, for all n ≥ n _{0}, T (n) ≤ cf. (n). For example, we say that an algorithm requires time O(n ^{2}) to denote that its cost function grows not faster than the square of the input size, for sufficiently large inputs.
Graph Representation
We first introduce the data structures which can be used to represent graphs. A simple way to represent a (directed or undirected) unweighted graph G = (V, E) with n nodes is using an n × n adjacency matrix M _{ ij, } where M _{ ij } = 1 if and only if (i, j)∈ E.
If G is undirected, as in the case of Fig. 2, the adjacency matrix is symmetric since edges (u, v) and (v, u) are the same. Adjacency matrices support some graph operations efficiently: for example, it is possible to test for the existence of an edge (u, v), add a new edge, or delete an edge in constant time by accessing the appropriate element of the matrix. Unfortunately, the storage space required to encode a graph with n nodes is O(n ^{2}), since the matrix has n ^{2} elements. Therefore, the adjacency matrix is mostly used with small graphs or with dense graphs where ρ(G) ≈ 1.
A more spaceefficient representation is based on adjacency lists. Here, the graph is represented as an array of n lists, where list u contains the neighbors of node u.
Computational costs of adjacency matrix and adjacency list graph representation
Adjacency matrix  Adjacency lists  

Space  O(n ^{2})  O(n + m) 
Checking adjacency  O(1)  O (maxv n(v)) 
List adjacent nodes  O(n)  O (maxv n(v)) 
Adding an edge  O(1)  O(1) 
Deleting an edge  O(1)  O (maxv n(v)) 
Graph Algorithms
We now give an overview of some classic algorithms used to compute the measures considered in this essay. The algorithms considered here are not necessarily the most efficient ones available, but nevertheless are those which are most frequently implemented in graph analysis packages due to their simplicity.
Eccentricity, Closeness, and Diameter. To compute the eccentricity and closeness of a node u, we need to compute the geodesic distances from u to all other nodes. This is the wellknown single source shortest path (SSSP) problem on graphs (Festa and MGC 2006). For unweighted graphs, this problem reduces to performing a breadth first visit of the graph starting from u that requires time O(n + m) using adjacency lists. Dijkstra’s algorithm can solve the SSSP problem for directed graphs with nonnegative edge weights in time O ((n + m) log n) using a priority queue implemented with a binary heap (Cormen et al. 2009); using more efficient priority queue implementations, the running time can be reduced to O(m + n log n). For directed graphs with arbitrary edge weights, BellmanFord algorithm can be used to compute all shortest paths from a single source in time O(nm) (Cormen et al. 2009). Computing the graph diameter requires solving the all pairs of shortest path problem, that is, computing the geodesic distances between all pairs of nodes. For unweighted graphs this can be achieved in time O(n ^{2} + nm) by simply executing n Breadth First visits, starting from each node. FloydWarshall algorithm (Floyd 1962) can compute all geodesic distances for directed graphs with arbitrary edge weights in time O(n ^{3}). Johnson’s algorithm (Johnson 1977) achieves a running time of O(nm + n ^{2} log n), which is more efficient on graphs with low density.
Betweenness. The most efficient sequential algorithm for computing the betweenness centrality of graphs is due to Brandes (2001). Brandes’ algorithm requires O(n + m) space and runs in time O(nm) on unweighted graphs and time O(nm + n ^{2} log n) on weighted graphs.
Clustering Coefficient. Computation of the node clustering coefficient C(u) can be done in time O(n ^{2}) in the worst case, by counting all edges incident to the neighbors of u. The network clustering coefficient C(G) from Eq. (11) can be computed by counting all (closed) triplets. A bruteforce approach is to examine each combination of nodes, which requires time O(n ^{3}). A better algorithm has been proposed by Latapy (2008), who demonstrated that it is possible to solve triplet finding, counting and node counting in O(n ^{2.376}) time and O(n ^{2}) space using fast matrix multiplication on the adjacency matrix representation of G.
Software
We conclude this part by mentioning some existing software packages which can be used to compute the measures described above and many others.
Gephi (Bastian et al. 2009) and NodeXL (2012) are interactive network visualization tools supporting visual network analysis. Gephi is a cross platform and extensible environment, with a plugin mechanism to implement additional algorithms, while NodeXL is an extension of MS Excel, working only on Windows systems, more focused on easiness of use by people without strong computer skills. Both tools provide algorithms to compute common SNA measures, including those addressed in this essay (NodeXL uses SNAP (Leskoveč and Sosiˇč 2016) as the underlying computation library). Igraph (Csardi and Nepusz 2006) is a software package written in C for creating and manipulating large graphs and can be used for statistical SNA, thanks to its version for the R statistical environment (R Core Team 2012).
Key Applications
The applications of social network analysis span the most diverse topics. Newman (2010) cites several examples, including the estimation of the number of people the average person knows (McCormick et al. 2010), the study of the collaboration pattern of scientists (Newman 2001) and movie actors (Watts and Strogatz 1998), the analysis of dating patterns among high school students (Bearman et al. 2004), and the analysis of networks of terrorists (Latora and Marchiori 2004). We briefly discuss some of these results.
A scientific collaboration network is an undirected graph G = (V, E) where nodes represent scientists, and there exists an edge (u, v) if and only if u and v wrote a paper together. Newman (2001) observed that the collaboration networks of several disciplines exhibit a smallworld structure: two scientists picked at random are likely to be connected by a short path in G. Specifically, the average degree of separation in the analyzed dataset is about six, meaning that any scientist can be reached from any other scientist in the collaboration graph by following a path of average length of 6.
The smallworld property of collaboration networks has been part of the folklore for a long time before being formally observed. Mathematicians defined the Erdős number as a tribute to Paul Erdős (1913–1996), probably the most prolific mathematician of all times. The Erdős number E(v) is the distance between Erdős and researcher v in the collaboration graph. Therefore, those who have written a paper with Erdős have E(v) = 1, their coauthors (that are not coauthors of Erdős) have E(v) = 2, and so on. Most mathematicians have finite Erdős number: an estimate of the median Erdős number among mathematicians is 5, the mean is 4.65, and the standard deviation is 1.21 (Erdős Number Project 2006). Many nonmathematicians have finite Erdős number as well, due to interdisciplinary research activities that resulted in joint publications at some point in time.
The idea of Erdős number has been ported to the movie industry: the Bacon number is defined as the distance to Kevin Bacon in the graph G whose nodes are actors, and an edge (u, v) represents the fact that actors u and w appeared in the same movie. The average Bacon number for a randomly chosen actor with finite distance to Kevin Bacon is 3.02 (Bacon Oracle 2016), revealing that the movie industry is a smallworld network. There are notable scientists that have both finite Erdős number and finite Bacon number: for example, physicist and Nobel laureate Richard Feynman has Erdős number 3 and Bacon number 3, the latter due to his appearance in the film AntiClock.
Humans are not the only species whose collaboration patterns take the form of a smallworld network. Lusseau (2003) analyzed the social interaction graph within a community of 64 bottlenose dolphins (Tursiops truncatus). Nodes of the interaction graph correspond to individual dolphins, and an undirected edge (u, v) denotes that u and v were observed together more often than expected by random encounters alone. The resulting network is scalefree and highly clustered.
We conclude by describing a study where graph metrics have been used to probe the structure of a terrorist organization. Latora and Marchiori (2004) consider the connections among the hijackers involved in the September 2001 attacks, with the goal of identifying the terrorists to target if one wants to disrupt the organization. As a quantitative measure of such disruption, the authors use the global efficiency of the terrorist graph defined in Eq. (8); the most important individual in the criminal organization is the one whose removal produces the largest decrease in efficiency.
Future Directions
The success of online social network sites as a mass phenomenon has dramatically increased the size of these networks. Analyzing them is computationally challenging, due to the prohibitive time and storage space requirements of traditional, sequential algorithms. In response to these challenges, new algorithms have been proposed, based on emerging computer architectures and different computing paradigms like approximate algorithms (Brandes and Pich 2007) and streaming computation (Becchetti et al. 2008; Guha and McGregor 2012). Onchip parallelism, such as that provided by modern multicore processors or by general purpose graphics processing units (GPUs), is fostering the interest on the development of efficient parallel graph algorithms (Lambertini et al. 2014; Lumsdaine et al. 2007; Wang et al. 2016), where multiple independent execution units concurrently and cooperatively build the solution to specific graph problems.
However, it should be observed that more powerful algorithms and computing infrastructures are not always the correct solution to address the increase in network sizes. For example, while it may be interesting to compute centrality measures on the whole Facebook network, practical SNA tasks would often focus on a specific community, e.g., people subscribed to a product or company page or fans of a public figure. In this context community detection methods may become particularly relevant to filter portions of the network to analyze (Fortunato 2010).
Another aspect that has not been taken into full consideration yet is the understanding of the semantics of edges. Some decades ago the reconstruction of the social graph was a difficult task, usually to be performed by asking people to list their friends or observing specific environments, e.g., a workplace. The advent of online social networks has made it much easier to construct social network graphs. However, the graph edges that can be easily inferred from online data may not be the best way to identify the social relations of interest, and this could lead to the construction of a wrong social graph. For example, many social network sites provide information about user contacts through their application programming interfaces (API), e.g., we can easily retrieve all Twitter followers of a specific user through the Twitter API. However, all the pathbased measures presented in this essay are based on the assumption that the network represents communication paths, while a large percentage of messages is in fact exchanged between users that are not directly connected, and therefore they are exchanged on a different network that we may call communication network (Rossi and Magnani 2012).
Things may get even more complicated when we consider multiple social networks. Individuals often use different services to communicate with different audiences, e.g., Facebook, Twitter, or LinkedIn. Of course, the social graphs built using Facebook, Twitter, or LinkedIn contacts may differ strongly, even if these graphs are built considering the same set of individuals. The computation of centrality measures on each of these graphs in isolation may provide insights on the usage of that social network by a specific user, but may fail to draw a realistic picture of the information flow and of the role each user plays in cross disseminating information over all networks.
CrossReferences
References
 Alexanderson GL (2006) About the cover: Euler and Königsberg’s bridges: a historical view. Bull Am Math Soc 43:567–573. doi:10.1090/S027309790601130XCrossRefzbMATHGoogle Scholar
 Anthonisse JM (1971) The rush in a directed graph. Technical report BN 9/71, Stichting Mathematisch Centrum, AmsterdamGoogle Scholar
 Bacon Oracle (2016) The Oracle of Bacon. https://oracleofbacon.org/. Accessed 11 Nov 2016
 Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154
 Bearman PS, Moody J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks. Am J Sociol 110(1):44–91. doi:10.1086/386272CrossRefGoogle Scholar
 Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semistreaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ‘08. ACM, New York, pp 16–24. doi:10.1145/1401890.1401898Google Scholar
 Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. doi:10.1080/0022250X.2001.9990249CrossRefzbMATHGoogle Scholar
 Brandes U, Pich C (2007) Centrality estimation in large networks. Int J Bifurcation Chaos 17(07):2303–2318. doi:10.1142/S0218127407018403MathSciNetCrossRefzbMATHGoogle Scholar
 Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge, MAzbMATHGoogle Scholar
 Costa LF, Rodrigues FA, Travieso G, Villas Boas PR (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56(1):167–242. doi:10.1080/00018730601170527CrossRefGoogle Scholar
 Csardi G, Nepusz T (2006) The igraph software package for complex network research. Inter J Complex Syst 1695. http://igraph.org/
 Erdős Number Project (2006) The Erdős number project at Oakland University. https://oakland.edu/enp/. Accessed 26 Nov 2016
 Festa P (2006) Shortest path algorithms. In: Resende MGC, Pardalos PM (eds) Handbook of optimization in telecommunications. Springer, New York, pp 185–210. doi:10.1007/9780387301655_8CrossRefGoogle Scholar
 Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5(6):345. doi:10.1145/367766.368168CrossRefGoogle Scholar
 Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174. doi:10.1016/j.physrep.2009.11.002MathSciNetCrossRefGoogle Scholar
 Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41. doi:10.2307/3033543CrossRefGoogle Scholar
 Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Networks 1(3):215–239. doi:10.1016/03788733(78)900217CrossRefGoogle Scholar
 Goncalves B, Perra N, Vespignani A (2011) Modeling users’ activity on twitter networks: validation of Dunbar’s number. PLoS ONE 6(8):e22656. doi:10.1371/journal.pone.0022656CrossRefGoogle Scholar
 Guha S, McGregor A (2012) Graph synopses, sketches, and streams: a survey. Proc VLDB Endow 5(12):2030–2031. doi:10.14778/2367502.2367570CrossRefGoogle Scholar
 Harary F (1969) Graph theory. AddisonWesley, ReadingzbMATHGoogle Scholar
 Harary F, Norman RZ (1953) Graph theory as a mathematical model in the social sciences. Institute for Social Research, University of Michigan, Ann ArborGoogle Scholar
 Johnson DB (1977) Efficient algorithms for shortest paths in sparse networks. J ACM 24(1):1–13. doi:10.1145/321992.321993MathSciNetCrossRefzbMATHGoogle Scholar
 Lambertini M, Magnani M, Marzolla M, Montesi D, Paolino C (2014) Largescale social network analysis. In: GkoulalasDivanis A, Labbi A (eds) Largescale data analytics. Springer, New York, pp 155–187. doi:10.1007/9781461492429 6CrossRefGoogle Scholar
 Latapy M (2008) Mainmemory triangle computations for very large (sparse (powerlaw)) graphs. Theor Comput Sci 407(1):458–473. doi:10.1016/j.tcs.2008.07.017MathSciNetCrossRefzbMATHGoogle Scholar
 Latora V, Marchiori M (2001) Efficient behavior of smallworld networks. Phys Rev Lett 87:198,701. doi:10.1103/PhysRevLett.87.198701CrossRefGoogle Scholar
 Latora V, Marchiori M (2004) How the science of complex networks can help developing strategies against terrorism. Chaos, Solitons Fractals 20(1):69–75. doi:10.1016/S09600779(03) 004296CrossRefzbMATHGoogle Scholar
 Leskoveč J, Sosiˇč R (2016) Snap: a generalpurpose network analysis and graphmining library. ACM Trans Intell Syst Technol 8(1):20. doi:10.1145/2898361Google Scholar
 Luce R, Perry A (1949) A method of matrix analysis of group structure. Psychometrika 14:95–116. doi:10.1007/BF02289146MathSciNetCrossRefGoogle Scholar
 Lumsdaine A, Gregor D, Hendrickson B, Berry JW (2007) Challenges in parallel graph processing. Parallel Process Lett 17(1):5–20. doi:10.1142/S0129626407002843MathSciNetCrossRefGoogle Scholar
 Lusseau D (2003) The emergent properties of a dolphin social network. Proc R Soc Lond B Biol Sci 270(Suppl 2):S186–S188. doi:10.1098/rsbl.2003.0057CrossRefGoogle Scholar
 McCormick TH, Salganik MJ, Zheng T (2010) How many people do you know?: efficiently estimating personal network size. J Am Stat Assoc 105(489):59–70. doi:10.1198/jasa.2009.ap08518MathSciNetCrossRefzbMATHGoogle Scholar
 Moreno JL (1934) Who shall survive? A new approach to the problem of human Interrelations. Nervous and Mental Disease Publishing Co., Washington, DCCrossRefGoogle Scholar
 Newman MEJ (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci U S A 98(2):404–409. doi:10.1073/pnas.98.2.404MathSciNetCrossRefzbMATHGoogle Scholar
 Newman MEJ (2005) A measure of betweenness centrality based on random walks. Soc Networks 27(1):39–54. doi:10.1016/j.socnet.2004.11.009MathSciNetCrossRefGoogle Scholar
 Newman MEJ (2010) Networks: an introduction. Oxford University Press, OxfordCrossRefzbMATHGoogle Scholar
 NodeXL (2012) Nodexl, a graph visualization and manipulation software. http://nodexl.codeplex.com. Accessed 6 Dec 2016
 Opsahl T, Panzarasa P (2009) Clustering in weighted networks. Soc Networks 31(2):155–163. doi:10.1016/j.socnet.2009.02.002CrossRefGoogle Scholar
 Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Networks 32(3):245–251. doi:10.1016/j.socnet.2010.03.006CrossRefGoogle Scholar
 Peay ER (1980) Connectedness in a general model for valued networks. Soc Networks 2(4):385–410. doi:10.1016/03788733(80)900052MathSciNetCrossRefGoogle Scholar
 R Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.Rproject.org. ISBN:3900051070
 Rossi L, Magnani M (2012) Conversation practices and network structure in twitter. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4634
 Sabidussi G (1966) The centrality index of a graph. Psychometrika 31(4):581–603. doi:10.1007/ BF02289527MathSciNetCrossRefzbMATHGoogle Scholar
 Wang Y, Davidson A, Pan Y, Wu Y, Riffel A, Owens JD (2016) Gunrock: a highperformance graph processing library on the GPU. In: Proceedings of 21st ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ‘16. ACM, New York, pp 11:1–11:12. doi:10.1145/2851141.2851145Google Scholar
 Wasserman S, Faust K (1994) Social network analysis. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
 Watts DJ, Strogatz SH (1998) Collective dynamics of “smallworld” networks. Nature 393:440–442. doi:10.1038/30918CrossRefGoogle Scholar
 White DR, Borgatti SP (1994) Betweenness centrality measures for directed graphs. Soc Networks 16(4):335–346. doi:10.1016/03788733(94)900159CrossRefGoogle Scholar