1 Introduction

In the classic book of Wasserman and Faust (1994) it is stated that “one of the primary uses of graph theory in social network analysis is the identification of the “most important” actors in a social network”. Such identification is mainly done by means of the so-called “centrality measures”. A vertex centrality measure is a local vertex invariant, i.e., a property of vertices that is equal for pairs of vertices interchangeable by an automorphism (Watkins 2014). Therefore, the vertex degree is a centrality measure, i.e., the more connections a vertex has, the most important it is. Although the pioneering work of Bavelas (1948) about centrality were based on counting walks in the graph, the influential works of Freeman (1977); Freeman (1979) about “closeness” and “betweenness” centrality put a lot of emphasis on counting geodesics–shortest paths–in the graphs. The shortest path (SP) between two vertices is the shortest (in terms of number of edges) of all sequences of distinct vertices/edges (paths) connecting the two vertices. Differently, a walk allows for the repetition of both vertices and edges. The closeness centrality (CC) accounts for how close in terms of SP a vertex is in relation to the rest of the vertices in the graph. The betweenness centrality (BC) refers to the importance that a vertex has in the SP communication between other pairs of vertices (see Estrada (2012a) for definitions and properties of these and other centrality measures).

What it should mean, for instance, that the peaks of two mountains are separated at a distance x? This information is very valuable for an eagle who can fly from one peak to the other. For a mountaineer, this information is useless because she cannot fly, but has to go down from the first peak and hike the second one. The same happens if we say that the average SP distance in a network is y (see Burago et al. 2022). If “items” are not traveling through the shortest paths–which can only be the case when the sender has complete information about the topology of the network–the information about the SP is useless. It is not surprising, however, that such measures like the CC based on SP brings some information about dynamical processes which do not take place through the SP but using diffusive patterns of spreading. These are the cases, for instance, of epidemic spreading or synchronization of coupled oscillators, which are well-known diffusive processes. What happen is similar to the fact that the mountaineer can guess how much effort it will cost to go down the first mountain and hike the second one by using the distance between the two peaks. CC based on SP can “guess” how a diffusive process takes place (see for instance (Janyasupab et al. 2021), but it does not provide a precise information about such processes because it is out of its designed scope.

Nowadays centrality measures are used beyond social network analysis in a broad spectrum of computational analysis of networked systems. They include, for instance, use in dynamics on networks (synchronization, consensus, epidemics, information spreading), pattern recognition, community detection (Saxena and Iyengar 2020; Rodrigues 2019), dimensionality reduction (Chen et al. 2023) and isomorphism detection (Meghanathan 2015). The last two topics are of particular interest for the current work. The main idea of dimensionality reduction is to “reduce” a large-scale network to a smaller one while preserving some of its global structural characteristics. Chen et al. (2023) proposed to use centrality measures for this purpose, such that by removing the least central vertices they were reducing the size of the networks while observing that certain structural features were preserved. The second application is that of determining whether two graphs are isomorphic, for which there exist no deterministic polynomial-time algorithm. The strategy is to rank the vertices of two test graphs using a given centrality (Meghanathan 2015). If they are not the same, then the two graphs are not isomorphic. If they have the same ranked sequences, then they are declared to be potentially isomorphic and confirmed through additional heuristics. Both applications of centrality measures, dimensionality reduction and graph isomorphism detection, are heavily dependent on the discriminant power of the centrality measures used. For instance consider a regular graph. In this case, the eigenvector centrality (EC)–the entries of the eigenvector associated to the spectral radius of the adjacency matrix (see Estrada 2012a)—is the same for every node. Therefore, there is no chance of ranking the nodes to start a pruning based on the least central ones. If you are detecting whether two k-regular graphs of degree k are isomorphic or not using the EC you will end up always by using an additional heuristics because both graphs have the same list of EC. The discriminant power of a centrality measure in general refers to the capacity of the index to differentiate pairs of vertices (Bao and Zhang 2021). A better definition should be given in term of the capacity of a centrality index to differentiate nonidentical pairs of vertices, where identical vertices are those which can be swapped by an automorphism.

The problem of graph automorphism is intimately related to the symmetry of graphs. This problem has been treated in the mathematical literature using several approaches (Chan and Godsil 1997; Kocay 2007). More recently, the problem of analyzing the size of the automorphism group in large real-world networks has been explored (MacArthur et al. 2008; Sánchez-García 2020; Garlaschelli et al. 2010; Xiao et al. 2008; MacArthur and Sanchez-Garcia 2009). The size of the automorphism group refers to the number of symmetric operations which swap the vertices of the graphs leaving unaltered its adjacency matrix. Then, it is clear that a complete graph on four vertices \(K_{4}\) is more symmetric than a cycle graph of the same size \(C_{4}\). While the first has \(\left| \text {Aut}\left( K_{4}\right) \right| =24\)–there are n! different symmetries in \(K_{n}\), the second has \(\left| \text {Aut}\left( C_{4}\right) \right| =8\), n rotational and n reflection symmetries in \(C_{n}\). However, from a vertex point of view it is true that both graphs have exactly the same number of identical pairs of vertices. There are many more ways of swapping the vertices in \(K_{n}\) than in \(C_{n}\) but what an “inspector” of these graphs “sees” is that both are formed by four identical vertices. This is the important problem of vertex similarity–more appropriately vertex identity (Harary and Palmer 1966; Godsil and Kocay 1982; Lauri 1997; Albertson and Collins 1996; Sun et al. 2021). Symmetry groups has also been used as a proxy for the calculation of SP distance invariants in graphs (Koorepazan-Moftakhar and Ashrafi 2015).

Here, we will propose a distance metric between vertices in a simple graph. We will prove that this distance, which appears naturally from the way in which diffusive objects navigate a graph, is Euclidean and spherical, i.e., induces an embedding of the graph in a Euclidean hypersphere. We will then propose to use the Euclidean distance matrix (EDM) of communicability cosine distances (CCD) as a quantitative proxy to detect similarity between vertices and show some problems arising with a popular existing measure. Such drawbacks are corrected by the new method proposed here. Further we will defined and study a measure of closeness centrality based on CCD. This measure differentiates very well vertices in small graphs, allowing to distinguish all nonidentical vertices in connected graphs of up to 9 vertices. We also study several identity (asymmetric) graphs with this measure, all of whose vertices are differentiated by the new closeness centrality. Real-world networks are also studied for both the discriminating power of the new centrality on their vertices as well as for ranking their vertices. Several of the advantages of the communicability closeness centrality proposed here are analyzed along the paper.

2 Preliminaries

Here we consider simple (finite, undirected, unweighted and connected) graphs \(G=\left( V,E\right) \) with \(\#V=n\) and \(\#E=m\). As usual, A denotes the adjacency matrix of the graph G, which is obviously symmetric for the kind of graphs we study here. We consider that \(A=U\Lambda U^\textrm{T}\) where \(\varLambda \) is a diagonal matrix of eigenvalues of A and U is an orthogonal matrix of eigenvectors.

A walk of length l in a graph is a sequence of (not necessarily different) vertices \(v_{1},v_{2},\ldots ,v_{l},v_{l+1}\) such that for each \(i=1,2,\ldots l\) there is an edge from \(v_{i}\) to \(v_{i+1}.\) The walk is known as closed if \(v_{l+1}=v_{1}.\) The number of walks of length l between \(v_{p}\) and \(v_{q}\) is given by \(\left( A^{l}\right) _{pq}.\) Hereafter we will designate the vertices directly by the subindex, such as p and q. A path is a walk in which there is no repetition neither of vertices nor of edges. We also denote by u an all-ones vector.

Let us now consider a few definitions related to the communicability function of networks (Estrada and Hatano 2008; Estrada et al. 2012).

Definition 1

The communicability function between two nodes \(v,w\in V\) of G is given by

$$\begin{aligned} G_{vw}=\left( \exp \left( \beta A\right) \right) _{vw}, \end{aligned}$$
(2.1)

where \(\exp \left( \beta A\right) =I+\beta A+\dfrac{\left( \beta A\right) ^{2}}{2!}+\cdots +\dfrac{\left( \beta A\right) ^{k}}{k!}+\cdots \) is the matrix exponential of A and \(\beta \) is an empirical parameter which may be set to one for the sake of simplicity.

Obviously, due to the spectral decomposition of the adjacency matrix we have

$$\begin{aligned} G_{vw}=\sum _{j=1}^{n}\psi _{jv}\psi _{jw}\exp \left( \beta \lambda _{j}\right) , \end{aligned}$$
(2.2)

where \(\psi _{jv}\) is the vth entry of the eigenvector associated with the jth eigenvalue \(\lambda _{j}\) of A.

Definition 2

The term \(G_{vv}=\left( \exp \left( \beta A\right) \right) _{vv}\) is the subgraph centrality (Estrada and Rodriguez-Velazquez 2005) of the node v and it accounts for the importance of a node in terms of its participation in all subgraphs of the network giving more importance to the smaller than to the larger ones.

Definition 3

The terms (Estrada 2012b; Estrada and Hatano 2016)

$$\begin{aligned} {\xi }_{vw}:=G_{vv}+G_{ww}-2G_{vw}, \end{aligned}$$
(2.3)

and

$$\begin{aligned} \zeta _{vw}:=\dfrac{G_{vw}}{\sqrt{G_{vv}G_{ww}}}, \end{aligned}$$
(2.4)

account for the proximity between the nodes v and w in terms of the way in which they transmit “information” from each other. If \( {\xi }_{vw}\) or \(\zeta _{vw}\) are small it means that most of the information sent from v (resp. w) to w (resp. to v) arrives to its destination, in contrast to the information which is returned to the origin.

Lemma 1

The terms \( {\xi }_{vw}\) and \(\zeta _{vw}\) are a squared Euclidean distance between the vertices v and w and the cosine of the angle between the position vectors of both vertices in a Euclidean space, respectively.

Proof

First let us write

$$\begin{aligned} \begin{aligned} G&=e^{A}\\&=Ue^{\varLambda }U^\textrm{T}\\&=Ue^{\varLambda /2}e^{\varLambda /2}U^\textrm{T}\\&=\left( e^{\varLambda /2}U^\textrm{T}\right) ^\textrm{T}\left( e^{\varLambda /2}U^\textrm{T}\right) \\&=X^\textrm{T}X. \end{aligned} \end{aligned}$$
(2.5)

Therefore,

$$\begin{aligned} G_{vw}=x_{v}\cdot x_{w}, \end{aligned}$$
(2.6)

where \(x_{i}=e^{\Lambda /2}\varphi _{i}\) represents the position vector of the vertex i in the Euclidean space induced by the communicability and \(\varphi _{i}\) is the ith column of \(U^\textrm{T}.\) Thus, we have

$$\begin{aligned} \begin{aligned} {\xi }_{vw}&=x_{v}\cdot x_{v}+x_{w}\cdot x_{w}-2x_{v}\cdot x_{w}\\&=\left\| x_{v}-x_{w}\right\| ^{2}, \end{aligned} \end{aligned}$$
(2.7)

and

$$\begin{aligned} \zeta _{vw}=\dfrac{x_{v}\cdot x_{w}}{\left\| x_{v}\right\| \left\| x_{w}\right\| }=\cos \theta _{vw}. \end{aligned}$$
(2.8)

\(\square \)

Remark 1

The communicability function \(G_{vw}\) can be derived from a nonconservative diffusive process as follows. Let us consider the nonconservative diffusion on the graph (see Estrada (2024) for details):

$$\begin{aligned} y\left( t\right) =-{\mathcal {L}}_{\chi }y\left( t\right) , \end{aligned}$$
(2.9)

with initial condition \(y\left( 0\right) =z\) and where \({\mathcal {L}}_{\chi }:=\chi I-A\) is the Lerman–Ghosh Laplacian (Lerman and Ghosh 2012) with \(\chi \ge 0\). Then, when \(\chi =0\) the solution of this equation is for a given vertex of the graph:

$$\begin{aligned} y_{v}\left( t\right) =\sum _{j}\left( e^{tA}\right) _{vj}z_{j}, \end{aligned}$$
(2.10)

such that if the initial concentration is totally located at the vertex v, \(z_{j}=\delta _{j,v}\), where \(\delta _{i,j}\) is the Kronecker delta, we have that the amount transferred from vertex u to vertex w is

$$\begin{aligned} y_{v,w}\left( t\right) =\left( e^{tA}\right) _{vw}, \end{aligned}$$
(2.11)

which is the communicability between the two vertices.

Another known Euclidean distance between the vertices of a graph is the so-called resistance distance, which is formally defined below.

Definition 4

Let \(L=K-A\) be the Laplacian matrix of a graph, where K is the diagonal matrix of vertex degree and A its adjacency matrix. Then,

$$\begin{aligned} \varOmega _{vw}:=L_{vv}^{+}+L_{ww}^{+}-2L_{vw}^{+}, \end{aligned}$$
(2.12)

is the resistance distance (Klein and Randić 1993) between the vertices v and w in the graph, where \(L^{+}\) is the Moore–Penrose pseudoinverse of the Laplacian matrix.

Definition 5

Let \(d_{vw}\) and \(\varOmega _{vw}\) be the shortest-path and resistance distance between the vertices v and w in a graph. Then,

$$\begin{aligned} \hbox {CC}_{v}=\left( \sum _{w=1}^{n}d_{vw}\right) ^{-1}, \end{aligned}$$
(2.13)

and

$$\begin{aligned} \hbox {RCC}_{v}=\left( \sum _{w=1}^{n}\varOmega _{vw}\right) ^{-1}, \end{aligned}$$
(2.14)

are the closeness (Bavelas 1948) (see also Wasserman and Faust 1994; Estrada 2012a) and resistance-closeness (Bozzo and Franceschet 2013) centrality of the vertex v, respectively.

Other definitions needed in the current work are the following.

Definition 6

Let \(D=\left( d_{ij}\right) \) be an \(n\times n\) matrix. Then, D is called Euclidean distance matrix (EDM) if there exist points \(p^{1},p^{2},\ldots ,p^{n}\) in some Euclidean space \({\mathbb {R}}^{r}\), such that

$$\begin{aligned} d_{ij}=\left\| p^{i}-p^{j}\right\| ^{2}, \end{aligned}$$
(2.15)

for all \(i,j=1,\ldots ,n\), where \(\Vert \, \Vert \) denotes Euclidean norm. An EDM D is called spherical EDM or circum-EDM if the points \(p^{1},p^{2},\ldots ,p^{n}\) that generate D lie on a hypersphere (Tarazaga et al. 1996; Alfakih 2006). The matrix having entries given by \( {\xi }_{vw}\) is a spherical EDM (Estrada et al. 2014).

Let us now consider some definitions on graph automorphism which will be used in this work (see Watkins 2014).

Definition 7

Let \(G=\left( V,E\right) \) be a simple graph. A permutation \(\alpha \) of V is an automorphism of G if

$$\begin{aligned} \left\{ v,w\right\} \in E\Longleftrightarrow \left\{ \alpha \left( v\right) ,\alpha \left( w\right) \right\} \in E, \end{aligned}$$

for all \(v,w\in V\).

Definition 8

The set of all automorphisms of G, together with the operation of composition of functions, forms a subgroup of the symmetric group of V,  which is called the automorphism group of G, and it is denoted by \(\text {Aut}\left( G\right) \).

Definition 9

Two vertices v and w in a graph G are identical (usually called similar (Harary and Palmer 1966) in graph theory) if some automorphism of G maps v onto w. If v and w are identical vertices in G then the graphs \(G-v\) and \(G-w\), obtained by deleting v and w, respectively, from G, are isomorphic. However, this is not true the other way around because it is possible for \(G-v\) and \(G-w\) to be isomorphic even when v and w are not similar. In this case the nodes are pseudoidentical (usually called pseudosimilar (Godsil and Kocay 1982; Lauri 1997) in graph theory).

Definition 10

A graph in which there is no pair of identical vertices is known as asymmetric or identity graph (Albertson and Collins 1996). Let \(\iota \) be the identity of any group of permutations. Then, the graph G is asymmetric if \(\iota \) is its only automorphism.

3 Communicability cosine distance

The communicability distance \(\xi _{vw}\) between two vertices v and w,  previously defined and studied, is an unbounded magnitude. On the contrary, the communicability angle \(\theta _{vw}\) is always bounded between \(0^\circ \) and \(90^\circ \) for unweighted graphs (Estrada and Hatano 2016). This boundedness may represent an interesting characteristic for comparing properties within and between graphs. Therefore, here we propose a transformation of the communicability angle into a communicability cosine distance which is bounded and display a few mathematically interesting properties. Let me start with the following.

Definition 11

Let \(\cos \theta _{vw}\) be the cosine of the communicability angle. Then

$$\begin{aligned} D_{vw}=2-2\cos \theta _{vw}, \end{aligned}$$
(3.1)

is the squared communicability cosine distance (CCD) between the nodes v and w in G.

Definition 12

The CCD matrix D is the square, symmetric matrix whose vw entry is \(D_{vw}\).

Now, we will prove some of the mathematical properties of the CCD.

Lemma 2

The CCD is a squared Euclidean distance between the corresponding pair of nodes in G.

Proof

Let us write

$$\begin{aligned} D_{vw}=2-2\dfrac{x_{v}\cdot x_{w}}{\left\| x_{v}\right\| \left\| x_{w}\right\| }, \end{aligned}$$
(3.2)

which can be expressed as

$$\begin{aligned} \begin{aligned}D_{vw}&=\dfrac{x_{v}\cdot x_{v}}{\left\| x_{v}\right\| ^{2}}+\dfrac{x_{w}\cdot x_{w}}{\left\| x_{w}\right\| ^{2}}-2\dfrac{x_{v}\cdot x_{w}}{\left\| x_{v}\right\| \left\| x_{w}\right\| }\\&=\left\| \dfrac{x_{v}}{\left\| x_{v}\right\| }-\dfrac{x_{w}}{\left\| x_{w}\right\| }\right\| ^{2}, \end{aligned} \end{aligned}$$
(3.3)

which proves the result. \(\square \)

Remark 2

The CCD is not ultrametric. A metric is called ultrametric if the strong triangle inequality holds, i.e., if \(d\left( x,z\right) \le \max \left[ d\left( x,y\right) ,d\left( y,z\right) \right] \) (Murtagh 2006). However, this condition is not fulfilled for CCD as can be seen in the following counterexample. Let G be the tadpole graph, i.e., the graph formed by a triangle and a pendant vertex. Let the pendant vertex be labeled as 1, the triangle vertex connected to it be labeled by 2 and the other two vertices as 3, and 4. Then, the CCD are: \(D_{1,3}\approx 2.771\), \(D_{1,2}\approx 0.790\), and \(D_{2,3}\approx 1.561\), which clearly obeys the triangle inequality but not the strong triangle inequality. However, we should strength a few remarks here. First, it was proved by Lemin (1985) that every finite ultrametric space consisting of \(n+1\) (distinct) points can be isometrically embedded into a point Euclidean n-dimensional space. Thus, it is an open problem to determine whether there is an ultrametric space which can give rise to the Euclidean space induced by CCD. Second, although the space induced by CCD is not ultrametric, it is true that there are ultrametic sets in such space. An ultrametric set is a finite set S of points in a Euclidean space the mutual distances of which satisfy the ultrametric inequality (Fiedler 1998). For instance, in the tadpole graph the triangle forms a ultrametric set. Therefore, it is an open question the characterization of properties of ultrametric sets of CCD in graphs, i.e., what is the size of the largest ultrametric set generated by CCD in specific types of graphs? Finally, Zubarev (2014) has proved that a matrix of Euclidean distances on a set of specially distributed random points in the n-dimensional Euclidean space \({\mathbb {R}}^{n}\) converges in probability to an ultrametric matrix as \(n\rightarrow \infty \). Thus, it should be interesting to know how fast the CCD converges to an ultrametric distance when the size of the graph increases.

Theorem 1

The CCD matrix is a spherical EDM.

Proof

First, let write the matrix D as follows:

$$\begin{aligned} D=2\left( uu^\textrm{T}-S^{-1/2}e^{A}S^{-1/2}\right) , \end{aligned}$$
(3.4)

where S is the diagonal matrix with diagonal entries \(S_{vv}=\left( e^{A}\right) _{vv}\).

To prove that D is circum-Euclidean it is needed to prove that \(D^{-1}\) exists and \(u^\textrm{T}D^{-1}u>0\). Then let me find \(D^{-1}\) using the Sherman–Morrinson formula:

$$\begin{aligned} \dfrac{1}{2}D^{-1}=-S^{1/2}e^{-A}S^{1/2}-\dfrac{S^{1/2}e^{-A}S^{1/2}uu^\textrm{T}S^{1/2}e^{-A}S^{1/2}}{1-u^\textrm{T}S^{1/2}e^{-A}S^{1/2}u}. \end{aligned}$$
(3.5)

To prove that \(D^{-1}\) exists it is only needed to prove that \(u^\textrm{T}S^{1/2}e^{-A}S^{1/2}u\ne 1\). Then, let \(J=uu^\textrm{T}\) and write

$$\begin{aligned} \begin{aligned}a&=u^\textrm{T}S^{1/2}e^{-A}S^{1/2}u\\&=\textrm{Tr}\left( S^{1/2}e^{-A}S^{1/2}J\right) \\&=\textrm{Tr}\left( e^{-A}S^{1/2}JS^{1/2}\right) \\&=\textrm{Tr}\left( e^{-A}Z\right) , \end{aligned} \end{aligned}$$
(3.6)

where \(Z=S^{1/2}JS^{1/2}\).

Now, we will use Ruhe’s trace inequality (Ruhe 1970) to find

$$\begin{aligned} \begin{aligned}\textrm{Tr}\left( e^{-A}Z\right)&\ge \sum _{j=1}^{n}\lambda _{j}\left( Z\right) \lambda _{n-j+1}\left( e^{-A}\right) \\&=\lambda _{1}\left( Z\right) \lambda _{n}\left( e^{-A}\right) , \end{aligned} \end{aligned}$$
(3.7)

where \(\lambda _{j}\left( Z\right) =\left\{ \lambda _{1}\left( Z\right) ,0,\ldots ,0\right\} \) are the eigenvalues of Z. It is straightforward to realize that \(\lambda _{1}\left( Z\right) =EE\left( G\right) =\textrm{Tr}\left( e^{A}\right) \), which is known as the Estrada index of the graph (see Estrada 2022), and \(\lambda _{n}\left( e^{-A}\right) =e^{-\lambda _{1}\left( A\right) }\), where \(\lambda _{1}\left( A\right) \) is the spectral radius of A. Therefore,

$$\begin{aligned} \begin{aligned} a=\hbox {Tr}\left( e^{-A}Z\right)&\ge e^{-\lambda _{1}\left( A\right) }\textrm{Tr}\left( e^{A}\right) =e^{-\lambda _{1}\left( A\right) }\sum _{j=1}^{n}e^{\lambda _{j}\left( A\right) }\\&=\sum _{j=1}^{n}e^{\lambda _{j}\left( A\right) -\lambda _{1}\left( A\right) }=1+\sum _{j=2}^{n}e^{\lambda _{j}\left( A\right) -\lambda _{1}\left( A\right) }>1, \end{aligned} \end{aligned}$$
(3.8)

with the last inequality due to the fact that the graph is not trivial.

Now, let me prove that \(u^\textrm{T}D^{-1}u>0\), for which we will start by writing

$$\begin{aligned} \begin{aligned} \dfrac{1}{2}u^\textrm{T}D^{-1}u&=-u^\textrm{T}S^{1/2}e^{-A}S^{1/2}u-\dfrac{u^\textrm{T}S^{1/2}e^{-A}S^{1/2}ee^\textrm{T}S^{1/2}e^{-A}S^{1/2}u}{1-u^\textrm{T}S^{1/2}e^{-A}S^{1/2}u}.\\&=-a-\dfrac{a^{2}}{1-a}=\dfrac{1}{a-1}>0, \end{aligned} \end{aligned}$$
(3.9)

where the last inequality is due to the fact that \(a>1\) as proved before. \(\square \)

Remark 3

The previous result indicates that the CCD induces an embedding of a graph into a Euclidean n-dimensional sphere of radius (see Estrada et al. 2014):

$$\begin{aligned} R=\sqrt{\dfrac{1}{2}\left( u^\textrm{T}D^{-1}u\right) ^{-1}}=\sqrt{a-1}. \end{aligned}$$
(3.10)

In a recent paper (Estrada 2023) we have proved that every circum-EDM is the effective resistance matrix of a graph with appropriate edge weights.

4 Communicability cosine distance as vertex similarity

It is obvious that the EDM of communicability cosines distances represents a dissimilarity matrix where the nondiagonal entries represent the dissimilarity between the corresponding pair of nodes. Such dissimilarity accounts for how well communicated a pair of nodes are in the network based on the ratio of the number of weighted walks connecting them to the number of weighted walks that start and end in the corresponding nodes. Here, the weight is given by the reverse of the factorial of the length of the walk.

The use of different graph measures to account for vertex similarities is not new. In fact, the use of the so-called structural equivalence, where two vertices sharing many of the same network neighbors are considered structurally equivalent, dates back to the 1970s (Lorrain and White 1971). Further developments include the regular equivalence, in which vertices are said to be similar if they are connected to other vertices that are themselves similar (Borgatti and Everett 1993; Batagelj et al. 1992). More recent approaches include the use of algebraic methods, such as the method proposed by Blondel et al. (2004) as a generalization of Kleinberg’s “ hub and authority” method (Kleinberg 1999), and the method proposed by Leicht, Holme and Newman (LHN) (Leicht et al. 2006) on the basis of the so-called Katz centrality index (Katz 1953). In this last approach two vertices are similar if their immediate neighbors in the network are themselves similar, which leads to a self-consistent matrix formulation of similarity that can be evaluated in an iterative way using only a knowledge of the adjacency matrix of the network. LHN (Leicht et al. 2006) tested their measure, among other networks, for extracting sensible synonyms to words from a network representing the structure of Roget’s Thesaurus [55], similarly to what Blondel et al. (2004) have done before. In general, the authors considered that this new measure is capable of extracting useful information about vertex similarity based on network topology and that it displays some advantages in relation with previously defined measures. Therefore, we will concentrate here in the comparison between the CCD dissimilarity and that of LHN. More formally, the similarity matrix proposed by LHN (Leicht et al. 2006) is as follows.

Definition 13

Let G be a graph with adjacency matrix A and let K be the diagonal matrix of vertex degrees. The LHN similarity between two nodes v and w is given by the vw entry of the following matrix

$$\begin{aligned} {\mathcal {N}}=2m\lambda _{1}K^{-1}\left( I-\dfrac{\alpha }{\lambda _{1}}A\right) ^{-1}K^{-1}, \end{aligned}$$
(4.1)

where m is the number of edges, \(\lambda _{1}\) is the spectral radius of A, I is the identity matrix and \(0<\alpha <1\) is an empirical parameter.

LHN (Leicht et al. 2006) performed a brief analysis of the parameter \(\alpha \) in their paper determining empirically that a value of \(\alpha =0.97\) was the most appropriated for the analysis of vertex similarities in networks. Because \(0\le D_{vw}\le 2\), we will consider here \({\mathcal {D}}_{vw}=\tfrac{1}{2}D_{vw}\), which is then bounded as \(0\le {\mathcal {D}}_{vw}\le 1\) as required by any dissimilarity metric. We then applied here the two measures \({\mathcal {D}}_{vw}\) and \({\mathcal {N}}_{vw}\) to the analysis of similarities between words in the Roget Thesaurus network. It should be noticed from the eq. (4.1) that the diagonal entries of \({\mathcal {N}}\) are not one, like it is common in all similarity matrices. However, it should be not a problem if \({\mathcal {N}}_{vv}>{\mathcal {N}}_{vw}\) for all \(w\in V\). That is, a vertex is most similar to itself than with any other vertex in the graph. This is indeed observed for all the vertices in the network of the Roget Thesaurus. For the comparison of the two approaches we have considered the same words analyzed by LHN (Leicht et al. 2006) and the results are displayed in Table 1.

Table 1 Top five most similar words found by the LHN similarity measure, \({\mathcal {M}}_{vw}\), and by the cosine communicability similarity, \({\mathcal {N}}_{vw}\), for the words “alarm”, “hell”, “mean” and “water”

As can be seen in Table 1 there are some differences between the ranking of words performed by both approaches, but in general they are qualitatively very similar. For instance, for “alarm” both methods identify “warning” as the most similar one and then rank “omen” and “indication” in the top five. For “hell”, LHN identifies “heaven” as the top similar word while the new method identifies “pain” as the top one, which is then ranked as the second most similar by LHN. For “mean” the words in the top five according to both methods differ, but they are closely related words, which is understood by the very meanings of the words “mean”. Finally, there is coincidence in the top-ranked similar word for “water”, in which both methods coincide in “plunge”, and then coincide in identifying “moisture”, “insertion” and “river” in the top five.

However, an important difficulty can be foreseen for the general use of the LHN method (Leicht et al. 2006) in real-world networks. The analysis of the formula (4.1) indicates that if a network has a very large spectral radius of the adjacency matrix, i.e., \(\lambda _{1}\gg 1\), then \(\alpha /\lambda _{1}\rightarrow 0\), which makes that the influence of the whole topology of the network, accounted for by the adjacency matrix, disappears from the similarity matrix. That is, in a network in which \(\lambda _{1}\) is relatively large, it is expected that \({\mathcal {N}}_{vw}\sim \tfrac{2m\lambda _{1}}{k_{v}k_{w}}\), where \(k_{v}\) is the degree of the vertex v. That is, in this case the similarity between the two vertices will depend only on their respective degrees. The Roget Thesaurus network has \(\lambda _{1}\approx 12.027\), which according to the results previously obtained, still allows for capturing the influence of the topology of the network in the LHN formula. Let us then consider another network, for which \(\lambda _{1}\approx 44.303\). This network is a representation of the Online Dictionary for Library and Information Science by Reitz (2004). That is, a dictionary which is specialized in library and information science.

The first curious result obtained with the LHN for this network is the following. Now, it is not always true that \({\mathcal {N}}_{vv}>{\mathcal {N}}_{vw}\). That is, there are some vertices for which there are other vertices more similar to them than themselves. This is certainly weird! For instance, for “homepage”, LHN finds that there are 22 other words more similar to it that the proper word “homepage”. As another example, LHN finds that the words “data” and “queue” are more similar than “data” to itself. For the word “book” there are 1682 words more similar to it than the word “book” itself, and for the word “work” there are 1378 words in the same situation. Why is this happening? Simply because in this case \(\lambda _{1}\) is sufficiently large as for \(\alpha /\lambda _{1}\rightarrow 0\), implying that \({\mathcal {N}}_{vw}\sim \tfrac{2m\lambda _{1}}{k_{v}k_{w}}\). Then, if the vertex v has a relatively large degree, it will have more similarity with those vertices w displaying a very low degree, e.g., \(k_{w}=1\). And this is exactly what happens for words like “work” (\(k=380\)), “text” (\(k=246\)) or “homepage” (\(k=150\)), just to give three examples. In Table 2 we give the top seven most similar words according to LHN and to the CCD dissimilarity.

Table 2 Top most similar words in ODLIS to the words “work”, “text”, “homepage” and “book” according to LHN method and the current approach

As can be seen in Table 2 there are five words which are repeated among the top seven found as the most similar one for “work”, “text” and “homepage”. These words, according to their definitions in ODLIS, are completely unrelated to the target words as can be seen in Table 3. What happen is that these five words have degree one in the ODLIS network. Then, they are found as “most similar” simply because of the fact that the similarity is given here by \({\mathcal {N}}_{vw}\sim \tfrac{2m\lambda _{1}}{k_{v}k_{w}}\), such that low degree of vertex w will increase its similarity with vertex v. The fact that the words “juvenalia” and “potboiler” are among the top most similar ones with “work” is due to the fact that they have degree one, and they are connected to “work” in the network. The same happen for “DEMCO”, “Highsmith Inc.” and “Brodart” in relation with “homepage”. However, because the word “text” has no word of degree one connected to it, LHN selects any word with degree one as the most similar to it.

Table 3 Definitions given at ODLIS for the words ubiquitously found by LHN approach as most similar to several nonrelated words

On the contrary, all words found by the CCS approach are related to the target ones, and indeed in all cases the target word appears in the definition of the word found. For instance, original: the word “original” appear in ODLIS as: “In literature, a work as written by the author or in the author’s own words. In art, a finished work as completed by the artist and ready for reproduction, the phrase “direct edition” appears as: An edition of a work for which the author provides the publisher with camera-ready copy produced on a computer with the aid of word processing software. Used mainly for works that cannot be produced economically from type.

5 Cosine-distance closeness centrality

Here we propose an analogous of the closeness centrality index based on the CCD between a pair of nodes in a graph.

Definition 14

Let \(D_{vw}\) be the CCD between the nodes v and w. Then,

$$\begin{aligned} C_{v}=\left( \sum _{w=1}^{n}D_{vw}\right) ^{-1}, \end{aligned}$$
(5.1)

is the communicability closeness centrality (CCC) of the vertex v.

The closeness centrality measures, such as CC and RCC, are designed to account for the importance of a vertex in a graph in terms of its proximity to the rest of the vertices. Then, these indices are useful in comparing different vertices in the same graph. However, we will analyze the change of the centrality of a given vertex in a graph when the size of the graph growth. The main goal of this exercise is to show some significant differences between CCC and the “classical” closeness centrality measures. We will start by the analysis of cycle graphs.

Lemma 3

Let \(C_{n}\) be a cycle graph with n nodes. Let \(I_{\nu }\left( z\right) \) be the modified Bessel function of the first kind, and let

$$\begin{aligned} {\tilde{C}}_{v}\left( C_{n}\right) =\dfrac{I_{0}\left( 2\right) }{2\left( nI_{0}\left( 2\right) -e^{2}\right) }. \end{aligned}$$
(5.2)

Then,

$$\begin{aligned} \lim _{n\rightarrow \infty }\dfrac{C_{v}\left( C_{n}\right) }{{\tilde{C}}_{v}\left( C_{n}\right) }=1. \end{aligned}$$
(5.3)

Proof

The communicability between a pair of nodes v and w and the subgraph centrality of a given node in \(C_{n}\) are, respectively

$$\begin{aligned} G_{vw}\left( C_{n}\right)= & {} \frac{1}{n}\sum _{j=0}^{n/2}\exp \left( 2\cos \left( \frac{2j\pi }{n}\right) \right) \cos \left( \frac{2j\pi \left( v-w\right) }{n}\right) , \end{aligned}$$
(5.4)
$$\begin{aligned} G_{vv}\left( C_{n}\right)= & {} \frac{1}{n}\sum _{j=0}^{n/2}\cos \left( \frac{2j\pi }{n}\right) . \end{aligned}$$
(5.5)

Let

$$\begin{aligned} {\tilde{G}}_{vw}\left( C_{n}\right):= & {} \frac{1}{\pi }\int _{0}^{\pi }\exp \left( 2\cos \left( \phi \right) \right) \cos \left( \phi \left( v-w\right) \right) \textrm{d}\phi =I_{d_{v,w}}\left( 2\right) , \end{aligned}$$
(5.6)
$$\begin{aligned} {\tilde{G}}_{vv}\left( C_{n}\right):= & {} \frac{1}{\pi }\int _{0}^{\pi }2\cos \left( \phi \right) \textrm{d}\phi =I_{0}\left( 2\right) , \end{aligned}$$
(5.7)

where \(d_{v,w}\) is the shortest path distance between the two nodes and \(\phi =2\pi j/n\).

Then, we define

$$\begin{aligned} \cos {\tilde{\theta }}_{vw}\left( C_{n}\right) :=\dfrac{{\tilde{G}}_{vw}\left( C_{n}\right) }{\sqrt{{\tilde{G}}_{vv}\left( C_{n}\right) {\tilde{G}}_{ww}\left( C_{n}\right) }}=\dfrac{I_{d_{v,w}}\left( 2\right) }{I_{0}\left( 2\right) }. \end{aligned}$$
(5.8)

It can be easy to see that \(\lim _{n\rightarrow \infty }{\tilde{G}}_{vw}\left( C_{n}\right) /G_{vw}\left( C_{n}\right) =1\) and \(\lim _{n\rightarrow \infty }{\tilde{G}}_{vv}\left( C_{n}\right) /G_{vv}\left( C_{n}\right) =1\), such that \(\lim _{n\rightarrow \infty }\cos {\tilde{\theta }}_{vw}\left( C_{n}\right) /\cos \theta _{vw}\left( C_{n}\right) =1\).

Therefore, let

$$\begin{aligned} {\tilde{C}}_{v}\left( C_{n}\right) =\dfrac{1}{\sum _{w\ne v}2-2\cos {\tilde{\theta }}_{vw}\left( C_{n}\right) }=\dfrac{1}{2\left( n-1\right) -2\sum _{w\ne v}\cos {\tilde{\theta }}_{vw}\left( C_{n}\right) }. \end{aligned}$$
(5.9)

Let us consider again that the cycle is of even length, such that

$$\begin{aligned} \sum _{w\ne v}\cos {\tilde{\theta }}_{vw}\left( C_{n}\right) =\dfrac{2}{I_{0}\left( 2\right) }\sum _{k=1}^{n/2}I_{k}\left( 2\right) . \end{aligned}$$
(5.10)

Then, because \(e^{z}=I_{0}\left( z\right) +2\sum _{r=1}^{\infty }I_{r}\left( z\right) \), we have that

$$\begin{aligned} \lim _{n\rightarrow \infty }{\tilde{C}}_{v}\left( C_{n}\right) =\dfrac{I_{0}\left( 2\right) }{2\left( nI_{0}\left( 2\right) -e^{2}\right) }, \end{aligned}$$
(5.11)

which finally proves the result. \(\square \)

Example 1

Let us consider the cycles \(C_{n}\) for \(4\le n\le 12\) and obtain the value of \({\tilde{C}}_{v}\left( C_{n}\right) \) using the previous result as well as \(C_{v}\left( C_{n}\right) \) using the function ’expm(A)’ implemented in Matlab. The results are given in Table 4. First, it is observed that the values of \({\tilde{C}}_{v}\left( C_{n}\right) \) converge to those of \(C_{v}\left( C_{n}\right) \) when the sized of the graph is still relatively small.

In this case the three closeness centrality indices, CC, RCC and CCC, drop with the increment of the graph size. The fastest drawing is observed for CC where \(\textrm{CC}_{v}\left( C_{n}\right) =\dfrac{4}{n^{2}}\) if n is even, or \(\textrm{CC}_{v}\left( C_{n}\right) =\dfrac{4}{n^{2}-1}\) if it is odd. It is followed by RCC where \(\textrm{RC}_{v}\left( C_{n}\right) =\dfrac{6}{n^{2}-1}\) and finally by CCC where \({\tilde{C}}_{v}\left( C_{n}\right) \approx \dfrac{2.2796}{4.5592n-14.7781}\).

Table 4 Values of the cosine distance closeness centrality obtained using Matlab function ’expm(A)’, \(C_{v}\left( C_{n}\right) \), as well as using the results of Lemma 3, \({\tilde{C}}_{v}\left( C_{n}\right) \)
Fig. 1
figure 1

Plot of the change of the closeness centrality indices considered in this work with the number of vertices in complete graphs

Let us now consider the complete graphs with n vertices.

Lemma 4

Let \(K_{n}\) be the complete graph with n nodes. Then,

$$\begin{aligned} C_{v}\left( K_{n}\right) =\dfrac{e^{n}+n-1}{2n\left( n-1\right) }. \end{aligned}$$
(5.12)

Proof

The eigenvalues of the adjacency matrix of \(K_{n}\) are \(n-1\) with multiplicity 1 and \(-1\) with multiplicity \(n-1\). We thereby have

$$\begin{aligned} G_{vv}= & {} \dfrac{1}{ne}\left( e^{n}+n-1\right) , \end{aligned}$$
(5.13)
$$\begin{aligned} G_{vw}= & {} \dfrac{1}{ne}\left( e^{n}-1\right) , \end{aligned}$$
(5.14)

therefore, the cosine of the communicability angle between any pair of vertices is

$$\begin{aligned} \cos \theta _{vw}=\dfrac{e^{n}-1}{e^{n}+n-1}. \end{aligned}$$
(5.15)

Then the result follows by substitution in the formula for the CCC.\(\square \)

Example 2

Let us now consider all the complete graphs with \(3\le n\le 12\). In Fig. 1 we illustrate the values of CCC for a vertex of \(K_{n}\) as well as the values of CC and RCC. The classical closeness centrality takes values \(\hbox {CC}_{v}\left( K_{n}\right) =\left( n-1\right) ^{-1}\) which clearly decays to zero when \(n\rightarrow \infty \). The shortest path distance between two nodes in \(K_{n}\) is always one. In the case of the RCC there are two competing factors. When the size of the graph increases the number of vertices that can be reached in one step from a given node also increases. However, due to the increase in the number of walks, the length of these edges drops as \(\Omega _{ij}\left( K_{n}\right) =2/n\). The resulting effect is a domination of the first of the two mentioned factors, which means that RCC drops with the increase of size, \(\hbox {RCC}_{v}\left( K_{n}\right) =n/\left( 2\left( n-1\right) \right) \), reaching the asymptotic value of 1/2 as the size growth. As a result of the second factor, RCC drops more slowly than CC as can be seen in Fig. 1. Finally, in the case of CCC the competition between the two previously mentioned factors is won by the drop of the edge length, which decays exponentially in this case, \(D_{uv}=\dfrac{2n}{e^{n}+n-1}\). As a result, CCC increases monotonically with the size of the complete graphs. From an application point of view it means that a node in \(K_{n}\) becomes more central as n growth because it can reach more neighbors with relatively little effort due to the contraction of the edge length separating them. Such contraction of edge lengths is due to the fact that more walks exists to go from one vertex to another, which can be used as ways of communication between such pair of vertices. Geometrically this means that the radius of the hypersphere in which \(K_{n}\) is embedded decays exponentially as n increases. Empirically we have found that \(R\approx 2.167\exp \left( -0.3856n\right) \) with a correlation coefficient of 0.9981 for \(4\le n\le 12\).

Finally, we will illustrate the behavior of CCC on star graphs of different sizes.

Lemma 5

Let \(S_{1,n-1}\) be the star graph with n nodes and central node labeled by 1. Then,

$$\begin{aligned} C_{1}\left( K_{1,n-1}\right)= & {} \dfrac{\varUpsilon ^{1/2}\varSigma ^{1/2}}{\left( n-1\right) \left( 2\varSigma ^{1/2}\varUpsilon ^{1/2}-2\varPhi \right) }, \end{aligned}$$
(5.16)
$$\begin{aligned} C_{j}\left( K_{1,n-1}\right)= & {} \dfrac{\varUpsilon ^{1/2}\varSigma ^{1/2}}{2n^{2}\varSigma ^{1/2}-4n\varSigma ^{1/2}+2\varSigma ^{3/2}-\varPhi \varUpsilon ^{1/2}}, \end{aligned}$$
(5.17)

where

$$\begin{aligned} \Upsilon= & {} \cosh \left( \sqrt{n-1}\right) +n-2, \end{aligned}$$
(5.18)
$$\begin{aligned} \varPhi= & {} \sinh \left( \sqrt{n-1}\right) , \end{aligned}$$
(5.19)
$$\begin{aligned} \varSigma= & {} \cosh \left( \sqrt{n-1}\right) . \end{aligned}$$
(5.20)

Proof

The communicability between the different pairs of nodes in \(K_{1,n-1}\) are

$$\begin{aligned} G_{1w}\left( K_{1,n-1}\right)= & {} \dfrac{1}{\sqrt{n-1}}\sinh \left( \sqrt{n-1}\right) , \end{aligned}$$
(5.21)
$$\begin{aligned} G_{vw}\left( K_{1,n-1}\right)= & {} \dfrac{1}{n-1}\left( \cosh \left( \sqrt{n-1}\right) -1\right) , \end{aligned}$$
(5.22)

and the subgraph centrality of these vertices are

$$\begin{aligned} G_{11}\left( K_{1,n-1}\right)= & {} \cosh \left( \sqrt{n-1}\right) , \end{aligned}$$
(5.23)
$$\begin{aligned} G_{ww}\left( K_{1,n-1}\right)= & {} \dfrac{1}{n-1}\left( \cosh \left( \sqrt{n-1}\right) +n-2\right) . \end{aligned}$$
(5.24)

Therefore, we have

$$\begin{aligned} \cos \theta _{1w}= & {} \dfrac{\sinh \left( \sqrt{n-1}\right) }{\sqrt{\cosh \left( \sqrt{n-1}\right) }\sqrt{\cosh \left( \sqrt{n-1}\right) +n-2}}=\dfrac{\varPhi }{\Sigma ^{1/2}\varUpsilon ^{1/2}}, \end{aligned}$$
(5.25)
$$\begin{aligned} \cos \theta _{vw}= & {} \dfrac{\cosh \left( \sqrt{n-1}\right) -1}{\cosh \left( \sqrt{n-1}\right) +n-2}=\dfrac{\varSigma -1}{\varUpsilon }, \end{aligned}$$
(5.26)

from which the results follow by substitution into the definition of CCC. \(\square \)

Example 3

Here we consider star graphs with \(4\le n\le 100\) for which we calculate \(C_{1}\left( K_{1,n-1}\right) \) and \(C_{2}\left( K_{1,n-1}\right) \). The comparison with the analogous of CC (notice that RCC is identical to CC because the graph is a tree) is straightforward because \(\textrm{CC}_{1}\left( K_{1,n/1}\right) =\left( n-1\right) ^{-1}\), and \(\textrm{CC}_{j}\left( K_{1,n/1}\right) =\left( 2n-3\right) ^{-1}\). That is, for CC the centrality of both nodes decays with the number of vertices. Here the edge length is always equal to one, such that the only effect observed is the fact that a walker at a vertex in a star encounters more vertices to visit as n increases. However, when we consider CCC we find again two competing factors: (i) the increase in the number of first and/or second neighbors, and (ii) the change of the edge length due to the increase in the number of walks connecting pairs of vertices. The resulting effect of both factor is nonmonotonic on the number of vertices as can be seen in Fig. 2a, where CCC first drop with the number of vertices and then growth for \(n\ge 23\). The difference in the CCC of the central and pendant vertices also increases with the growth of the graph size. The reason for the nonmonotonic behavior observed resides in the fact that here the communicability distance between pairs of vertices decays still exponentially, but not so fast as for the case of complete graphs. Without loss of generality let me focus on the central node labeled as 1. Instead of using the more difficult to interpret expression for \(\cos \theta _{1w}\) that we have found before, let us proceed as follow. We can obtain an empirical relation between between the communicability distance and n by finding the best nonlinear regression model. It results to be of the form: \(D_{1,j}\approx ae^{bn}+ce^{dn},\) where we have found that \(a\approx 0.8896\), \(b\approx -0.0472\), \(c\approx -0.5259\), and \(d\approx -0.2606\). Similar results are also obtained for \(D_{i,j}\) when \(i\ne j\ne 1\). The result of the “exact” values of these distances and that of the best fitting are given in Fig. 2b. Thus, what happens in star graphs is that for relatively small number of vertices the contraction in the communicability distance is not enough to overcome the effect of the increase in the number of paths of length one and two. That is, the centrality of the vertex 1, for instance, decays when n increases in this region because the vertex has to make a greater “effort” to “contact” a larger number of nearest neighbors. However, after certain size, due to the contraction of the edge lengths between this vertex and its neighbors, the number of nearest neighbors that can be visited from this vertex is bigger and bigger without increasing too much the “effort” that it takes to visit them.

Fig. 2
figure 2

a Plot of the communicability closeness centrality for the central node (continuous line) and the pendant node (broken line) of star graphs with n vertices. b Plot of the cosine communicability distance from the vertex 1 to a nearest neighbor (left) and between a pair of adjacent vertices \(i\ne j\ne 1\) (right)

In closing, we have seen that CCC has very different behavior for different types of graphs and that they differ significantly from those observed for CC and RCC.

6 Discriminant power of CCC

Bao and Zhang (2021) have recently analyzed the discriminant power of several centrality measures, including the CC one. They quantify the number of pairs of vertices that are differentiated by a given centrality measure relative to the total number of pairs as the discriminant power of the centrality. Here we analyze the discriminant power of the CCC and the analogous closeness centrality CC and RCC, based on their capacity to discriminate the nonidentical vertices in a graph. Let me start with the following.

Example 4

Let us consider the labeled graph illustrated in Fig. 3. As the graph has 7 vertices it has 21 pairs of vertices. Let us check for every of these 21 pairs of vertices if there is a permutation matrix P that interchange these two vertices, such that: \(PAP^\textrm{T}=A\). We have found that there is only one of such permutations vertices, which interchanges the vertices labeled as 3 and 6, which are colored in black in Fig. 3. Therefore, there is at least one nontrivial automorphism of the graph that swap these two vertices. These vertices are named “similar” in graph theory, but here we propose to call them “identical” in order to avoid confusion with the quantitative concept of vertex similarity used before.

Fig. 3
figure 3

Illustration of a labeled graph used in the Example

Let us now consider the values of three closeness centrality measures studied here for the vertices in the graph represented in Fig. 3. The classical CC does not differentiate among the vertices labeled by numbers 3, 4 and 6 (see Table 5). That is, there are three pairs of indistinguishable vertices according to this centrality. The resistance closeness centrality identifies the pair 3, 6 as having the same centrality but it also identifies another pair of vertices as indistinguishable, which corresponds to the pair 4, 5. As we have seen these vertices are not identical because there is not a permutation matrix that interchange them leaving unchanged the adjacency matrix. As can be seen in the third column of see Table 5 the CCC only identifies one pair of vertices as indistinguishable and they coincide with the identical vertices previously found.

Table 5 Values of the communicability closeness centrality (CCC) as well as of the standard closeness centrality and of the resistance closeness centrality for the nodes in the graph illustrated in Fig. 3

Let us first designate \({\hat{x}}_{j}=\dfrac{x_{j}}{\left\| x_{j}\right\| }\), such that \(D_{vw}=\left\| {\hat{x}}_{v}-{\hat{x}}_{w}\right\| ^{2}\) and let us call communicability cosine farness to

$$\begin{aligned} F_{v}=\sum _{w=1}^{n}D_{vw}. \end{aligned}$$
(6.1)

Then, we have the following.

Lemma 6

Let v and w be two different vertices of a graph G. Then, if \(F_{v}\left( G\right) =F_{w}\left( G\right) \),

$$\begin{aligned} \left( {\hat{x}}_{v}-{\hat{x}}_{w}\right) \cdot \left( {\hat{x}}_{1}+\cdots +{\hat{x}}_{n}\right) =0, \end{aligned}$$
(6.2)

meaning that either \({\hat{x}}_{v}={\hat{x}}_{w}\) or that \({\hat{x}}_{v}\cdot R={\hat{x}}_{w}\cdot R\), where \(R={\hat{x}}_{1}+\cdots +{\hat{x}}_{n}\) and \(a\cdot b\) indicates inner product.

Proof

We can write

$$\begin{aligned} \begin{aligned}F_{v}\left( G\right)&=F_{w}\left( G\right) \\ \left\| {\hat{x}}_{v}-{\hat{x}}_{1}\right\| ^{2}+\cdots +\left\| {\hat{x}}_{v}-{\hat{x}}_{n}\right\| ^{2}&=\left\| {\hat{x}}_{w}-{\hat{x}}_{1}\right\| ^{2}+\cdots +\left\| {\hat{x}}_{w}-{\hat{x}}_{n}\right\| ^{2}\\ {\hat{x}}_{v}\cdot {\hat{x}}_{1}+{\hat{x}}_{v}\cdot {\hat{x}}_{2}+\cdots +{\hat{x}}_{v}\cdot {\hat{x}}_{n}&={\hat{x}}_{w}\cdot {\hat{x}}_{1}+{\hat{x}}_{w}\cdot {\hat{x}}_{2}+\cdots +{\hat{x}}_{w}\cdot {\hat{x}}_{n}. \end{aligned} \end{aligned}$$
(6.3)

Because the vectors \({\hat{x}}_{v}\) and \({\hat{x}}_{w}\) are respectively in the LHS and RHS of the sums and recalling that \({\hat{x}}_{v}\cdot {\hat{x}}_{v}=1\) we have \(F_{v}\left( G\right) -F_{w}\left( G\right) =0\) implies that

$$\begin{aligned} \left( {\hat{x}}_{v}-{\hat{x}}_{w}\right) \cdot \left( {\hat{x}}_{1}+\cdots +{\hat{x}}_{n}\right) =0. \end{aligned}$$
(6.4)

Because \(0\le M_{vw}\le 1\), the angle between every pair of position vectors is not larger than \(90^\circ \), which implies that \({\hat{x}}_{1}+\cdots +{\hat{x}}_{n}>0\), which finally proves the result. \(\square \)

Remark 4

If CCC discriminate vertices up to automorphism it should be the case that \(C_{v}=C_{w}\) implies that \(AP=PA\) for a permutation matrix P that swap the vertices v and w. This last condition is fulfilled if \(\varphi _{v}=\varphi _{w}\). The fact that \({\hat{x}}_{v}\cdot R={\hat{x}}_{w}\cdot R\), does not imply mathematically that \({\hat{x}}_{v}={\hat{x}}_{w}\) and consequently does not imply that \(\varphi _{v}=\varphi _{w}\). Therefore, this condition does not imply that if \(F_{v}\left( G\right) =F_{w}\left( G\right) \) there is a permutation that swap the two vertices v and w while preserving the adjacency matrix. Let us then consider the second case, that is when \({\hat{x}}_{v}={\hat{x}}_{w}\). First, let us consider that \(x_{v}=x_{w}\) which means that \(e^{\varLambda /2}\left( \varphi _{v}-\varphi _{w}\right) =0\). Therefore, it implies that \(\varphi _{v}=\varphi _{w}\). Also because \(G_{vv}=x_{v}\cdot x_{v}\), the fact that \(x_{v}=x_{w}\) also implies that \(G_{vv}=G_{ww}\). In closing, if \({\hat{x}}_{v}={\hat{x}}_{w}\) we have that

$$\begin{aligned} e^{\varLambda /2}\left( \dfrac{\varphi _{v}}{G_{vv}}-\dfrac{\varphi _{w}}{G_{ww}}\right) =0, \end{aligned}$$
(6.5)

which necessarily implies that \(\varphi _{v}=\varphi _{w}\).

Because it is not always necessarily true that \(C_{v}=C_{w}\) implies that the two vertices are identical we then studied all 11,117 connected graphs with 8 vertices and identified the number of graphs with a given number of pairs of identical vertices. For instance, there are 3552 graphs with 8 vertices which have no pair of identical vertices (see next section). There are 2825 graphs with only one pair of identical vertices, 1913 with two pairs, and so on. In Table 6 we give the number of graphs with a given number of distinguishable pairs of vertices for the three closeness centrality indices studied here. For instance, there are only 12 graphs for which CC give different values for all their vertices, this number increases up to 2823 for RCC, but only CCC identifies all graphs having no pair of identical vertices. As can be seen in Table 6 the CCC identifies all graphs with any number of pairs of identical vertices among the graphs with 8 vertices. This performance is not observed even for other indices based on the exponential of the adjacency matrix, such as the subgraph centrality (SC), for which we shown the number of identical pairs of vertices identified by this centrality in Table 6. Both CC and RCC are far from discriminate all the pairs of nonidentical vertices in these graphs.

Table 6 Number of connected graphs with 8 vertices which are identified by the three closeness centrality studied here, plus the subgraph centrality, as having a given number of equivalent vertices

6.1 Graphs with no pair of identical vertices

Let me recall that a graph is called asymmetric or identity if it does not contain any pair of identical vertices (Albertson and Collins 1996). That is, if there is no nontrivial automorphism for the graph. Asymmetric graphs are of relevance for the study of graph controllability (Aguilar and Gharesifard 2014; Yoon 2014; Rahmani and Mesbahi 2007; Cvetković ewt al. 2011). For instance, it was proved that the class of essentially controllable graphs form a strict subset of asymmetric graphs (Aguilar and Gharesifard 2014). Additionally, identity graphs have been proposed as “the mathematical structure of the World” in the “World as a graph” proposal of Dipert (1997) (see also Shackel 2011). The idea of “World as a graph” is that if the World is a graph, then it has to be asymmetric to avoid that two spatio-temporal points can be identical and therefore can be swapped.

There are no asymmetric graphs with \(2\le n<6\), but there are 8 asymmetric graphs with 6 vertices, which are illustrated in Fig. 4).

Fig. 4
figure 4

Illustration of the asymmetric (identity) eight graphs with 6 vertices, i.e, those graphs with 6 vertices for which \(\text {Aut}\left( G\right) =1\)

We calculated the CCC for all the vertices in these 8 asymmetric graphs with \(n=6\) and found that this centrality distinguishes all their vertices. That is, let \({\mathcal {A}}_{n}\) be the set of all asymmetric graphs with n vertices, we have observed that if \(v\ne w\) then \(C_{v}\ne C_{w}\) for all \(v,w\in {\mathcal {A}}_{6}\). This is not the case for the standard CC, which does not distinguish all the vertices in any of these asymmetric graphs, nor for the RCC, which only finds all vertices different for 4 out of the 8 asymmetric graphs. Although not a closeness centrality we also compared the results with the SC, which distinguishes all vertices in these 8 asymmetric graphs with 6 vertices.

We now extend these calculations to all asymmetric graphs with \(7\le n\le 9\) and the results are displayed in Table 7. It is known that the cardinality of the sets \({\mathcal {A}}_{n}\) are 8, 144, 3552, 131,452 for \(n=6,7,8,9\), respectively [see integer sequence A124059 in the Encyclopedia of Integer Sequences (Sloane 2018)]. As can be seen in Table 7 the CCC differentiates all the vertices in 100% of the asymmetric graphs with \(6\le n\le 9\). That is, if \(v\ne w\) then \(C_{v}\ne C_{w}\) for all \(v,w\in {\mathcal {A}}_{n\le 9}\). The standard CC differentiates very poorly the vertices of these asymmetric graphs, distinguishing only 2.1% of the asymmetric graphs with 7 vertices, 0.36% of those with 8 vertices and 0.09% of those with 9 vertices. The situation improves when we consider the resistance closeness centrality which differentiates 93.75%, 79.53% and 96.33% of asymmetric graphs with 7, 8 and 9 vertices, respectively. Here again, the SC is the second best, after CCC, and differentiates 100%, 97.83% and 98.3% of asymmetric graphs with 7, 8 and 9 vertices, respectively.

Table 7 Number of asymmetric (identity) graphs whose vertices are differentiated by the different centrality measures for \(6\le n\le 9\)

A particularly interesting class of graphs is formed by those having at least one pair of pseudosimilar vertices (Godsil and Kocay 1982; Lauri 1997) (hereafter we will call them “pseudoidentical”). That is, these are graphs for which a pair of vertices vw exists such that \(G-v\) is isomorphic to \(G-w\) but there is no automorphism transforming v onto w. From the perspective of the current work, these pairs of vertices could be seem as difficult ones to be distinguished by any centrality measure. Two examples of asymmetric graphs with a pair of pseudoidentical vertices are the graphs G and H illustrated in Fig. 5. Both graphs have no automorphism but the identity one. In graph G the vertices 4 and 6 are pseudoidentical as well as the vertices 5 and 7 in H.

Fig. 5
figure 5

Illustration of two pseudoidentity graphs. In graph G the vertices 4 and 6 are those that if removed the resulting graphs \(G-4\) and \(G-6\) are isomorphic (see first line of the figure). For the graph H such vertices are 5 and 7 and the resulting isomorphic graphs are \(H-5\) and \(H-7\)

In Table 8 we give the values of CCC, RCC and SC for every vertex in the two graphs illustrated in Fig. 5. We do not display the results for the classical CC because it is highly degenerated in most of the cases, i.e., many pairs of nonidentical vertices have the same values of the CC. Although the RCC fails to differentiate all the vertices of these two graphs, the pairs of vertices for which RCC gives the same values do not coincide with the pseudoidentical ones. That is, in graph G the pseudoidentical vertices are labeled 4 and 6 while the RCC produces identical values for vertices 5 and 6. The same happens for graph H where the RCC is degenerated for vertices 6 and 8 while the pseudoidentical ones are 5 and 7. This situation is repeated across the asymmetric graphs with 8 vertices where we have identified 36 graphs having one pair of pseudoidentical vertices. The vertices of 22 of these 36 graphs are distinguished by the RCC and in the remaining cases the pairs with similar values of this centrality do not necessarily coincide with the pseudoidentical ones. A different situation occurs with the SC which have the same value for the pairs of pseudoidentical vertices in both graphs illustrated in Fig. 5 (see Table 8). However, the SC does not only produce degeneration for these pairs of pseudoidentical vertices but occasionally also in other pairs, such as pair 6,8 in graph H. This happened in 14 out of the 36 asymmetric graphs with one pair of pseudoidentical vertices.

Table 8 Values of centrality measures for the two pseudoidentity graphs illustrated in Fig. 5

We calculated the difference between the CCC of every pair of vertices in the two graphs illustrated in Fig. 5. In both cases it is found that the smallest difference is observed for the pair of pseudoidentical vertices. Unfortunately, this is not always true, as for the 36 asymmetric graphs with a pair of pseudoidentical vertices, the smallest difference between the CCC of every pair of vertices coincides with the pseudoidentical vertices in 27 occasions, but it does not coincide in 9.

Let me now consider some other examples of known identity graphs which are illustrated in Fig. 6, which have from 12 to 222 vertices.

Fig. 6
figure 6

Illustration of the structure of eight “classical” examples of identity graphs, i.e., those having no symmetry but the identity one

In Table 9 we give the number of pairs of vertices which CC, RCC and CCC identify as equivalent. As can be seen CCC differentiates all the vertices in the eight graphs. RCC identifies as equivalent a pair of vertices in the Frucht graph but in general distinguishes very well the vertices in these graphs. Finally, CC only gives different values for the vertices of the (10,3)-incidence graph illustrated.

Table 9 Number of pairs of vertices with the same value of the closeness centrality measures studied here for eight “classical” examples of identity graphs

Note 1 A note of caution should be raised here about numerical accuracy. In some cases, the values of the centrality can be very close for different pairs of vertices and the precision of the calculations should be increased. However, it also must be taken into account what is the accuracy of the numerical methods used to compute the corresponding centrality. For instance, in the case of CCC in the Gardner graph we have to consider accuracy up to 15 decimal places and using \(\sum _{w=1}^{n}D_{vw}\) instead of its reciprocal in order to avoid problems with rounding errors. Therefore, we would like to remark that more numerical analysis is needed before giving conclusive statements about the true discriminant power of CCC.

6.2 Graphs with at least one pair of identical vertices

In Fig. 7 we illustrate eight graphs in which there are one (a), two (b), three (c), four (d), five (e), six (f), seven (g) and eight (h) pairs of identical vertices, respectively. The number of identical pairs of vertices was identified computationally by obtaining all possible permutations of the 8 vertices and checking which ones obeys that \(PAP^\textrm{T}=A\). The groups of identical nodes are identified in the Figure with different colors. For instance, in graph b) there are two pairs of identical vertices, 1,2 colored in black and 6,7 in gray. For larger number of pairs it is possible that such pairs are grouped in clusters. For instance, in c) where there are three pairs, they are formed by the vertices 3,4,5 (3,4; 3,5; 4,5). It could be in other cases that the pairs are isolated and not grouped in this way.

Fig. 7
figure 7

Illustration of graphs with 8 vertices having one (a), two (b), three (c), four (d), five (e), six (f), seven (g) and eight (h) pairs of identical vertices, respectively. Every group of identical vertices is colored the same. For instance, in (h) there are found vertices in black, which form 6 pairs of identical vertices, plus one pair in dark gray and another pair in pale gray, summing 8 pairs of identical vertices

In Table 10 we give the values of CCC for every vertex in these eight graphs. The pairs of nodes for which \(C_{v}=C_{w}\) coincide in all cases with the identical nodes existing in these graphs and no other pair of nodes have the same values of CCC. This situation is very different for the other closeness centrality measures. For instance, for graph a) CC gives the same value \(\textrm{CC}_{j}=0.1,j=3,\ldots ,7\) for five vertices, such that there are 10 pairs of vertices with the same values of CC. The RCC identifies, apart from the pair 4,5 for which \(\textrm{RC}_{4}=\textrm{RC}_{5}\approx 0.2689\), also the pair 6,7 as having the same value of this centrality, \(\textrm{RC}_{6}=\textrm{RC}_{7}\approx 0.2647\). The SC identifies the vertices 4,5,7 with the same value, \(SC_{4}=SC_{5}=SC_{7}\approx 8.0135\). In the case of the graph b) the CC identifies 11 pairs of equivalent vertices (vertices 1–5 and 6,7), while the RCC identifies three pairs (1,2; 3,4 and 6,7) as well as SC which identifies 1,2; 3,5 and 6,7. In graph c) CC and RCC identify the same 10 pairs of equivalent vertices (3–7), they also coincide in identifying 15 pairs (2–7) for graph d) and 7 pairs (1–3; 4,5 and 6–8) for graph e). The graph f) is a tree, which is shown to illustrate that even in very simple graphs CC identifies different set of vertices than CCC. In this case CC (RCC is identical to CC for trees) identifies vertex 7 as equivalent to vertices 1–4, which are the identical ones. In graphs g) and h) CC identifies 9 and 16 equivalent nodes, respectively, while RCC identifies correctly only the identical vertices in g) and h). However, in g) SC identifies 9 vertices which coincide with those identified by CC. If we consider the sum of the rows of the LHN matrix \({\mathcal {N}}\) (Eq. 4.1) as a centrality measure, then, the graph h) is an example where it fails to identify only the identical vertices because it identifies 16 pairs of equivalent vertices (1,2 and 3–8) instead of 8. More examples exist among the graphs with 8 vertices.

Table 10 Values of CCC for the vertices in the graphs illustrated in Fig. 7

Remark 5

A note of caution should be stressed here. It is not the case that centrality measures in general should differentiate vertices up to their automorphism. For instance, CC is designed to identify those vertices in a graph which are closer to the rest of the vertices. Therefore, two vertices with the same proximity to the rest of the vertices of the graph have to have the same CC even if they are not identical.

6.3 CCC in networks

Here we consider 14 networks representing complex systems in a variety of scenarios. They include a flip-flop electronic circuit, the neuronal network of the worm C. elegans, the networks of connections between regions of cat and macaque cortex, a network connecting pairs of human brain that coactivate, a network indicating whether the buyers of a given political book also buy another one, the Roget thesaurus, and the ODLIS, a network representing the flights connections between USA airports, a food web in Bridge Brooks, the protein-protein interaction network of yeast, a network indicating whether two drug users have interchanged needles in a given period of time, the network indicating whether two authors in computational geometry have published a paper together, and a network representing the Internet at the Autonomous System (more details and references can be found in the Appendix of Estrada (2012a)).

6.3.1 Distinguishing vertices

For all the 14 real-world networks previously described we calculated the number of pairs of identical vertices \(\#{\mathcal {A}}\), as well as the number of vertices which are differentiated by each of the three CC studied. The first interesting result is that all neuronal systems in different species considered here are identity graphs. This is particularly interesting in the case of those neuronal systems coming from brains with bilateral symmetry, such as cat and macaque visual cortex as well as the human brain. That is, the bilateral symmetry, as well as any other kind of symmetry, is broken in these networks possibly due to functional reasons. However, the study of the implication of this network asymmetry is outside the scope of this work. As can be seen in Table 11 in all these cases RCC and CCC distinguish all pairs of vertices.

The degree of vertices redundancy in a network can be calculated as

$$\begin{aligned} {\mathcal {R}}\left( G\right) :=\dfrac{2\#{\mathcal {A}}}{n\left( n-1\right) }, \end{aligned}$$
(6.6)

which are given as percentages in Table 11. As can be seen, real-world networks have very few pairs of identical vertices in relation to their total possible number. This contrasts with the very high global symmetry observed by some of these graphs obtained from the automorphism group. For instance, \(\left| \text {Aut}\left( G\right) \right| =2.5916\cdot 10^{24}\) for the airport network in USA and \(\left| \text {Aut}\left( G\right) \right| =1.8994\cdot 10^{320}\) for the collaboration network in computational geometry (we have used exactly the same versions of the networks used in MacArthur et al. (2008)). This means that there are much more ways of transforming the 1436 pairs of identical vertices existing in the collaboration network than symmetry operations exist to transform the 135 pairs of identical vertices in the airport network. However, as we have remarked in the Introduction with the example of \(K_{4}\) and \(C_{4}\) the size of the automorphism group does not indicate the number of pairs of vertices which are identical, which is what is matter in many real-world problems, such as the problem of controllability.

Table 11 Number of pairs of vertices which are differentiated by the three closeness centrality studied here in the 14 real-world networks analyzed

Let us now turn our analysis to the identical vertices in the Roget Thesaurus network. The complete enumeration of all pairs of vertices that can be swapped by a nontrivial automorphism identifies only three pairs, formed by words: “duality”-“bisection”, “celibacy”-“divorce”, and “man”-“woman”. The CC identified 307 pairs of words with equivalent closeness centrality. However, both CCC and RCC identifies three uniquely existing identical pairs of words. The pairs “duality”-“bisection” and “celibacy”-“divorce” are pairs of trivially identical words because they correspond to vertices of degree one which are connected to the same root vertex. In the first case the two words are connected to “duplication” and in the second they are connected to “marriage”. The words “man” and “woman” are connected to each other in the network and each of them is connected to other three words: “infant”, “adolescent” and “marriage”.

The second example is the network representing ODLIS which is formed by 2898 vertices and 16,376 edges, after the elimination of 5 self-loops (self-referenced words). In this case there are 121 identical pairs of words. The CCC and RCC identify correctly these 121 pairs of words, while CC identifies 1324 pairs of equivalent words. The large majority of the existing identical words correspond to trivial pairs–pairs of pendant vertices connected to the same vertex. For instance, 102 out of 121 pairs correspond to such cases, with groups of up to 8 pendant vertices connected to the same node. One of these groups is formed by the words: “book cradle”, “bookrest”, “imbrication”, “lectionary”, “polaire” and “bibliotaph”, which are all connected to the term “bibliographic item”, which have appeared previously in the analysis of words similarity in this work. There are, however, 19 pairs of words forming pairs of nontrivial identical vertices, i.e., their vertices have degree larger than one. These 19 pairs of nontrivial identical vertices are formed only by isolated pairs, that is, there are no triples, quadruples, etc. Some examples are: “chef-d’oeuvre”-“masterpiece” of degree 2; “imprimatur”-“in press” of degree 3; “americanize”-“briticize”, “elegy”-“ode”; “burlesque”-“parody”, all of degree 4; “color plate”-“monochrome plate” of degree 5; “broader term (BT or B)”-“narrower term (NT or N)” of degree 12.

Fig. 8
figure 8

Illustration of the degree (a), betweenness (b), closeness (c) and communicability closeness (d) centrality of the nodes in the network of a karate club, where the size and color of the vertices are proportional to the corresponding centrality

6.3.2 Ranking vertices

One of the most important applications of centrality measures is the ranking of vertices in decreasing order of the given centrality. The most central vertices are then expected to play some fundamental, structural and/or dynamical, role in the network. However, in most of real-world networks, vertex centrality measures are highly correlated with the degree of the vertices, in particular for those which depends on shortest paths. For instance, if a vertex has degree k there are \(k\left( k-1\right) /2\) shortest paths of length two that cross this vertex. Therefore, if k is relatively large, there are high chances that the betweenness centrality of this node is also high. Thus, we are counting duplicated information in this case. This happens, for instance, in the social network where a group of 34 members of a karate club express their friendship preferences. The nodes labeled by 1 and 34 correspond to the trainer and administrator of the club and they are visibly the most connected ones with degrees 16 and 17 respectively. A polarization is known to exist in this network as there are two factions in it: one following the administrator and the other following the trainer. In Fig. 8 we represent this network with vertices colored and with radii proportional to their degrees (a) and betweenness centrality (b). If we consider the CC (see Fig. 8c) we can see that it is also biased by the degree of the nodes. The ranking according to CC is: 1, 3, 34, 32, 33, which are the vertices with degrees 16, 10, 17, 6, 12. RCC identifies 34, 1, 3, 33, and 2 as the most central vertices which have degrees: 17, 16, 10, 12, and 9. Thus, it is even more biased by degree than CC. However, CCC identifies the vertices 9, 3, 20, 32, 14 and 31 with degrees 5, 10, 3, 6, 5, and 4, but placed somehow in between the two groups in conflict in this network. This group of the most central vertices according to CCC are those for which there is a relatively large number of short walks connecting them to the rest of the vertices but a relatively small number of walks starting and ending at themselves. They are “good communicators” between the two factions existing in the network.

We can now identify the most central words according to CCC in the Roget Thesaurus network, which are given in Table 12 together with the degree, CC and RCC. As can be seen CC and RCC identifies as most central the most connected words in the thesaurus, while CCC again identifies words with relatively small degree but well connected to the rest of words via relatively short walks.

Table 12 Most central words in Roget Thesaurus according to degree and the three closeness centrality studied here

These differences are even more remarked in the case of the ODLIS network as can be seen in Table 13 where CC and RCC identify the most connected vertices, while CCC identifies vertices of relatively small degree.

Table 13 Most central words in ODLIS according to degree and the three closeness centrality studied here
Fig. 9
figure 9

Political books repurchased by the same buyers. a A classification of books as liberal (brown), liberal (blue) and neutral (green). bf Coloring of the vertices according the degree (b), CC (c), RCC (d), betweenness (e) and CCC (f)

Now, we will focus on a network of books about US politics, where books about US politics published around the time of the 2004 presidential election and sold by the online bookseller Amazon.com are represented as vertices and edges between books represent frequent co-purchasing of books by the same buyers. The books have been classified as “liberal”, “conservative” and “neutral” as can be seen in Fig. 9a. The degree of the vertices identifies three conservative books as the most central ones, followed by five liberal books and then again three conservatives and so on (see Fig. 9b). In the case of closeness measures we would expect that books placed more or less equidistant from both polarized groups, i.e., liberal and conservative, would be among the most central ones. The CC identifies the books “The Price of Loyalty”, “Rise of the Vulcans”, “The Bushes”, “Ghost Wars” and “Bush Country” as the most central ones. The books at positions 2, 3 and 4 of CC ranking are almost equidistant of both main political groups. “Rise of the Vulcans” is connected to 5 conservative books, 3 neutral and 4 liberal. “The Bushes” and “Ghost wars” both have 4, 2, 2 connections to conservative, neutral and liberal books. However, “The Price of Loyalty” and “Bush Country” are selected due to the CC bias towards high degree nodes. The first has 2, 3, 15 connections, being clearly closer to liberal books, and the second has 14, 2, 0, being clearly closer to conservative ones.

RCC is highly biased towards degree as can be seen in Fig. 9c. Indeed in the top ten ranking of books by RCC we find the top most connected books in the network, and there is a high correlation between the two centrality indices. Indeed \(\textrm{RCC}\sim k^{-\zeta }\) with a squared correlation coefficient of 0.97. The ranking produced by the betweenness centrality also displays bias by the degree of the vertices. It ranks at the top the books: “The Price of Loyalty”, “The Bushes”, “Bush Country”, “Off with Their Heads” and “American Dynasty”. The books at positions 1, 4 and 5 are among the most connected ones in the network: (the first was already analyzed), “Off with Their Heads” has 22, 3, 0 connections and “American Dynasty” has 1, 2, 19 connections.

At this point we arrive at the ranking produced by CCC where the top books are: “The Bushes”, “Ghost Wars”, “Rise of the Vulcans”, “Sleeping With the Devil”, and “Why Courage Matters”, which have (4, 2, 2), (4, 1, 3), (5, 3, 4), (3, 3, 2) and (4, 1 0) connections, respectively. As can be seen in Fig. 9f all the most central books are at the central part of the network almost equidistant from both major political wings, without any bias due to the vertex degree, which is a characteristics not observed by any of the other centrality measures and illustrate one of the main characteristic features of the communicability cosine distance and derived indices.

6.3.3 On scalability

Before ending this work we would like to briefly mention the problem of scalability of the current approach. There are many algorithms available for computing the exponential of a matrix (see for instance Moler and Van Loan 2003), although we have used here the scaling and squaring one (Higham 2005) as implemented in Matlab. However, we recognize that for extremely large networks this method could be problematic in terms of computer time. Therefore, an possible solution is to consider an approximation to the spectral representation of the communicability function based on the k-largest eigenvalues and their corresponding eigenvectors:

$$\begin{aligned} G_{vw}\left( k\right) =\sum _{j=1}^{k}\psi _{j,v}\psi _{j,w}e^{\lambda _{j}}, \end{aligned}$$
(6.7)

where \(\lambda _{1}\ge \cdots \ge \lambda _{k}\). In this case we can use the implicitly restarted Arnoldi (IRA) method (Lehoucq and Sorensen 1996; Lehoucq et al. 1998). To illustrate the method let us consider a network of \(n=10,000\) vertices generated with a random structure and \(m=60,000\) edges. The mean value of the CCD between every pair of vertices is \({\bar{D}}\approx 1.0358\). The approximate values of \({\bar{D}}\left( k\right) \) are given in Table 14. More importantly we calculated the CCC using both the scaling and squaring method (Matlab ’expm’ function) and the their approximation based on IRA for different values of k. Then, we obtained the correlation (Pearson, Spearman and Kendall) coefficients of the regression between the “exact” and approximate values as illustrated in Table 14. In both cases it is seem that the use of about 20% of the top eigenvalues and eigenvectors of the adjacency matrix give acceptable results in the IRA approximation. It is clear that this is not an exhaustive numerical analysis of this particular problem but it give some hints about the utility of this or similar approximate methods for computing matrix exponential for extremely large matrices.

Table 14 Values of the relative error (RE) for the mean of the CCD among every pair of vertices in a random graph with 10,000 vertices and 60,000 edges and the coefficients of Pearson (\(r_\textrm{P}\)), Spearman rank correlation (\(r_\textrm{S}\)) and Kendall \(\tau \) coefficient for the correlation of the CCC based on scaling and squaring method and based on IRA with different values of k