Nonbacktracking cycles: length spectrum theory and graph mining applications
Abstract
Graph distance and graph embedding are two fundamental tasks in graph mining. For graph distance, determining the structural dissimilarity between networks is an illdefined problem, as there is no canonical way to compare two networks. Indeed, many of the existing approaches for network comparison differ in their heuristics, efficiency, interpretability, and theoretical soundness. Thus, having a notion of distance that is built on theoretically robust first principles and that is interpretable with respect to features ubiquitous in complex networks would allow for a meaningful comparison between different networks. For graph embedding, many of the popular methods are stochastic and depend on blackbox models such as deep networks. Regardless of their high performance, this makes their results difficult to analyze which hinders their usefulness in the development of a coherent theory of complex networks. Here we rely on the theory of the length spectrum function from algebraic topology, and its relationship to the nonbacktracking cycles of a graph, in order to introduce two new techniques: NonBacktracking Spectral Distance (NBD) for measuring the distance between undirected, unweighted graphs, and NonBacktracking Embedding Dimensions (NBED) for finding a graph embedding in lowdimensional space. Both techniques are interpretable in terms of features of complex networks such as presence of hubs, triangles, and communities. We showcase the ability of NBD to discriminate between networks in both real and synthetic data sets, as well as the potential of NBED to perform anomaly detection. By taking a topological interpretation of nonbacktracking cycles, this work presents a novel application of topological data analysis to the study of complex networks.
Keywords
Graph distance Graph embedding Algebraic topology Length spectrumIntroduction
As the network science literature continues to expand and scientists compile more examples of real life networked data sets coming from an ever growing range of domains (Clauset et al.Kunegis 2013), there is a need to develop methods to compare complex networks, both within and across domains. Many such graph distance measures have been proposed (Soundarajan et al. 2014; Koutra et al. 2016; Bagrow and Bollt 2018; Bento and Ioannidis 2018; Onnela et al. 2012; Schieber et al. 2017; Chowdhury and Mémoli 2017; 2018; Berlingerio et al. 2013; Yaveroğlu et al. 2014), though they vary in the features they use for comparison, their interpretability in terms of structural features of complex networks, computational costs, as well as in the discriminatory power of the resulting distance. This reflects the fact that complex networks represent a wide variety of systems whose structure and dynamics are difficult to encapsulate in a single distance score. For the purpose of providing a principled, interpretable, efficient and effective notion of distance, we turn to the length spectrum function. The length spectrum function can be defined on a broad class of metric spaces that includes Riemannian manifolds and graphs. The discriminatory power of the length spectrum is well known in other contexts: it can distinguish certain onedimensional metric spaces up to isometry (Constantine and Lafont 2018), and it determines the Laplacian spectrum in the case of closed hyperbolic surfaces (Leininger et al. 2007). However, it is not clear if this discriminatory power is also present in the case of complex networks. Accordingly, we present a study on the following question: is the length spectrum function useful for the comparison of complex networks?
We answer this question in the positive by introducing the NonBacktracking Spectral Distance (NBD): a principled, interpretable, efficient, and effective measure that quantifies the distance between two undirected, unweighted networks. NBD has several desirable properties. First, NBD is based on the theory of the length spectrum and the set of nonbacktracking cycles of a graph (a nonbacktracking cycle is a closed walk that does not retrace any edges immediately after traversing them); these provide the theoretical background of our method. Second, NBD is interpretable in terms of features of complex networks such as existence of hubs and triangles. This helps in the interpretation and visualization of distance scores yielded by NBD. Third, NBD is more computationally efficient than other comparable methods. Indeed, NBD depends only on the computation of a few of the largest eigenvalues of the nonbacktracking matrix of a graph, which requires only slightly more computational time than the spectral decomposition of the adjacency matrix. Fourth, we have extensive empirical evidence that demonstrates the effectiveness of NBD at distinguishing real and synthetic networks.
NBD is intimately related to the eigenvalues of the nonbacktracking matrix, which serve as a proxy for the length spectrum of a graph (see “Relaxed Length Spectrum” section). Motivated by the usefulness of the eigenvalues of the nonbacktracking matrix, we then turn our focus to its eigenvectors. We discuss the potential of using the eigenvectors as a graph embedding technique, which we refer to as NonBacktracking Embedding Dimensions (or NBED for short). This technique computes a low dimensional embedding of each directed edge in a graph. We show that the peculiar visualizations yielded by NBED are particularly apt at providing a rich visualization of the embedded network, and use the patterns found therein to perform anomaly detection on the Enron email corpus (Klimt and Yang 2004).
The rest of this paper is structured as follows. “Background” section provides necessary background information on the length spectrum, nonbacktracking cycles, and the nonbacktracking matrix. In “Related work” section we discuss previous investigations related to ours. “Operationalizing the length spectrum” section explains the connection between these objects, as well as a discussion of the properties of the nonbacktracking matrix that make it relevant for the study of complex networks. It also presents our efficient algorithm to compute it. “NBD: Nonbacktracking distance” section presents our distance method NBD and provides experimental evidence of its performance by comparing it to other distance techniques. In “NBED: Nonbacktracking embedding dimensions” section we discuss our embedding method NBED based on the eigenvectors of the nonbacktracking matrix and provide extensive visual analysis of the resulting edge embeddings, as well as mention its shortcomings and necessary future lines of study. We conclude in “Discussion and conclusions” section with a discussion of limitations and future work.
Background
Notation used in this work
Symbol  Definition 

π_{1}(X,p)  The fundamental group of X with basepoint p 
[c]  Homotopy class of loop c 
\(\mathcal {L}, \mathcal{L'}\)  Length spectrum function, relaxed length spectrum 
Conv(X)  If X is a graph, Conv(X) is its 2core 
G=(V,E)  An undirected graph with node set V and edge set E 
n,m  Number of nodes and number of edges of a graph G 
e, e^{−1}  A directed edge e=u→v and its inverse e^{−1}=v→u 
NBC  Nonbacktracking cycle, in which no edge is followed by its inverse 
B  2m×2m nonbacktracking matrix of a graph 
B ^{′}  2n×2n matrix whose eigenvalues are the same as those of B, save for ±1 
λ_{k}=a_{k}+ib_{k}  kth largest eigenvalue, in magnitude, of B 
Re(λ),Im(λ)  Real and imaginary parts of the complex number λ 
P,Q  n×2m directed incidence matrices of a graph 
p _{ k}  The fraction of a graphs’ nodes with degree k 
γ  The degree exponent in the case when p_{k}∼k^{−γ} 
〈k^{i}〉  ith moment of the degree distribution p_{k} 
deg(u)  Degree of node u 
nnz(A)  Number of nonzero elements of binary matrix A 
NBD(G,H)  Nonbacktracking distance between graphs G and H 
d({λ_{k}},{μ_{k}})  Distance between eigenvalues λ_{k} and eigenvalues μ_{k} 
r  Number of eigenvalues computed from one graph 
\(\rho = \sqrt {\lambda _{1}}\)  Magnitude threshold for eigenvalue computation 
r _{0}  Number of eigenvalues whose magnitude is greater than ρ 
σ  Spread parameter of the RBF kernel 
Length Spectrum
The length spectrum \(\mathcal {L}\) is a function on π_{1}(X) which assigns to each homotopy class of loops the infimum length among all of the representatives in its conjugacy class.^{1} Note, importantly, that the definition of length of a homotopy class considers the length of those loops not only in the homotopy class itself, but in all other conjugate classes. In the case of compact geodesic spaces, such as finite metric graphs, this infimum is always achieved. For a finite graph where each edge has length one, the value of \(\mathcal {L}\) on a homotopy class then equals the number of edges contained in the lengthminimizing loop. That is, for a graph G=(V,E), v∈V, if [c]∈π_{1}(G,v) and c achieves the minimum length k in all classes conjugate to [c], we define \(\mathcal {L}([c]) = k\).
Our interest in the length spectrum is supported by the two following facts. First, graphs are aspherical. More precisely, the underlying topological space of any graph is aspherical, i.e., all of its homotopy groups of dimension greater than one are trivial.^{2} Therefore, it is logical to study the only nontrivial homotopy group, the fundamental group π_{1}(G). Second, Constantine and Lafont (Constantine and Lafont 2018) showed that the length spectrum of a graph determines a certain subset of it up to isomorphism. Thus, we aim to determine when two graphs are close to each other by comparing their length spectra relying on the main theorem of (Constantine and Lafont 2018). For completeness, we briefly mention it here; it is known as marked length spectrum rigidity.
Theorem 1
(Constantine and Lafont 2018) For a metric space X define Conv(X) as the minimal set to which X retracts by deformation. Let X_{1},X_{2} be a pair of compact, noncontractible, geodesic spaces of topological dimension one. If the marked length spectra of X_{1} and X_{2} are the same, then Conv(X_{1}) is isometric to Conv(X_{2}).
When G_{1},G_{2} are graphs, Conv(G_{i}),i=1,2, corresponds to the subgraph resulting from iteratively removing nodes of degree 1 from G_{i}, which is precisely the 2core of G_{i} (Batagelj and Zaversnik 2011). Equivalently, the 2core of G is the maximal subgraph in which all vertices have degree at least 2. Thus, Theorem 1 states that when two graphs have the same length spectrum, their 2cores are isomorphic.
Given these results, it is natural to use the length spectrum as the basis of a measure of graph distance. Concretely, given two graphs, we aim to efficiently quantify how far their 2cores are from being isomorphic by measuring the distance between their length spectra. In “Relaxed Length Spectrum” section, we explain our approach at implementing a computationally feasible solution for this problem.
NonBacktracking Cycles
Consider an undirected, unweighted graph G=(V,E) and suppose E=m. For an undirected edge e={u,v}∈E, we write u→v and v→u for its two possible orientations, and define E → as the set of all 2m directed edges. Given a directed edge e=u→v∈E →, define e^{−1} as the same edge traversed in the inverse order, e^{−1}=v→u. A path in G is a sequence of directed edges e_{1}e_{2}...e_{k} such that if e_{i}=u_{i}→v_{i} then v_{i}=u_{i+1} for i=1,...,k−1. Here, k is called length of the path. A path is closed if v_{k}=u_{1}. A path is called nonbacktracking when an edge is never followed by its inverse, \(e_{i+1} \neq e_{i}^{1}\) for all i. A closed path is also called a cycle. A cycle is a nonbacktracking cycle (or NBC for short) when it is a closed nonbacktracking path and, in addition, \(e_{k} \neq e_{1}^{1}\).^{3}
where δ_{uv} is the Kronecker delta. Thus, there is a 1 in the entry indexed by row k→l and column u→v when u≠l and v=k, and a 0 otherwise. Intuitively, one can interpret the B matrix as the (unnormalized) transition matrix of a random walker that does not perform backtracks: the entry at row k→l and column u→v is positive if and only if a walker can move from node u to node v (which equals node k) and then to l, without going back to u.
Importantly, NBCs are topologically relevant because backtracking edges are homotopically trivial, that is, the length −2 cycle u→v,v→u can always be contracted to a point (Terras 2011). Indeed, in Fig. 2 we removed edges are precisely those that form backtracks.
Observe that the matrix B tracks each pair of incident edges that do not comprise a backtrack. As a consequence we have the following result.
Lemma 1
\(B^{p}_{k \to l, i \to j}\) equals the number of nonbacktracking paths of length p+1 that start with edge i→j and end with edge k→l. Moreover, tr(B^{p}) is proportional to the number of closed nonbacktracking paths (i.e., NBCs) of length p in G.
Proof 1
The first fact is reminiscent of the wellknown fact that \(A^{p}_{uv}\) gives the number of paths that start at node u and end at node v, for any u,v∈G, where A is the adjacency matrix of G. The proof is the same as the proof of this latter fact, so we omit the details for brevity. The only difference is that B^{p} gives the number of nonbacktracking paths of length p+1, while A^{p} gives the number of paths of length p. To see why, notice that if the entry B_{k→l,i→j} is positive, then the length −2 path i→j→l must exist and it cannot contain any backtracks. Accordingly, if \(B^{2}_{k \to l, i \to j}\) is positive then the length −3 path i→j→k→l must be nonbacktracking. It remains to show that tr(B^{p}) is proportional to the number of NBCs of length k, and not the number of NBCs of length p+1 as one may think from the previous paragraph. Suppose a→b→c→a is a valid closed nonbacktracking path, that is, the nodes a,b,c form a triangle. In this case, \(B^{2}_{c \to a, a \to b}\) must be positive, as must be \(B^{3}_{a \to b, a \to b}\). The latter represents the existence of some node c such that the path a→b→c→a→b exists and is nonbacktracking, or, equivalently, that a→b→c→a is a nonbacktracking cycle. Note that \(B^{3}_{b \to c, b \to c}\) and \(B^{3}_{c \to a, c \to a}\) are also positive and all signal the existence of the same triangle a,b,c. Thus the sum of all diagonal elements tr(B^{3}) counts every triangle exactly six times, since there are three edges each with two possible orientations. A proof by induction shows that the above argument holds for any p.
The second claim of Lemma 1 will be fundamental in our later exposition. We prefer to count NBCs of length p using tr(B^{p}), as opposed to the entries of B^{p−1}, because of the wellknown fact that \(tr(B^{p}) = \sum _{i} \lambda _{i}^{p}\), for any square matrix B, and where each λ_{i} is an eigenvalue of B (Lang 2004). Also of importance is the fact that B is not symmetric, and hence its eigenvalues are, in general, complex numbers. Since B is a real matrix, all of its eigenvalues come in conjugated pairs: if a+ib is an eigenvalue then so is a−ib, where i is the imaginary unit. This implies that we can write the above equation as \(tr(B^{p}) = \sum _{i} \lambda _{i}^{p} = \sum _{i} \text {Re}\left (\lambda _{i}^{p}\right)\), where Re(z) is the real part of a complex number, since the imaginary part of conjugated eigenvalues cancel out. In the rest of this work we refer to B’s eigenvalues as the nonbacktracking eigenvalues.
where A is the adjacency matrix, D is the diagonal matrix with the node degrees, and I is the n×n identity matrix.
Related work
Hashimoto (1989) discussed the nonbacktracking cycles of a graph (and the associated nonbacktracking matrix) in relation to the theory of Zeta functions in graphs. Terras (2011) explained the relationship between nonbacktracking cycles and the free homotopy classes of a graph. More recently, the nonbacktracking matrix has been applied to diverse applications such as node centrality (Martin et al. 2014; Grindrod et al. 2018) and community detection (Krzakala et al. 2013; Bordenave et al. 2015; Kawamoto 2016), and to the data mining tasks of clustering (Ren et al. 2011) and embedding (Jiang et al. 2018). In particular, the application to community detection is of special interest because it was shown that the nonbacktracking matrix performs better at spectral clustering than the Laplacian matrix in some cases (Krzakala et al. 2013). Hence, there is recent interest in describing the eigenvalue distribution of the nonbacktracking matrix in models such as the ErdősRényi random graph and the stochastic block model (Gulikers et al. 2017). Our work differs from other applied treatments of the nonbacktracking matrix in that we arrive at its eigenvalues from first principles, as a relaxed version of the length spectrum. Concretely, we use the eigenvalues to compare graphs because the spectral moments of the nonbacktracking matrix describe certain aspects of the length spectrum (see “Operationalizing the length spectrum” section). The spectral moments of other matrices (e.g., adjacency and Laplacian matrices) also describe structural features of networks (Estrada 1996; Preciado et al. 2013).
Many distance methods have been proposed recently (Soundarajan et al. 2014; Koutra et al. 2016; Bagrow and Bollt 2018; Bento and Ioannidis 2018; Onnela et al. 2012; Schieber et al. 2017; Chowdhury and Mémoli 2017; 2018; Berlingerio et al. 2013; Yaveroğlu et al. 2014). This proliferation is due to the fact that there is no definitive way of comparing two graphs, especially complex networks since they present a vast diversity of structural features. The methods that are most related to ours fall in two categories: combinatorial enumeration or estimation of different subgraphs (such as motifs or shortest paths), or those that depend on spectral properties of some matrix representation of the network, such as the method Lap from “Data sets and base lines” section.
The topic of embedding has also seen a sharp increase of activity in recent years, motivated by the ubiquity of machine learning and data mining applications to structured data sets; see (Goyal and Ferrara 2018; Hamilton et al. 2017) for recent reviews. Our method NBED differs from most others in that it yields an edge embedding that differentiates the two possible orientations of an edge. Our focus is in providing an interpretable visual analysis of the resulting embedding, and on its application to anomaly detection.
Operationalizing the length spectrum
Relaxed Length Spectrum
The first step of this procedure is to focus on the image of the length spectrum rather than the domain (i.e., focus on the collection of lengths of cycles). The second step is to aggregate these values by considering the size of the level sets of either length spectrum.
Concretely, when comparing two graphs G and H, instead of comparing \(\mathcal {L}_{G}\) and \(\mathcal {L}_{H}\) directly, we compare the number of cycles in G of length 3 with the number of cycles in H of the same length, as well as the number of cycles of length 4, of length 5, etc, thereby essentially comparing the histogram of values that each \(\mathcal {L}\) takes. Theoretically, focusing on this histogram provides a common ground to compare the two functions. In practice, this aggregation allows us to reduce the amount of memory needed to store the length spectra because we no longer keep track of the exact composition of each of the infinitely many (free) homotopy classes. Instead, we only keep track of the frequency of their lengths. According to this aggregation, we define the relaxed version of the length spectrum as the set of points \(\mathcal {L}' = \{(k, n(k)):k=1,2,..\}\), where n(k) is the number of conjugacy classes of π_{1} (i.e., free homotopy classes) of length k.
The major downside of removing focus from the underlying group structure and shifting it towards (the histogram of values in) the image is that we lose information about the combinatorial composition of each cycle. Indeed, π_{1}(G) holds information about the number of cycles of a certain length k in G; this information is also stored in \(\mathcal {L}'\). However, the group structure of π_{1}(G) also allows us to know how many of those cycles of length k are formed by the concatenation of two (three, four, etc.) cycles of different lengths. This information is lost when considering only the sizes of level sets of the image, i.e., when considering \(\mathcal {L}'\).
The next step makes use of the nonbacktracking cycles (NBCs). We rely on NBCs because the set of conjugacy classes of π_{1}(G) is in bijection with the set of NBCs of G see e.g., Terras (2011), Hashimoto (1989). In other words, to compute the set \(\mathcal {L}'\) we need only account for the lengths of all NBCs. Indeed, consider the nonbacktracking matrix B of G and recall that tr(B^{k}) equals the number of NBCs of length k in the graph. This gives us precisely the set \(\mathcal {L}' = \left \{\left (k, tr\left (B^{k}\right)\right)\right \}_{k=1}^{\infty }\). Recall that \(tr\left (B^{k}\right)=\sum _{i} \lambda _{i}^{k}\), where each λ_{i} is a nonbacktracking eigenvalue. Therefore, the eigenvalues of B contain all the information necessary to compute and compare \(\mathcal {L}'\). In this way, we can study the (eigenvalue) spectrum of B, as a proxy for the (length) spectrum of π_{1}.
We have deviated from the original definition of the length spectrum in important ways. Fortunately, our experiments indicate that \(\mathcal {L}'\) contains enough discriminatory information to distinguish between real and synthetic graphs effectively; see “Clustering networks” and “Rewiring edges” sections. We discuss this limitation further in our concluding remarks in “Discussion and conclusions” section. Beyond experimental results, one may ask if there are theoretical guarantees that the relaxed version of the length spectrum will keep some of the discriminatory power of the original. Indeed, even though our inspiration for this work is partially the main rigidity result of (Constantine and Lafont 2018), we can still trust the eigenvalue spectrum of B to be useful when comparing graphs. On the one hand, the spectrum of B has been found to yield fewer isospectral graph pairs (i.e., nonisomorphic graphs with the same eigenvalues) than the adjacency and Laplacian matrices in the case of small graphs (Durfee and Martin 2015). On the other hand, B is tightly related to the theory of graph zeta functions (Hashimoto 1989), in particular the Ihara Zeta function, which is known to determine several graph properties such as girth, number of spanning trees, whether the graph is bipartite, regular, or a forest, etc (Cooper 2009). Thus, both as a relaxed version of the original length spectrum, but also as an object of interest in itself, we find the eigenvalue spectrum of the nonbacktracking matrix B to be an effective means of determining the dissimilarity between two graphs. For the rest of this work we focus on the eigenvalues of B and on using them to compare complex networks.
Computing B
As mentioned previously, the eigenvalues of B different than ±1 are also the eigenvalues of the 2n×2n block matrix B^{′} defined in Eq.2. Computing its eigenvalues can then be done with standard computational techniques, and it will not incur a cost greater than computing the eigenvalues of the adjacency or Laplacian matrices, as B^{′} is precisely four times the size of A and it has only 2n more nonzero entries (see Eq. 2). This is what we will do in “NBD: Nonbacktracking distance” section in order to compare two relaxed length spectra. However, in “NBED: Nonbacktracking embedding dimensions” section we will need the eigenvectors of B, which cannot be computed from B^{′}. For this purpose, we now present an efficient algorithm for computing B. Computing its eigenvectors can then be done with standard techniques.
Note that an entry of B may be positive only when the corresponding entry of C is positive. Therefore, we can compute B in a single iteration over the nonzero entries of C.
Now, C has a positive entry for each pair of incident edges in the graph, from which we have the following result.
Lemma 2
Set nnz(C) to be the number of nonzero entries in C, and 〈k^{2}〉 the second moment of the degree distribution. Then, nnz(C)=n〈k^{2}〉.
Proof 2
This follows by direct computation: \(\sum _{k\to l, i\to j} \delta _{kj} = \sum _{k} \left (\sum _{l} a_{lk} \right) \left (\sum _{i} a_{ik}\right) = n \langle k^{2} \rangle \)
Computing P and Q takes O(m) time. Since computing the product of sparse matrices takes time proportional to the number of positive entries in the result, computing C takes O(n〈k^{2}〉). Thus we can compute B in time O(m+n〈k^{2}〉). In the case of a powerlaw degree distribution with exponent 2≤γ≤3, the runtime of our algorithm falls between O(m+n) and O(m+n^{2}). Note that if a graph is given in adjacency list format, one can build B directly from the adjacency list in time Θ(n〈k^{2}〉−n〈k〉) by generating a sparse matrix with the appropriate entries set to 1 in a single iteration over the adjacency list.
Spectral properties of B
In our experiments we have found that the eigenvalues of B behave nicely with respect to certain fundamental properties of complex networks such as degree distribution and triangles. If the theory of the length spectrum justifies the use of the nonbacktracking eigenvalues to compare graphs in general, the properties we present next justify their use for complex networks in particular.
Lemma 3
We have nnz(B)=n(〈k^{2}〉−〈k〉).
Proof 3
We directly compute \(\sum _{k \to l, i \to j} \delta _{kj} (1\delta _{il})\! =\!\! \sum _{k} (\deg (k)  1)\deg (k) \,=\, n\left (\!\langle k^{2} \rangle \,\, \langle k \rangle \!\right)\).
In general, the eigenvalues of B with small absolute value tend to fall on a circle in the complex plane (Krzakala et al. 2013; Angel et al. 2015; Wood and Wang 2017). However, if \(\sum _{k} a_{k}^{2}\) is large and \(\sum _{k} b_{k}^{2}\) is small (implying a large number of triangles), the eigenvalues cannot all fall too close to a circle, since they will need to have different absolute values. Hence, the more triangles in the graph, the less marked the circular shape of the eigenvalues. Another way of saying the same thing is that the more triangles exist in the graph, the eigenvalues have larger and positive real parts.
Eigenvalues: visualization
Data sets used in this work
data set  network  n  〈k〉  \(\bar {c}\) 

Online social (social)   4k  43.69  0.61 
sdot8  77k  14.13  0.06  
sdot9  82k  14.18  0.06  
Epinions  76k  10.69  0.14  
 81k  33.02  0.57  
Wiki  7k  28.32  0.14  
Coauthorship (CA)  AstroPh  19k  21.11  0.63 
CondMat  23k  8.08  0.63  
GrQc  5k  5.53  0.53  
HepPh  12k  19.74  0.61  
HepTh  10k  5.26  0.47  
Peertopeer filesharing (P2P)  P2P04  11k  7.35  0.01 
P2P05  9k  7.20  0.01  
P2P06  9k  7.23  0.01  
P2P08  6k  6.59  0.01  
P2P09  8k  6.41  0.01  
P2P24  27k  4.93  0.01  
P2P25  23k  4.82  0.01  
P2P30  36k  4.82  0.01  
P2P31  63k  4.73  0.01  
Autonomous systems (AS)  AS331  11k  4.12  0.30 
AS407  11k  4.10  0.29  
AS414  11k  4.16  0.30  
AS421  11k  4.19  0.30  
AS428  11k  4.13  0.29  
AS505  11k  4.13  0.29  
AS512  11k  4.12  0.29  
AS519  11k  4.11  0.29  
AS526  11k  4.19  0.30  
BA1k  1k  13.90  0.05  
CM1k  1k  10.20  0.17  
ER1k  1k  14.99  0.15  
HG1k  997±2  14.72  0.82  
KR1k  1021±1  14.54  0.03  
WS1k  1k  14.00  0.67  
BA5k  5k  13.98  0.02  
CM5k  5k  11.40  0.13  
ER5k  5k  15.00  0.00  
HG5k  4982±8  14.67  0.82  
KR5k  4089±3  18.27  0.01  
WS5k  5k  14.00  0.67  
BA10k  10k  13.99  0.01  
CM10k  10k  11.97  0.12  
ER10k  10k  14.99  0.00  
HG10k  9958±15  14.96  0.83  
KR10k  8170±11  18.32  0.01  
WS10k  10k  14.00  0.67 
Each model generates eigenvalue distributions presenting different geometric patterns. As expected from the previous section, HG has a less marked circular shape around the origin since it is the model that generates graphs with the most triangles. Furthermore, HG also has the largest maximum degree, and therefore its eigenvalues have a greater spread along the imaginary axis. We also show in Fig. 1 a lowdimensional projection of the nonbacktracking eigenvalues using UMAP (McInnes et al. 2018). Notice how, even in two dimensions, the nonbacktracking eigenvalues are clustered according to the random graph model they come from. This provides experimental evidence that the nonbacktracking eigenvalues shall be useful when comparing complex networks. For a more indepth explanation of the parameters used for UMAP, see Appendix 1.
NBD: Nonbacktracking distance
Based on the previous discussion, we propose a method to compute the distance between two complex networks based on the nonbacktracking eigenvalues. In this section, we use d to refer to an arbitrary metric defined on subsets of \(\mathbb {R}^{2}\). That is, the distance between two graphs G,H is given by d({λ_{k}},{μ_{k}}), where λ_{k},μ_{k} are the eigenvalues of G,H, respectively, which we identify with points in \(\mathbb {R}^{2}\) by using their real and imaginary parts as coordinates. With these notations we are finally ready to propose the NonBacktracking Spectral Distance, or NBD for short.
Definition 1
Consider two graphs G,H, and let \(\{\lambda _{k}\}_{k=1}^{k=r}, \{\mu _{k}\}_{k=1}^{k=r}\), be the r nonbacktracking eigenvalues of largest magnitude of G and H, respectively. We define the NBD between G and H as NBD(G,H)=d({λ_{k}},{μ_{k}}).
Note that in Definition 1 we leave open two important degrees of freedom: the choice of d and the choice of r. We will discuss the optimal choices for these parameters in the next section. Regardless of these choices, however, we have the following results.
Proposition 1
If d is a metric over subsets of \(\mathbb {R}^{2}\), then NBD is a pseudometric over the set of graphs.
Proof 4
NBD inherits several desirable properties from d: nonnegativity, symmetry, and, importantly, the triangle inequality. However, the distance between two distinct graphs may be zero when they share all of their r largest eigenvalues. Thus, NBD is not a metric over the space of graphs but a pseudometric.
Computing NBD
Algorithm 1 presents our method to compute the NBD between two graphs. It makes use of the following fact, which simplifies the computation. It is known (see e.g. Durfee and Martin (2015)) that the multiplicity of 0 as an eigenvalue of B equals the number of edges outside of the 2core of the graph. For example, a tree, whose 2core is empty, has all its eigenvalues equal to 0. On the one hand, we could use this valuable information as part of our method to compare two graphs. On the other hand, the existence of zero eigenvalues does not change the value of tr(B^{k}),k≥0, and thus leaves the relaxed length spectrum \(\mathcal {L}' = \left \{\left (k,tr\left (B^{k}\right)\right)\right \}_{k}\) intact. Moreover, iteratively removing the nodes of degree one (a procedure called “shaving”) reduces the size of B (or the sparsity of B^{′}; see Eq. 2), which makes the computation of nonzero eigenvalues faster.
Given two graphs G,H, we first compute their 2cores by shaving them, that is, iteratively removing the nodes of degree 1 until there are none; call the new graphs \(\tilde {G}\) and \(\tilde {H}\). Then for each graph, we compute the B^{′} matrix using Eq. 2 to then compute the largest r eigenvalues. Lastly, we compute the distance d between the resulting sets of eigenvalues. We proceed to analyze the runtime complexity of each line of Algorithm 1. Line 1: the shaving step has a worstcase scenario of O(n^{2}), where n is the number of nodes. Indeed, the shaving algorithm must iterate two steps until completion: first, identify all nodes of degree 1, and second, remove those nodes from the graph. For the first step, querying the degree of all nodes takes O(n) time, while the deletion step takes an amount of time that is linear in the amount of nodes that are removed at that step. In the worstcase scenario, consider a path graph. At each iteration, the degrees of all nodes must be queried, and exactly two nodes are removed. The algorithm terminates after n/2 steps, at each of which it must query the degree of all remaining nodes, thus it takes O(n^{2}). In several classes of complex networks, however, the number of nodes removed by the algorithm (i.e. those outside the 2core) will be a small fraction of n. Line 2: the eigenvalue computation can be done in O(n^{3}) using, for example, power iteration methods, though its actual runtime will greatly depend on the number of eigenvalues and the structure of the matrix. Line 3: finally, computing the distance between the computed eigenvalues depends on which distance is being used; see next section.
Choice of distance and dependence on r
Previous works provide evidence for the fact that the majority of the nonbacktracking eigenvalues have magnitude less than \(\rho = \sqrt {\lambda _{1}}\), where λ_{1} is the largest eigenvalue (which is guaranteed to be real due to the PerronFrobenius theorem since all of B’s entries are real and nonnegative). Indeed, in (Saade et al. 2014) the authors show that, in the limit of large network size, nonbacktracking eigenvalues with magnitude larger than ρ occur with probability 0. Therefore, to compare two graphs, we propose to compute all the eigenvalues that have magnitude larger than ρ. Informally, this means we are comparing the outliers of the eigenvalue distribution, since they occur with less probability the larger the network becomes. This provides a heuristic to choose the value of r.

The Euclidean metric computes the Euclidean distance between the vectors whose elements are the eigenvalues sorted in order of magnitude: \(\text {Euclidean}\left (\{\lambda _{i}\}, \{\mu _{i}\}\right) = \sqrt {\sum _{i} \lambda _{i}  \mu _{i}^{2}}\), with λ_{i}≥λ_{i+1} and μ_{i}≥μ_{i+1} for all i. If two eigenvalues have the same magnitude, we use their real parts to break ties. Computing this distance takes O(r) computations, where r is the number of eigenvalues.

The EMD metric computes the Earth Mover Distance (a.k.a. Wasserstein distance) (Rubner et al. 1998) which is best understood as follows. Write λ_{k}=a_{k}+ib_{k} and μ_{k}=c_{k}+id_{k} for all k. Consider the sets \(\left \{(a_{k}, b_{k})\right \}_{k=1}^{r}\) and \(\left \{(c_{k}, d_{k})\right \}_{k=1}^{r}\) as subsets of \(\mathbb {R}^{2}\), and imagine a point mass with weight 1/r at each of these points. Intuitively, EMD measures the minimum amount of work needed to move all point masses at the points (a_{k},b_{k}) to the points (c_{k},d_{k}). This distance requires the computation of the (Euclidean) distance between every pair of eigenvalues and thus it takes O(r^{2}) runtime.
 Hausdorff metric is a standard distance between compact sets in a metric space (Munkres 2000); it is defined as$$ \text{Hausdorff}(\{\lambda_{i}\}, \{\mu_{i}\}) = \max\left(\max_{i} \min_{j} \lambda_{i}  \mu_{j}, \max_{j} \min_{i} \lambda_{i}  \mu_{j} \right). $$(5)
Similarly as above, this computation takes O(r^{2}) runtime.
In Fig. 5 we see that Hausdorff has high variance and a somewhat erratic behavior on random graphs. Euclidean has the least variance and most predictable behavior, continuing to increase as the value of r grows. In the KR data sets, Hausdorff presents an “elbow” near r_{0}, after which it levels off without much change for increasing values of r. Interestingly, the plots suggest that, in the case of EMD, the steepest decrease usually occurs before r_{0}. Also of note is the fact that on HG, all distances have a slight increase for large values of r.
Since Hausdorff does not appear to have a predictable behavior, and Euclidean continues to increase with no discernible inflection point even when using the largest r=1000 eigenvalues, we favor EMD when comparing nonbacktracking eigenvalues of two graphs. Furthermore, EMD has a predictable “elbow” behavior that happens near r_{0}. Thus we conclude that this is an appropriate choice for the number of eigenvalues to use. In the following we use EMD as the distance d and r_{0} as the value of r for all our experiments.^{5}
^{6}
NBD: Experiments
Data sets and base lines
A description of the data sets we use throughout this article can be found in Table 2. We use random graphs obtained from six different models: ErdösRényi (ER) (Erdös and Rényi 1960; Bollobás 2001), BarabásiAlbert (BA) (Barabási and Albert 1999), Stochastic Kronecker Graphs (KR) (Leskovec et al. 2010; Seshadhri et al. 2013), Configuration Model with power law degree distribution (CM) (Newman 2003), WattsStrogatz (WS) (Watts and Strogatz 1998), and Hyperbolic Graphs (HG) (Krioukov et al. 2010; Aldecoa et al. 2015). We generated CM and HG with degree distribution p_{k}∼k^{−γ} with exponent γ=2.3, and WS with rewiring probability p=0.01. All graphs have approximate average degree 〈k〉=15. From each model, we generate three batches, each with 50 graphs: the first with n=1,000 nodes, the second with n=5,000, and the third with n=10,000. This yields 6×3 data sets that we refer to by model and graph size (see Table 2). We also use real networks, divided in four different data sets: social, CA, AS, and P2P. social contains six networks obtained from human online social networks (facebook, twitter, sdot8, sdot9, epinions, and wiki) (McAuley and Leskovec 2012; Leskovec et al. 2010; Leskovec et al. 2009; Richardson et al. 2003), CA contains five coauthorship networks (AstroPh, CondMat, GrQc, HepPh, HepTh) obtained from the arXiv preprint server (Leskovec et al. 2007), P2P contains nine snapshots of the peertopeer connections of a filesharing network (Leskovec et al. 2007), and AS contains nine snapshots of the Internet autonomous systems infrastructure (Leskovec et al. 2005). Note that P2P and AS are temporal snapshots of the same network and thus we may assume that they have been generated using the same growth process. The same is not true for social and CA: even though they contain similar kinds of networks, their generating mechanisms may be different. The real networks were obtained through SNAP (Leskovec and Krevl 2014) and ICON (Clauset et al.). In all, we have 929 different networks (900 random, 29 real), comprising 22 different data sets (6×3 random, 4 real).
Using each distance method in turn, we obtain a 929×929 distance matrix containing the distance between each pair of networks in our data sets. Using these distance matrices we perform the following experiments.
Clustering networks
Our main experiment is to evaluate the distance methods in a clustering context (Yaveroğlu et al. 2015). We use each distance matrix as the input to a spectral clustering algorithm. To form a proximity graph from the distance matrix, we use the RBF kernel. That is, if the distance between two graphs under the ith distance method is d, their proximity is given by \(\exp \left (d^{2}/2\sigma _{i}^{2}\right)\). We choose σ_{i} as the mean withindata set average distance, that is, we take the average distance among all pairs of ER graphs, the average distance among all pairs of BA graphs, and so on for each data set of random graphs, and average all of them together (von Luxburg 2007). Note this yields a different σ for each distance method, and that we do not use graphs from real data sets to tune this parameter. We program the algorithm to find the exact number of groundtruth clusters. We report the results of this experiment using all graphs (22 clusters), only real graphs (4 clusters), and only random graphs (18 clusters).
Once we have obtained the clusters, we use as performance metrics homogeneity, completeness, and vmeasure. Homogeneity measures the diversity of data sets within a cluster, and is maximized when all data sets in each cluster come from the same data set (higher is better). Completeness is the converse; it measures the diversity of clusters assigned to the graphs in one data set (higher is better). The vmeasure is the harmonic mean of homogeneity and completeness (Rosenberg and Hirschberg 2007). Concretely, let C be the groundtruth class of a network chosen uniformly at random, and let K be the cluster assigned to a network chosen uniformly at random. Homogeneity is defined as 1−H(CK)/H(C) where H(C) is the entropy of the class distribution, and H(CK) is the conditional entropy of the class distribution given the cluster assignment. If the members of each cluster have the same class distribution as the total class distribution, then H(CK)=H(C), and therefore homogeneity is 0. If each cluster contains elements of the same class, i.e., if the conditional distribution of classes given the clusters is constant, then H(CK)=0 and homogeneity is 1. If H(C)=0, then homogeneity is defined to be 1 by convention. Similarly, completeness is defined as 1−H(KC)/H(K), and conventionally equal to 1 when H(K)=0. Note that homogeneity is analogous to precision, while completeness is analogous to recall, which are used in binary classification. Correspondingly, vmeasure is analogous to the F1score.
Interestingly, when clustering only the real networks, NBD300 performs more poorly than when using random graphs, while NBD ρ obtains the highest vmeasure score across all our experiments (0.87). This indicates that the choice of r=r_{0} as the number of eigenvalues larger than \(\sqrt {\lambda _{1}}\) can be both more efficient and more informative than computing a set constant value of, say, r=300.
We conclude from this experiment that (i) counting motifs locally (number of pernode occurrences) is better than counting them globally (number of occurrences in the whole graph), because GCD performs better than ESCAPE across data sets, (ii) that comparing length distributions of paths or cycles seems to yield better performance than motif counting, because NBD and S perform better than others in general, and (iii) that nonbacktracking eigenvalues provide slightly better performance at distinguishing between real and synthetic graphs than other combinatorial methods, as well as other spectral methods such as Lap. We highlight that NBD is in general also more efficient than the combinatorial methods, while being only slightly more timeconsuming than the spectral method Lap; see next section.
Runtime comparison
As part of “Computing NBD” section we included a complexity analysis of the runtime of NBD (Algorithm 1). Here, we include a direct comparison of runtime of NBD and all other baseline algorithms. Observe that all five graph distance algorithms used have something in common: they all run in two steps. The first step is to compute a certain kind of summary statistics of each of the two graphs to be compared, and the second step is to compare those precomputed statistics. For NBD and Lap, the first step computes eigenvalues of the corresponding matrices, while GCD and ESCAPE compute motif counts, and S computes the shortest path distance between each pair of nodes in the graph. The bulk of the computational time of all methods is spent in the computation of these statistics, rather than in the comparison. Therefore, in this section we compare the runtime of this first step only. Another reason to focus on the runtime of this first step only is that it depends on one single graph (in all cases, the computation of the corresponding statistics is done on each graph in parallel), and thus we can focus on investigating how the structure of a single graph affects computation time. All experiments were conducted on a computer with 16 cores and 100GB of RAM with Intel Skylake chip architecture, with no other jobs running in the background. Note that all distance algorithms are implemented in Python, except for ESCAPE which is implemented in C++. Thus we focus on comparing the four algorithms that are implemented in Python. For bigO complexity analyses we refer the interested reader to each of the corresponding references for each algorithm.
The left column of Fig. 9 shows evidence for the fact that the runtime scaling of NBD is very close to that of Lap, as claimed in “Computing B” section because the former uses the nonbacktracking matrix, which is always four times the size of the Laplacian matrix. However, their runtimes do not differ by only a constant factor in the cases of ER and HG. We hypothesize that this is because the nonbacktracking matrix has a more complicated structure than the Laplacian depending on the structure of the underlying graph (see “Spectral properties of B” section and especially Lemma 3). In all other panels we see that NBD and Lap present similar scaling behavior, but NBD is usually slower. The middle column presents the effect of varying the number of nodes in the graph on all five distance methods. NBD is consistently the third fastest method for HG and BA graphs, after ESCAPE and Lap, while GCD is consistently the slowest by two or three orders of magnitude in some cases. Note that the scaling behavior of all methods is fairly similar between BA and HG graphs, while for ER it is markedly different. When varying the number of nodes of ER graphs, for example, NBD and GCD seem to scale at the same rate, and take the about the same runtime, while S scales more poorly than all others. In all other panels, GCD seems to scale more poorly than other methods.
From the results presented in Fig. 9 we conclude that NBD is a viable alternative in the case of complex networks with heterogeneous degree distributions (modeled by BA and HG) when the application at hand requires comparison of counts of motifs or subgraphs such as nonbacktracking paths of arbitrary length.
Nearest neighbors
For networks facebook, AstroPh, twitter we show the 10 networks closest to each, as determined by NBD ρ, and report the distance score in parentheses
rank   AstroPh  

1  epinions (4.08)  CondMat (11.33)  AstroPh (31.38) 
2  HepPh (5.26)  epinions (13.72)  CondMat (40.11) 
3  sdot9 (6.05)  facebook (14.18)  epinions (41.49) 
4  sdot8 (6.24)  GrQc (14.71)  facebook (42.45) 
5  wiki (6.56)  HepTh (14.84)  HepTh (42.60) 
6  AS414 (8.98)  sdot9 (16.96)  HepPh (44.30) 
7  AS505 (9.03)  HepPh (16.97)  GrQc (44.54) 
8  AS512 (9.05)  sdot8 (17.37)  sdot9 (44.66) 
9  AS421 (9.06)  AS407 (17.62)  wiki (45.64) 
10  AS407 (9.16)  AS414 (17.68)  sdot8 (45.88) 
From these observations we conclude that NBD is able to identify networks that are generated by similar mechanisms: all technological graphs – i.e., autonomous systems of the Internet (AS) and automatically generated peertopeer file sharing networks (P2P) – are perfectly clustered together, and all online social graphs are ranked near each other, even when some of these have vastly different elementary statistics such as number of nodes, number of edges, and average clustering coefficient.
Rewiring edges
As expected, the NBD between a graph and its rewired versions grows steadily as the percentage of rewired edges increases. Note that AS331 and P2P09 seem to plateau when the percentage of rewired edges is 65% or larger, while facebook and HepPh have a slight jump close to 70%. Observe too that the variance of the NBD is kept fairly constant, which indicates that the distribution of nonbacktracking cycle lengths in all rewired versions of each graph is very similar to each other. This may be due to the fact that edge rewiring destroys short cycles more quickly than long cycles. Furthermore, observe that for high percentages of rewired edges, the rewired graph is essentially a configuration model graph with the same degree distribution as the original. Therefore, by comparing the NBD values from Figs. 10 and 11, we can conclude that the NBD between these four networks and a random graph with the same degree distribution is larger than the average distance to other networks in the same data set (cf. Fig. 10 and discussion in previous section). We conclude that NBD is detecting graph structural properties – stored in the distribution of nonbacktracking cycle lengths and encoded in the nonbacktracking eigenvalues –, that are not solely determined by the degreedistribution of the graph.
NBED: Nonbacktracking embedding dimensions
In this Section we discuss the potential of a second application of the eigendecomposition of the nonbacktracking matrix in the form of a new embedding technique. In this case, instead of the eigenvalues, we use the eigenvectors. We propose the NonBacktracking Embedding Dimensions (or NBED for short) as an edge embedding technique that assigns to each undirected edge of the graph two points in \(\mathbb {R}^{d}\) – one for each orientation –, where d is the embedding dimension. We are able to find many interesting patterns that encode the structure of the network, which we then put to use for anomaly detection. While the visualizations found with NBED are certainly peculiar and rich in information, future work will prove essential in determining the full extent of possible applications of NBED.
where f is defined as \(f\left (v^{i}_{k \to l}\right) = \text {Re}\left (\lambda _{i}\right)\text {Re}\left (v^{i}_{k \to l}\right)  \text {Im}\left (\lambda _{i}\right)\text {Im}\left (v^{i}_{k \to l}\right)\) and Re, Im are the real and imaginary pats of a complex number, respectively. As mentioned previously, λ_{1}, the largest eigenvalue, is always guaranteed to be real and positive, and the entries of v^{1} are also real and positive. Thus, \(f\left (v^{1}_{k \to l}\right) = v^{1}_{k \to l}\) for every k→l. Whenever any λ_{i},i=2,3,..., has nonzero imaginary part, the entries of v^{i} may also be complex numbers in general, and \(f\left (v^{i}_{k \to l}\right)\) is simply a linear combination of the real and imaginary parts.
In the following we use the 2dimensional NBED of real and random graphs. The advantages of this 2dimensional edge embedding are as follows. First, the first and second eigenvectors of the nonbacktracking matrix have been studied before and they can be interpreted in terms of edge centrality (Martin et al. 2014;Grindrod et al. 2018) and community detection (Krzakala et al. 2013;Bordenave et al. 2015;Kawamoto 2016) respectively. Second, NBED provides a deterministic graph embedding, as opposed to other methods that are stochastic due to their dependence on random sampling of subgraphs (Grover and Leskovec 2016;Perozzi et al. 2014) or due to stochastic optimization of some model (Wang et al. 2016;Cao et al. 2016). Third, NBED makes a distinction between the two possible orientations of an edge thus providing more information about it. We conjecture that these properties make NBED more robust to the presence of noise in the graph when compared to embeddings generated by popular models such as deep networks, which are in general not interpretable, are stochastic, and usually work on nodes rather than directed edges. Further, the 2dimensional version of NBED is particularly important because our experimental evidence shows that the second nonbacktracking eigenvalue of complex networks tends to be real and positive, which means that we can visualize these embeddings on the real plane without losing any information due to the linear combination f. We proceed to analyze the visual patterns we can identify in NBED visualizations; future research will be necessary for analytic characterizations of them.
Visualization
 1
In all of the plots except for WS5k we see small sets of dots (red circles) that are markedly separated from the bulk of the dots. Every one of these sets is made up of the embeddings of the directed edges k→l for a fixed l, and the horizontal position of this set corresponds to the degree of l. That is, the larger the degree of a fixed node l, the more likely all the embeddings of edges of the form k→l are to cluster together separate from the rest, and the further right this cluster will be.
 2
Two of these sets corresponding to the same graph seem to have similar structures. That is, the relative positions of dots forming a cluster encircled in red is repeated for all such cluseters inside each graph. For example, the sets of BA5k and KR5k are much more vertically spread than those of CM5k or HG5k. We conclude that each of these sets have a particular internal structure that correlates with global graph structure, as it is repeated across sets. In ER5K, the internal structure of each set is less well marked, owing to the fact that there are no global structural patterns in the ER model.
 3
In BA5k, ER5k, HG5k, and KR5k, we find that the dots inhabit a certain region of the plane that is bounded by a roughly parabolic boundary (gray dashed lines). Furthermore, the cusp of this boundary always lies precisely at the origin. That is, there are never dots to the left of the origin, and the vertical spread of the dots grows at roughly the same rate both in the upper halfplane and the lower halfplane. Since the vertical axis corresponds to the entries of the second eigenvector v^{2} which has been shown to describe community structure, we conjecture that the significance of this boundary is related to the existence of communities in the network. (See “Case Study: Enron emails” section for more discussion on this point.)
 4
In Fig. 12, the color of the embedding of edge k→l corresponds to the degree of the source node k. In this way, we can use these plots to gather information about the degree distribution of the graph. For example, the plot for ER5k looks like a noisy point cloud because the degree distribution is Poisson centered around 〈k〉=15, and the plot for WS5k is very homogeneous in color because all nodes start with the same degree k=14 and only a few of them are rewired. Furthermore, if one compares the color distribution of the plots for BA5k, CM5k, and HG5k, one can differentiate that even though all have heavytailed degree distributions, their exponents must be different. Indeed, for BA5k we have γ=3 while the other two were generated with γ=2.3.
 5
From all previous considerations we can analyze the plot of KR5k as a case study. First, it presents small sets of dots that are markedly separate from the bulk, and whose structure is similar to those of BA5k. Second, its color distribution is closer to that of ER5k than to any of the others. Third, it exhibits the boundary effect that we see in both BA5k and ER5k. From these observations we conclude that the stochastic Kronecker graph model is a “mix” between the BarabasiAlbert and ErdosRenyi models, in the sense its plot has elements from both of the plots for BA and ER. Consider this in light of Fig. 1 where the graphs of these three data sets are clustered together, even when using the largest r=200 nonbacktracking eigenvalues of each graph. This means that the eigendecomposition of the nonbacktracking matrix is detecting the similarities and differences between these three random graph models, which we can identify through a visual analysis of Figs. 1 and 12.
 6
Lastly, we focus on the plot for WS5k in Fig. 12. This plot shows none of the characteristics of the others, and in fact some of its points clearly fall on a path determined by a continuous curve. We hypothesize that this is due to the strong symmetry of the ring lattice from which it is generated. Studying the properties of the NBED of highly symmetric graphs and their randomized versions is a task we leave for a future work.
Case Study: Enron emails
In this Section we use both NBD and NBED to perform an analysis of the wellknown Enron email corpus (Klimt and Yang 2004). This data set is comprised of email communications of Enron corporation employees in the years 2000–2002. From it, we form two different data sets of networks. In all networks each node represents an email address, and an edge between two email addresses represents that at least one email was sent from one account to the other. We aggregate these networks on a weekly basis, as well as on a daily basis, and we proceed to apply our methods NBD and NBED on them.
We have presented a visual analysis of the Enron corpus using NBD and NBED. Using these techniques, we are able to recover properties not only of the underlying network, but of the underlying data set as well, such as periodicity and temporal anomalies. Even though the analytical details of NBED require further consideration, we are still able to interpret the visualizations of Fig. 14 to mine important information about the underlying data set.
Discussion and conclusions
We have focused on the problem of deriving a notion of graph distance for complex networks based on the length spectrum function. We add to the repertoire of distance methods the NonBacktracking Spectral Distance (NBD): a principled, interpretable, computationally efficient, and effective technique that takes advantage of the fact that one can interpret the nonbacktracking cycles of a graph as its free homotopy classes. NBD is principled because it is backed by the theory of the length spectrum, which characterizes the 2core of a graph up to isomorphism. It is interpretable because we can study its behavior in the presence of structural features such as hubs and triangles, and we can use the resulting geometric features of the eigenvalue distribution to our advantage. It is efficient relative to other similar methods that depend on the combinatorial enumeration of different kinds of subgraphs. Lastly, we have presented extensive experimental evidence to show that it is effective at discriminating between complex networks in various contexts, including visualization, clustering, and anomaly detection. Performance of NBD is better or comparable to other distance methods such as ESCAPE, S, GCD, and Lap; see “NBD: Experiments” section. We chose to compare against these methods because the first three depend on motif counts in one way or another, as NBD depends on nonbacktracking cycle counts, and Lap depends on the spectral decomposition of a matrix representation of a graph, as NBD depends on the nonbacktracking eigenvalues.
Motivated by the usefulness of NBD due to the connections with the homotopy of graphs and the spectrum of the nonbacktracking matrix, we also presented a new embedding technique, NonBacktracking Embedding Dimensions (or NBED for short) which provides a rich visualization full of interpretable patterns that describe the structural properties of a network. We have provided examples of these patterns as well as their application to anomaly detection. Further research will reveal the full potential of applications of NBED.
An implementation of NBD, NBED and our algorithm for computing the nonbacktracking matrix, is available at (Torres 2018).
Limitations NBD relies on the assumption that the nonbacktracking cycles contain enough information about the network. Accordingly, the usefulness of the NBD will decay as the 2core of the graph gets smaller. For example, trees have an empty 2core, and all of its nonbacktracking eigenvalues are equal to zero. In order to compare trees, and more generally, those nodes outside the 2core of the graph, the authors of (Durfee and Martin 2015) propose several different strategies, for example adding a “cone node" that connects to every other node in the graph. However, many realworld networks are not trees and we extensively showcased the utility of NBD on this class of networks.
The greatest limitation of NBED is that we are not able to provide rigorous derivations for the patterns we identify in “Visualization” section. Without a formal theory of the relationship between the eigenvectors of the nonbacktracking matrix and the structural properties of graphs it is difficult to design algorithms that make use of these patterns automatically. Regardless, we have presented evidence for the usefulness of these patterns even with a visual analysis.
Future work There are many other avenues to explore in relation to how to exploit the information stored in the length spectrum and the fundamental group of a graph. As mentioned in Sec. 4, the major downside of the relaxed length spectrum \(\mathcal {L}'\) is the fact that we lose information stored in the combinatorics of the fundamental group. That is, \(\mathcal {L}'\) stores information about the frequency of lengths of free homotopy classes, but no information on their concatenation, i.e., the group operation in π_{1}(G). One way to encapsulate this information is by taking into account not only the frequency of each possible length of nonbacktracking cycles, but also the number of nonbacktracking cycles of fixed lengths ℓ_{1} and ℓ_{2} that can be concatenated to form a nonbacktracking cycle of length ℓ_{3}. It remains an open question whether this information can be computed using the nonbacktracking matrix for all values of the parameters ℓ_{1},ℓ_{2},ℓ_{3}, and if so, how to do it efficiently. One alternative is to rely upon efficient motif counting (Pinar et al. 2017;Kolda et al. 2013).
A different research direction is to focus on the nonbacktracking eigenvalues themselves, independently of the length spectrum theory. One standing question is to characterize the behavior of the eigenvalues after the network has been rewired. “Rewiring edges” section only scratches the surface of what can be said in this regard. However, spectral analysis of the nonbacktracking matrix is exceedingly difficult due to the fact that it is asymmetric and nonnormal and therefore most of the usual tools for spectral analysis are not applicable.
In this work, we have focused on introducing and exploiting novel theoretical concepts such as the length spectrum and the fundamental group to the study of complex networks. We are confident this work will pave the road for more research in topological and geometric data analysis in network science.
Appendix 1: UMAP parameter settings
In order to understand the visualizations of the nonbacktracking eigenvalues in Fig. 1 we will now explain some of the features of the UMAP algorithm. For full detail we refer the interested reader to (McInnes et al. 2018).
UMAP stands for Uniform Manifold Approximation and Projection, authored by Leland McInnes, John Healy, and James Melville. Data are first represented in a high dimensional euclidean space using Laplacian Eigenmaps (Belkin and Niyogi 2003), then they are approximated unifomly using fuzzy simplicial sets and patched together to form a manifold, and finally this manifold is projected to \(\mathbb {R}^{2}\). This process reveals topological features of the data and provides flexibility for geometrical specifications to distinguish taskspecific quantities of interest.

n_neighbors: A weighted knearest neighbor graph is constructed from the initial data. The number of neighbors is set with this parameter. We used a value of 75.

metric: Different metrics in the high dimensional space where the data are embedded can be specified. We found good overall results with the Canberra metric (shown in Fig. 1. The Chebyshev and Euclidean metrics cluster ER graphs together, for some specific values of the other parameters. However they make HG and CM clusters overlap. A clearly separated projection for the HG and CM graphs can be found using Correlation metric.

n_epochs: Training optimization epochs. More epochs can provide better results, with the usual computational drawbacks. We use 1000 epochs.

min_dist: Alters the minimum distance between embedded points. Smaller values increase clustering, larger values present a more uniformly distributed visualiaztion. We use a distance of 0.01.

repulsion_strength: Values above 1 increase the weight of negative samples, i.e., the importance of the distances between farapart data points. We use a value of 10.

negative_sample_rate: Sets the number of negative samples to be selected per each positive sample in the optimization process. We use 50 negative samples per positive sample.
We were not able to find a specific set of parameters that improved upon the visualization shown in Fig. 1, though it is perhaps the interplay between min_dist and repulsion_strength what causes ER graphs to split into two different clusters.
Appendix 2: Asymmetric NBED embeddings
As mentioned in “Case Study: Enron emails” section, the NBED of Enron graphs present a feature that we do not see in any of the random graphs we handled for this work. Namely, the NBED of Enron graphs are asymmetric with respect to the horizontal axis (Fig. 14), whereas the NBED of all other random graphs (Fig. 12) are generally symmetric. In “Case Study: Enron emails” section we hypothesized that this is due to the fact that Enron graphs contain communities. Here, we present some evidence for this hypothesis.
Endnotes
^{1} The definition presented here is also known as marked length spectrum. An alternative definition of the (unmarked) length spectrum does not depend on π_{1}; see for example (Leininger et al. 2007).
^{2} This follows from G being homotopy equivalent to a bouquet of k circles, where k is the rank of the fundamental group of G. The universal covering of a bouquet of circles is contractible, which is equivalent to the space being aspherical. (See (Hatcher 2017).)
^{3} Other authors reserve the use of the term cycle for special cases of closed paths such as the set of simple cycles, which are closed paths that do not intersect each other. In this work we use cycle and closed path interchangeably.
^{4} In (Constantine and Lafont 2018), the authors need an isomorphism between the fundamental group of the spaces that are being compared, which is also computationally prohibitive.
^{5} Here we choose both EMD and r=r_{0} based on the experimental evidence of Fig. 5. An early version of NBD used Euclidean and a value of r set to a constant (Torres et al. 2018). Following that version, other authors have independently proposed using EMD (Mellor and Grusovin 2018).
^{6} https://www.ipam.ucla.edu/programs/longprograms/cultureanalytics/
Notes
Acknowledgements
We thank Evimaria Terzi for her contributions to an earlier version of this work. Torres and EliassiRad were supported by NSF CNS1314603 and NSF IIS1741197. SuárezSerrato was supported by UCMEXUS (University of California Institute for Mexico and the United States) CN1643, DGAPAUNAM PAPIIT IN102716, and DGAPAUNAM PASPA program. Part of this research was performed while Suárez Serrato was visiting the Institute for Pure and Applied Mathematics (IPAM), which is supported by NSF DMS1440415. EliassiRad and Suárez Serrato met during the IPAM Long Program on Culture Analytics in 2016,^{7}of which EliassiRad was a coorganizer.
Authors’ contributions
This manuscript would not exist without LT; he was the lead contributor in all aspects of the manuscript. PSS introduced the mathematical concept of length spectra and topological data analysis to LT and TER. LT and PSS were the major contributors to the topological graph analysis and graph distance in the manuscript. LT and TER were the major contributors to the graph embedding and experimental design in the manuscript. LT developed the code and conducted all the experiments. LT was the lead contributor in writing the manuscript. All authors read and approved the final manuscript.
Funding
Torres and EliassiRad were funded by the National Science Foundation grants NSF CNS1314603, NSF IIS1741197. SuárezSerrato was supported by University of California Institute for Mexico and the United States CN1643 and Universidad Nacional Autónoma de México DGAPAUNAM PAPIIT IN102716, DGAPAUNAM PASPA.
Competing interests
The authors declare that they have no competing interests.
References
 Aldecoa, R, Orsini C, Krioukov D (2015) Hyperbolic graph generator. Comput Phys Commun 196:492–6.CrossRefGoogle Scholar
 Angel, O, Friedman J, Hoory S (2015) The nonbacktracking spectrum of the universal cover of a graph. Trans Amer Math Soc 367(6):4287–318.MathSciNetCrossRefGoogle Scholar
 Bagrow, JP, Bollt EM (2018) An informationtheoretic, allscales approach to comparing networks. Preprint, arXiv:1804.03665 [cs.SI].Google Scholar
 Barabási, AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–12.MathSciNetCrossRefGoogle Scholar
 Bass, H (1992) The IharaSelberg zeta function of a tree lattice. Internat J Math 3(6):717–97.MathSciNetCrossRefGoogle Scholar
 Batagelj, V, Zaversnik M (2011) Fast algorithms for determining (generalized) core groups in social networks. Adv Data Anal Classi 5(2):129–45.MathSciNetCrossRefGoogle Scholar
 Belkin, M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–96.CrossRefGoogle Scholar
 Bento, J, Ioannidis S (2018) A family of tractable graph distances In: Proceedings of the 2018 SIAM International Conference on Data Mining (SDM), 333–41.. Society for Industrial and Applied Mathematics, San Diego, CA.CrossRefGoogle Scholar
 Berlingerio, M, Koutra D, EliassiRad T, Faloutsos C (2013) Network similarity via multiple social theories In: Advances in Social Networks Analysis and Mining (ASONAM), 1439–40.. ACM, Niagara, ON.Google Scholar
 Bollobás, B (2001) Random Graphs, 2nd edn. In: Cambridge Studies in Advanced Mathematics.. Cambridge University Press, Cambridge; New York.Google Scholar
 Bordenave, C, Lelarge M, Massoulié L (2015) Nonbacktracking spectrum of random graphs: community detection and nonregular Ramanujan graphs In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS) 2015, 1347–57.. IEEE.Google Scholar
 Cao, S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: Schuurmans D Wellman MP (eds)Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 1217, 2016, 1145–52.. AAAI, Phoenix.Google Scholar
 Chowdhury, S, Mémoli F (2017) Distances and isomorphism between networks and the stability of network invariants. Preprint, arXiv:1708.04727 [cs.DM].Google Scholar
 Chowdhury, S, Mémoli F (2018) The metric space of networks. Preprint, arXiv:1804.02820 [cs.DM].Google Scholar
 Clauset, A, Tucker E, Sainz MThe Colorado Index of Complex Networks. https://icon.colorado.edu/. Accessed 19 June 2018.
 Constantine, D, Lafont JF (2018) Marked length rigidity for onedimensional spaces. J Topol Anal. https://doi.org/10.1142/S1793525319500250.
 Cooper, Y (2009) Properties determined by the Ihara zeta function of a graph. Electron J Combin 16(1):14–84.MathSciNetzbMATHGoogle Scholar
 Durfee, C, Martin K (2015) Distinguishing graphs with zeta functions and generalized spectra. Linear Algebra Appl 481:54–82.MathSciNetCrossRefGoogle Scholar
 Erdös, P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17.MathSciNetzbMATHGoogle Scholar
 Estrada, E (1996) Spectral moments of the edge adjacency matrix in molecular graphs, 1. definition and applications to the prediction of physical properties of alkanes. J Chem Inf Comp Sci 36(4):844–9.MathSciNetCrossRefGoogle Scholar
 Goyal, P, Ferrara E (2018) Graph embedding techniques, applications, and performance: A survey. KnowlBased Syst 151:78–94.CrossRefGoogle Scholar
 Grindrod, P, Higham DJ, Noferini V (2018) The deformed graph Laplacian and its applications to network centrality analysis. SIAM J Matrix Anal Appl 39(1):310–41.MathSciNetCrossRefGoogle Scholar
 Grover, A, Leskovec J (2016) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). In: Krishnapuram B, Shah M, Smola AJ, Aggarwal CC, Shen D, Rastogi R (eds), 855–64.. ACM.Google Scholar
 Gulikers, L, Lelarge M, Massoulié L (2017) Nonbacktracking spectrum of degreecorrected stochastic block models In: 8th Innovations in Theoretical Computer Science (ITCS), 44–14427.. ITCS 20178th Innovations in Theoretical Computer Science, Berkeley, CA.Google Scholar
 Hamilton, WL, Ying R, Leskovec J (2017) Representation learning on graphs: Methods and applications. IEEE Data Eng Bull 40(3):52–74.Google Scholar
 Hashimoto, K (1989) Zeta functions of finite graphs and representations of padic groups In: Automorphic Forms and Geometry of Arithmetic Varieties, 211–80.Google Scholar
 Hatcher, A (2017) Algebraic Topology. Cambridge University Press, Cambridge; New York.zbMATHGoogle Scholar
 Jiang, F, He L, Zheng Y, Zhu E, Xu J, Yu PS (2018) On spectral graph embedding: A nonbacktracking perspective and graph approximation In: Proceedings of the 2018 SIAM International Conference on Data Mining (SDM), 324–32.. Society for Industrial and Applied Mathematics, San Diego, CA.CrossRefGoogle Scholar
 Kawamoto, T (2016) Localized eigenvectors of the nonbacktracking matrix. J Stat Mech Theory Exp 2:12.023404.Google Scholar
 Klimt, B, Yang Y (2004) The enron corpus: A new dataset for email classification research In: European Conference on Machine Learning, 217–226.. Springer, Berlin, Heidelberg.Google Scholar
 Kolda, TG, Pinar A, Seshadhri C (2013) Triadic measures on graphs: The power of wedge sampling In: Proceedings of the 13th SIAM International Conference on Data Mining (ICDM), 10–8.. Society for Industrial and Applied Mathematics, Austin.Google Scholar
 Koutra, D, Shah N, Vogelstein JT, Gallagher B, Faloutsos C (2016) DeltaCon: Principled massivegraph similarity function with attribution. TKDD 10(3):28–12843.CrossRefGoogle Scholar
 Krioukov, D, Papadopoulos F, Kitsak M, Vahdat A, Boguñá M (2010) Hyperbolic geometry of complex networks. Phys Rev E 82:036106.MathSciNetCrossRefGoogle Scholar
 Krzakala, F, Moore C, Mossel E, Neeman J, Sly A, Zdeborová L, Zhang P (2013) Spectral redemption in clustering sparse networks. Proc Natl Acad Sci USA 110(52):20935–40.MathSciNetCrossRefGoogle Scholar
 Kunegis, J (2013) KONECT: The Koblenz network collection In: 22nd International World Wide Web Conference, (WWW), 1343–50.. ACM, Rio de Janeiro, Brazil.Google Scholar
 Lang, S (2004) Linear Algebra, 3rd edn. Springer, New York.Google Scholar
 Leininger, CJ, McReynolds DB, Neumann WD, Reid AW (2007) Length and eigenvalue equivalence. Int Math Res Not IMRN 2007(24):135.MathSciNetzbMATHGoogle Scholar
 Leskovec, J, Chakrabarti D, Kleinberg JM, Faloutsos C, Ghahramani Z (2010) Kronecker graphs: An approach to modeling networks. J Mach Learn Res 11:985–1042.MathSciNetzbMATHGoogle Scholar
 Leskovec, J, Huttenlocher DP, Kleinberg JM (2010) Proceedings of the 28th International Conference on Human Factors in Computing Systems. In: Mynatt ED, Schoner D, Fitzpatrick G, Hudson SE, Edwards WK, Rodden T (eds), 1361–70.. CHI 2010, Atlanta, Georgia. April 1015, 2010.Google Scholar
 Leskovec, J, Kleinberg JM, Faloutsos C (2005) Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. In: Grossman R, Bayardo RJ, Bennett KP (eds), 177–87.. ACM, Chicago, Illinois. August 2124, 2005.Google Scholar
 Leskovec, J, Kleinberg JM, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. TKDD 1(1):2.CrossRefGoogle Scholar
 Leskovec, J, Krevl A (2014) SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. Accessed 9 Feb 2019.
 Leskovec, J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters. Internet Math 6(1):29–123.MathSciNetCrossRefGoogle Scholar
 Marks, R (2008) Enron Timeline. http://www.agsm.edu.au/bobm/teaching/BE/Enron/timeline.html. Accessed 20180606.
 Martin, T, Zhang X, Newman MEJ (2014) Localization and centrality in networks. Phys Rev E 90:052808.CrossRefGoogle Scholar
 McAuley, JJ, Leskovec J (2012) Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 36, 2012. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds), 548–56.. Neural information processing systems Foundation, Lake Tahoe, Nevada.Google Scholar
 McInnes, L, Healy J, Melville J (2018) Umap: Uniform manifold approximation and projection for dimension reduction. Preprint arXiv:1802.03426.Google Scholar
 Mellor, A, Grusovin A (2018) Graph comparison via the nonbacktracking spectrum. Preprint arXiv:1812.05457.Google Scholar
 Munkres, JR (2000) Topology, 2nd edn. Prentice Hall, Englewood Cliffs, NJ.Google Scholar
 Newman, MEJ (2003) The structure and function of complex networks. SIAM Rev. 45(2):167–256.MathSciNetCrossRefGoogle Scholar
 Onnela, JP, Fenn DJ, Reid S, Porter MA, Mucha PJ, Fricker MD, Jones NS (2012) Taxonomies of networks from community structure. Phys Rev E 86:036104.CrossRefGoogle Scholar
 Perozzi, B, AlRfou R, Skiena S (2014) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD). In: Macskassy SA, Perlich C, Leskovec J, Wang W, Ghani R (eds), 701–10.. ACM.Google Scholar
 Pinar, A, Seshadhri C, Vishal V (2017) ESCAPE: efficiently counting all 5vertex subgraphs In: Proceedings of the 26th International Conference on World Wide Web (WWW) 2017, 1431–40.. ACM, Perth. April 37, 2017.CrossRefGoogle Scholar
 Preciado, VM, Jadbabaie A, Verghese GC (2013) Structural analysis of Laplacian spectral properties of largescale networks. IEEE Trans Automat Contr 58(9):2338–43.MathSciNetCrossRefGoogle Scholar
 Ren, P, Wilson RC, Hancock ER (2011) Graph characterization via Ihara coefficients. IEEE T Neural Nerwor 22(2):233–45.CrossRefGoogle Scholar
 Richardson, M, Agrawal R, Domingos PM (2003) Trust management for the semantic web. In: Fensel D, Sycara KP, Mylopoulos J (eds)The Semantic Web  ISWC 2003, Second International Semantic Web Conference, Sanibel Island, FL, USA, October 2023, 2003, Proceedings. Lecture Notes in Computer Science, 351–68.. Springer, Berlin, Heidelberg.Google Scholar
 Rosenberg, A, Hirschberg J (2007) Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), June 2830, 2007. In: Eisner J (ed), 410–20.. Association for Computational Linguistics, Prague.Google Scholar
 Rubner, Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases In: ICCV, 59–66.. IEEE.Google Scholar
 Saade, A, Krzakala F, Zdeborová L (2014) Spectral density of the nonbacktracking operator on random graphs. EPL (Europhys Lett) 107(5):50005.CrossRefGoogle Scholar
 Schieber, TA, Carpi L, DíazGuilera A, Pardalos PM, Masoller C, Ravetti MG (2017) Quantification of network structural dissimilarities. Nat Commun 8:13928.CrossRefGoogle Scholar
 Seshadhri, C, Pinar A, Kolda TG (2013) An indepth analysis of stochastic Kronecker graphs. J ACM 60(2):13–11332.MathSciNetCrossRefGoogle Scholar
 Soundarajan, S, EliassiRad T, Gallagher B (2014) A guide to selecting a network similarity method In: Proceedings of the 2014 SIAM International Conference on Data Mining (SDM), 1037–45.. Society for Industrial and Applied Mathematics, Philadelphia, PA.CrossRefGoogle Scholar
 Terras, A (2011) Zeta Functions of Graphs: A Stroll Through the Garden In: Cambridge Studies in Advanced Mathematics, 239.. Cambridge University Press, Cambridge, Cambridge; New York.Google Scholar
 The Guardian (2006) Timeline: Enron. https://www.theguardian.com/business/2006/jan/30/corporatefraud.enron. Accessed 20180606.
 The New York Times (2006) Timeline: A chronology of Enron Corp. https://www.nytimes.com/2006/01/18/business/worldbusiness/timelineachronologyofenroncorp.html. Accessed 20180606.
 Torres, L (2018) SuNBEaM: Spectral NonBacktracking Embedding And pseudoMetric. GitHub. https://github.com/leotrs/sunbeam. Accessed 5 Mar 2019.
 Torres, L, SuarezSerrato P, EliassiRad T (2018) Graph distance from the topological view of nonbacktracking cycles. arXiv preprint arXiv:1807.09592.Google Scholar
 von Luxburg, U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416.MathSciNetCrossRefGoogle Scholar
 Wang, D, Cui P, Zhu W (2016) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). In: Krishnapuram B, Shah M, Smola AJ, Aggarwal CC, Shen D, Rastogi R (eds), 1225–34.. ACM.Google Scholar
 Watts, DJ, Strogatz SH (1998) Collective dynamics of ’smallworld’ networks. Nature 393(6684):440.CrossRefGoogle Scholar
 Wood, PM, Wang K (2017) Limiting empirical spectral distribution for the nonbacktracking matrix of an ErdösRényi random graph. Preprint, arXiv:1710.11015 [math.PR].Google Scholar
 Yaveroğlu, ÖN, MalodDognin N, Davis D, Levnajic Z, Janjic V, Karapandza R, Stojmirovic A, Pržulj N (2014) Revealing the hidden language of complex networks. Sci Rep 4:4547.CrossRefGoogle Scholar
 Yaveroğlu, ÖN, Milenković T, Pržulj N (2015) Proper evaluation of alignmentfree network comparison methods. Bioinformatics 31(16):2697–704.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.