Networks and Cycles: A Persistent Homology Approach to Complex Networks
Persistent homology is an emerging tool to identify robust topological features underlying the structure of high-dimensional data and complex dynamical systems (such as brain dynamics, molecular folding, distributed sensing).
Its central device, the filtration, embodies this by casting the analysis of the system in terms of long-lived (persistent) topological properties under the change of a scale parameter.
In the classical case of data clouds in high-dimensional metric spaces, such filtration is uniquely defined by the metric structure of the point space. On networks instead, multiple ways exists to associate a filtration. Far from being a limit, this allows to tailor the construction to the specific analysis, providing multiple perspectives on the same system.
In this work, we introduce and discuss three kinds of network filtrations, based respectively on the intrinsic network metric structure, the hierarchical structure of its cliques and—for weighted networks—the topological properties of the link weights. We show that persistent homology is robust against different choices of network metrics. Moreover, the clique complex on its own turns out to contain little information content about the underlying network. For weighted networks we propose a filtration method based on a progressive thresholding on the link weights, showing that it uncovers a richer structure than the metrical and clique complex approaches.
KeywordsComplex networks Persistent homology Metrics Computational topology
Over the last decade complex networks have become one of the prominent tools in the study of social, technological and biological systems. By virtue of their sheer sizes and complex interactions, they cannot be meaningfully described and controlled through classical reductionist approaches.
Within this framework, the study of the topology of complex networks, and its implications for dynamical processes on them, has most often focused on the statistical properties of nodes and edges and therefore found a natural and effective description in terms of statistical mechanical models of graph ensembles [1, 2]. These models rely for their formulations on local interactions and become quickly hard to manage when higher correlations are included or one-step approximations are not sufficient, as Schaub et al.  pointed out for the case of community detection algorithms for example.
The last few years saw a new perspective emerge that focuses on the very geometry of complex network. It was promoted by a large availability of new (typically geosocial) data coming from spatial networks , but also by analytical and numerical results on the relations between geometrical properties and global features of complex networks, e.g. the hyperbolic embedding of the Internet with the resulting increased efficiency of greedy routing algorithms , stationarity conditions for chemical networks  and brain cortex dynamics .
In this work, we take on this perspective and study the geometrical properties of networks through the goggles of persistent homology, a technique originally introduced by [8, 9] to uncover robust topological information from noisy high-dimensional point clouds. Persistent homology works by extracting from a dataset a growing sequence of simplicial complexes (called filtration), indexed by a parameter ϵ, and studying the associated homology groups, which encode the geometrical information (for example, the holes of an n-torus). The robustness of each topological feature is then obtained from the persistence of the corresponding generator along the filtration,
For example, in the case of the torus, there will be two persistent generators associated to the two non-equivalent loops on its surface.
Persistent homology has received some attention in the context of networks , but there has been no systematic study on its efficiency and sensibility for networks yet. This is of particular importance since, in contrast with the unique natural metric available for point cloud datasets, networks allow various rules to generate the filtration.
Our results will show that the salient features of the homology do not change significantly under different metrics and that there exist a metric scale ϵ c at which the filtration displays the richest structure.
We will then study a second method to create the filtration, relying only on the network clique structure. Unfortunately, this will turn out to yield little additional information.
In the case of weighted networks it is possible to devise a refined filtration based on the clique structure of the network thresholded by ϵ, which yield a much richer picture than the simple clique complex method.
The rest of this work is organised as follows. In the next section a minimal introduction to homology and its persistent sister is given. The following section will present selected results of simulations and datasets under different choices of metrics for the network filtration.
We conclude then presenting the procedure for the filtration built with the link-thresholded clique structure and briefly discuss the results and implications for future research. In particular, we have discuss the possibility of expanding the method by considering multi-filtrations, that is filtrations indexed by more than one parameter.
Formally, homology is an algebraic invariant converting local geometric information of a space into a global descriptor. There are many homology theories, but simplicial homology is the most amenable for computational purposes thanks to its combinatorial structure.
This kind of homology is applied to simplicial complexes, that are combinations of vertices, segments, triangles and higher dimensional analogues, joined according to specific compatibility relations. As we will see in the following, simplicial complexes can be constructed from discrete spaces or networks. Low dimensional homology groups have an intuitive interpretation. Given a simplicial complex X, H 0(X) is the free group generated by the connected components of X, H 1(X) is the free group generated by the cycles in X, H 2(X) is the free group generated by voids—holes bounded by two-dimensional faces. The Betti numbers count the number of generators of such homology groups.
The standard tool to encode this information is the so-called barcode, which is a collection of intervals representing the lifespans of such generators. Long-lived topological features can be distinguished in this way from short-lived ones, which can be considered as topological noise. There are various ways of building persistence modules out of a given dataset. The most known are the Rips-Vietoris complex, the Cech complex and the clique complex . The first two require a metric space for the data and are generated by inflating spheres of the same radius around points (or nodes in a network) and associating set of points to simplices according to the overlap of the corresponding spheres. They can also be used to create a filtration out of general network, once a metrical structure is given on the network itself (shortest-path, commute time distance, etc). Besides these two methods, there exist a few methods pertaining to networks only , the best known being the clique complex, which is generated by associating to each maximal clique the simplex generated by the vertices of the clique.
15.3 Robustness Against Metric Change
Network metrics have been well studied, especially in the context of clustering algorithms  and Markov Chain models . In addition to the shortest path and commute time metrics, it is possible to define kernel matrices as functions of the network’s adjacency and Laplacian matrices. From such kernels one obtains a well-defined distance, which effectively turns the network into a metric space.
We analysed the metrics associated to: the shortest paths, the commute time between nodes, exponential diffusion  and exponential Laplacian diffusion [11, 14], which emerge as solutions of diffusion processes on the corresponding network, the von Neumann kernel ,which generalises the hub-authority measures, Markov diffusion  and random walks with restart.
For each metric, the filtration was generated and the persistent homology calculated. The analysis was repeated on a range of different networks, spanning different network topologies, sizes and origins (biological, social, technological).
In this respect, our results suggest the existence of a particular value ϵ c , a metric scale, at which one observes the most structure in the metrical representation of the network under study. The same behaviour was found in a number of other networks, ranging from the US air passenger network to the human gene regulatory one. Note moreover that, in general, ϵ c is different from the average distance between the nodes (in terms of the chosen metric) and therefore cannot be explained as a mere effect of the distances distribution. Moreover, if the appearance of non trivial higher homology groups was only due to the merging of small connected components into a giant component, one would expect to observe the same phenomenon also for the merging of smaller components. However, we did not see any of these signatures, supporting the existence of a characteristic scale ϵ c .
15.4 Clique Complex and Link Weights Thresholding
Finally, we can conclude that this method is able to identify mesoscopic and long-range structures which are present in networks, but would otherwise pass undetected with standard methods, and assigns to them also a measure of robustness in the form of the persistence intervals.
The authors acknowledge Mario Rasetti for insightful discussions and constant support.
- 13.Kondor R, Lafferty J (2002) Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the nineteenth international conference on machine learning (ICML’02), pp 315–322 Google Scholar
- 15.Kandola J, Shawe-Taylor J, Cristianini N (2002) Learning semantic similarity. Advances in neural information processing systems 15:657–666 Google Scholar