1 Introduction

A common theme in Geometry and in many of its applications (in some cases still in a non-deliberate manner) is the embedding, with least distortion, of a given structure/space, in a familiar, larger ambient space. For instance, in classical Geometry and Topology, this ambient space is usually \({\mathbb {R}}^n\), for some n large enough, whereas any metric space is isometrically embeddable in \(l^\infty \). Also, more recently, embeddings of networks in the Hyperbolic Plane \({\mathbb {H}}^2\) or Space \({\mathbb {H}}^3\) have become quite common (see, e.g. [1] and the references therein).

People study and make appeal to the coarse geometry of networks, constantly and for a long time now, whether they do it conscientiously (rarely) or not (in most cases). We shall demonstrate this shortly, however, to make this even clear, let us specify what we, informally, mean by “Coarse Geometry”: The study of the geometric (topological) properties, without “looking at” the small scales. In other words, one does not discern between objects that, viewed from sufficiently far away, look the same. Perhaps the simplest and immediate example of such a large scale geometry (or, in John Lott’s suggestive words, “Mr. Magoo geometry”) is given by the integer grid in the Euclidean plane. (For more insights and technical definitions and results, see, e.g [2,3,4], as well the Appendix.)

The Complex Networks community has been exposed to this approach—and makes by now use of it—via the notion of Gromov hyperbolicity, itself stemming from Gromov’s seminal work on hyperbolic groups [5]. The—perhaps somewhat theoretical in the eyes of many practitioners—notion above is connected to the notion of network backbone, underlying the long-distance relations between major network regions. Indeed, the connection between negative curvature and these “communication highways” in networks has been emphasized in [6].

It is therefore most natural to ask, not only if a coarse embedding (viewed as a metric measure space) of a weighted graph exists, but also whether there exists an “automatic” procedure to achieve such an embedding. The present paper shows that the answer to both these questions is positive.

The reminder of the paper is structured as follows: in Sect. 2, we bring the definitions of Forman’s Bochner–Laplacians and Curvature functions on which our embedding kernels rest upon; in Sect. 3, we discuss coarse embeddings and their use to support-vector machines (SVMs), while in Sect. 4, we apply our kernels to networks’ visualization, concluding in Sect. 5, with an overview of the paper and an outlook. For the benefit of the readers we also added an appendix including a very brief overview of coarse spaces.

2 The Forman–Laplacian and curvature functions

We bring below the general formulas for the Forman–Laplacian and curvature functions. For more background on CW complexes we refer the reader to Forman’s paper [8] as well as to [7].

Let M be a (positively) weighted quasiconvex regular CW complex, let \(\alpha = \alpha ^p \in M\) a p-dimensional cell and let \(w(\alpha )\) denote its weight. While general weights are possible, making the combinatorial Ricci curvature extremely versatile, it is suffices (cf. [8]), Theorem 2.5 and Theorem 3.9) to restrict oneself only to so called standard weights:

Definition 1

The set of weights \(\{w_\alpha \}\) is called a standard set of weights iff there exist \(w_1, w_2 > 0\) such that given a p-cell \(\alpha ^p\), the following holds:

$$\begin{aligned} w(\alpha ^p) = w_1\cdot w_2^p\,. \end{aligned}$$

(Note that the combinatorial weights \(w_\alpha \equiv 1\) represent a set of standard weights, with \(w_1 = w_2 = 1\).)

Using standard weights, we obtain the following formula for the Forman–Laplacian

$$\begin{aligned} \Box _p(\alpha _1^p,\alpha _2^p) = \sum _{\begin{array}{c} \beta ^{p+1}>\alpha _1^p \\ \beta ^{p+1}>\alpha _2^p \end{array}}\epsilon _{\alpha _1\beta }\epsilon _{\alpha _2\beta }\frac{\sqrt{\omega (\alpha _1^p)\omega (\alpha _2^p)}}{\omega (\beta ^{p+1})} + \sum _{\begin{array}{c} \gamma ^{p-1}<\alpha _1^p \\ \gamma ^{p-1}<\alpha _2^p \end{array}}\epsilon _{\gamma \alpha _1}\epsilon _{\gamma \alpha _2}\frac{\omega (\gamma ^{p-1})}{\sqrt{\omega (\alpha _1^p)\omega (\alpha _2^p)}}\,, \end{aligned}$$
(1)

where, for instance \(\alpha < \beta \) denotes that \(\alpha \) is a face of \(\beta \), and \(\epsilon _{\alpha _1\beta }, \epsilon _{\alpha _2\beta }, \epsilon _{\gamma \alpha _1}, \epsilon _{\gamma \alpha _2} \in \{-1,+1\}\) represent the respective incidence numbers of the cells \(\beta \) relative to \(\alpha _i\), and \(\alpha _i\) relative to \(\gamma \), \(i = 1,2\), respectively. (see [7]).

In the very specific, limiting case of 1-dimensional complexes, i.e. graphs/networks, Formula (1) becomes [9] simply

$$\begin{aligned} \Box _1(e) = \Box _1(e,e) = \frac{w(v_1)}{w(e)} + \frac{w(v_2)}{w(e)}\,; \end{aligned}$$
(2)

where \(v_1,v_2\) are the end nodes of the edge e, and \(w(v_1),w(v_2),w(e)\) represent their respective weights.

Furthermore, we also obtain the formula for the curvature functions, namely

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_p(\alpha ^p)&= \omega (\alpha ^p)\Bigg [\left( \sum _{\beta ^{p+1}>\alpha ^p}\frac{\omega (\alpha ^p)}{\omega (\beta ^{p+1})}\; + \sum _{\gamma ^{p-1}<\alpha ^p}\frac{\omega (\gamma ^{p-1})}{\omega (\alpha ^p)}\right) \\&\quad -\sum _{\alpha _1^p\parallel \alpha ^p, \alpha _1^p \ne \alpha ^p}\Big |\sum _{\begin{array}{c} \beta ^{p+1}>\alpha _1^p \\ \beta ^{p+1}>\alpha ^p \end{array}}\frac{\sqrt{\omega (\alpha ^p)\omega (\alpha _1^p)}}{\omega (\beta ^{p+1})}\, - \sum _{\begin{array}{c} \gamma ^{p-1}<\alpha _1^p \\ \gamma ^{p-1}<\alpha ^p \end{array}}\frac{\omega (\gamma ^{p-1})}{\sqrt{\omega (\alpha ^p)\omega (\alpha _1^p)}}\Big |\,\;\Bigg ]\,; \end{aligned} \end{aligned}$$
(3)

where \(\alpha < \beta \) denotes \(\alpha \) being a face of \(\beta \) and \(\alpha _1 \parallel \alpha _2\) parallel faces \(\alpha _1\) and \(\alpha _2\), the notion of parallelism being defined as follows:

Definition 2

Let \(\alpha _1 = \alpha _1^p\) and \(\alpha _2 = \alpha _2^p\) be two p-cells. \(\alpha _1\) and \(\alpha _2\) are said to be parallel (\(\alpha _1 \parallel \alpha _2\)) iff either: (1) there exists \(\beta = \beta ^{p+1}\), such that \(\alpha _1, \alpha _2 < \beta \); or (2) there exists \(\gamma = \beta ^{p-1}\), such that \(\alpha _1, \alpha _2 > \gamma \) holds, but not both simultaneously.

The case \(p=1\) is especially important, since in the classical, i.e. smooth manifold case, \({\mathcal {F}}_1\) coincides with Ricci curvature. Therefore, by analogy with the classical case, one defines the discrete (weighted) Forman–Ricci curvature on \(\alpha = \alpha ^1\), i.e. 1-cells (edges), as

$$\begin{aligned} \mathrm{Ric_F}(\alpha )= {\mathcal {F}}_1(\alpha ). \end{aligned}$$
(4)

More generally, the operators \(\Box _p\) and \({\mathcal {F}}_p\) are interrelated via the defining discrete Bochner–Weitzenböck formula—see [8], namely

$$\begin{aligned} \Box _p = B_p + F_p\,; \end{aligned}$$
(5)

where

$$\begin{aligned} {\mathcal {F}}_p = \langle F_p,\alpha ^p \rangle \,. \end{aligned}$$
(6)

(For further details see [8].)

In the dimensionally degenerate—yet essential in practice and applications—1-dimensional case, that is of graphs/networks, the daunting Formula (3) reduces to the following elementary one:

$$\begin{aligned} {\textbf{F}}(e){} & {} = w(e) \left( \frac{w(v_1)}{w(e)} + \frac{w(v_2)}{w(e)} \right. \nonumber \\{} & {} \quad \left. - \sum _{e(v_1)\ \sim \ e,\ e(v_2)\ \sim \ e} \left[ \frac{w(v_1)}{\sqrt{w(e) w(e(v_1))}} + \frac{w(v_2)}{\sqrt{w(e) w(e(v_2))}} \right] \right) \,; \end{aligned}$$
(7)

where

  • e denotes the edge under consideration between two nodes \(v_1\) and \(v_2\);

  • w(e) denotes the weight of the edge e under consideration;

  • \(w(v_1), w(v_2)\) denote the weights associated with the nodes \(v_1\) and \(v_2\), respectively;

  • \(e(v_1) \sim e\) and \(e(v_2) \sim e\) denote the set of edges incident on nodes \(v_1\) and \(v_2\), respectively, after excluding the edge e under consideration which connects the two nodes \(v_1\) and \(v_2\), i.e. \(e(v_1),e(v_2) \ne e\).

Here we denote by \({\textbf{F}}\) the Graph Forman-Ricci curvature for networks. (Notice the change of notation adopted for clarity reasons.)

We do not elaborate here more on the intuition behind these formulas, especially in the case of 1-dimensional complexes, which is the most important case herein, since we dwelt on them in detail in our original papers [9, 10], as well in the quite recent article [11], which represents a companion paper of the present one.

3 Coarse embedding and an application to SVM

By a coarse embedding of a metric space (Xd) into another metric space \((Y,\rho )\), we mean a map \(i: X \rightarrow Y\), such that there exist increasing, unbounded functions \(\eta _1,\eta _2:{\mathbb {R}} \rightarrow {\mathbb {R}}\), such that

$$\begin{aligned} \eta _1(d(x_1,x_2)) \le \rho (i(x_1),i(x_2)) \le \eta _2(d(x_1,x_2))\,, \end{aligned}$$
(8)

for any \(x_1,x_2 \in X\).

The “automatic” embedding method we alluded to above is one that is canonical in SVM techniques, and classical in Fourier Analysis and its various applications, in particular in Signal and Image Processing, namely that of reproducing kernels. Recall the following definition:

Definition 3

Given a set X, symmetric kernel is a symmetric function \(k:X \times X \rightarrow {\mathbb {R}}\), i.e. \(k(x,y) = k(y,x)\), for any \(x,y \in X\).

A kernel k is said to have

  1. 1.

    positive type if the matrix \(K_m = \{k(x_i,x_j)\}_{i,j=1}^m\) is positive semidefinite for all \(m \in {\mathbb {N}}\); and

  2. 2.

    negative type if the matrix \(K_m = \{k(x_i,x_j)\}_{i,j=1}^m\) is negative semidefinite for all \(m \in {\mathbb {N}}\).

Positive and negative type kernels are interrelated by the following classical result:

Proposition 1

(Schoenberg’s Lemma) Let k be a symmetric kernel on set X. Then the following statements are equivalent:

  1. 1.

    The kernel k is of negative type;

  2. 2.

    The kernel \(\kappa = \textrm{exp}(-tk)\) is of positive type, for each \(t > 0\).

(For a proof see, e.g. [3], Proposition 11.12. See also [13].)

From Schoenberg’s Lemma and the fact that \(k^a\) can be written as

$$\begin{aligned} k^a(x,y) = C\int _0^\infty (1 - e^{-t\kappa (x,y)})t^{-a - 1}dt\, \end{aligned}$$

where \(C = C_a\) is a positive constant, it follows (cf. [3]) that following corollary holds:

Corollary 2

Let \(k \ge 0\) be a kernel of negative type on X. Then \(\kappa = \kappa ^a\) is also of negative type, for any \(0< a < 1\).

(See, e.g. [3], Example 11.13.)

Remark 1

An (effectively coarse) embedding method for networks as well as higher dimensional spaces was proposed in [12]. While the approach suggested therein applies to more general spaces than networks and, the embedding space is the familiar \({\mathbb {R}}^n\), it still is less than intuitive, since it makes appeal to a family of quasi-metrics. While this approach allows for the network to be studied at many scales, it is also more complicated then the one adopted here (and based on the path degree metric). Furthermore, the method introduced herein has also the advantage of coming in conjunction with a reproducing kernel, that allows for the automation required in SVM related applications. Therefore, the corollary above shows that a way of studying networks at many scales, akin to the one in [12], exists also in the coarse embedding (kernel) approach.

Since the operators \(\Box _1\) and \(B_1\) are symmetric as functions of the end nodes uv of an edge \(e = (u,v)\), they define (in analogy to the classical Laplacian) reproducing kernels \(k_{\Box } = \Box _1, k_B = B_1\) on any given network. Moreover, since by its very construction/definition \(B_p\) is positive semidefinite, it follows that \(k_p\) is also positive semidefinite (i.e. a “classical” reproducing kernel). By a direct application of a classical result (for a proof, see, e.g. [3], Theorem 11.15), we get the following theorem:

Theorem 3

Let k be a symmetric kernel on X. Then,

  1. 1.

    If k is of positive type, then there exists a map \(\varPhi :X \rightarrow {\mathcal {H}}\), where \({\mathcal {H}}\) is a real Hilbert space, such that \(k(x,y) = <\varPhi (x),\varPhi (y)>\), for any \(x,y \in X\).

  2. 2.

    If k is of negative type and, furthermore, \(k(x,x) = 0\) for any \(x \in X\), then there exists a map \(\varPhi :X \rightarrow {\mathcal {H}}\), where \({\mathcal {H}}\) is a real Hilbert space, such that \(k(x,y) = \mid \mid \varPhi (x) - \varPhi (y)\mid \mid ^2\), for any \(x,y \in X\).

Moreover, \(\textbf{F}\) and \(\textrm{Ric}_F\) are both symmetric functions (again viewed as acting on pairs of nodes defining edges), thus kernels \(k_\textbf{F}\) and \(k_{\textrm{Ric}_F}\) can be defined. However, neither version of Forman’s Ricci curvature is positive, therefore a mapping into a real Hilbert space, as for the Laplacians, is not possible for these curvature-based kernels. In fact, \(\textbf{F}\) is not negative only if both its end nodes have degree 2, that is only the degenerate case of cycles and graph “spurious” vertices of degree 2, i.e graph subdivisions homeomorphic to a “good” graph can have edges of non-negative curvature. It can, therefore, be surmised that proper networks have pure Forman curvature \(\textbf{F} < 0\). (Note that this does not hold for \(\textrm{Ric}_F\).)

Unfortunately, part 2 of the theorem above (the one regarding negative type operators) does not hold, since it requires that \(k(x,x) = 0\), for all \(x \in X\), which, of course, does not hold neither for \(\Box _1\), nor for \(B_1\). However, one can still map the kernels \(k_{\Box }, k_B\) to a Hilbert space precisely as above, after modifying them into fitting related positive operators, by putting \(k^*_{\Box } = e^{-k_{\Box }}; \,\, k^*_B = e^{-k_B}\).Footnote 1

Moreover (and perhaps more important in our context) there exists a coarse embedding of any (finite) network into a (real) Hilbert space. This follows from the fact that, for instance, \(e^{-k_\textbf{F}}\) is a positive kernel, and from the fact that the kernel \(k_\textbf{F}\), is effective i.e. the edges \(\{e = (x,y) \mid \; \mid \!k_{\textbf{F}}(e)\!\mid \;< K\}\), \(K > 0\) generate the coarse structure of the network (see [3] for technical details), which, given in the case at hand, of finite networks, can naturally to be taken as the discrete coarse structure generated by sets that contain only a finite number of points (nodes) off the diagonal. Therefore we can apply the following result:

Theorem 4

[3, Theorem 11.16] Let X be a coarse space. Then the following statements are equivalent:

  1. 1.

    X can be coarsely embedded into a Hilbert space.

  2. 2.

    There exists an effective negative type kernel on X.

In fact, the observation above can be strengthened in more than one way: On the one hand, the result can be applied to any of the kernels above, not only the one considered above. Furthermore, due to finiteness, one can relax the effectiveness condition by dispensing with the absolute value in the defining inequality. On the other hand, the Forman–Ricci curvature kernels are effective even if one considers \(\varepsilon \)-nets obtained from the geometric sampling of non-compact Riemannian manifolds with Ricci curvature bounded from below, given that the sampling procedure is essentially the same as for the compact ones (see [4] for the classical case and [14] for the generalization to metric measure spaces). The only difference between the compact case and the one at hand is that one has to relax somewhat the coarseness isometry definition and the resulting \(\varepsilon \)-net, endowed with the combinatorial metric (i.e. with edge lengths \(\equiv 1\)), will be only roughly isometric to the given manifold, where rough isometry is defined as follows:

Definition 4

Let (Xd) and \((Y,\delta )\) be two metric spaces, and let \(f:X \rightarrow Y\) (not necessarily continuous). f is called a rough isometry iff

  1. 1.

    There exist \(a \ge 1\) and \(b > 0\), such that

    $$\begin{aligned}\frac{1}{a}d(x_1,x_2) - b \le \delta (f(x_1),f(x_2)) \le ad(x_1,x_2) + b\,,\end{aligned}$$
  2. 2.

    there exists \(\varepsilon _1 > 0\) such that

    $$\begin{aligned}\bigcup _{x \in X}{B(f(x),\varepsilon _1)}= Y;\end{aligned}$$

    (that is f is \(\varepsilon _1\)-full.)

It follows that all types of kernels based on Forman’s discretization of the Bochner–Weitzenböck may be applied in the variety of SVM problems where kernels are usually employed, such as clustering and classification. In particular, one can compute the so called kernel distance [15, 16]:

Fig. 1
figure 1

Two views of the “Kangaroo” Graph Forman–Ricci curvature derived, kernel-based, embedding in \({\mathbb {R}}^3\) which reveal that the network essentially consist from one large cluster. The attained minimum for the cost function in this case is \(J = 23.841747\)

Fig. 2
figure 2

Two views of the Graph Forman–Ricci curvature derived, kernel-based embedding of the “Les Misérables” characters network. Here the attained minimum for the cost function is \(J = 42.264625\). Note that the kernel-based embedding into \({\mathbb {R}}^3\) clearly distinguishes the clusters around each of the main characters in Hugo’s novel, connected by the relations between them

Fig. 3
figure 3

Graph Forman–Ricci (above) vs. \(\square _1\) (below) kernel-based embedding of the “C. Elegans” network [20]. Note the far better separation properties of the curvature based embedding. This ability has as a penalty a higher minimum of the cost function J, namely 404.342321 for the curvature-based kernel, as compared to 283.640003 for the Laplacian-based kernel

Definition 5

Given a similarity function (kernel) \(K:{\mathbb {R}}^d \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\), such that \(K(p,p) = 1\), for any \(p \in {\mathbb {R}}^d\), and given sets \(P,Q \subset {\textbf{R}}^d\), the kernel distance \(D_K(P,Q)\) is defined as

$$\begin{aligned} D_K(P,Q) = \sqrt{\sum _{p \in P}\sum _{p' \in P}K(p,p') - 2\sum _{p \in P}\sum _{q \in Q}K(p,q) + \sum _{q \in Q}\sum _{q' \in Q}K(q,q')}\,.\nonumber \\ \end{aligned}$$
(9)

Remark 2

\(D_K(P,Q)\) is not a metric if K is not positive definite.

Note that in the special case \(P = \{p\}, Q = \{q\}\), we obtain \(D^2_K(P,Q) = 2(1 - K(p,q))\), thus \(1 - K(p,q)\) can be viewed as a proper squared distance, since \(1 - K(p,p) = 0\).

Remark 3

The quantity \(\kappa (P,Q) = \sum _{p \in P}\sum _{q \in Q}K(p,q)\) is called the cross-similarity (of P and Q), thus \(D_K(P,Q)\) can be expressed, more simply, in terms of the cross similarity, rather then in means of the kernel as \(\kappa (P,P) - 2\kappa (P,Q) + \kappa (Q,Q)\).

4 First results: network visualization

A first application of the kernel distance is in the visualization of kernel spaces and data in general [15, 17]. The embedding method employed here is the so called mutidimensional scaling (MDS) one (see, e.g. [18]). More precisely, given N points \(p_1,\ldots ,p_N\), we can approximate their position in a kernel space \({\mathcal {H}} \subset {\mathbb {R}}^d\) by \(y_1,\ldots ,y_N \in {\mathbb {R}}^d\), by minimizing the cost-function

$$\begin{aligned} J = \sum _{i = 1}^N\sum _{j > i}^N(\vert \vert y_i - y_j \vert \vert - D_{ij})^2\,; \end{aligned}$$
(10)

where \(\vert \vert \cdot \vert \vert \) denote the Euclidean norm on \({\mathbb {R}}^d\) and \(D_{ij} = D_K(p_i,p_j)\).

We illustrate this application below, using the Graph Forman-Ricci curvature, on two real-life networks, namely the simple, illustrative “Kangaroo” network, and on the larger “Les Misérables” one. As the second example proves, this visualization method is quite efficient at distinguishing various clusters in a large network, due to the high degree of separation that higher dimensional spaces afford. In these examples, to numerically determine the minimum of the cost-function J we made appeal to the classical Douglas–Rachford algorithm [19]. We have illustrated these examples in Figs. 1 and 2.

Furthermore, we have implemented the \(\square _1\) based embedding. Perhaps contrary to the common wisdom, and thus somewhat unexpected, the Graph Forman-Ricci curvature renders a far better separation into clusters than the considered Laplacian. See Fig. 3.

The kernel embedding method not only is useful in the visualization of networks, it also (and perhaps more importantly) allows us to embed large complex networks into the familiar Euclidean space. Consequently one can employ well established and intuitive sampling methods of data points and sets in the usual ambient space.

5 Conclusion and outlook

We have proposed a coarse geometric approach to Forman curvature and Bochner–Laplacian kernel based network embedding, and we have illustrated these ideas on a number of representative small and medium size networks, as well as on test images. Even though the experiments were quite restricted in scope and functioning more as illustrative cases and “proof of feasibility”, some first conclusions can be drawn already, as we have noted throughout the text:

  • One should experiment and compare our proposed method with other embedding techniques than the one used herein, for instance the so called local linear embedding [21].

  • Besides the kernel embedding method embraced in the present paper, one should explore another, and perhaps better established approach, that is the one using the eigenvectors and eigenfunctions of the Laplacian for embedding and sampling, where the classical graph Laplacian is replaced by the Bochner and rough ones. Furthermore, given that they exist for each and every dimension up to the maximal one, they suggest themselves as a potentially powerful method of studying hypernetworks/complexes at many scales. Additional and more systematic approaches can also be found in [22].

  • Perhaps the most rewarding, from the applications viewpoint, direction of further study is the employment of the approach introduced herein to the fields of Imaging and Graphics. Indeed, the motivation for the search for possible curvature-based kernels stems from these settings (as a natural counterpoint of curvature-based sampling and reconstruction [23]). Clearly, in this case the networks to be studied are the 1-skeleta of the triangular or quadrilateral meshes, whereas the curvature to be employed could be the ones considered herein, metric curvatures [24] or the classical defect one (see, e.g. [25]).