ripple2vec: Node Embedding with Ripple Distance of Structures

Graph is a generic model of various networks in real-world applications. And, graph embedding aims to represent nodes (edges or graphs) as low-dimensional vectors which can be fed into machine learning algorithms for downstream graph analysis tasks. However, existing random walk-based node embedding methods often map some nodes with (dis)similar local structures to (near) far vectors. To overcome this issue, this paper proposes to implement node embedding by constructing a context graph via a new defined ripple distance over ripple vectors, whose components are the hitting times of fully condensed neighborhoods and thus characterize their structures as pure quantities. The distance is able to capture the (dis)similarities of nodes’ local neighborhood structures and satisfies the triangular inequality. The neighbors of each node in the context graph are defined via the ripple distance, which makes the short random walks from a given node over the context graph only visit its similar nodes in the original graph. This property guarantees that the proposed method, named as ripple2vec\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf {ripple2vec}$$\end{document}, is able to map (dis)similar nodes to (far) near vectors. Experimental results on real datasets, where labels are mainly related to nodes’ local structures, show that the results of ripple2vec\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf {ripple2vec}$$\end{document} behave better than those of state-of-the-art methods, in node clustering and node classification, and are competitive to other methods in link prediction.


Introduction
Graph is a generic model of various networks in real-world applications, e.g., social network analysis [22], knowledge representation and inference [18], chemoinformatics and computational biology [24]. Effective graph analyses, often time-and space-consuming tasks due to the nature of graph data, help users better understand the structure of societies [22], languages [28], molecules and many others, which benefits lots of meaningful operations such as node classification [30], item recommendation [21], information propagation [29] and performance prediction [2]. Many approaches have been proposed to perform such analyses [20][21][22]25]. At the same time, recent years have witnessed a surge of successes in processing data with machine learning (ML) algorithms, which operate on structured data and often require to represent symbolic data as numerical vectors. In order to analyze graph data with ML algorithms, people turn to graph embedding, i.e., representing nodes [4][5][6][7]9] (edges [12] and graphs [11], resp.) in low-dimensional vector spaces such that the vectors of similar structures have small distances. This paper focuses on node embedding with direct embedding methods.
The goal of node embedding, given a graph G(V, E) and an integer d ∈ N , is to find a mapping f ∶ V → R d such that is the similarity between u and v. Usually, it is difficult to define dis G to quantify precisely the semantic relationships and similarities between nodes in G(V, E) [26]. For example, general dis G s are adopted in traditional algorithms such as multi-dimensional scaling (MDS) [33], IsoMap [32], Laplacian eigenmap [31], and local linear embedding (LLE) [34], which suffer from high space and time cost. And, new ideas are developed in recent algorithms [4-9, 13, 15, 16].
Some algorithms follow the framework of random walk [4,6] or its variants [13,14,17] to measure the proximities 1 3 between nodes with their co-occurrence on short random walks over the graph. However, such co-occurrences are based on the adjacency relationships instead of the similarity, which makes it hard to map nodes with similar local structures to vectors with small distances.
Instead, people sample random walks from context graphs, which are constructed via different similarity measures of local structures. For example, [8] and [23] construct weighted motif adjacency graphs by counting the occurrences of a set of network motifs. Such methods require user to provide suitable network motifs, which is usually a nontrivial task. Also, [16] constructs a hierarchy of coarsened graphs by collapsing related nodes into "supernodes." It still depends on the adjacency relationships of the original graph, while [7] applies DTW algorithm on degree sequences to measure the similarity of local structures. Notice that DTW algorithm ignores partially the effects of connection patterns within neighborhoods. As a result, nodes with similar local structures may be mapped to far vectors. Therefore, there is still much space to improve the way to capture the similarity of local neighborhoods. For example, consider the first (second resp.)-order neighborhoods N 1 (u) , N 1 (v) ( N 2 (u) , N 2 (v) resp.) of nodes u, v in Fig. 1. Obviously, there are edges in both N 1 (v) and N 2 (v) , but no edges in neither N 1 (u) nor N 2 (u) . That is, N i (u) and N i (v) ( i = 1, 2 ) have different local structures, which cannot be captured by applying DTW algorithm on degree sequences (like in [7]) because N i (u) and N i (v) have a same sequence. Besides, such differences cannot be easily captured explicitly by neither predefined graphlets (like in [8]) nor other existing methods, to the best of our knowledge.
Based on this observation and several wonderful existing ideas [1,4,7,16], this paper proposes a metric (named as ) to capture above differences and leverages it to build a context graph (for node embedding), which helps map nodes with (dis)similar local structure to (far) near vectors.
Specifically, we first describe the local neighborhood structures of each node u as a ripple vector r(u), where each component r i (u) quantifies approximately the structure of the ith-order neighborhood N i (u) as the expected hitting time of a random walk in the Fully Condensed Abstract Neighborhood Graph F i (u) . The intuitive meaning of r i (u) is the average steps of the random walk, which is determined inherently by the structure of N i (u) , to escape Fig. 1. Furthermore, r(u) = ⟨3, 2.67, 7.2⟩ and r(v) = ⟨3, 4.0, 4.62⟩ (See Sect. 4.2 for their computation). As a whole, ripple vector catches the structure of the nested local neighborhoods of each node, just like the shapes of water waves in diffusion sense the positions of the obstacles on the surface of the water.
Then, we propose to use r(u, v), which is defined as max(r i (u),r i (v)) ) , to measure the (dis)similarity of modes' local structures. The configurability of weights brings a flexibility of our method to different applications. And ripple distance satisfies the triangular inequality, no matter how the weights are chosen. This lays a solid foundation for to map (dis)similar nodes to (far) near vectors.
Finally, with the help of the ripple distance, we construct a multilayer context graph G c to implement node embedding by sampling short random walks from it. Node u in each ith layer of G c has log |V| neighbors corresponding to log |V| similar nodes (measured with the i-th-order ripple distance) of u in the original graph, which makes the co-occurring nodes in short random walks have high similarity. Efficient algorithm to compute the context graph is designed to guarantee its scalability.
Our main contributions include: • We quantify the local structures of nodes in (un)weighted graphs as ripple vectors and define the ripple distance over them to measure the (dis)similarity between nodes. • We implement a node embedding (named as ) in the framework of by adapting ripple distance to define the context graphs. It helps map (dis)similar nodes to (far) near vectors. • Extensive experiments are conducted on real datasets to show the superiority of the , i.e., it is stronger to capture explicitly the similarity of local structures, scales well and benefits downstream applications well.
Notations used in this paper are summarized in Table 1.
The remainder is organized as below. Section 2 summarizes related work. Section 3 formalizes the problem and presents explain our motivation further. Our solution is given in Sect. 4. And, Sect. 5 reports the experimental results, followed by the conclusion in Sect. 6.

Related Work
There are different types of node embedding algorithms (see [3,26] for surveys and classification). Our method belongs to direct encoding algorithms.

Traditional Methods
Traditional dimension reduction algorithms, such as [31], [33], [34] and IsoMap [32], can be used as node embedding algorithms. All these algorithms can be adapted to work coordinately with our ripple distance. For example, and can be applied directly on graph G ( V, E � ) , where E ′ consists of all node pairs with their ripple distances as weights. And, and can be applied on graph G ( V, E �� ) , where E ′′ ⊂ E ′ only contains edges between each node with its k-nearest neighbors under the ripple distance. However, such adaptions suffer from high time cost of (|V| 2 ) and are not affordable.

Random Walk-Based Methods
Many recent successful direct embedding algorithms take advantage of the framework of random walk (or its variants), which was first proposed in [4]. Its innovation is to stochastically measure the proximity between nodes with their co-occurrence on short random walks over a graph. generated random walks in a DFS manner, while [6] in a BFS manner. Instead of sampling random walks over the original graph, [8] seizes the similarities of local structures by counting the occurrences of a set of network motifs (i.e., graphlets or orbits) to form a weighted motif adjacency graph and then applies random walk approaches (e.g., and ) on it. Also, [16] realizes a node embedding by generating iteratively a hierarchy of coarsened graphs (by collapsing related nodes into "supernodes") and then applying random walk approaches (e.g., and ) on these coarsened graphs in an inverse order. Notice that the idea of using coarsened supernode to characterize local structures is explored in the definition of the ripple vector. Likewise, [7] defines a multilayer graph with a recursively defined distance to formalize the similarities between local structures of nodes in the original graph and then applies on the multilayer graph the idea of random walk such that it has more opportunities to visit nodes with similar local structures.
utilizes DTW distance on ordered degree sequences of N k (u), N k (v) to define the similarity of local structures of u, v, while ripple distance uses the difference between particular hitting times of the coarsened neighborhoods. This integration of collapsing neighborhood and random walking makes ripple distance more powerful in capturing the similarity of local structures. Moreover, the similar expressions of the distances in and make it possible to adapt the framework of to the ripple distance directly. There are some other extensions of the random walk idea. For example, [13] extends to learn embeddings by using random walks that "skip" or "hop" over multiple nodes at each step, resulting in a proximity measure similar to [14], while Chamberlan et al. [17] modify the inner product of node2vec to use a hyperbolic distance measure (rather than Euclidean distance). Our aims to leverage the local structures caught precisely by the ripple distances and thus is perpendicular to these methods. [5] expresses the likelihood of the first and second proximity with the sigmoid function of the inner products of target vectors directly and then learns the target vectors by optimizing a loss function derived from the KL divergence metric between the empirical distribution. However, its strong assumption of conditional independence (i.e., Pr[N(u)|u] = v i ∈N(u) Pr[v i ] ) makes it hard to catch the dissimilarities between nodes of the same neighborhood with different structures, as ripple distance does.

Other Methods
[9] and [10] are representatives of deep learning methods (see [3,26] for surveys). They associate each

G(V, E)
The input graph with node set V and edge set E d max The maximum degree of G N i (u) The i-th-order neighborhood of u ∈ V F i (u) Fully condensed abstract neighborhood graph of N i (u) Hitting time from source to sink in The i-th-order ripple vector of u ∈ V r i (u, v) The i-th-order ripple distance between of u and v The multilayer context graph node with a neighborhood vector which contains the proximities between this node and other nodes and learn node embedding with deep neural networks. DNGR defines the vector according to the pointwise mutual information of two nodes co-occurring on random walks, similar to [4] and [6]. simply chooses the node's adjacency vector as the vector. Both vectors may be replaced with the ripple distance vector (left as our future work).
Graph convolutional neural networks (GCNs) [30,36] obtain node embedding by aggregating the features of neighbor nodes either in spectral domain or vertex domain. They leverage the local structures implicitly in downstream applications (e.g., [30]), while aims to exploit local structures explicitly. Graph kernel methods (see [19] for a recent survey) usually map graph to vectors by counting the appearance of substructures such as paths and trees, which is perpendicular to .

Problem Statement and Motivation
Node embedding aims to represent nodes of a graph in a lowdimensional Euclidean vector space. Formally, given a graph G(V, E) with node set V and edge set E, the goal of node embedding is to learn a mapping function explicitly preserves the similarity between the local structures of arbitrary u, v ∈ V.
The key point of node embedding is that the similarity between local neighborhood structures of nodes must be preserved essentially. However, different understandings of the size and the structure of the local neighborhood result in different definitions of the similarity, which further have great impacts on the quality of node embedding. For example, [8] counts the occurrences of predefined graphlets to measure the similarity.
[4] and [6] leverage the co-occurrences of nodes in short random walks to measure the similarity.
[9] uses the first-and the second-order proximity to measure the similarity.
[7] aligns the sorted degree sequences to measure the similarity, and so on. These popular methods have two main disadvantages: (1) The similarity measures do not satisfy the triangular inequality in any way; (2) dissimilar nodes are often mapped to near vectors.
These issues motivate us to seek a new metric to capture the local neighborhood similarities such that (1) The dissimilarities (like in Fig. 1) not caught by existing methods can be easily captured by the new method; (2) it helps us realize a node embedding who maps similar nodes to near vectors and dissimilar nodes to far vectors with acceptable computation cost. This can be accomplished by requiring the new metric satisfies the triangular inequality. That is, nodes with more similar local structures have smaller distance, while nodes with more dissimilar local structures have larger distances. Since node embedding preserves similarities, a node embedding implemented with the new metric naturally maps (dis)similar nodes to (far) near vectors. To obtain such an embedding, we construct a multilayer context graph such that each node is adjacent to its most similar nodes. Therefore, the pioneering random walk framework over the context graph generates the context for each node (consisting of its most similar nodes) and results in an expected node embedding.
In summary, we devote to solve the following problem: Output A learned mapping function: f ∶ V → R d such that nodes with (dis)similar local neighborhood structures are mapped to (far) near vectors.
Method Adapting the popular random walk framework to a context graph G c (V c , E c ) derived from new similarity measure satisfying triangular inequality in some way.

The Proposed Model
This section presents our solution of the node embedding problem. The framework is presented in Subsection 4.1, and the key steps will be expanded in Subsections 4.2 and 4.3.

Framework of
The framework of (See Alg. 1) follows the generic framework of [7] and runs in

Ripple Vector
Given a graph G(V, E), let diam(G) be the diameter of G, i.e., the maximum length of the shortest paths between two nodes of G. For ∀u ∈ V and 0 ≤ k ≤ diam(G) , the collection of vertices exactly k-hop away from u in G is called as the k-th-order neighborhood of u and denoted as N k (u) . For example, consider the graph in Fig. 1. We have N 0 (u) = {u} , N 1 (u) = {u 11 , u 12 , u 13 } , and N 2 (u) = {u 21 , u 22 , u 23 , u 24 , u 25 }. Intuitively, we intend to quantify the structure of N k (u) as a single value T k (u) . And, all T k (u) s, together with d u (G) , consist of the ripple vector r(u) of u. Therefore, if N k (u) is viewed as a ripple k-hop away from u and T k (u) quantifies the structure of N k (u) , then the ripple vector r(u) characterizes the whole structure of N 0 (u) ∪ ⋯ ∪ N diam(G) (u) , just like water waves in diffusions on the surface of the water area.
The structure of each N k (u) ( 0 < k < diam(G) ) is determined by three kinds of edges, i.e., the edges between N k−1 (u) and N k (u) , the edges between N k (u) and N k+1 (u) , and edges within N k (u) . In what follows, we first represent this structure approximately by constructing a 5-node weighted undirected graph F k (u) , which is called as Fully Condensed Abstract Neighborhood Graph (or FCANG in short), and then quantify it as a single value.
F k (u) has 5 nodes u 0 , u 1 , u 2 , u 3 , u 4 . Each u i is a subset of V. It first computes, by invoking (See Subsection 4.2), a K+1-dimensional vectors for all nodes to depict their Kth-order local neighborhood structures (Line 1). Usually, K = 4 is enough (See Sect. 5), because ripple vectors catch approximately the local neighborhood structures well, and higher order of local structures is often meaningless as shown in [6].
Then, it invokes (see Sect. 4.3) to construct a weighted multilayer context graph G c (V c , E c ) (Line 2), in which all nodes in V appear in each layer. In each i-th layer, u has exactly log |V| neighbors, which are the nearest neighbors in the i-th-order ripple distance. Edge weights among every node pair within each layer are defined in a same way as . After that, it generates context for each node by sampling p random walks with length at most l (Line 3, Line 4). Notice that each random walk from u contains only u's similar nodes from G.
Finally, it accomplishes the mapping with the well-known learning algorithm [28] (Line 5).

Ripple Vectors and Ripple Distance
This subsection develops a metric, which is integrated into in next subsection, to measure the similarity of the structures of the local neighborhoods of any pair of nodes. Our statement focuses on unweighted graphs, although it can be directly extended to weighted graphs. Subsection 4.2.1 introduces the ripple vectors. And, Subsection 4.2.2 discusses the ripple distance.
Edges in F k (u) can be added as below. If u i ≠ ∅ ( i = 1, 2, 3 ), then there is an edge (u 0 , u i ) . If there is an edge in G(V, E) with both endpoints from u i , then there is a self-circle at u i . Moreover, if u 2 ≠ ∅ ( u 3 ≠ ∅ resp.), then edge (u 2 , u 3 ) ( (u 3 , u 4 ) resp.) exits. Actually, each edge (u i , u j ) in F k (u) represents a type of connections near N k (u) . Therefore, it can be weighted as the number n ij of edges in G(V, E) with one endpoint from u i and the other from u j . Figure 2 illustrates the first-and the second-order FCANGs of nodes u, v, z in Fig. 1, where the weights are not given explicitly. Particularly, in

Example 1
The structure of the FCANG F k (u) can be quantified in the mass as the expected hitting time of random walk in F k (u) from u 0 to u 4 , which intuitively is the average random walk steps to escape from N k (u) . It is well known that the hitting time is inherently affected by the structure of F k (u).
, is defined as the expected hitting time of random walk from u 0 to u 4 in F k (u).

Definition 2
In graph G(V, E), for ∀u ∈ V and 1 ≤ k ≤ diam(G) , the k-th-order ripple vector of u, denoted as r k (u) , is defined as the vector ⟨d G (u), T 1 (u), ⋯ , T k (u)⟩.
T k (u) ( k > 0 ) can be computed with the closed formula derived in [1]. This can be done by first filling the state tran-  Fig. 2. We count the edges in Fig. 1 and obtain n 01 = 3 , n 02 = 1,n 03 = 3 , n 11 = 0 , n 22 = 0 , n 32 = 2 , n 33 = 0 and n 43 = 4 . Therefore, we have a s s h o w n i n F i g .
Algorithm 2 computes the Kth-order ripple vectors for all nodes by extending the procedure above directly. It is noticeable that n ii is increased when each endpoint of an edge is considered. Therefore, factor 2 disappears in the formula to update B i (u) . Now notice that the inverse of B i (u) can be Remark Ripple vector can also be defined without splitting N k (u) into u 1 , u 2 and u 3 , which makes matrix B k (u) have rank |N k (u)|, and Alg. 2 have prohibitively high cost to apply.
Remark (1) With the ripple distance rather than the Euclidean distance, ripple vectors generated by Alg. 2 realize a node embedding. (2) Ripple distances between ∀u, v ∈ V form a similarity matrix, on which the traditional dimension reduction algorithms can be applied to implement a node embedding in O(|V| 2 ) time.

Ripple Distance
We first define the distance and then prove it satisfies the triangular inequality and present some implications. The ripple vector quantifies node's hierarchical local structures as its components. Therefore, the difference between ripple vectors, which is a weighted sum of the difference of each component, measures the similarity of local structures of nodes. To obtain 0 distance for identical vectors, the difference of the corresponding component of two ripple vectors considers the ratio of the minimal value to the maximal value.
The ripple distance can focus on the differences in one or several decomposed neighborhoods by adjusting the aggregation weights, which results in a great flexibility of its applications. We do not discussed how to set the weights and take = ⟨ 100 k+1 , 100 k+1 , ⋯ , 100 k+1 ⟩ for r k (u, v) in all experiments.

Multi-layer Context Graph
aggregation results from a list of sorted arrays by scanning them parallelly until a stop condition is satisfied. During parallel scanning, aggregated results of all seen objects are obtained via random access of data. These results are stored, and the highest ones are managed with a max-heap of a fixed size t. Once there are t seen objects whose scores in each array are higher than the current lowest thresholds of the same arrays, the scan terminates and objects in the heap are the final results. We refer the readers to [35] to learn more details. We adapt the threshold algorithm to compute G c (V c , E c ) by reorganizing each component of all ripple vectors as a sorted array and scanning all arrays parallelly and bidirectionally. Precisely, let R |V|×(K+1) be the ripple vectors and L i be the sorted array storing {⟨R u,i , u⟩�u ∈ V} for 0 ≤ i ≤ K . Additionally, L i has some attributes to memorize information. For instance, L i .tVal is the value which defines the starting position of the parallel scan in L i . L i .up ( L i .down resp.) is the position of the parallel scan in upward (downward resp.) direction. L i .uRatio and L i .dRatio are the corresponding ratio at position L i .up and L i .down . L.th memorizes the This subsection presents the algorithm , which constructs for (see Subsection 4.1) a multilayer context graph G c (V c , E c ) such that (1) the number of layers is the same as the dimension of the ripple vectors (see Subsection 4.2); (2) each node u in G(V, E) has a counterpart node in each i-th layer with log |V| nodes of smallest ripple distances r i (u, ⋅) as its neighbors; and (3) counterpart nodes of the same u ∈ V in different layers are connected as in the context graph of [7]. So, the key of is to find the neighborhood N (i) (u) of each u ∈ V in each i-th layer.
A straightforward way to complete this task is to compute, for each i, r i (u, v) for all vs and take log |V| nearest nodes among them as u's neighbors in the i-th layer. However, this method takes a prohibitively high cost of O(|V| 2 ).
Fortunately, the ripple distance is a "monotone" function in the sense that the more similar the local structures are, the smaller the distance between them is, which is implied by the triangular inequality. And, this monotonicity can be utilized to speed up the computation with the help of the threshold algorithm [35], which aims to find t largest current lowest threshold of the scan in L. Each array can use () to initialize the scan with a given value and () to get next tuple of the scan. These two procedures are as below.
Procedure . Algorithm is sketched in Alg. 3. It first transcripts ripple vectors R |V|×(K+1) into sorted arrays L 0 , ⋯ , L K (Line 1-Line 2). Then, it computes the neighborhood N (i) (u) in each layer i for each u ∈ V (Line 3-Line 26). To do so, it initializes D to record the seen neighbors and Rounds to account the rounds of the parallel scan (Line 4). After that, it computes log |V| nearest neighbors with r i (⋅, ⋅) for i = 0, 1, ⋯ , K sequentially (Line 6-Line 25) and takes log 2 |V| nodes with smallest distance with u as u's neighbors in layer i (Line 26). Finally, it adds edges between different layers in a same way as in . When computing N (i) (u) , it first scans Rounds nodes in L i (Line 7-Line 9), which guarantees that a same number of nodes in each of L 0 , ⋯ , L i have been scanned. The new met nodes are recorded in new. Then, it updates each i − 1-thorder ripple distance in D to the i-th-order distances and rechecks the condition: th holds for each j ≤ i and counts the number of seen nodes v which satisfy this condition (Line 10-Line 13). After that, it computes the i-thorder distances for all nodes in new, appends them to D and counts the number of nodes satisfying the same condition (Line 14-Line17). If count ≥ log |V| , i.e., at least log |V| seen nodes satisfy the condition, then N (i) (u) has been obtained. Else, the algorithm scans parallelly all arrays L 0 , ⋯ L i until count ≥ log |V| (Line 18-Line 25). For each new met node v, it appends ⟨r i (u, v), v, N⟩ to D (Line 19-Line 21). After each round of parallel scan, it checks for each unlabeled node in D whether it satisfies the condition and increases count if necessary (Line 22-Line 24).
Notice that the condition |D| < (i + 1) × log 2 |V| in Line 18 limits the rounds executed in parallel scans. Therefore, the algorithm only finds for each node log |V| near neighbors rather than exact log |V| nearest neighbors, which guarantees its scalability (see Sect. 5).

Example 3
Ignore the second condition in Line 18, Fig. 3 illustrates the computation of the 2-nearest neighbors of node 2, whose ripple vector and positions are all marked with red color, in layer 0 and layer 1 with Alg. 3 and the given data. The i-th components of all ripple vectors are sorted in array L i ( i = 0, 1).
where K is the dimension of ripple vectors. In fact, there are K layers in G c . Each node in a layer has log |V| neighbors in that layer. Computation of these log |V| neighbors for counterpart nodes in different layers is shared according to the definition of the ripple distance (see Line 5-Line 25), where at most K log |V| Kth-order ripple distances are computed, each of which needs O(K log |V|) operations ( |D| < (K + 1) log |V| ). Totally, Line 1-Line 26 spend O(K 2 |V| log 2 |V|) time. Line 27 adds edges across successive layers. Since there are totally 2K|V| such edges (see [7]) and the weight of each edge is determined in log |V| time, Line 27 needs a total time of O(K|V| log |V|).

Experiments
This section evaluates 's performance on various settings. Subsection 5.1 evaluates the capability to capture the (dis)similarities of local structures. Subsection 5.2 evaluates performance in classical tasks. Subsection 5.3 evaluates performance in network alignment. Subsection 5.4 shows the effects of the dimension K of ripple vectors and the method of setting weights. Subsection 5.5 evaluates its scalability.
We compare with 7 algorithms, i.e., [4], [6], [5], [9], [7], GCN [36] and [37]. To evaluate the roles of Alg. 3, we adapt the ripple distance to without any other changes (labeled as + ). Additionally, with the second condition in Line 17 of Alg. 3 canceled (labeled as + ) is also considered. All algorithms are implemented in Python 3 and run with five threads on an Inspur Server with Intel Xecon 128x2.3GHz CPU and 3Tb RAM running CentOS7 Linux as its operating system.

Capability to Capture Similarities
We conduct experiments on bcspwr01 1 in Fig. 4(a) to evaluate 's capability to capture (dis)similarities of the local neighborhood structures. Bcspwr01 depicts the skeleton of a fish with 39 vertices and 46 edges (see Fig. 4(a)). Since all these vertices play different roles on the body of the fish, most of them have different local neighborhood structures. We expect that nodes with similar neighborhood structures (e.g., 34 and 35) are mapped to near points in the plain, while dissimilar nodes are mapped to distinguishable points. Experimental results are shown in Fig. 4, from which we have following two observations. Firstly, performs better than these state-of-theart methods. In fact, for each of the existing methods, it maps some nodes with similar local neighborhood structures to further vectors and some dissimilar nodes to nearer vectors. For instance, nodes 38 and 39 have the same decomposed neighborhoods, which are different from those of nodes 16, 31 and 34 (see Fig. 4a). However, in the node embedding f generated by , f (38) is closer to f (16), f (31), f(34) than to f(39) (see Fig. 4b). Similar phenomenon can also be observed for (see Fig. 4e). For , f (38) and f (39) are almost the farthest vectors (see Fig. 4(c)). Similarly, nodes 34 and 35 with a same local structure are mapped to further vectors by (see Fig. 4(d)). Their weaker abilities stem from the fact that they are designed to draw the context of nodes without explicit consideration of structural similarity. On the contrary, applies DTW algorithm on the sorted degree sequences of the decomposed neighborhoods to catch the (dis)similarity of local structures. Thus, it scatters all nodes better than the previous methods (see Fig. 4f). However, some dissimilar nodes (such as 18 and 33) are mapped to nearer vectors, while some similar nodes (such 38 and 39) are mapped to further vectors. In contrast, uses ripple distance over ripple vectors to capture the (dis)similarity of local structures. It almost always maps similar nodes to near vectors and dissimilar nodes to far vectors, no matter what the value of K is (see Fig. 4h) and 4(i)).
Secondly, we find that if we replace the distance in the framework of with the ripple distance, then the 1 3 resulting method (i.e., + ) has a weaker ability to capture similarities of local neighborhood structures (see Fig. 4(g)) than . In this case, some similar nodes (e.g., 3 and 4, 38 and 39) are mapped to further vectors than those in . This indicates that the context graph constructed in Sect. 4.3 is meaningful.
Both observations can be verified on the Barbell-(2,10) and the mirrored karate. Figures 5 and 6 present only partial results. For full results, we refer the readers to our technique report. Barbell-(2,10) [7] is the graph obtained by connecting two 10-cliques with a path of 10 nodes (Fig. 5a). Both and can distinguish the differences in the local neighborhood structures (5b and 5c). The mirrored karate [7] is the graph obtained by connecting two isomorphic karate networks, where behaves much better than . For example, almost always maps any pair of mirrored nodes to the same vectors, while does not (e.g., nodes 12 and 67, 30 and 36). Besides, some dissimilar nodes not captured by are caught by (e.g., nodes 35 and 36, 38 and 39).

Performance in Classical Analysis Tasks
The capability of to map nodes with similar local structures to near vectors can be leveraged for node clustering when the clusters of nodes are more related to their local structures than to other features. To verify this, we conduct experiments on three air-traffic networks: Brazilian air-traffic network 2 , American air-network 3 and European air-traffic network 4 , which are unweighted, undirected networks with nodes corresponding to airports, edges indicating the existence of commercial flights and the label on each node marking the airport's level of activity (measured in fights and people, totally 4 levels are used). Table 2 reports the statistics of each dataset (see [7] and its GitHub link 5 for more details). We conduct experiments to observe the behaviors of in node classification and link prediction. All algorithms map nodes to 128-dimensional vectors. For and , we take default parameters. For , , and , we take 80 walks with length 40 for each node and 8 as Skip-Gram window in word2vec. Both optimizations in [7] are considered. GCN takes the identity matrix as its input features. We fix K = 4 (see Subsection 5.4 for changing K). The embedded vectors of each dataset are partitioned into training data and test data in different ways by setting each i ∈ [0, 10] as a random seed correspondingly. Each experiment is repeated 5 times, and the average behaviors are reported.
For node clustering, we run k-means algorithm on the embedded vectors to cluster all nodes into 4 classes. The performance is evaluated as the adjusted Rand index (ARI), which ranges in [−1, 1] with larger values indicating stronger consistencies between the clustering results and the actual clusters. Table 3 reports the experimental results. Notice that outperforms other algorithms remarkably, which verifies the facts that is stronger to map nodes with similar local structures into near vectors.
For node classification, we take labels in training data as known and classify nodes in test data by finding their k-NN ( k = 11 ) neighbors in training data. The performance is evaluated as average accuracy and average macro F 1 score. Table 4 reports the results. Notice that is  competitive to on all datasets. It is better than GCN on Brazil and competitive to GCN on other datasets. And, it always outperforms all other algorithms. These observations verify further the facts that is stronger to map nodes with similar local structures into near vectors.
For link prediction, we train a logistic regression model with the training datasets and use it to predict the edges in the corresponding test datasets. The performance is evaluated as average accuracy and average area under curve (AUC). Table 5 reports the experimental results. Notice that is competitive to all other algorithms, although the logistic regression model uses nonlinear activation function to eliminate some advantages of mapping nodes with similar local structures into near vectors.
Moreover, in all three experiments above, we notice that the second condition in Line 17 of Alg. 3 has a little bit of impacts on the performance (comparing the results of with + ). However, it affects the scalability of the algorithm greatly. On the contrary, adapting directly the ripple distance to the framework of behaves worse than in both node clustering and node classification (comparing the results of with + ). Both observations demonstrate the effectiveness of the design in Sect. 4.3.

Performance in Network Alignment
In addition to node classification, node cluster and link prediction, ripple vector can also be used in network alignment work. To show this, we use ripple vector to improve the [37](the improved algorithm denoted by RAlign). is a state-of-the-art method, which is an unsupervised network alignment method, but the accuracy of the network alignment is not worse than the supervised network alignment method. In the RAlign algorithm, we denote the alignment matrix obtained by the algorithm through node attribute training as S1 and then use the ripple vector of each node as an additional attribute of each node and use the algorithm to train the alignment matrix on the additional attributes, denoted as S2. The weighted summation of S1 and S2 is used to obtain the alignment matrix S, and finally, S is used for network alignment.  We use 4 real datasets: Allmovie-Imdb [37], Douban Online-Offline [41], ppi and blogCatalog. The Allmovie-Imdb dataset is a movie network dataset from the Rotten Tomatoes website and the Imdb website. The nodes represent movies, and the edges represent two movies with at least one co-actor. Douban Online-Offline dataset is a dataset from Chinese social networks. The nodes represent users, and the edges represent friendships. The ppi network is short for protein-protein interaction network. The nodes represent proteins, and the edges represent the interaction between two proteins. And, the blogCatalog dataset is a social relationship network, with nodes representing users and edges representing whether there is a social relationship (such as a friend relationship). Table 6 summarizes the statistics of these datasets.
On all datasets, we generate network alignment datasets for testing by randomly removing 20% of the edges and randomly modify the attributes of 15% of the nodes on a single real network to generate different subnetworks.
We compare RAlign with 4 algorithms, i.e., [37], [38], [39], [40]. Among them, the IsoRank and FINAL are supervised algorithms. For these two algorithms, we use 20% of the ground truth as the training data. The performance is evaluated as Acc 1 , Acc 5 , Acc 10 and MAP [42], where Acc q indicates whether the first q candidates have a correct matching and MAP = mean( 1 ra ) , where ra is the rank of true anchor target in the sorted list of anchor candidates. The experimental results are reported in Table 7. Table 7 tells us that (1) on the artificial synthetic dataset, the various metrics of network alignment using RAlign are much higher than other algorithms; (2) on two real datasets, Fig. 7 Impacts of the dimension of the ripple vectors on the performance of the obtained test results also meet our expectations, RAlign only performs worse than other algorithms in three tests, and RAlign ranks second in these three tests. This proves that ripple vector can capture the local structure of the network well, and this ability is also effective in the field of network alignment.

Effects of the Dimension and the Weights
Next, we will show the effects of the dimension K and the weights to help you understand how to better adjust the parameters.
Effects of K on the performance of . On all datasets in Subsection 5.2 with the same experimental settings, we change the dimension K of the ripple vectors and observe its impacts on the performance of in these analysis tasks. Figure 7 reports the average accuracies and the standard deviations. We find that the average accuracies increase a little bit slowly when the dimension changes from 2 to 4 and keep stable when it becomes larger, which is consistent with the fact that the local structures near nodes are more useful than those far from nodes in usual graph analysis tasks.
Effects of weights on the performance of . With the same settings in Subsection 5.2, we use four different groups of weights in the ripple distance and observe their impacts on these classical analysis tasks.  Figure 8 reports the results. We find that (1) accuracies in former two tasks change more dramatically than in the last one. This is because the sigmoid activation function in the selected algorithm weakens the impacts of distances between embedded vectors. (2) The impacts of the weights are dataset-specific, since any group of weights does not always outperform others both in node clustering and node classification.

Scalability
To check the scalability of , we fix K = 4 and run it on instances of the Erdös-Rényi random graph model. We take the number of nodes as 10 2 , 10 3 , 10 4 , 10 5 and 10 6 , respectively, and other parameters as in Subsection 5.2. The results are reported in Fig. 9, which indicates that scales super-linearly under curve of |V| log 2 |V| . This is consistent with our analysis.

Conclusion
This paper proposes to improve the node embedding by constructing a context graph with a new defined ripple distance over ripple vectors, whose components are the hitting times of the fully condensed neighborhoods and thus characterize the structures of the neighborhoods as pure quantities. The ripple distance satisfies the triangular inequality and is able to capture the (dis)similarities of the local neighborhood structures. The neighbors of each node in the context graph are optimized to its nearest neighbors under the ripple distance, which guarantees that the short random walks from the node in the context graph only visit its similar nodes in the original graph, which makes map similar nodes to near vectors and dissimilar nodes to far vectors. Algorithms to compute the ripple vectors and the context graph are carefully designed, which makes scale well. As a future work, we will integrate structure similarity captured by with deep neural networks like GCN and graph kernels to improve the performance of downstream graph analysis tasks. Data availability All data generated or analyzed during this study are included in this published article.
Code availability The code and the datasets of this research this study have been deposited in: https:// github. com/ hitSo ngXiao/ rippl e2vec.

Declarations
Ethics approval and consent to participate Not applicable.

Competing interests Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.