Keywords

1 Introduction

Low-dimensional vector representation of nodes in large-scale networks has been widely applied to a variety of domains, such as social media [13], molecular structure [7], and transportation [9]. Previous approaches, e.g., DeepWalk [13], LINE [16], and SDNE [20], are designed to reduce the sparse structure information to a dense latent space for node classification [13], link prediction [16], and network visualization [21]. However, the above embedding schemes were not designed for evolutionary networks. Current popular networks tend to evolve with time, e.g., the average number of friends increases from 155 in 2016 and to 338 in 2018 [8]. Ephemeral social networks, like Snapchat for short-term conversations, may disappear within weeks. However, retraining the whole embedding for each snapshot is computationally intensive for a massive network. Therefore, streaming network embedding is a desirable option to quickly update and generate new embeddings in a minimum amount of time.

Different from dynamic network embeddings [12, 21] that analyze a sequence of networks to capture the temporal patterns, streaming network embeddingFootnote 1 aims to update the network embedding from the changed part of the network to find the new embedding. Efficient streaming network embedding has the following four main challenges. 1) Multi-type change. Dynamic changes of networks with insertions and deletions of nodes and edges are usually frequent and complex. It is thus important to derive the new embedding in minimum time to timely reflect the new network status. 2) Evaluation of affected nodes. Updating the embeddings of only the nodes neighboring to the changed part ignores the ripple effect on the remaining nodes. It is crucial to identify the nodes required to update the embeddings and ensure that the nodes with similar structures share similar embeddings. 3) Transduction. When a network significantly changes, it is difficult to keep the local proximity between the changed part and the remaining part of the network. It is also important to reflect the change in the global structure. 4) Quality guarantee. For streaming embeddings based on neural networks (usually regarded as a black box), it is challenging to provide theoretical guarantees about the embedding quality.

To effectively address the above challenges, this paper proposes a new representation learning approach, named Graph Memory Refreshing (GMR). GMR first derives the new embedding of the changed part by decomposing the loss function of Skip-Gram to support multi-type changes. It carefully evaluates the ripple-effect area and ensures the correctness by proposing a globally structure-aware selecting strategy, named hierarchical addressing, to efficiently identify and update those affected nodes with beam search to avoid the overfitting problem. To effectively support streaming data, our idea is to interpret the update of embeddings as the memory networks with two controllers, a refreshing gate and percolation gate, to tailor the embeddings from the structural aspect and maintain the transduction. GMR then updates the embeddings according to the streaming information of the new network and the stored features (i.e., memory) of the current network to avoid recomputing the embedding of the whole network. Moreover, GMR aims to both preserve the global structural information and maintain the embeddings of isomorphic structures, i.e., ensuring that the nodes with similar local structures share similar embeddings. This property is essential to ensure the correctness of network analysis based on network embeddings [18]. We theoretically prove that GMR preserves the consistency of embeddings for isomorphic structures better than that of the existing approaches. The contributions of this paper are summarized as follows.

  • GMR explores streaming network embedding with quality guarantees. The hierarchical addressing, refreshing gate, and percolation gate efficiently find and update the affected nodes under multi-type changes.

  • We prove that GMR embedding preserves isomorphic structures better than the existing approaches. According to our literature review, this is the first theoretical analysis for streaming network embedding.

  • Experimental results show that GMR outperforms the baselines by at least \(10.5\%\) for link prediction and node classification with a much shorter time.

2 Related Work

Static network embedding has attracted a wide range of attention. Laplacian Eigenmaps [1] and IsoMaps [17] first constructed the adjacency matrix and then solved the matrix factorization, but the adjacency matrix was not scalable for massive networks. After Skip-Gram [11] was demonstrated to be powerful for representation learning, DeepWalk [13] and node2vec [5] employed random walks to learn network embedding, while LINE [16] and SDNE [20] were able to preserve the first-order and second-order proximity. GraphSAGE [6] and GAT [19] generated node representations in an inductive manner, by mapping and aggregating node features from the neighborhood.

In addition, a recent line of research proposed to learn the embeddings from a sequence of networks over time for finding temporal behaviors [12, 21]. However, these approaches focused on capturing the temporal changes rather than the efficiency since they recomputed the embeddings of the whole network, instead of updating only the changed part. Another line of recent research studied the dynamic embedding without retraining. However, the SVD-based approach [22] was more difficult to support large-scale networks according to [5]. Besides, [10] only supported the edge insertion and ignored edge deletion, whereas the consistency of the embeddings for globally isomorphic structures was not ensured. Compared with the above research and [3], the proposed GMR is the only one that provides a theoretical guarantee on the embedding quality (detailed later). It also more accurately preserves both the global structural information and the consistency of the embeddings.

3 Problem Formulation

In this section, we present the definitions for streaming network embeddings.

Definition 1

( Streaming Networks ). A dynamic network \(\mathcal {G}\) is a sequence of networks \(\mathcal {G} = \{G_1,\cdots , G_T \}\) over time, where \(G_t = (V_t , E_t )\) is the network snapshot at timestamp t. \(\varDelta G_t = (\varDelta V_t, \varDelta E_t )\) represents the streaming network with the changed part \(\varDelta V_t\) and \(\varDelta E_t\) as the sets of vertices and edges inserted or deleted between t and \(t+1\).

Definition 2

( Streaming Network Embeddings). Let \(z_{i,t}\) denote the streaming network embedding that preserves the structural property of \(v_i \in G_t\) at timestamp t. The streaming network embeddings are derived by \(\varPhi ^s= (\phi ^s_1, \cdots ,\phi ^s_{t+1},\cdots , \phi ^s_T)\), where \(\phi ^s_{t+1}\) updates the node embedding \(z_{i,t+1}\) at timestamp \(t+1\) according to \(\mathbf{z} _t\) and \(\varDelta G_{t}\), i.e., \(z_{i,t+1}=\phi ^s_{t+1}(\mathbf{z} _t,\varDelta G_{t})\), where \(\mathbf{z} _t=\{z_{i,t}| \forall v_i\in V_t\}\).

In other words, the inputs of the streaming network function are the embedding in the current time and the changed part of the network. In contrast, for [12, 21], given a dynamic network \(\mathcal {G}\), the embedding is derived by a sequence of functions \(\varPhi = (\phi _1, \cdots ,\phi _{t+1},\cdots , \phi _T)\), where \(\phi _{t+1}\) maps the node \(v_i\) to the d-dimensional embedding \(z_{i,t+1}\) at timestamp \(t+1\), i.e., \(z_{i,t+1} = \phi _{t+1}(v_i,G_{t+1})\). Therefore, the inputs are the whole networks in the current and next time. In the following, we present the problem studied in this paper.

Definition 3

( Quality-aware Multi-type Streaming Network Embeddings ). Given a streaming network with \(\varDelta V_t\) and \(\varDelta E_t\) as the sets of the vertices and edges inserted or deleted between t and \(t+1\), the goal is to find the streaming network embedding and derive the corresponding embedding quality to ensure that the nodes with similar structures share similar embeddings.

Later in Sect. 5, we formally present and theoretically analyze the quality of the embedding with a new metric, named isomorphic retaining score. Moreover, we prove that the proposed GMR better preserves the structures than other state-of-the-art methods in Theorems 1.

4 Graph Memory Refreshing

In this section, we propose Graph Memory Refreshing (GMR) to support multi-type embedding updates, to identify the affected nodes required to update the embeddings by hierarchical addressing, and to ensure that the nodes with similar structures share similar embeddings. To effectively support streaming data, we leverage the controllers (refreshing and percolation gates) of memory networks [4] to refresh the memory (update the embedding) according to the current state (the current embedding) and new input (streaming network).

4.1 Multi-type Embedding Updating

For each node \(v_i\), the Skip-Gram model predicts the context nodes \(v_j \in N(v_i)\) and maximizes the log probability,

$$\begin{aligned} \sum _{v_i \in V} \sum _{v_j \in N(v_i)} \log p(v_j|v_i). \end{aligned}$$
(4.1)

However, it is computationally intensive to derive the above probabilities for all nodes. Therefore, the probabilities are approximated by negative sampling [11],

$$\begin{aligned} \sum _{(v_i,v_j) \in E} \sigma (z_i^T z_j) + \sum _{v_i \in V} \mathbbm {E}_{v_j \sim P_{N}(v_i)} [\sigma (-z_i^T z_j)], \end{aligned}$$
(4.2)

where \(\sigma (x) = 1/(1+e^{-x})\) is the sigmoid function, \(z_i\) and \(z_j\) are respectively the embedding vectors of \(v_i\) and \(v_j\), and \(P_{N}(v_i)\) is the noise distribution for negative sampling. The two terms respectively model the observed neighborhoods and the negative samples (i.e., node pairs without an edge) drawn from distribution \(P_{N}(v_i)\). However, Eq. (4.2) focuses on only the edge insertion. To support the edge deletion, the second part in Eq. (4.2) is revised to consider unpaired negative samples and the deletion as follows,

$$\begin{aligned} \sum _{(v_i,v_j) \in E} \sigma (z_i^T z_j) + \sum _{v_i \in V} \mathbbm {E}_{v_j \sim P_{N}(v_i)} [\sigma (-z_i^T z_j)] +\alpha \sum _{(v_i,v_j) \in D} \sigma (-z_i^T z_j), \end{aligned}$$
(4.3)

where D is the set of deleted edges, and \(\alpha \) is required to be set greater than 1 because the samples from D usually provide more information than the unpaired negative samples \(P(v_i)\).Footnote 2 Note that node deletion is handled by removing all incident edges of a node, while adding a node with new edges is regarded as the edge insertion.Footnote 3

Fig. 1.
figure 1

Example of hierarchical addressing.

4.2 Hierarchical Addressing

For streaming network embedding, previous computationally intensive approaches [4] find the embeddings of all nodes by global addressing. A more efficient way is updating only the neighboring nodes of the changed part with local addressing [10]. However, the ripple-effect area usually has an arbitrary shape (i.e., including not only the neighboring nodes). Therefore, instead of extracting the neighboring nodes with heuristics, hierarchical addressing systematically transforms the original network into a search tree that is aware of the global structure for the efficient identification of the affected nodes to update their embeddings.

Hierarchical addressing has the following advantages: 1) Efficient. It can be regarded as a series of binary classifications (on a tree), whereas global addressing and local addressing belong to multi-class classification (on the candidate list). Therefore, the time complexity to consider each node in \(\varDelta V_t\) is reduced from \(O(|V_t|)\) (i.e., pairwise comparison) to \(O(k\log (|V_t|))\), where k is the number of search beams (explained later). 2) Topology-aware. It carefully examines the graph structure to evaluate the proximity and maintain the isomorphic structure, i.e., ensuring that the nodes with similar structures share similar embeddings. This property is essential for the correctness of network analysis with network embeddings [18].

Specifically, hierarchical addressing first exploits graph coarsening to build an addressing tree for the efficient search of the affected nodes. Graph coarsening includes both first-hop and second-hop collapsing: first-hop collapsing preserves the first-order proximity by merging two adjacent nodes into a supernode; second-hop collapsing aggregates the nodes with a common neighbor into a supernode, where the embedding of the supernode is averaged from its child nodes [2]. Second-hop collapsing is prioritized because it can effectively compress the network into a smaller tree.

The network is accordingly transformed into an addressing tree with each node \(v \in V_t\) as a leaf node. Afterward, for each node \(v_i \in \varDelta V_t\), we search for the node \(v_j \in V_{t}\) sharing the highest similarity with \(v_i\) as the first affected node for \(v_i\) by comparing their cosine similarity [4] along the addressing tree. For each node in the tree, if the left child node shares a greater similarity to \(v_i\), the search continues on the left subtree; otherwise, it searches the right subtree. The similarity search ends when it reaches the leaf node with the highest similarity to \(v_i\), and any node in \(V_t\) (not only the neighbors of \(v_i\)) is thereby allowed to be extracted. In other words, hierarchical addressing enables GMR to extract the affected nodes located in different locations of the network (not necessary to be close to \(v_i\)), whereas previous approaches [3, 10, 21] update only the neighboring nodes of \(v_i\). Afterward, hierarchical addressing extracts the top-1 result for all nodes in \(\varDelta V_t\) as the initially affected nodes (more will be included later), where the nodes with the similarity smaller than a threshold h are filtered. To prevent over-fitting in a local minimum, hierarchical addressing can also extract the top-k results at each iteration with the beam search.Footnote 4

Figure 1 presents an example of hierarchical addressing with the dimension of embeddings as 2. At timestamp \(t=1\) (Fig. 1(a)), we construct the addressing tree by first merging nodes \(v_1\) and \(v_2\) into supernode \(u_{12}\) through second-hop collapsing. The embedding of \(u_{12}\) is \(0.5\cdot (0.4,0.4) + 0.5\cdot (0.2,0.8)=(0.3, 0.6)\). Afterward, \(v_3\) merges \(u_{12}\) into \(u_{123}\) through first-hop collapsing, and \(u_{123}\) is the root of the tree. At \(t=2\) (Fig. 1(b)), if a new node \(v_4\) is linked to \(v_1\) with the embedding as (0.3, 0.2), we identify the affected nodes with bream search (\(k=2\)) and start from the root \(u_{123}\). First, we insert \(v_3\) and \(u_{12}\) into the search queue with the size as 2 since \(k=2\), to compare the similarity of \(v_4\) with that of \(v_3\) and \(u_{12}\). Both \(u_{12}\) and \(v_3\) are then popped out from the queue because \(v_1\) and \(v_2\) have higher similarity i.e., the top-2 results (0.78 and 0.98), compared with 0.73 for \(v_3\).

4.3 Refresh and Percolate

After identifying the nodes required to update the embeddings by hierarchical addressing, a simple approach is to update the embeddings of those affected nodes with a constant shift [6, 20]. However, a streaming network with a topology change on only a subset of nodes usually leads to different shifts for the nodes in distinct locations. Moreover, updating only the nodes extracted from hierarchical addressing is insufficient to ensure consistency of embeddings for the nodes with similar structures when the embeddings are tailored independently.

To effectively support streaming data, inspired by the gating mechanism in GRU [4], we parameterize the update of the embedding according to the current embedding and incoming streaming network. Specifically, GMR decomposes the update procedure into two controller gates: a refreshing gate \(g_r\) and percolation gate \(g_p\). For each node \(v_j\) selected in hierarchical addressing for each \(v_i \in \varDelta V_t\), the refreshing gate first updates the embedding of \(v_j\) according the new embedding of \(v_i\), and the percolation gate then updates the embedding for every neighbor \(v_k\) of \(v_j\) from the new embedding of \(v_j\). The refreshing gate quantifies the embedding update for \(v_j\) from an incoming stream (i.e., one-to-one update), while the percolation gate transduces the embedding of \(v_j\) to its neighborhoods (i.e., one-to-many update) to preserve better local structure. The two gates are the cornerstones to maintain isomorphic structure, as proved later in the Theorem 1.

To update the embeddings of \(v_j\), i.e., updating \(z_{j,t+1}\) from \(z_{j,t}\), we first define a shared function \(a_r\) to find the refreshing coefficient \(\rho _r\), which represents the correlation between the embedding of \(v_j\) and the new embedding of \(v_i\), i.e., \(\rho _r = a_r(z_{i,t+1}, z_{j,t})\). The refreshing gate selects the correlation function [19] as the shared function \(a_r\) to extract the residual relation [19] between the two embeddings, instead of directly adopting a constant shift as was done in previous work. Here \(\varvec{a_r} \in \mathbbm {R}^{2d}\) is a shift projection, and \(\rho _r\) is derived by \(\varvec{a_r}^T[z_{i,t+1} || z_{j,t}]\), where || is the vector concatenation operation. After this, we regulate refreshing coefficient \(\rho _r\) into [0, 1] by a sigmoid function \(g_r = \sigma (\rho _r)\) to provide a non-linear transformation. Therefore, \(g_r\) quantifies the extent that \(z_{i,t+1}\) affects \(z_{j,t}\),

$$\begin{aligned} z_{j,t+1} \leftarrow g_r z_{i,t+1} + (1 - g_r) z_{j,t}. \end{aligned}$$
(4.4)

Thereafter, the percolation gate revises the embedding of the neighbor nodes of \(v_j\) to ensure the consistency of the embeddings for the nodes with similar structures. The percolation gate learns another sharable vector \(\varvec{a_p} \in \mathbbm {R}^{2d}\) and finds the percolation coefficient \(\rho _p = \varvec{a_p}^T [z_{j,t+1} || z_{k,t}]\), to quantify the extent that \(v_j\) affects \(v_k\). Similarly, we regulate \(\rho _p\) by \(g_p = \sigma (\rho _p)\) to update \(z_{k,t}\) as follows,

$$\begin{aligned} z_{k,t+1} \leftarrow g_p z_{j,t+1} + (1-g_p) z_{k,t}. \end{aligned}$$
(4.5)

Therefore, when the refreshing and percolation gates are 0, the streaming network is ignored. In contrast, when both gates become 1, the previous snapshot embedding is dropped accordingly. In summary, the refreshing and percolation gates act as decision makers to learn the impact of the streaming network on different nodes. For the percolation gate, when node \(v_j\) is updated, the percolation gate tailors the embedding of each \(v_k\in N_1(v_j)\),Footnote 5 by evaluating the similarity of \(v_j\) and \(v_k\) according to the embeddings \(z_k\) and \(z_j\). If \(v_j\) and \(v_k\) share many common neighbors, the percolation value of \((v_j, v_k)\) will increase to draw \(z_k\) and \(z_j\) closer to each other. The idea is similar for the refreshing gate. Note that \(\varvec{a_r}\) and \(\varvec{a_p}\) are both differentiable and can be trained in an unsupervised setting by maximize the objective Eq. (4.3). The unsupervised loss can also be replaced or augmented by a task-oriented objective (e.g., cross-entropy loss) when labels are provided. We alternatively update the embeddings (i.e., \(z_{i,t}\) and \(z_{j,t}\)) and the correlation parameters (i.e., \(\varvec{a_r}\) and \(\varvec{a_p}\)) to achieve better convergence.

Figure 2 illustrates an example of updating the node \(v_3\). After the embedding of \(v_3\) updated from (0.8, 0.1) to (0.9, 0.1), GMR uses the percolation gate to transduce the embedding to the neighborhood nodes (i.e., \(v_1\), \(v_2\), and \(v_4\)) to preserve the local structure. Since \(v_1\) shares more common neighbors (\(v_4\)) with \(v_3\) than \(v_2\) (none), the values of percolation gate for \(v_1\) and \(v_2\) are 0.8 and 0.5, respectively. The embeddings of node \(v_1\) and \(v_2\) become \((0.76, 0.16) = 0.2\cdot (0.4, 0.4) + 0.8 \cdot (0.9, 0.1)\) and \((0.55, 0.45) = 0.5 \cdot (0.2, 0.8) + 0.5 \cdot (0.9, 0.1)\) through the percolation gate from \(v_3\), respectively. Therefore, relative distance between \(\Vert z_3-z_2\Vert \) and \(\Vert z_3-z_1\Vert \) can be maintained.

Fig. 2.
figure 2

Example of percolation gate.

5 Theoretical Analysis

The quality of network embedding can be empirically evaluated from the experiment of network analysis, e.g., link prediction [16] and node classification [13], since the network embedding algorithm is unsupervised learning without knowing the ground truth. In contrast, when the network analysis task is unknown a priori, it is important to theoretically analyze the quality of network embedding. To achieve this goal, we first define the isomorphic pairs and prove that the embeddings of isomorphic pairs are the same in GMR. This property has been regarded as a very important criterion to evaluate the quality of network embedding [18], because the nodes with similar structures are necessary to share similar embeddings. Moreover, the experimental results in Sect. 6 manifest that a higher quality leads to better performance on task-oriented metrics.

Definition 4

( Isomorphic Pair). Any two different nodes \(v_i\) and \(v_j\) form an isomorphic pair if the sets of their first-hop neighbors \(N_1(.)\) are the same.

Lemma 1

If \((v_i, v_j)\) and \((v_j, v_k)\) are both isomorphic pairs, \((v_i, v_k)\) is also an isomorphic pair.

Proof:

According to Definition 4, \((v_i, v_j)\) and \((v_j, v_k)\) are both isomorphic pairs, indicating that \(N_1(v_i) = N_1(v_j)\) and \(N_1(v_j) = N_1(v_k)\). Therefore, \(N_1(v_i)\) is equal to \(N_1(v_j)\), and thus \((v_i, v_k)\) is also an isomorphic pair.   \(\square \)

Lemma 2

The embeddings \(z_i\) and \(z_j\) are the same after GMR converges if and only if (\(v_i\), \(v_j\)) is an isomorphic pair.

Proof:

We first prove the sufficient condition. If \((v_i, v_j)\) is an isomorphic pair with \(z_i \ne z_j\), the probability of \(v_i\) to predict the context nodes is not to equal to that of \(v_j\) (Eq. (4.1)). Therefore, there exists a better solution that makes \(z_i\) and \(z_j\) be equal, contradicting the condition that the algorithm has converged. For the necessary condition, if \(z_i=z_j\) but \((v_i, v_j)\) is not an isomorphic pair, since the probabilities are equal and the algorithm has converged, \(N(v_i)\) should be identical to \(N(v_j)\) for Eq. (4.1), contradicting that \((v_i, v_j)\) is not an isomorphic pair. The lemma follows.    \(\square \)

As proved in [14], the network embedding algorithms can be unified into the factorization of the affinity matrix. Therefore, nodes with the same first-hop neighborhood have the same embedding when the decomposition ends.

Based on Lemma 2, we define the isomorphic retaining score as follows.

Definition 5

( Isomorphic Retaining Score). The isomorphic retaining score, denoted as \(S_t\), is the summation of the cosine similarity over every isomorphic pair in \(G_t\), \(S_t \in [-1,1]\). Specifically,

$$\begin{aligned} S_{t} = \frac{1}{|\xi _t|}\sum _{(v_i, v_j) \in \xi _t} s_{ij,t}, \end{aligned}$$
(5.1)

where \(s_{ij,t}\) is the cosine similarity between \(z_{i,t}\) and \(z_{j,t}\), and \(\xi _t\) is the set of isomorphic pairs in \(G_t\). In other words, the embeddings of any two nodes \(v_i\) and \(v_j\) with the same structure are more consistent to each other if \(s_{ij,t}\) is close to 1 [18]. Experiment results in the next section show that higher isomorphic retaining scores lead to better performance of 1) the AUC score for link prediction and 2) the Macro-F1 score for node classification.

The following theorem proves that GMR retains the isomorphic structure better than other Skip-Gram-based approaches, e.g., [5, 13, 16], under edge insertion. Afterward, the time complexity analysis is presented.

Theorem 1

GMR outperforms other Skip-Gram-based models regarding the isomorphic retaining score under edge insertion after each update by gradient descent.

Proof:

Due to the space constraint, Theorem 1 is proved in the online version.Footnote 6

   \(\square \)

Time Complexity. In GMR, the initialization of the addressing tree involves \(O(|V_1|)\) time. For each t, GMR first updates the embeddings of \(\varDelta V_t\) in \(O(|\varDelta V_t|\log (|\varDelta V_t|))\) time. After this, hierarchical addressing takes \(O(k|\varDelta V_t| \log (|V_t|))\) time to identify the affected nodes. Notice that it requires \(O(|\varDelta V_t|\log (|V_t|)\) time to update the addressing tree. To update the affected nodes, the refreshing and percolation respectively involve O(1) and \(O(d_{max})\) time for one affected node, where \(d_{max}\) is the maximum node degree of the network. Therefore, updating all the affected nodes requires \(O(k d_{max} |\varDelta V_t|)\). Therefore, the overall time complexity of GMR is \(O(k d_{max}|\varDelta V_t| + k |\varDelta V_t|\log (|V_t|))\), while retraining the whole network requires \(O(|V_t| \log (|V_t|))\) time at each timestamp. Since k is a small constant, \(d_{max}\ll |V_t|\), and \(|\varDelta V_t|\ll |V_t|\), GMR is faster than retraining.

6 Experiments

To evaluate the effectiveness and efficiency of GMR, we compare GMR with the state-of-the-art methods on two tasks, i.e., link prediction and node classification. For the baselines, we compare GMR with 1) Full, which updates the whole network with DeepWalk [13]; 2) change [3], which only takes the changed part as the samples with DeepWalk;Footnote 7 3) GraphSAGE [6], which derives the embeddings from graph inductive learning; 4) SDNE [20], which extends the auto-encoder model to generate the embeddings of new nodes from the embeddings of neighbors; 5) CTDNE [12], which performs the biased random walk on the dynamic network;Footnote 8 and 6) DNE [3], which updates only one affected node; 7) SLA [10], which handles only node/edge insertion; 8) DHPE [22], which is an SVD method based on matrix perturbation theory. The default \(\alpha \), h, k, d, batch size, and learning rate are 1, 0.8, 3, 64, 16, and 0.001, respectively. Stochastic gradient descent (SGD) with Adagrad is adopted to optimize the loss function.

6.1 Link Prediction

For link prediction, three real datasets [15] for streaming networks are evaluated: Facebook (63,731 nodes, 1,269,502 edges, and 736,675 timestamps), Yahoo (100,001 nodes, 3,179,718 edges, and 1,498,868 timestamps), and Epinions (131,828 nodes, 841,372 edges, and 939 timestamps).Footnote 9 The concatenated embedding \([z_i || z_j]\) of pair \((v_i,v_j)\) is employed as the feature to predict the link by logistic regression.Footnote 10

Table 1. Experiment results of link prediction.

Table 1 reports the AUC [5], isomorphic retaining score S in Eq. (5.1), and running time of different methods.Footnote 11 The results show that the proposed GMR achieves the best AUC among all streaming network embedding algorithms. Compared with other state-of-the-art baselines, GMR outperforms other three baselines in terms of AUC by at least \(17.1\%\), \(15.7\%\) and \(11.3 \%\) on Facebook, Yahoo and Epinions, respectively. Besides, GMR is close to that of Full(\(1.7\%\) less on Facebook, \(0.6\%\) more on Yahoo and \(2.2\%\) less on Epinions), but the running time is only \(4.7\%\). Moreover, GraphSAGE has relatively weak performance since it cannot preserve the structural information without node features. The running time of SDNE is \(2.1\times \) greater than that of GMR due to the processing of the deep structure, while the AUC of SDNE is at least \(12.5\%\) less than that of GMR on all datasets.

Compared to other streaming network embedding methods (e.g., DNE, SLA, and DHPE), GMR achieves at least \(10.8\%\) of improvement because the embeddings of other methods are updated without considering the global topology. In contrast, GMR selects the affected nodes by globally structure-aware hierarchical addressing, and the selected nodes are not restricted to the nearby nodes. Furthermore, GMR outperforms baselines regarding the isomorphic retraining score since it percolates the embeddings to preserve the structural information. Note that the isomorphic retaining score S is highly related to the AUC with a correlation coefficient of 0.92, demonstrating that it is indeed crucial to ensure the embedding consistency for the nodes with similar structures.

Table 2. Experiment results of node classification.

6.2 Node Classification

For node classification, we compare different approaches on BlogCatalog [16] (10,132 nodes, 333,983 edges, and 39 classes), Wiki [5] (2,405 nodes, 17,981 edges, and 19 classes), and DBLP [22] (101,253 nodes, 223,810 edges, 48 timestamps, and 4 classes). DBLP is a real streaming network by extracting the paper citation network of four research areas from 1970 to 2017. BlogCatalog and Wiki are adopted in previous research [3] to generate the streaming networks.Footnote 12 The learned embeddings are employed to classify the nodes according to the labels. Cross-entropy is adopted in the loss function for classification with logistic regression. We randomly sample \(20\%\) of labels for training and \(80 \%\) of labels for testing, and the average results from 50 runs are reported.Footnote 13 Table 2 demonstrates that GMR outperforms Change by \(27.1\% \) regarding Macro-F1 [13], and it is close to Full but with \( 20.7 \times \) speed-up. The Macro-F1 scores of GraphSAGE and SDNE are at least \(40\%\) worse than that of GMR, indicating that GraphSAGE and SDNE cannot adequately handle multi-type changes in dynamic networks. Moreover, GMR achieves better improvement on BlogCatalog than on DBLP, because the density (i.e., the average degree) of BlogCatalog is larger, enabling hierarchical addressing of GMR to exploit more structural information for updating multiple nodes. For DBLP, GMR also achieves the performance close to Full.

It is worth noting that the isomorphic retaining score S is also positively related to Macro-F1. We further investigate the percentages of isomorphic pairs with the same label on different datasets. The results manifest that \(88\%\), \(92\%\) and \(97\%\) of isomorphic pairs share the same labels on BlogCatalog, Wiki, and DBLP, respectively. Therefore, it is crucial to maintain the consistency between isomorphic pairs since similar embeddings of isomorphic pairs are inclined to be classified with the same labels.

7 Conclusion

In this paper, we propose GMR for streaming network embeddings featuring the hierarchical addressing, refreshing gate, and percolation gate to preserve the structural information and consistency. We also prove that the embeddings generated by GMR are more consistent than the current network embedding schemes under insertion. The experiment results demonstrate that GMR outperforms the state-of-the-art methods in link prediction and node classification. Moreover, multi-type updates with the beam search improve GMR in both task-oriented scores and the isomorphic retaining score. Our future work will extend GMR to support multi-relations in knowledge graphs.