Efficient maintenance of highway cover labelling for distance queries on large dynamic graphs

Graphs in real-world applications are typically dynamic which undergo rapid changes in their topological structure over time by either adding or deleting edges or vertices. However, it is challenging to design algorithms capable of supporting updates efficiently on dynamic graphs. In this article, we devise a parallel fully dynamic labelling method to reflect rapid changes on graphs when answering shortest-path distance queries, a fundamental problem in graph theory. At its core, our solution accelerates query processing through a fully dynamic distance labelling of a limited size, which provides a good approximation to bound online searches on dynamic graphs. Our parallel fully dynamic labelling method leverages two sources of efficiency gains: landmark parallelism and anchor parallelism. Furthermore, it can handle both incremental and decremental updates efficiently using a unified search approach and a bounded repairing inference mechanism. We theoretically analyze the correctness, labelling minimality, and time complexity of our method, and also conduct extensive experiments to empirically verify its efficiency and scalability on 10 real-world large networks.


Introduction
Given a graph G, a distance query on G is to answer the distance between any two vertices in the graph G.As a fundamental primitive, distance queries are widely applied in modern network-oriented systems, such as communication networks, context-aware search Figure 1 Performance overview of our proposed method PARDHL and the state-of-the-art methods DECM [8], DECPLL [9], DECFD [10] and DECHL [11], where the update time is calculated by processing 1,000 edge deletions over complex networks with sizes varying from 20 millions of edges to 3 billions of edges in web graphs [1,2], social network analysis [3,4], route-planning in road networks [5,6], management of resources in computer networks [7], and so on.
Traditionally, a distance query can be answered using Dijkstra's algorithm [12] on nonnegative weighted graphs or breadth-first search (BFS) algorithm on unweighted graphs.However, these algorithms may end up traversing the entire network when two vertices are far apart from each other, thus becoming too slow for applications that require low latency.To speed up query response time, a plethora of methods have been proposed in the past years [5,10,[13][14][15][16][17][18][19][20][21].Among these methods, precomputing a distance labelling is typically considered as a promising solution.However, most of existing distance labelling methods were designed for static networks.
Networks in the real-world are typically dynamic which undergo rapid changes, i.e. edge additions/deletions in their topological structure over time.For example, people become friend/unfriend or follow/unfollow others in social networks, web links become valid/invalid in web graphs, and communication networks may have faults being detected and recovered [7,22,23].It is imperative to design dynamic algorithms that can efficiently update distance labelling to reflect graph changes for fast and accurate responses to distance queries.So far, only limited attempts have been made on maintaining a distance labelling for dynamic graphs [8,10,[24][25][26].Among them, the methods considering incremental updates (i.e.edge additions) [10,24,25] are relatively efficient, e.g., an incremental update can be processed on graphs with billions of vertices in less than one second [25].Unfortunately, the methods considering decremental updates still suffer from long update time of a distance labelling [8][9][10].As shown in Figure 1, the average update time of one edge deletion on graphs of size around 20 million edges is 135 seconds for DECM [8] and 19 seconds for DECPLL [9], which are very inefficient.Moreover, these methods all consider the single-update setting, i.e., performing one single edge insertion or edge deletion at a time.Unlike existing works, in this article, we aim to explore the following research questions: Q1: Is it possible to design a dynamic labelling algorithm which can efficiently reflect both incremental and decremental updates on graphs for fast and accurate distance computation?
Figure 2 An illustration of our parallel framework, which dynamically maintains highway cover labelling for graphs undergoing rapid updates Q2: Can such a dynamic labelling algorithm handle multiple updates in parallel in order to offer performance gains over existing dynamic labelling algorithms in the single-update setting?
To answer these research questions, we propose a parallel solution for answering distance queries on dynamic graphs undergoing rapid changes in their topological structure.Our method is efficient both in time and space, and can scale to large graphs with billions of edges.There are several design considerations.First and foremost, we combine offline labelling and online searching in our proposed solution so as to leverage the advantages from both sides -accelerating query processing through a distance labelling that has a limited size but provides a good approximation to bound online searches.Then, we proceed to design a fully dynamic distance labelling algorithm, which dynamizes a distance labelling to efficiently reflect updates on the underlying graph.This algorithm consists of three stages: (1) Finding affected vertices -to precisely identify vertices that are affected by updates; (2) Finding boundary vertices -to bound the traversal space that is needed for repairing; (3) Repairing affected vertices -to change the labels of affected vertices via an inference process based on their new distances.Figure 2 illustrates the high-level overview of our solution.At its core, we abide by the following principles: Parallel searches: We exploit interactions between updates and design a novel parallel approach to find affected vertices, which involves both landmark parallelism and anchor parallelism.Bounded space: We bound search spaces for updates to only small portions of graphs that are affected, which are achieved by identifying boundary vertices with respect to updates.

Repair inference:
We develop a repairing approach that can efficiently infer the new distances of affected vertices to repair their labels through a level-by-level computation from boundary vertices.

Contributions
To summarise, the main contributions of this work are as follows: (a) We study the problem of answering exact distance queries on dynamic graphs by proposing an efficient solution comprising of a fully dynamic distance labelling and online sparsified searches.(b) Our fully dynamic labelling algorithm can uniformly handle both incremental and decremental updates efficiently using a novel parallel search strategy and a bounded repairing inference mechanism.(c) We prove the correctness of our method and show that it can preserve the minimality of distance labelling, as well as provide the complexity analysis.(d) We have evaluated our method on 10 real-world large networks to verify their efficiency, scalability and robustness.The results show that our algorithms significantly outperform the state-of-the-art methods.It can answer queries in the order of milliseconds and maintain very small labelling sizes, even for large graphs with billions of edges.To the best of our knowledge, this is the first study to develop a parallel fully dynamic labelling method for distance queries.

Outline
The rest of the article is organized as follows.In Section 2, we review existing works that are related to our work.In Section 3, we present preliminary notations and definitions used in this article along with problem formulation.In Section 4, we discuss our parallel fully dynamic framework.In Section 5, we present our proposed method that can maintain distance labelling in parallel to efficiently reflect graph changes for distance querying.A formal proof for showing that our method is correct and preserve the property of minimality of highway cover labelling, along with complexity analysis is detailed in Section 6.Then, in Section 7, we discuss our experimental results.Finally, we conclude the article and outline future research directions in Section 9.

Related work
We review previous works that have attempted to address the shortest-path distance query answering problem on dynamic graphs.In [13], Akiba et al. proposed the pruned landmark labelling (PLL) method which precomputes a 2-hop cover distance labelling [27] by performing a pruned breadth-first search (BFS) from every vertex.The idea is to prune vertices whose distance information can be obtained from the partially available 2-hop cover distance labelling constructed during previous BFSs.This work helps to lower construction cost and labelling size.Later on, Li et al. [20] developed a parallel method called parallel shortest distance labelling (PSL) for constructing PLL in parallel in order to increase scalability.These labelling-based methods serve as the state-of-the-art for answering exact distance queries.However, they are designed for static graphs whose topological structure remains unchanged over time.
Some early works on extending 2-hop cover distance labelling from static graphs to dynamic graphs have considered either incremental updates (i.e., edge insertions) or decremental updates (i.e., edge deletions).In [24], Akiba et al. studied the problem of maintaining a 2-hop cover distance labelling on graphs undergoing incremental updates.This work claims fast update time at the cost of increased labelling size as they do not remove outdated and redundant distance entries from the labels of affected vertices which may significantly affect query performance over time.They consider that removing outdated and redundant distance entries would be costly and may make the update operation slower.Another recently proposed method [25] studied incremental updates on graphs which solves the problem of eliminating outdated and redundant entries, thereby guaranteeing minimality of labelling.Note that, in the case of decremental updates, outdated and redundant distance entries must be removed; otherwise distance labelling cannot be correctly updated.To tackle the problem of maintaining 2-hop cover distance labelling for decremental updates, Qin et al. [8] proposed a method using the well-ordering property of 2-hop cover distance labelling.However, this method is inefficient to perform updates for even moderately large graphs and can only scale to graphs that have a few millions of vertices and edges.It has been shown in the experiments that the average update time of 1000 deletions on a network of size 19 millions edges is 135 seconds which is too high to be used in real-world scenarios.
Several methods [9-11, 28, 29] have been proposed to consider both incremental and decremental updates on graphs.Hayashi et al. [10] proposed a fully dynamic (FD) method to perform distance queries on dynamic graphs.The key idea is to select a small set of landmarks R and precompute shortest-path trees (SPTs) w.r.t. each r ∈ R.Then, a distance query traverses a sparsified graph under an upper distance bound being computed via the SPTs.To reflect graph changes, they maintain SPTs to ensure correct distances.However, this method cannot perform decremental updates efficiently, e.g., as shown in [10], it takes about 6 seconds on average to reflect one graph change (i.e., an edge deletion) on Twitter network.D'angelo et al. [9] maintains a 2-hop cover distance labelling for dynamic graphs.They first developed a method for reflecting decremental updates on a graph, which works in three phases, 1) identify the affected vertices, 2) remove the outdated entries, and 3) restore the 2-hop cover property.Then, they combined the work proposed in [24] for incremental updates with their method for decremental updates to form a fully dynamic algorithm.However, this fully dynamic algorithm has a very high complexity of performing decremental updates, e.g., on a network with 16 millions of edges, it takes 19 seconds, which would be impractical for many real-world applications.Another method by D'Emidio et al. [29] claims an improvement over the method proposed in [9] for decremental updates.However, this method is limited to graphs with few millions of nodes when updating labels is required to be in the order of seconds.A recent method [28] has also attempted to apply a parallel method (PSL) for constructing PLL [20] on dynamic graphs for fast distance computation which unfortunately can only accommodate million-scale graphs.Very recently, Farhan et al. [11] proposed a fully dynamic method that leverages the advantages of highway cover distance labelling proposed in [18] for fast processing of dynamic changes on graphs.Nonetheless, their method, similar to previous methods, processes graph updates in the single-update setting which may cause repeated computation w.r.t.multiple updates and thus is slow as shown in our experiments.In this article, we have adopted batch-update setting to reflect graph changes much more efficiently by exploiting different interaction between updates in a batch.

Problem formulation
Let G = (V , E) be a graph where V is a set of vertices and E is a set of edges.The distance between two vertices s and t in G, denoted as d G (s, t), is the length of a shortestpath between s and t.If there does not exist any path between two vertices s and t, then we consider d G (s, t) = ∞.We use P G (s, t) to denote the set of all shortest-paths between s and t in G, and N G (v) the set of neighbors of a vertex v ∈ V , i.e.N G (v) = {w ∈ V | (v, w) ∈ E}.Without loss of generality, we assume that G is undirected and unweighted and explain how this work can be extended to directed graphs and non-negative weighted graphs in Section 8.
In this work, we consider two fundamental types of updates on graphs, edge insertion and edge deletion.Given a graph G = (V , E), an edge insertion is to add an edge (a, b) into G where {a, b} ⊆ V and (a, b) / ∈ E. Conversely, an edge deletion is to delete an edge (a, b) from G where (a, b) ∈ E. We consider to perform multiple updates aggregated as a batch in parallel.Without loss of generality, we assume that there is no insertion and deletion on the same edge in a batch since these two operations cancel each other out.It is worth noting that node insertion or node deletion can be treated as a set of updates which contains only edge insertions or only edge deletions, respectively.
Let R ⊆ V be a small set of special vertices in a graph G, called landmarks.
and n ≤ |R|.The set of labels for all vertices in V , i.e., L = {L(v)} v∈V , is a distance labelling over G.The size of a distance labelling L is defined as size(L) = v∈V |L(v)|.In the literature, a distance labelling is often constructed by following the 2-hop cover property [27].For any two vertices (u, v) in a graph, the 2-hop cover property requires at least one common vertex w existing in both L(u) and L(v) and w must also be on one shortest-path between u and v. Definition 1 (2-hop cover distance labelling) A distance labelling L over a graph G = (V , E) is a 2-hop cover labeling if the following holds for any pair of vertices u, v ∈ V : ( In our work, we consider a distance labelling property based on the notion of highway, i.e., highway cover labelling [18], which has recently been shown to outperform the stateof-the-art methods in the single-update setting [11].A highway H = (R, δ H ) consists of a set R of landmarks and a distance decoding function δ

Definition 2 (Highway cover distance labelling) A highway cover labelling = (H, L) consists of a highway H and a distance labelling L satisfying that, for any
Intuitively, a highway cover labelling requires that, for every vertex v ∈ V , its label L(v) must contain a distance entry to each landmark r ∈ R unless there is another landmark occurring on a shortest-path between r and v.For a detailed explanation, please refer to [18], which has also provided some illustrative examples for highway and highway cover labelling.
In this work, we study the problem of answering distance queries on dynamic graphs that undergo rapid changes in their topological structures.Since both edge insertion and deletion are considered, we formulate the problem as a fully dynamic distance query answering problem.
Definition 3 (Fully dynamic distance query) Given a dynamic graph G that undergoes edge insertions and edge deletions, a fully dynamic distance query for any two vertices s and t in G is to compute their distance on G .

Parallel fully dynamic framework
In this section, we explore an efficient and scalable parallel fully dynamic framework that combines offline distance labelling and online searching to answer exact distance queries.Our proposed framework has two main components: (i) fully dynamic distance labelling, and (ii) sparsified searching.The key idea is to dynamically maintain a distance labelling for a graph G that undergoes updates, and then use such a fully dynamic distance labelling to guide online searches on a sparsified subgraph of G in order to answer fully dynamic distance queries efficiently.

Fully dynamic distance labelling
Let G be a graph that undergoes updates (edge insertions and deletions) and B be a sequence of updates occurring on G.We use G • B to indicate the graph that corresponds to applying updates B on G.A fully dynamic distance labelling on G • B is a highway cover distance labelling that is dynamically maintained to reflect updates B on G.That is, a fully dynamic distance labelling = (H, L) on G can be dynamically maintained to a fully dynamic distance labelling = (H , L ) on G • B for any sequence B of updates containing edge insertions and deletions.Note that the set of landmarks on G and G • B remains the same during the maintenance.
As discussed in [9], minimality is a highly desirable property to consider when designing a distance labelling over dynamic graphs.Otherwise, a distance labelling may have increasingly unnecessary entries in its labels and query performance deteriorates over time.

Definition 4 (Minimality) A fully dynamic distance labelling
Since the set of landmarks is unchanged, the highways H and H in the above definition always have the same size, i.e., only differing in distance values between landmarks.It has been shown in [18] that, given a graph and a set of landmarks on the graph, there exists a unique minimal highway cover labelling on the graph.

Fully dynamic distance querying
Given a fully dynamic distance labelling = (H, L) on a graph G, an upper bound on the distance between any pair of vertices s, t ∈ V \R in G is computed as follows: A fully dynamic distance query Q(s, t, ) on G using a fully dynamic distance labelling can be answered by conducting a bi-directional BFS search over a sparsified graph G[V \R] (i.e., removing all landmarks in R from G) under the upper bound d st such that: graph G in order to perform fully dynamic distance queries on graphs undergoing updates, particularly when such graphs are very large?"

Parallel maintenance of distance labelling
Below, we introduce a parallel fully dynamic method that can handle all updates including both edge insertions and edge deletions in parallel and reflect the effects of these updates into a distance labeling efficiently.Our proposed method, denoted as PARDHL (an abbreviation for Parallel Dynamic Highway Labelling), involves three main steps: finding affected vertices, finding boundary vertices, and repairing affected vertices.Algorithm 1 describes a high-level view of PARDHL.We discuss these three steps in detail.

Finding affected vertices
We start with defining "affected vertices" whose labels may need to be updated as a consequence of edge insertions and edge deletions.Let G = (V , E) be changed to G = (V , E ) by a sequence of updates B, i.e., G = G • B, R be a set of landmarks, and be a fully dynamic distance labelling on G.The following lemma states how affected vertices relate to a single update (edge insertion or edge deletion).Since a vertex v is affected w.r.t a landmark r iff P G (v, r) = P G (v, r) (cf.Definition 5), a "naive" way of finding all affected vertices is to conduct a BFS from each landmark r ∈ R on both graphs G and G , respectively, and then compare whether P G (v, r) = P G (v, r) For an edge insertion, this anchor distance indicates the new distance of the anchor vertex to the landmark r on the changed graph G • (a, b); for an edge deletion, this anchor distance indicates the old distance of the anchor vertex to the landmark r on the original graph G.In our work, regardless of edge insertions or edge deletions, we use anchor distances to devise a unified algorithm that can find vertices affected by updates in parallel.Concretely, given a sequence of updates B, there exists a set of anchor vertices corresponding to the updates in B. We can then perform parallel searches from multiple or all anchor vertices in this set to find affected vertices efficiently, for which we call anchor parallelism.Example 1 Consider the example graph in Figure 3(a) which has two landmarks 5 and 12.After applying a sequence of updates B = (4, 8), (10, 13) containing a deleted edge (4, 8) and an inserted edge (10,13), the set of all vertices affected by B w.r.t. the landmark 5 is {8, 12, 13, 14, 15, 16, 17, 18} because their sets of shortest-path(s) to landmark 5 have The following lemma states that we can start with an anchor vertex to traverse its local neighbourhood and then identify affected vertices based on their distances to the anchor vertex on G , the anchor distance on G and the distance labelling information on G.We can also easily see that this lemma allows us to parallelise searches from anchor vertices.
Proof Given a vertex v in G which satisfies the condition in (5), we know that there must exist at least one new shortest path between r and v that goes through an inserted edge (u , u) in G , or at least one old shortest path between r and u that goes through a deleted edge (u, u ).By Lemma 1, we thus know that v is affected.
Algorithm 2 shows the pseudo-code of our algorithm that finds affected vertices.Given a sequence of updates B on G and a highway cover distance labelling on G, we conduct a partial BFS search for each update (a, b) ∈ B w.r.t. a landmark r in parallel.Specifically, for each update (a, b) ∈ B, we start from the anchor vertex b with its new depth π = Q(r, a, ) + 1 (Lines 5-6).Then, for every (v, π ) ∈ Q, we examine the neighbors of v and enqueue the ones into Q that are affected based on Lemma 1 with their new depths π + 1 (Lines 9-10) and add v to A(r, B) as an affected vertex (Line 9).This process continues until Q is empty.

Finding boundary vertices
Now we define a special kind of affected vertices, called boundary vertices, which allows us to bound the update space so as to efficiently repair the labels of affected vertices.These boundary vertices lie at the boundary of affected and unaffected vertices.d G (r, v) must be upper-bounded by d G (r, w) + 1.It is worth noting that, if v is disconnected from the landmark r after applying the updates B, then the upper bound of v is undefined.In Algorithm 3, we treat d G (r, v) = ∞ (Line 2) in this case.Then, we define the distance bound of v, denoted as d * (r, v), by its unaffected neighbors such that: The following lemma states that the smallest distance bound of a boundary vertex v equals to the new distance from v to the landmark r on the changed graph G .
Proof We prove this by contradiction.Assume that d G (r, v) = d * (r, v).Since d * (r, v) is the minimum length of all paths between v and r that go through only unaffected vertices, it means that d G (r, v) < d * (r, v) and one shortest-path between v and r must go through at least one affected vertex v ∈ A(r, B).Then d G (r, v) > d G (r, v ) must hold.This contradicts with the assumption that v has the smallest distance bound and thus Note that, d G (r, v) = d * (r, v) does not generally hold for every boundary vertex v.If the distance bound of a boundary vertex v is not the smallest in comparison with the distance bounds of other boundary vertices in B(r, B), the distance d G (r, v) can be computed from its affected neighbors rather than its unaffected neighbors.Nonetheless, in such cases, the affected neighbors of v needs to find their new distances to the landmark r on the changed graph G before computing d G (r, v).
Example 3 Consider the graph in Figure 4(a), three boundary vertices with their distance bounds w.r.t. the landmark 5 are highlighted in dark green.Although the distance bound of vertex 14 is 4 through its unaffected neighbor 11, the distance d G (5, 14) = 3 can be obtained through its affected neighbor 13 as shown in Figure 4(b).

Algorithm 3 Finding boundary vertices.
Fig. 4 An illustration of our bounded repair algorithm w.r.t. the landmark 5. Affected vertices are highlighted in green color, boundary vertices are highlighted in dark green color, blue color denotes vertices that are being repaired, and red color denotes vertices that are being pruned.Their distance bounds are also provided next to these vertices Algorithm 3 describes our algorithm for finding boundary vertices.Given a changed graph G by a sequence of updates B and a set of affected vertices A(r, B), for each vertex v ∈ A(r, B), we compute the distance bound of v using the distance information of its unaffected neighbors (Lines 4-7).Additionally, to improve efficiency, we remove outdated distance information from the labels of all affected vertices because they will be repaired based on the changed graph G in the next section (Line 8).Example 4 Consider the graph in Figure 4(a), we start with eight affected vertices and find only three of them as boundary vertices {12, 13, 14} because of the presence of at least one unaffected vertex in their neighborhood and their bounded distances are being computed using the distance information of their unaffected neighbors.

Repairing affected vertices
In this section, we propose a repairing strategy to efficiently update the labels of affected vertices in order to reflect graph changes.The key idea is to conduct a BFS only on affected vertices bounded by the "boundary vertices".
Concretely, based on Lemma 3, we can conduct a BFS from boundary vertices with the smallest distance bound to infer the distances of their affected neighbors and repair their labels.After each iteration, we treat affected vertices with repaired labels as being unaffected and further process boundary vertices that have the smallest distance bound again.This process terminates only when the labels of all affected vertices are repaired.
To further improve efficiency, we develop a landmark pruning strategy based on the following lemma.
Lemma 4 A vertex v ∈ A(r, B) can be pruned from repairing (i.e., do not repair its label) if there exists a landmark r ∈ R\{r} lying on one shortest-path in P G (v, r).
Proof By Definition 2, if there exists such a landmark r , d G (r, v) can be computed using (3) based on the highway and label of v w.r.t. the landmark r .Thus, the entry in L(v) w.r.t. the landmark r is not needed.
From Lemma 4, we notice that, if a vertex v ∈ A(r, B) can be pruned, any vertex (v, v ) can also be pruned.Putting all together, we incorporate a landmark pruning strategy into our repairing algorithm.
Algorithm 4 describes our algorithm for repairing affected vertices.Given a graph G = G • B, a set of affected vertices A(r, B) and a set of boundary vertices B(r, B), we first sort vertices in B(r, B) w.r.t.their boundary distances and set π as the minimum boundary distance (Lines 2-3).We use two queues Q l and Q p to process vertices to be labeled and pruned, respectively.Then, we conduct a BFS w.r.t. a landmark r starting from vertices in B(r, B) with the smallest distance bound.We process vertices in B(r, B) that have distance π and enqueue to Q p or Q l based on whether they need to be labeled or have been pruned.For each vertex v ∈ Q l at distance π , we examine affected neighbors w of v.If w is pruned, and if w is a landmark, then we repair the highway (Line 12) and add w to Q pruned because it is pruned (Line 13).Otherwise, we repair the label of w i.e., add an entry for r with the new distance π + 1 in L(w) and enqueue w to Q l (Lines 14-15).After that, we remove w from A(r, B) because it has been repaired (line 17).Now we process vertices with distance π + 1 in B(r, B) and put them to respective queues before processing pruned vertices, because otherwise vertices in Q p may prune out some of them that should not have been pruned (Lines 19-20).Next, we process vertices Q p and for each (v, π ) ∈ Q p at depth π , we enqueue affected neighbors w of v to Q p and remove them from A(r, B) (Lines 22-24).We process these two queues, one after the other, until Q l is empty.Due to the highway cover property of distance labelling, we can also leverage the parallelism at the landmark level, and we call this landmark parallelism.As shown in Algorithm 1, all these three steps can be conducted in parallel with respect to each landmark r ∈ R.

Theoretical discussion
In this section, we discuss the correctness of our parallel fully dynamic method and show that it can preserve the minimality property of highway cover labelling.We also analyse the time and space complexity of our proposed method.

Proof of correctness
When a graph G is changed to a graph G after undergoing a sequence of updates B, our proposed method PARDHL can dynamically maintain a highway cover labelling over G to a highway cover labelling over G .Formally, PARDHL is correct iff, whenever Q(u, v, ) = d G (u, v) holds for any two vertices u and v in G, Q(u , v , ) = d G (u , v ) also holds for any two vertices u and v in G .We prove the following theorem to show the correctness of PARDHL.(r, v) does not contain any other landmark in the shortest-path from v to r; (2) δ H (r, r ) = d G (r, r ) for any r ∈ R\{r}.By Lemma 3, starting from boundary vertices with the smallest distance bound, the distances of affected vertices on G are iteratively inferred in A(r, B) and added into their labels via Q l if these affected vertices are not prunable (Lines 6, 15-16 and 19).If an affected vertex v is prunable, it is kept in Q p ; if v is also a landmark, δ H (r, v) in H is updated (Lines 5, 10-13, 18).Thus, every vertex v appearing in Q p has no distance entry of r in L (v), whereas every vertex v appearing in Q l must have (r, d G (r, v))) ∈ L (v).By Lemma 4, this proves both (1) and ( 2).
The following theorem states that the minimality of labelling can be preserved by PARDHL.r i , v) does not contain any other landmark on any shortest-path between r and v.This ensures the minimality for the labels of all affected vertices.For unaffected vertices, by Definition 5, their labels should remain unchanged.Thus, if is minimal, then obtained by updating the labels of all affected vertices must also be minimal.

Complexity analysis
Let m be the total number of affected vertices w.r.t. a landmark r, l be the average label size of a highway cover distance labelling (i.e., l = size(L)/|V |), and d be the average degree of affected vertices.
In practice, m is usually orders of magnitudes smaller than |V | and l is also significantly smaller than

Experiments
We implemented our proposed method PARDHL to answer the following questions: Q1.How efficiently can PARDHL process graph updates in comparison with the-state-ofthe-art dynamic algorithms?Q2.How does the number of landmarks affect the performance of PARDHL?Q3.What is the effect of the size of updates on the performance of PARDHL?Q4.Is there an upper bound on the size of updates, for which PARDHL is better than reconstruction (i.e., recomputing a labelling from scratch) and online search algorithms without using any labelling?

Experimental setup
We implemented our method in C++11 and compiled it then using gcc 5.5.0 with the -O3 option.We performed all the experiments using 28 threads on a Linux server (Intel Xeon W-2175 with 2.50GHz, 28 cores and 512GB of main memory).

Baseline methods
We compared our method with the following state-of-the-art algorithms: -FULFD: a fully dynamic algorithm proposed in [10] which combines a distance labelling with a graph traversal algorithm to answer distance queries.-FULHL: a fully dynamic labelling algorithm proposed in [11] which combines a highway cover labelling with graph traversal algorithm to answer distance queries.-INCHL + : an online incremental algorithm proposed in [25], which combines a highway cover labelling with a graph traversal algorithm for answering distance queries; -Opt-BiBFS: an online bidirectional BFS algorithm to answer distance queries, using an optimized strategy to expand searches from a direction with less vertices [10].
Besides these methods, there are several other methods for answering distance queries on dynamic graphs, such as FULPLL [9], DECM [6], and WPSL [28] which can only process one single update at a time.Since the experimental results of the previous works [10,11,28] have shown that FULFD and FULHL outperform FULPLL, and WPSL outperforms DECM and can only scale to graphs with millions of nodes and edges, we omit the comparison with these methods.To distinguish the parallelism power of landmark parallelism from anchor parallelism, we also consider another variant of our proposed method, called PARDHL − , which is obtained by removing landmark parallelism from PARDHL.The code of FULFD and FULHL was provided by their authors and implemented in C++.We used the same parameter settings as suggested by their authors unless otherwise stated.To have a fair comparison, we set the number of landmarks to 20.

Datasets
We used 10 real-world large complex networks to verify the efficiency of our algorithms.We treated these networks as undirected and unweighted graphs.The statistics of these datasets are summarized in Table 1.They can be easily downloaded though the links of the Stanford Network Analysis Project [30] and the Laboratory for web Algorithmics [31].

Test data generation
We applied the following principles to sample updates and queries in our experiments.
For updates, we considered three update settings: (1) fully dynamic -contains 50% edge insertions and 50% edge deletions, (2) incremental -contains only edge insertions, and (3) decremental -contains only edge deletions.For each update setting, we randomly generated 10 sequences of updates, where each sequence contains 1,000 updates.These settings enable us to explore the impacts of edge insertions and edge deletions respectively, in addition to their combined impact.
In Figure 5, we report the distance distribution for updates in the incremental setting before applying updates and the decremental setting after applying updates.For the incremental setting, the distances between vertices of updates mostly range from 2 to 8 in all datasets, except for only Indochina and UK which have 15%-30% of updates with distances larger than 8.For the decremental setting, the distances between vertices of updates in all datasets are small ranging from 1 to 6, which shows that the updates are selected from densely connected components of these networks allowing us to evaluate the methods more effectively.Further, only a small number of updates are disconnected (i.e., have distance ∞) in most of the datasets.
For queries, we randomly sampled 100,000 pairs of vertices in each dataset to evaluate the average querying time on graphs after being changed as a result of updates.We also report the average size of distance labellings after being changed as a result of updates.

Performance comparison
In the following, we compare our proposed method with the baseline methods in terms of update time, labelling size and query time.

Update time
Tables 2, 3 and 4 show the average update time of the proposed and baseline methods after performing updates in different settings.We also report the fraction of affected vertices (i.e., |A|  |V | ) for 1000 updates which is averaged over 10 sequences for our algorithms.From Table 2, we see that the average update time of our method PARDHL is considerably less than FULHL and significantly much less than FULFD on all the datasets in the fully dynamic setting.We can also see that PARDHL − is comparable with FULHL and significantly outperforms FULFD.Our proposed methods have promising results for large datasets.In particular, they process updates on networks with over billions of edges and on networks that have large fractions of affected vertices, much more efficiently than the baseline methods.
Table 3 shows that the average update time of PARDHL outperforms INCHL on all the datasets, and outperforms INCHL + and INCFD except on Twitter and Friendster in the incremental setting.This is because PARDHL leverages advantages from parallelism when a large fraction of vertices is affected and as we can see that the fraction is negligibly small for Twitter and Friendster which make it slower on these datasets as compared to sequential methods INCFD and INCHL + .We can also see that PARDHL − has comparable results We can also verify from Table 4 that the average update time of PARDHL is significantly faster than the state-of-the-art methods DECHL and DECFD on all the datasets in the decremental setting.Particularly, PARDHL has much improved results on networks with high average degree such as Twitter and Hollywood.Due to the inherent complexity of decremental operation on graphs (i.e., increasing distances), DECFD takes very long time in identifying and updating labels of affected vertices.We can also observe that PARDHL −

Labelling size
Table 5 shows that PARDHL has significantly smaller labelling sizes as compared to FULFD on all the datasets.We did not provide the labelling sizes of FULHL because it is also designed based on highway cover distance labelling and due to the minimality property has the same labelling sizes as PARDHL.The labelling size of FULFD remains unchanged at all times because they maintain fixed bit-parallel shortest-path trees.On the other hand, PARDHL stores pruned shortest-path trees; therefore, to preserve the property of minimality, labels need to be added or deleted as a result of graph updates.However, the labelling size of PARDHL remains stable in practice because the average label size is bounded by a constant, i.e. the number of landmarks.

Query time
Table 5 shows the average query time of PARDHL is comparable with FULFD.Again, we did not provide query time of FULHL because it has the same results as our method PARDHL.It has been previously shown [9] that the average query time is mainly dependent on labelling size.Since the dynamic operations do not considerably affect the labelling size for PARDHL and FULFD, their query time also remains stable.We compare the total time of querying and updating of our methods with the baseline methods in Figure 6.For a fair comparison, we take the sum of the total update time for randomly sampled updates of varying sizes i.e., 1 to 10,000 plus the query time of 1,000 queries after applying the updates as the total time of our methods, denoted as PARDHL+QT and PARDHL − +QT, and the baseline methods, denoted as FULHL+QT and FULFD+QT.For the baseline method Opt-BiBFS, we take only the query time of 1,000 queries after applying the updates.We see that, even adding the update time of maintaining the labelling under updates of varying sizes, the overall query performance of our methods is significantly better than the baseline methods on all the datasets.In particular, our methods show a promising query performance on large networks Twitter, Friendster and UK.

Impact of varying landmarks
We analyse the performance of our proposed method PARDHL with the baseline methods FULHL and FULFD under varying landmarks, i.e., |R| ∈ [10,20,30,40,50,150]. Figure 7 shows that the update time of our method PARDHL and the baseline methods FULHL and FULFD, after applying a sequence of 1000 updates in the fully dynamic setting, under varying landmarks.We can see from Figure 7 that our method PARDHL outperforms FULHL and FULFD on all the datasets against each setting of landmarks.This is due to parallel searches for finding affected vertices and the novel pruning approach for repairing labels.Particularly, our method PARDHL can perform much better than FULHL and FULFD on large datasets with over billions of edges when the number of landmarks is increased.For clarity of performance illustration, we present results for Friendster separately for FULHL and FULFD.

Impact of varying size of updates
We evaluate the performance of our methods PARDHL and PARDHL − against the increasing size of updates that are selected randomly.We start with 500 updates and then iteratively add 500 updates up to 10,000 updates in the fully dynamic setting.
Figure 8 shows the average update time after constructing a distance labelling from scratch, and updating the distance labelling using our fully dynamic algorithms after each increase.We observe from Figure 8 that our methods perform well on all the datasets i.e., the update time remains lower under the construction time for almost all the datasets.We can also observe that increasing the size of updates tends towards slower increase in update time which shows that parallel searches and efficient repairing under large sizes of updates is much more efficient in processing the graph updates.It is worth noticing that PARDHL is more efficient than PARDHL − which shows that landmarks parallelism in PARDHL can be better leveraged for larger sizes of updates particularly for Indochina, IT and UK.This is due to the updates in these networks have larger distances, as shown in their distance distributions in Figure 5 which may result in much more affected vertices and thus parallelism is better leveraged.

Extensions
We can easily extend our proposed method to directed or weighted graphs.
For directed graphs, more specifically, we can redefine d G (s, t) as the distance from vertex s to vertex t.We store two label sets for each vertex v ∈ V , namely forward label L f (v) and backward label L b (v), which contain pairs (r i , δ r i v ) after performing forward and backward pruned BFSs w.r.t. each landmark r i ∈ R, respectively.We also store two highways, namely forward highway H f = (R, δ H f ) and backward highway H b = (R, δ H b ) such that for any two landmarks r i , r j ∈ R, δ H f (r i , r j ) = d G (r i , r j ) and δ H b (r j , r i ) = d G (r j , r i ).
For maintaining a precomputed highway cover distance labelling as a result of graph changes, we run our method twice, one for fixing the forward labels and highway and the other for fixing the backward labels and highway.To repair the forward labels and highway, we first perform parallel BFSs using Algorithm 2 on the forward adjacency list of a changed graph to find the set of all affected vertices.Then, we find boundary vertices using Algorithm 3 by checking the neighbors of all affected vertices in the reverse adjacency list of a changed graph.Finally, we repair the forward labels and highway starting from the boundary vertices that have the minimum distance using the forward adjacency list of a changed graph and infer the distances of affected vertices on the changed graph via a level-by-level inference.Similarly, the backward labels and highway can be repaired in the same manner.For a given query pair (s, t), we can use L f (s) and L b (t) to compute the upper bound distance from s to t in the same way as described in (3).
We can also extend our proposed method to non-negative weighted graphs.In such cases, we use Dijkstra's algorithm in place of BFSs in order to compute and maintain a highway cover distance labelling for dynamic weighted graphs.

Conclusion and future work
In this article, we have proposed a novel parallel method for answering distance queries on dynamic graphs.Our proposed method exploits anchor parallelism by parallelising searches for multiple updates that can find affected vertices simultaneously.We have also introduced an efficient repairing mechanism based on the observation of boundary vertices, which can bound a search space to only affected vertices while repairing their labels.Our repairing mechanism uses a novel pruning strategy to further bound the search space of affected vertices for efficient maintenance of a highway cover distance labelling.We have analyzed the correctness and complexity of our method and showed that it preserves the labelling minimality.We have empirically verified the efficiency, scalability and robustness of our method on 10 real-world networks.
For future work, we plan to further investigate the following research directions: 1) it would be interesting to explore the opportunity to extend the proposed algorithms to road networks, and 2) re-positioning/selection of landmarks in a dynamic setting in order to reduce the size of the labelling and hence of the query time.Re-selection of highly central landmarks could also be required after a certain amount of changes occurring on the topological structure of a dynamic network.This would help optimize the size of a highway cover distance labelling and query performance.Therefore, it is also interesting to explore the problems such as: after how much changes on the topological structure of a dynamic graph, re-positioning of landmarks is required.
r), and unaffected otherwise.We use A(r, B) = {v ∈ V | P G (v, r) = P G (v, r)} to denote the set of all affected vertices by B w.r.t. a landmark r and A = r∈R A(r, B) refers to the set of all affected vertices.

Lemma 1 A
vertex v is affected w.r.t. a landmark r iff there exists a shortest-path between v and r that passes through an inserted edge (a, b) in G or a deleted edge (a, b) in G.By this lemma, we know that any update on an edge (a, b) satisfying d G (r, a) = d G (r, b) is trivial w.r.t. a landmark r, since such an update does not affect any vertices w.r.t. the landmark r.Without loss of generality, we assume that d G (r, b) > d G (r, a) for updates on an edge (a, b) in the rest of this article.

Algorithm 2
Finding affected vertices.changed that is also highlighted in Figure 3(d).We have two anchor vertices 8 and 13 for B w.r.t. 5.The simultaneous searches from these anchors are shown in Figure 3(b)-(c) which results in identifying all affected vertices.

Definition 8 (
Boundary vertex) A vertex v is a boundary vertex w.r.t. a landmark r ∈ R and a sequence of updates B if v is an affected vertex and has at least one unaffected neighbor w, i.e. w / ∈ A(r, B) ∧ w ∈ N G (v).We use B(r, B) to refer to the set of boundary vertices w.r.t. a landmark r and a sequence of updates B. Since a boundary vertex v has at least one unaffected neighbor w,

Example 5 Algorithm 4
In the graph of Figure4(a), we have three boundary vertices with their distance bounds.In the graph of Figure4(b), we start with the boundary vertex 13 which has the smallest distance bound 2 w.r.t. the landmark 5, repair its label and infer new distance bound 3 for its affected neighbors 12 and 14.Then, in the graph of Figure4(c), we repair the labels of boundary vertices 12 and 14 which have the smallest distance bound 3 and infer new distance bound 4 for their affected neighbors 16, 17 and 18.Note that vertex 12 is pruned because it is a landmark.Next, in the graph of Figure4(c), we only repair the label of boundary vertex 18 because the other two boundary vertices 16 and 17 are pruned due to the presence of landmark 12 in their shortest-paths to the root landmark 5. Finally, in the graph of Figure 4(d), boundary vertices 8 and 12 are also pruned based on Lemma 4. Repairing affected vertices.

Theorem 1
Let G = G • B and = PARDHL(G, , B, G ). Then is a highway cover labelling on G .Proof First, we prove that FindAffected finds the set of all affected vertices w.r.t. a landmark r by a sequence of updates B. By Lemma 1, we know that each affected vertex is enqueued into Q during a BFS search from b (Lines 4 and 7-8 in Algorithm 2).Thus, A(r, B) contains all affected vertices w.r.t.r and (a, b).Then, we show that FindBoundary finds the set of all boundary vertices w.r.t. a landmark r and updates B. Algorithm 3 finds boundary vertices B(r, B) from all affected vertices v ∈ A(r, B).It adds an affected vertex v ∈ A(r, B) into B(r, B) iff v has at least one unaffected neighbor (Lines 4-7).Algorithm 3 also removes the distance entries of r from all affected vertices (Line 8).Now, we prove that RepairAffected modifies

Algorithm 2
takes O((m • d • l)/t) time and O(|V |) space to find all affected vertices, where t is the total number of parallel processes.Algorithm 3 takes O(m • d) to compute distance bounds.Then, Algorithm 4 takes O(m • d 2 ) to repair the labels of all affected vertices because the pruning step may be checked for each affected vertex in the worst case.We omit l from O(m • d) for Algorithm 3 and from O(m • d 2 ) for Algorithm 4 because distances for all unaffected neighbors of affected vertices are stored in Algorithm 2. Therefore, PARDHL takes O

Fig. 5
Fig. 5 Distance distributions for updates: (a)-(b) in the incremental setting (only edge insertions), and (c)-(d) in the decremental setting (only edge deletions)

Fig. 7 Fig. 8
Fig. 7 Update time of PARDHL (in colored bars) and the baseline methods FULHL (in colored plus grey bars) and FULFD (in colored plus grey plus light grey bars) under 10-50 landmarks affected by B w.r.t. a landmark r if the following condition holds for at least one update (u , u) ∈ B:

Table 1
Summary of datasets

Table 2
Comparison of the update time between our methods and the baseline methods.The update time is reported for 1,000 updates, averaged over 10 sequences

Table 3
Comparison of the update time in the incremental setting between our methods and the baseline methods

Table 4
Comparison of the update time in the decremental setting between our methods and the baseline methods outperforms DECFD and is comparable with DECHL and again better exploits parallelism on datasets with large fraction of affected vertices w.r.t.dataset size.

Table 5
Comparison of the labelling size and query time between our proposed methods and the baseline methods Comparison between the total time of querying and updating w.r.t varying size of updates up to 10,000 updates of the proposed methods PARDHL − and PARDHL, and the baseline methods Opt-BiBFS, FULHL, and FULFD