Low-Congestion Shortcut and Graph Parameters

The concept of low-congestion shortcuts is initiated by Ghaffari and Haeupler [SODA2016] for addressing the design of CONGEST algorithms running fast in restricted network topologies. Specifically, given a specific graph class $X$, an $f$-round algorithm of constructing shortcuts of quality $q$ for any instance in $X$ results in $\tilde{O}(q + f)$-round algorithms of solving several fundamental graph problems such as minimum spanning tree and minimum cut, for $X$. In this paper, we consider the relationship between the quality of low-congestion shortcuts and three major graph parameters, chordality, diameter, and clique-width. The main contribution of the paper is threefold: (1) We show an $O(1)$-round algorithm which constructs a low-congestion shortcut with quality $O(kD)$ for any $k$-chordal graph, and prove that the quality and running time of this construction is nearly optimal up to polylogarithmic factors. (2) We present two algorithms, each of which constructs a low-congestion shortcut with quality $\tilde{O}(n^{1/4})$ in $\tilde{O}(n^{1/4})$ rounds for graphs of $D=3$, and that with quality $\tilde{O}(n^{1/3})$ in $\tilde{O}(n^{1/3})$ rounds for graphs of $D=4$ respectively. These results obviously deduce two MST algorithms running in $\tilde{O}(n^{1/4})$ and $\tilde{O}(n^{1/3})$ rounds for $D=3$ and $4$ respectively, which almost close the long-standing complexity gap of the MST construction in small-diameter graphs originally posed by Lotker et al. [Distributed Computing 2006]. (3) We show that bounding clique-width does not help the construction of good shortcuts by presenting a network topology of clique-width six where the construction of MST is as expensive as the general case.

Conversely, if we get a time-complexity lower bound for any problem stated above, then it also applies to the partwise aggregation and low-congestion shortcuts (with respect to quality plus construction time). In fact, theÕ( √ n + D)-round lower bound of shortcuts for general graphs is deduced from the lower bound of MST. On the other hand, the existence of efficient (in the sense of breaking the general lower bound) low-congestion shortcuts is known for several major graph classes, as well as its construction algorithms [20,14,17,18,11,13].

Our Result
In this paper, we study the relationship between several major graph parameters and the quality of lowcongestion shortcuts. Specifically, we focus on three parameters, that is, (1) chordality, (2) diameter, and (3) clique-width. The precise statement of our result is as follows: 1Õ(·) is a notation which ignores polylog(n) factors from O(·). 2The statement of the weighted single-source shortest path problem is slightly simplified. See [19] for the details.
• There is an O(1)-round algorithm which constructs a low-congestion shortcut with quality O(k D) for any k-chordal graph. When k = O(1), its quality matches the Ω(D)-universal lower bound.
• For k ≤ D and k D ≤ √ n, there exists a k-chordal graph where the construction of MST requires Ω(k D) rounds. It implies that the quality plus construction time of our algorithm is nearly optimal up to polylogarithmic factors.
• There exists an algorithm of constructing a low-congestion shortcut with qualityÕ(n 1/4 ) inÕ(n 1/4 ) rounds for any graph of diameter three. In addition, there exists an algorithm of constructing a lowcongestion shortcut with qualityÕ(n 1/3 ) inÕ(n 1/3 ) rounds for any graph of diameter four. These results almost close the long-standing complexity gap of the MST construction in graphs with small diameters, which is originally posed by Lotker et al. [24].
• We present a negative instance certifying that bounded clique-width does not help the construction of good-quality shortcuts. Precisely, we give an instance of clique-width six where the construction of MST is as expensive as the general case, i.e.,Ω( √ n + D) rounds. Table 1 summarizes the state-of-the-art upper and lower bounds for low-congestion shortcuts. It should be noted that all the parameters considered in this paper is independent of the other parameters such that bounding it admits good shortcuts (e.g., treewidth and genus), and thus any result above is not a corollary of the past results. For proving our upper bounds, we propose a new scheme of shortcut construction, called 1-hop extension, where each node in a part only takes all the incident edges as the shortcut edges of its own part. Surprisingly, this very simple construction admits an optimal shortcut for any k-chordal graph. For graphs of diameter three or four, our algorithm is obtained by combining the 1-hop extension scheme with yet another algorithm of finding short low-congestion paths (i.e., paths of length one or two) connecting two moderately-large subgraphs. These algorithms are still simple but it is far from triviality to bound the quality of constructed shortcuts. The analytic part includes several (seemingly) new ideas and may be of independent interest.  [11]Õ(D) [11]Ω(D) [11] Genus-gÕ( √ gD) [18]Õ( √ gD) [18]Ω( √ gD) [18] Treewidth-kÕ(k D) [18] -D = 3Õ(n 1/4 ) (this paper)Õ(n 1/4 ) (this paper) Ω(n 1/4 ) [30, 24] D = 4Õ(n 1/3 ) (this paper)Õ(n 1/3 ) (this paper) * τ is the mixing time of the network graph G.

Related Work
The MST problem is one of the most fundamental problems in distributed graph algorithms. It is not only important by itself, but also has many applications for solving other distributed tasks (e.g., detecting connected components, minimum cut, and so on). Hence many researches have tackled the design of efficient MST algorithms in the CONGEST model so far [7,22,8,27,28,15,12,16,21]. The round-complexity lower bound of MST construction is also a central topic in distributed complexity theory [29,30,24,25,5,6]. The inherent difficulty of MST construction is of solving the partwise aggregation (minimum) problem efficiently. This viewpoint is first identified by Ghaffari and Haeupler [11] explicitly, as well as an efficient algorithm for solving it in planar graphs. The concept of low-congestion shortcuts is newly invented there for encapsulating the difficulty of partwise aggregation. Recently, several follow-up papers are published to extend the applicability of low-congestion shortcuts, which break the known general lower bounds of several fundamental graph problems in several specific graph classes: This line includes bounded-genus graphs [11,17], bounded-treewidth graphs [17], graphs with excluded minors [20], expander graphs [13,14], and so on (See Table 1). The application of low-congestion shortcuts is not limited only to MST. As stated in Theorem 1, it also admits efficient solutions for approximate minimum cut and single-source shortest path. A few algorithms recently proposed utilize low-congestion shortcuts as an important building block, e.g., the depth first search in planar graphs [19] and approximate treewidth (with decomposition) [23]. Haeupler et al. [16] shows a message-reduction scheme of shortcut-based algorithms, which drop the total number of messages exchanged by the algorithm intoÕ(m), where m is the number of links. On the negative side, it is known that the hardness of (approximate) diameter cannot be encapsulated by low-congestion shortcuts. Abboud et al. [1] shows a hard-core family of unweighted graphs with O(log n) treewidth where any diameter computation in the CONGEST model requiresΩ(n) rounds. Since any graph with O(log n) treewidth admits a low-congestion shortcut of qualityÕ(D), this result implies that it is not possible to compute the diameter of graphs efficiently by using only the property of low-congestion shortcuts.
While our results exhibit a tight upper bound for graphs of diameter three or four, a more generalized lower bound is known for small-diameter graphs. [30]. For any log n ≥ D ≥ 3, it is proved that there exists a network topology which incurs theΩ n (D−2)/(2D−2) -round time complexity for any MST algorithm. In more restricted cases of D = 1 and D = 2, Jurdzinski et al. [21] and Lotker et al. [24] respectively show O(1)-round and O(log n)-round MST algorithms.

Outline of the Paper
The paper is organized as follows: In Section 2, we introduce the formal definitions of the CONGEST model, partwise aggregation, and low-congestion shortcuts, and other miscellaneous terminologies and notations. In Section 3, we show the upper and lower bounds for shortcuts and MST in k-chordal graphs. In Section 4, we present our shortcut algorithms for graphs of diameter three or four. In Section 5, we prove the hardness result for bounded clique-width graphs. The paper is concluded in Section 6.

CONGEST model
Throughout this paper, we denote by [a, b] the set of integers at least a and at most b. A distributed system is represented by a simple undirected connected graph G = (V, E), where V is the set of nodes and E is the set of edges. Let n and m be the numbers of nodes and edges respectively, and D be the diameter of G. Each node has an ID from N (which is represented with O(log n) bits). In the CONGEST model, the computation follows the round-based synchrony. In one round, each node sends messages to its neighbors, receives messages from its neighbors, and executes local computation. It is guaranteed that every message sent at a round is delivered to the destination within the same round. Each link can transfer O(log n)-information (bidirectionally) per one round, and each node can inject different messages to its incident links. Each node has no prior knowledge on the network topology except for its neighbor's IDs. Given a graph H for which the node and link sets are not explicitly specified, we denote them by V H and E H respectively. Let N(v) be the set of nodes that are adjacent to v, and N + (v) = N(v)∪{v}. We define N(S) = ∪ s ∈S N(s) and N + (S) = ∪ s ∈S N + (s) for any S ⊆ V. For two node subsets X, Y ⊆ V, we also define E(X, The distance (i.e., the number of edges in the shortest path) between two nodes u and v in G is denoted by dist G (u, v). Let S be a path in G. With a small abuse of notations, we often treat S as the sequence of nodes or edges representing the path, as the set of nodes or edges in the path, or the subgraph of G forming the path.

Partwise Aggregation
The partwise aggregation is a communication abstraction defined over a set P = {P 1 , P 2 , . . . , P N } of mutually-disjoint and connected subgraphs called parts, and provides simultaneous fast communication among the nodes in each P i . It is formally defined as follows: Definition 1 (Partwise Aggregation (PA)). Let P = {P 1 , P 2 , . . . , P N } be the set of connected mutuallydisjoint subgraphs of G, and each node v ∈ V P i maintains variable b i v storing an input value x i v ∈ X. The output of the partwise aggregation problem is to assign where ⊕ is an arbitrary associative and commutative binary operation over X.
The straightforward solution of the partwise aggregation problem is to perform the convergecast and broadcast in each part P i independently. Specifically, we construct a BFS tree for each part P i (after the selection of the root by any leader election algorithm). The time complexity is proportional to the diameter of each part P i , which can be large (Ω(n) in the worst case) independently of the diameter of G.

(d, c)-Shortcut
As we stated in the introduction, the notion of low-congestion shortcuts is introduced for quickly solving the partwise aggregation problem (for some specific graph classes). The formal definition of (d, c)-shortcuts is given as follows. [11]] Given a graph G = (V, E) and a partition P = {P 1 , P 2 , . . . , P N } of G into node-disjoint and connected subgraphs, we define a (d, c)-shortcut of G and P as a set of subgraphs

For each edge e ∈ E, the number of subgraphs
The values of d and c for a (d, c)-shortcut H is called the dilation and congestion of H . As a general statement, a (d, c)-shortcut which is constructed in f rounds admits the solution of the partwise aggregation problem inÕ(d + c + f ) rounds [11,10]. Since the parameter d + c asymptotically affects the performance of the application, we call the value of d + c the quality of (d, c)-shortcuts. A low-congestion shortcut with quality q is simply called a q-shortcut.

The framework of the Lower Bound
To prove the lower bound of MST, we introduce a simplified version of the framework by Das Sarma et al. [30]. In this framework, we consider the graph class G(n, b, l, c) that is defined below. A vertex set X ⊆ V is called connected if the subgraph induced by X is connected.
Definition 3. For n, b, c ≥ 0 and l ≥ 3, the graph class G(n, b, l, c) is defined as the set of n-vertex graph G = (V, E) satisfying the following conditions: Figure 1 shows the graph that is defined vertex partition X and Q for the hard-core instances presented in the original proof by Das Sarma et al. [30]. This graph belongs to G(O(lb), b, l, O(log n)). For class G(n, b, l, c), the following theorem holds, which is just a corollary of the result by Das Sarma et al. [30].
Theorem 2 (Das Sarma et al. [30]). For any graph G ∈ G(n, b, l, c) and any MST algorithm A, there exists an edge-weight function w A,G : E → N such that the execution of A in G requiresΩ(min{b/c, l/2 − 1}) rounds. This bound holds with high probability even if A is a randomized algorithm.

k-Chordal Graph
A graph G is k-chordal if and only if every cycle of length larger than k has a chord (equivalently, G contains no induced cycle of length larger than k). In particular, 3-chordal graphs are simply called chordal graphs, which is known to be much related to various intersection graph families such as interval graphs [9,26]. Since k-chordal graphs can contain the clique of an arbitrary size for any k ≥ 3, it is never a subclass of any minor-excluded graphs. Thus no known shortcut algorithm works correctly for k-chordal graphs. The main results of this section are the following two theorems:

Proof of Theorem 3
We provide the proof of Theorem 3. The construction algorithm is very simple. It follows the 1-hop extension scheme stated below: Obviously, this algorithm terminates in one round. Since each node belongs to one part, the congestion of each edge is at most two. Therefore, the technical challenge in proving Theorem 3 is to show that the In other words, the following lemma trivially deduces Theorem 3.

Lemma 1. Letting
By symmetry, we only consider the case of x = 0. The case of x > 0 is proved similarly. Let S = (t 0 = s 0 , s 1 , . . . , s = t 1 ) be the sub-path of A, and S = (t 0 = s 0 , s 1 , . . . , s = t 1 ) be the sub-path of B. Given a sequence X, we denote by X[i, j] its consecutive subsequence from the i-th element to the j-th one in X.
We prove that for any 0 ≤ j ≤ , there exists a node s c(j) ∈ S such that c( j) ≥ j, dist G i (t 0 , s c(j) ) ≤ k j and N + (s c(j) ) ∪ S ∅ hold. The lemma is obtained by setting j = because then s c(j) = s = t 1 holds. The proof follows the induction on j. (Basis) If j = 0, then it holds for s c(j) = s 0 . (Inductive step) Suppose as the induction hypothesis that there exists a node s c(j) satisfying c( j) ≥ j and dist G i (t 0 , s c(j) ) ≤ k j. If c( j) > j, obviously s c(j+1) = s c(j) satisfies the case of j + 1. Thus, it suffices to consider the case of c( j) = j. Let s h be the neighbor of s c(j) in S maximizing h, and e = (s c(j) , s h ). We consider the cycle C consisting of S[c( j), ], S [h, ], and e. If the length of C is at most k, obviously we have − h ≤ k − 1. Since dist G i (t 0 , s c(j) ) ≤ k j holds by the induction hypothesis, s c(j+1) = s satisfies the condition. If the length of C is larger than k, C has a chord, which connects two nodes respectively in S and S because both S and S are shortest paths. Let e = (s y , s y ) be such a chord making the cycle C consisting of e, e , S[s c(j) , s y ], and S [s h , s y ] chordless (see Figure 2). Since h is the maximum, we have y > c( j) because if y = c( j) the edge e ( e) is taken as e. Due to the property of k-chordality, the length of C is at most k, and thus the length Letting c( j + 1) = y, we obtain the proof for j + 1. The lemma holds.

Proof of Theorem 4
We first introduce the instance mentioned in Theorem 4. Since it has two additional parameters x ≥ 0 and N ≥ 2 as well as k, we refer to that instance as G(k, x, N) = (V(k, x, N), E(k, x, N)) in the following : : : Figure 2: Proof of Lemma 1.
is clique argument. The parameters x and N are adjusted later for obtaining the claimed lower bound. Let K = k/2 − 1 for short. The vertex set and edge set of G(k, x, N) is defined as follows: Figure 3 illustrates the graph G(k, x, N). It is cumbersome to check this graph is k-chordal, but straightforward. One can show the following lemma.
Proof. For simplicity, we give some of the vertices a name v xy as follows; • We define a subset of vertices called row and column. The i-th row R i is defined as R i = {v i, j |0 ≤ j ≤ x}, and the i-th column C i is defined as First, we consider the diameter of G(k, x, N). For 2 ≤ i ≤ N and 0 ≤ j ≤ xK, we have min 0≤k  We consider a cycle X in G(k, x, N). Let l and r be the minimum/maximum indices of the rows X intersects, Similarly, let t and b be the minimum/maximum indices of the columns X intersects. Let m be the index such that |C m ∩ X | maximizes, and let a m = |C m ∩ X | for short. Any cycle X applies to one of the following four cases. We show that Lemma 2 holds for all the cases (Figure 4 almost states the proof).
1. The case of r − l ≥ 2: By the construction of G(k, x, N), l-r path intersects (l + 1)-column at least twice. Let u and v be the intersection of X and (l + 1)-column. Since C l+1 is clique, u and v are adjacent. Thus the edge (u, v) is chord of X.
2. The case of a m ≥ 3 and r − l 0: There exists two vertices in C m , which are not adjacent in X. Since C m is clique, there exists an edges between them, and this edge is a chord of X.
3. The case of r − l = 0: The cycle X is a clique in graph G and the lemma holds obviously.
4. The case of r − l = 1 and a m = 2: The cycle consists of four vertices v t,l ,v t,r ,v b,l , v b,r and two paths, that is, the paths connecting v t,l with v t,r , and v b,l with v b,r . It follows dist Thus the length of X is at most k.
The lemma is proved.
The proof of Theorem 4 follows the framework by Das Sarma et al. [30]. It suffices to show that the following lemma. Theorem 4 is obtained by combining this lemma with Theorem 2. Proof. We define X and Q for G(k, D − K, N) as follows: It is easy to check (C1) and (C2) is satisfied. Thus we only show that (C3) is satisfied. We have E(R i , N(R i )\R i−1 ) and E(L i , N(L i )\L i ) as follows:

Centralized Construction
In the following argument, we use term "whp. (with high probability)" to mean that the event considered occurs with probability 1 − n −ω(1) (or equivalently 1 − e −ω(log n) ). For simplicity of the proof, we treat any whp. event as if it necessarily occurs (i.e. with probability one). Since the analysis below handles only a polynomially-bounded number of whp. events, the standard union-bound argument guarantees that everything simultaneously occurs whp. That is, any consequence yielded by the analysis also occurs whp.
Since the proof is constructive, we first present the algorithms for D = 3 and 4. They are described as a (unified) centralized algorithm, and the distributed implementation is explained later. Let N be the number of parts whose diameter is more than 12κ D log 3 n (say large part). Assume that P 1 , P 2 , . . . , P N are large without loss of generality. Since each part P i (1 ≤ i ≤ N ) contains at least κ D nodes, N ≤ n/κ D holds obviously. The proposed algorithm constructs the shortcut edges H i for each large part P i following the procedure below: 1. Each node v ∈ V P i adds its incident edges to H i (i.e., compute the 1-hop extension).

This step adopts two
We show that this algorithm provides a low-congestion shortcut of qualityÕ(κ D ). First, we look at the bound for congestion. Let H 1 i be the set of the edges added to H i in the first step, and H 2 i be those in the second step. Since the congestion of 1-hop extension is negligibly small, it suffices to consider the congestion incurred by step 2. Intuitively, we can believe the congestion ofÕ(κ D ) from the fact that the expected congestion of each edge isÕ(κ D ): Since the total number of large parts is at most n/κ D , the expected congestion of each edge incurred in step 2 is n/κ D · (1/n 1/2 ) = O(n 1/4 ) for D = 3, and (n/κ D ) y ∈Y (1/y) · (1/|Y|) ≤ (n/κ D ) · (log n/|Y|) =Õ(n 1/3 ) for D = 4.

Lemma 4. The congestion of the constructed shortcut isÕ(κ D ) whp.
Proof. It suffices to show that the congestion of any edge e = (u, v) ∈ E isÕ(κ D ) whp. For simplicity of the proof, we see an undirected edge e = (u, v) as two (directed) edges (u, v) and (v, u), and distinguish the events of adding (u, v) to shortcuts by u and that by v. That is, the former is recognized as adding (u, v), and the latter as adding (v, u). Obviously, the asymptotic bound holding for directed edge (u, v) also holds for the corresponding undirected edge (u, v) actually existing in G (which is at most twice of the directed bound). Since the first step of the algorithm increases the congestion of each directed edge at most by one, it suffices to show that the congestion incurred by the second step is at mostÕ(κ D ).
Let X i be the indicator random variable for the event (u, v) ∈ H 2 i , and X = i X i . The goal of the proof is to show that X =Õ(κ D ) holds whp. The cases of D = 3 and D = 4 are proved separately. (D = 3) Since at most n/κ 3 large parts exist, we have E[X] ≤ (n/κ 3 ) · (1/n 1/2 ) = n 1/4 = κ 3 . The straightforward application of Chernoff bound to X allows us to bound the congestion of e by at most 2κ 3 with probability 1 − e −Ω(n 1/4 ) . (D = 4) Let P be the subset of all large parts P j such that u ∈ N + (P j ) holds. Consider an arbitrary partition of P into several groups with size at least (n 1/3 log 3 n)/2 and at most n 1/3 log 3 n. Let q be the number of groups. Each group is identified by a number ∈ [1, q]. We refer to the -th group as P . Fixing , we bound the number of parts in P using e = (u, v) as a shortcut edge. Let Y i be the value of h(u, i). For P i ∈ P , the probability that X i = 1 is Pr[X i = 1] = y ∈Y Pr[Y i = y]1/y = Har (|Y|) /|Y|, where Har(x) is the harmonic number of x, i.e., 1≤i ≤x i −1 . Letting X = j ∈P X j , we have E[X ] = (|P |Har(|Y|))/|Y|. Since Har(x) ≤ log x, we have (|P | log n)/|Y| ≥ E[X ] ≥ |P |/|Y| = (log 4 n)/2. Since the hash function h is (n 1/3 log 3 n)-wise independent, it is easy to check that X 1 , X 2 , . . . , X p are independent. We apply Chernoff bound to X , and obtain Pr[X ≤ 2E[X ]] ≥ 1 − e −Ω(E[X ]) = 1 − e −Ω(log 4 n) . It implies that for any at most 2E[X ] groups use (u, v) as their shortcut edges. The total congestion of (u, v) is obtained by summing up 2E[X ] for all ∈ [1, q], which results in 2|P | log n/|Y| = 2|P | log n/|Y| =Õ(n 1/3 ). The lemma is proved.
For bounding dilation, we first introduce several preliminary notions and terminologies. Given a graph G = (V, E), a subset S ⊂ V is called an (α, β)-ruling set if it satisfies that (1) for any u, v ∈ S, dist G (u, v) ≥ α holds, and (2) for any node v ∈ V, there exists u ∈ S such that dist G (v, u) ≤ β holds. It is known that there exists an (α, α + 1)-ruling set for any graph G [2]. LetP i = P i + H 1 i for short. For the analysis of P i 's dilation, we first consider an (α, α + 1)-ruling set ofP i for α = 12κ D log 3 n, which is denoted by S = {s 0 , s 1 , . . . , s z }. Note that this ruling set is introduced only for the analysis, and the algorithm does not construct it actually.
The key observation of the proof is that for any s j (1 ≤ j ≤ z) H i contains a path of lengthÕ(κ D ) from s 0 to s j whp. It follows that any two nodes u, v ∈ VP i are connected by a path of lengthÕ(κ D ) in P i + H i because any node in VP i has at least one ruling-set node within distance α + 1 in P i + H 1 i . To prove the claim above, we further introduce the notion of terminal sets.
. We can show that such a set always exists. Proof. The proof is constructive. Let c = 6κ D log 3 n for short. We take an arbitrary shortest path Q = (s j = u 0 , u 1 , u 2 , . . . , u c ) of length c in P i + H 1 i starting from s j ∈ S. Since no two nodes in N + (V P i )\V P i are adjacent in P i + H 1 i , Q contains no two consecutive nodes which are both in N + (V P i ) \ V P i . It implies that at least half of the nodes in Q belongs to V P i . Let q = (u 0 , u 1 , . . . u c ) be the subsequence of Q consisting of the nodes in V P i . Then we define T j = {u 0 , u 3 , . . . , u 3 c /3 }, which satisfies the three properties of terminal sets: It is easy to check that the first and second properties hold. In addition, one can show that dist G (u x , u x+a ) ≥ 3 (which is equivalent to N + (u x ) ∩ N + (u x+a ) = ∅) holds for any a ≥ 3 and x ∈ [1, c − a]: Suppose for contradiction that dist G (u x , u x+a ) ≤ 2 holds for some a ≥ 3 and x ∈ [1, c − a]. The distance two between u x and u x+a implies N + (u x ) ∩ N + (u x+a ) ∅, and thus distP i (u x , u x+a ) ≤ 2 holds. Then bypassing the subpath from u x to u x+a in Q through the distance-two path we obtain a path from s j to u c shorter than Q. It contradicts the fact that Q is the shortest path.
The second property of terminal sets and the following lemma deduces the fact that dist P i +H i (s 0 , s j ) = O(κ D ) holds for any j ∈ [0, z]. Lemma 6. Letting S = {s 0 , s 1 , . . . , s z } be any (α, α + 1)-ruling set ofP i for α = 14κ D log 3 n, and T = {T 0 , T 1 , . . . , T z } be a terminal set associated with S. For any j ∈ [0, z], there exist u ∈ T 0 and v ∈ T j such that dist P i +H i (u, v) = O(1) holds.
Proof. Since the distance of s 0 and s j is at least 14κ D log 3 n, we have N + (T 0 ) ∩ N + (T j ) = ∅. The proof is divided into the cases of D = 3 and D = 4. (D = 3) By the conditions of N + (T 0 ) ∩ N + (T j ) = ∅ and D = 3, there exists a path of length exactly three from any node a ∈ T 0 to any node b ∈ T j . Letting e a,b be the second edge in that path, we define F = {e a,b | a ∈ T 0 , b ∈ T j }. By the third property of terminal sets and the fact of N + (T 0 ) ∩ N + (T j ) = ∅, for any two edges (x 1 , y 1 ), (x 2 , y 2 ) ∈ F, either x 1 x 2 or y 1 y 2 holds. That is, e a 1 ,b 1 e a 2 ,b 2 holds for any a 1 , a 2 ∈ T 0 and b 1 , b 2 ∈ T j . By the second property of terminal sets, it implies |F | = |T 0 ||T j | ≥ (κ D log 3 n) 2 . Since each edge in F is added to H 2 i with probability 1/n 1/2 = 1/κ 2 D , the probability that no edge in F is added to H 2 i is at most (1 − 1/κ 2 D ) (κ D log 3 n) 2 ≤ e −Ω(log 6 n) . That is, an edge e a,b is added to H i whp. and then dist P i +H i (a, b) ≤ 3 holds. (D = 4) For any node u ∈ T 0 and v ∈ T j , there exists a path from u to v of length three or four in G. That path necessarily contains a length-two sub-path P 2 (u, v) = (a uv , b uv , c uv ) such that a uv ∈ N + (u) and c uv ∈ N + (v) holds (if P 2 (u, v) is not uniquely determined, an arbitrary one is chosen). We call (a uv , b uv ) and (b uv , c uv ) the first and second edges of P 2 (u, v) respectively. Let P 2 = {P 2 (u, v) | u ∈ T 0 , v ∈ T j }, G be the union of P 2 (u, v) for all u ∈ T 0 and v ∈ T j , and P e 2 = {P 2 (u, v) ∈ P 2 | e ∈ P 2 (u, v)} for any e ∈ E G . We first bound the size of P e 2 . Assume that e is a first edge of some path in P e 2 . Let e = (a, b) and u ∈ T 0 be the (unique) node such that a ∈ N + (u) holds. Since at most |T j | paths in P 2 can start from a node in N + (u), the number of paths in P 2 using e as their first edges is at most |T j |. Similarly, if e is the second edge of some path in P e 2 , at most |T 0 | paths in P 2 can contain e as their second edges. While some edge may be used as both first and second edges, the total number of paths using e is bounded by |T 0 | + |T j | = 2κ D log 3 n. It implies that any path P 2 (u, v) can share edges with at most 4κ D log 3 n edges, and thus P 2 contains at least |T 0 ||T j |/(4κ D log 3 n + 1) ≥ κ D log 3 n/5 edge-disjoint paths. Let P 2 ⊆ P 2 be the maximum-cardinality subset of P 2 such that any P 2 (u 1 , v 1 ), P 2 (u 2 , v 2 ) ∈ P 2 is edge-disjoint. We define B = {b | (a, b, c) ∈ P 2 }. Let ∆(b) be the number of paths in P 2 containing b ∈ B as the center. Due to the edge disjointness of P 2 , we have , and X b be the indicator random variable that takes one if a path in P 2 which contains b as the center is added to H i , and zero otherwise. Let X and Y be the indicator random variables corresponding to the events of

Consequently, we have Pr
The lemma is proved.

Distributed Implementation
We explain below the implementation details of the algorithm stated above in the CONGEST model.
• (Preprocessing) In the algorithm stated above, the shortcut construction is performed only for large parts, which is crucial to bound the congestion of each edge. Thus, as a preprocessing task, each node has to know if its own part is large (i.e. having a diameter larger than κ D ) or not. While the exact identification of the diameter is usually a hard task, just an asymptotic identification is sufficient for achieving the shortcut quality stated above, where the parts of diameter ω(κ D ) and diameter o(κ D ) must be identified as large and small ones, but those of diameter Θ(κ D ) is identified arbitrarily. This loose identification is easily implemented by a simple distance-bounded aggregation. The algorithm for part P i is that: (1)At the first round, each node in P i sends its ID to all the neighbors, and (2)in the following rounds, each node forwards the minimum ID it received so far. The algorithm executes this message propagation during κ D rounds. If the diameter is (substantially) larger than κ D , the minimum ID in P i does not reach all the nodes in P i . Then there exists an edge whose endpoints identify different minimum IDs. The one-more-round propagation allows those endpoints to know the part is large. Then they start to broadcast the signal "large" using the following κ D rounds. If κ D is large, the signal "large" is invoked at several nodes in P i , and κ D -round propagation guarantees that every node receives the signal. That is, any node in P i identifies that P i is large. The running time of this task is O(κ D ) rounds.
• (Step 1) As we stated, the 1-hop extension is implemented in one round. In this step, each node v ∈ V P i tells all the neighbors if P i is large or not. Consequently, if part P i is identified as a large one, all the nodes in N + (P i ) know it after this step.
• (Step 2) The algorithm for D = 3 is trivial. For D = 4, there are two non-trivial matters. The first one is the preparation of hash function h. We realize it by sharing a random seed of O(n 1/3 log 3 n log |Y|)bit length in advance. A standard construction by Wegman and Carter [31] allows each node to construct the desired h in common. Sharing the random seed is implemented by the broadcast of one O(n 1/3 log 3 n log |Y|)-bit message, i.e., takingÕ(κ D ) rounds. The second matter is to address the fact that u does not know if P i is large or not, and/or if v belongs to N + (P i ) or not. It makes u difficult to determine if (u, v) should be added to H i or not. Instead, our algorithm simulates the task of u by the nodes in N(u). More precisely, each node v ∈ N + (V P i ) adds each incident edge (u, v) to H i with probability 1/h(u, i). Due to the fact of v ∈ N + (P i ), v knows if P i is large or not (informed in step 1), and also can compute h(u, i) locally. Thus the choice of (u, v) is locally decidable at v. Since this simulation is completely equivalent to the centralized version, the analysis of the quality also applies.
It is easy to check that the construction time of the distributed implementation above isÕ(κ D ) in total.

Low-Congstion Shortcut for Bounded Clique-width Graphs
Let G = (V, E) a graph. A k-graph (k ≥ 1) is a graph whose vertices are labeled by integers in [1, k]. The clique-width is invented first as a parameter to capture the tractability for an easy subclass of high treewidth graphs [4,3]. That is, the class of bounded clique-width can contain many graphs with high treewidth. In centralized settings, one can often obtain polynomial-time algorithms for many NP-complete problems under the assumption of bounded clique-width. The following negative result, however, states that bounding clique-width does not admit any good solution for the MST problem (and thus also for the low-congestion shortcut).

Theorem 6.
There exists an unweighted n-vertex graph G = (V, E) of clique-width six where for any MST algorithm A there exists an edge-weight function w A : E → N such that the running time of A becomes Ω( √ n + D) rounds.
We introduce the instance stated in this theorem, which is denoted by G(Γ, p) (Γ and p are the parameters fixed later), using the operations specified in the definition of clique-width. That is, this introduction itself becomes the proof of clique-width six. Let G(Γ) be the set of 6-graphs that contains one node with label 1, Γ nodes with label 2, and Γ nodes label 3, and all other nodes are labeled by 4. Then we define the binary operation ⊕ over G(Γ). For any G, H ∈ G(Γ), the graph G ⊕ H is defined as the one obtained by the following operations: (1) Relabel 2 in G with 5 and relabel 3 in H with 6, (2) take the disjoint union G ∪ H, (3) joins with labels 5 and 6, (4) relabel 5 and 6 with 4, and then 1 with 5, (5) Add a node with label 1 by operation introduce (6) join with 1 and 5, and (7) relabel 5 with 4. This process is illustrated in Figure 5. Now we are ready to define G(Γ, p). The construction is recursive. First, we define G(Γ, 1) as follows: (1) Prepare a (2Γ)-biclique K Γ,Γ where one side has label 2, and the other side has label 3. Note that two labels suffice to construct K Γ,Γ . (2) Add three nodes with label 1, 5, and 6 by operation introduce. (3) Join with label 2 and 5, and with 3 and 6. (4) Join with label 1 and 5, and with 1 and 6. (5) Relabel 5 and 6 with 4. Then, we define G(Γ, p) = G(Γ, p − 1) ⊕ G(Γ, p − 1). The instance claimed in Theorem 6 is G( √ n, log n/2), which is illustrated in Figure 6. This instance is very close to the standard hard-core instance used in the prior work (e.g., [29,30]. See Figure 1). Thus it is not difficult to see thatΩ( √ n)-round lower bound for the MST construction also applies to G( √ n, log n/2). It suffices to show that the following lemma. Combined with Theorem 2, we obtain Theorem 6. Lemma 7. G(Γ, p) ∈ G(O(Γ(2 p + 2)), Γ, 2 p + 2, 3p).
Proof. First, let us formally specify the graph G(Γ, p), which is defined as follows (vertex IDs introduced below are described in Figure 6): We define X and Q for graph G(Γ, p) as follows: X = {X 1 , X 2 , . . . , X 2 p +2 } s.t.
It is easy to check (C1) and (C2) is satisfied. Thus we only show that (C3) is satisfied. Let V R i = R i ∩ Γ j=1 V j . For 2 ≤ i ≤ (2 p + 2)/2, we have (N(V R i )\R i−1 ) = ∅. For any and 1 ≤ i ≤ 2 p−2 , if u p i is included in R , then the neighbors of u p i is included in R . For any , 1 ≤ i ≤ p and 0 ≤ j ≤ 2 i − 2, if u i j is included in R , then u i j+1 is included in R . Let u i (R ) be leftmost vertex which level is i of T and included in R . For any , 1 ≤ i ≤ p and 0 ≤ j ≤ 2 i − 1, if u i j u i (R ) and u i j is included in R , then the parent of u i j is included in R . Thus |(N(R )\R −1 )| only includes neighbors of u i (R ) for 1 ≤ i ≤ p and 2 ≤ ≤ (2 p + 2)/2. Since the tree T is binary tree, u i (R ) has at most 3 neighbors in T. Therefore we have |E ((N(R i )\R i−1 )) | ≤ 3p. Similarly, we have |E ((N(L i )\L i−1 )) | ≤ 3p. Therefore we can prove that the graph G(Γ, p) is included in G(O(Γ(2 p + 2)), Γ, 2 p + 2, 3p). By Theorem 2, the lower bound of constructing MST in G(O(Γ(2 p +2)), Γ, 2 p +2, 3p) isΩ((min{Γ/3p, ((2 p + 2) /2−1}). When Γ = Θ( √ n) and 2 p = Θ( √ n), we obtain theΩ( √ n) lower bound.