Faster Cut Sparsification of Weighted Graphs

A cut sparsifier is a reweighted subgraph that maintains the weights of the cuts of the original graph up to a multiplicative factor of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1\pm \epsilon )$$\end{document}(1±ϵ). This paper considers computing cut sparsifiers of weighted graphs of size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n\log (n)/\epsilon ^2)$$\end{document}O(nlog(n)/ϵ2). Our algorithm computes such a sparsifier in time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(m\cdot \min (\alpha (n)\log (m/n),\log (n)))$$\end{document}O(m·min(α(n)log(m/n),log(n))), both for graphs with polynomially bounded and unbounded integer weights, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha (\cdot )$$\end{document}α(·) is the functional inverse of Ackermann’s function. This improves upon the state of the art by Benczúr and Karger (SICOMP, 2015), which takes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(m\log ^2 (n))$$\end{document}O(mlog2(n)) time. For unbounded weights, this directly gives the best known result for cut sparsification. Together with preprocessing by an algorithm of Fung et al. (SICOMP, 2019), this also gives the best known result for polynomially-weighted graphs. Consequently, this implies the fastest approximate min-cut algorithm, both for graphs with polynomial and unbounded weights. In particular, we show that it is possible to adapt the state of the art algorithm of Fung et al. for unweighted graphs to weighted graphs, by letting the partial maximum spanning forest (MSF) packing take the place of the Nagamochi–Ibaraki forest packing. MSF packings have previously been used by Abraham et al. (FOCS, 2016) in the dynamic setting, and are defined as follows: an M-partial MSF packing of G is a set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {F}=\{F_1, \ldots , F_M\}$$\end{document}F={F1,…,FM}, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_i$$\end{document}Fi is a maximum spanning forest in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G{\setminus } \bigcup _{j=1}^{i-1}F_j$$\end{document}G\⋃j=1i-1Fj. Our method for computing (a sufficient estimation of) the MSF packing is the bottleneck in the running time of our sparsification algorithm.


Introduction
In many applications, graphs become increasingly large, hence storing and working with such graphs becomes a challenging problem. One strategy to deal with this issue is graph sparsification, where we model the graph by a sparse set of (reweighted) edges that preserve certain properties. Especially because the aim is to work with large input graphs, this process should be efficient with respect to the graph size. Among the different types of graph sparsifiers, there are spanners (preserving distances, resistance sparsifiers (preserving effective resistances, see e.g. [DKW15]), see e.g. [PS89, ADD + 93, BS07, EN16]), cut sparsifiers (preserving cuts, see e.g. [BK96,BK15,FHHP19]), and spectral sparsifiers (preserving Laplacian quadratic forms, see e.g. [ST11,SS11,KX16,LS17]). This paper focuses on cut sparsifiers, as first introduced by Benczúr and Karger in [BK96]. We say that a (reweighted) subgraph H ⊆ G is a (1 ± ǫ)-cut sparsifier for a weighted graph G if for every cut C, the total weight w H (C) of the edges of the cut in H is within a multiplicative factor of 1 ± ǫ of the total weight w G (C) of the edges of the cut in G.
The main approach to compute cut sparsifiers uses the process of edge compression: each edge e ∈ E is part of the sparsifier with some probability p e , and if selected obtains weight w(e)/p(e). It is immediate that such a scheme gives a sparsifier in expectation, but it has to be shown that the result is also a sparsifier with high probability. The main line of research has been to select good connectivity estimators λ e for each edge such that sampling with p e ∼ 1/λ e yields a good sparsifier. The simplest such result is by Karger [Kar99], where we sample uniformly with each λ e equal to the weight of the min cut. Continuing along these lines are parameters as: edge connectivity [FHHP19], strong connectivity [BK96,BK15], electrical conductance [SS11], and Nagamochi-Ibaraki (NI) indices [NI92a,NI92b,FHHP19]. The challenge within the approach of edge compression is to find a connectivity estimator that results in a sparse graph, but can be computed fast.
For weighted graphs, there are roughly three regimes for sparsification. The first regime consists of cut sparsifiers of size O(n log 2 (n)/ǫ 2 ). Fung, Hariharan, Harvey, and Panigrahi [FHHP11,FHHP19] show that sparsifiers of this asymptotic size can be computed in linear time for polynomially-weighted graphs. For this they introduce a general framework of cut sparsification with a connectivity estimator, see Section 2.1. For unbounded weights, Hariharan and Panigrahi [HP10] give an algorithm to compute a sparsifier of size O(n log 2 (n)/ǫ 2 ) in time O(m log 2 (n)/ǫ 2 ).
The second regime consists of cut sparsifiers of size O(n log(n)/ǫ 2 ). Benczúr and Karger [BK96,BK15] show that these can be computed in time O(m log 2 (n)) for polynomially-weighted graphs, and in time O(m log 3 (n)) for graphs with unbounded weights. Note that these results can be optimized by preprocessing with the algorithms for the first regime.
A third regime, consists of sparsifiers of size O(n/ǫ 2 ). The known constructions in this regime yield spectral sparsifiers, which are more general than cut sparsifiers. Spectral sparsification was first introduced by Spielman and Teng in [ST11]. It considers subgraphs that preserve Laplacian quadratic forms. Lee and Sun [LS17] give an algorithm for finding (1 ± ǫ)-spectral sparsifiers of size O(n/ǫ 2 ) in time O(m · poly(log(n), 1/ǫ)). Analyzing their results, we believe that the poly-logarithmic factor contributes at least a factor of log 10 (n). While this is optimal in size, both for spectral sparsifiers [BSS12] and cut sparsifiers [ACK + 16], it is not in time.
In this paper, we improve on the results in the second regime, both for graphs with polynomially bounded and unbounded weights 1 . For an overview of the previous best running times and our results, see Figure 1. We present our sparsification algorithm in Section 4, with the special treatment of unbounded weights in Section 5. Our algorithm improves on the algorithm of Benczúr and Karger [BK96,BK15] for bounded weights, which has been unchallenged for the last 25 years. It also improves on the algorithm of [HP10] for unbounded weights, which has been unchallenged for the last 10 years. We obtain the following theorem, where α(·) refers to the functional inverse of Ackermann's function, for a definition see e.g. [Tar75]. For any realistic value x, we have α(x) ≤ 4. Theorem 1.1. There exists an algorithm that, given a weighted graph G and a freely chosen parameter ǫ ∈ (0, 1), computes a graph G ǫ , which is a (1 ± ǫ)-cut sparsifier for G with high probability. The running time of the algorithm is O(m · min(α(n) log(m/n), log(n))) and the number of edges of G ǫ is O n log(n)/ǫ 2 .
Using preprocessing with a result from [FHHP19] (see Theorem 2.5), we obtain the following corollary for polynomially-weighted graphs. Corollary 1.2. There exists an algorithm that, given a polynomially-weighted graph G and a freely chosen parameter ǫ ∈ (0, 1), computes a graph G ǫ , which is a (1 ± ǫ)-cut sparsifier for G with high probability. The running time of the algorithm is O(m+n log 2 (n)/ǫ 2 α(n) log(log(n)/ǫ)) and the number of edges of G ǫ is O(n log(n)/ǫ 2 ).
Following Benczúr and Karger [BK15], the computation of cut sparsifiers of graphs with fractional or even real weights can be reduced to integer weights. For the reduction see Appendix B. Thus our algorithm also gives a speedup for such graphs. Since the integer case is the essential one, we follow prior works and only formulate our results for this particular case.
As a direct application of the cut sparsifier, we can use Theorem 1.1 and Corollary 1.2 to replace m by n log(n)/ǫ 2 in the time complexity of algorithms solving cut problems, at the cost of a (1 ± ǫ)-approximation. We detail the effects for the minimum cut problem. Recently, Gawrychowski, Mozes, and Weiman [GMW20] showed that one can compute the minimum cut of a weighted graph in O(m log 2 (n)) time. Using sparsification [BK15,FHHP19] for preprocessing, the state of the art for (1 + ǫ)-approximate min-cut is O(m + n log 4 (n)/ǫ 2 ). When we use our new sparsification results, we obtain faster (1 + ǫ)-approximate min-cut algorithms when m = Ω(n log(n)/ǫ 2 ). Corollary 1.3. There exists an algorithm that, given a polynomially-weighted graph G and a freely chosen parameter ǫ ∈ (0, 1), with high probability computes an (1 + ǫ)-approximation of the minimum cut in time O(m + n log 3 (n)/ǫ 2 ).
For unweighted graphs, even faster minimum cut algorithms exist: Ghaffari, Nowicki, and Thorup [GNT20] show that we can find the minimum cut in O(min{m + n log 3 (n), m log(n)}) time. Combining this with the linear time cut sparsifier of Fung et al. [FHHP19], we get (1 + ǫ)approximate minimum cut in unweighted graphs in O(m+n log(n) min{1/ǫ+log 2 (n), log(n)/ǫ}) time.
The remainder of this article is organized as follows. The rest of the introduction consists of a technical overview of our algorithms. Section 2 contains a review of the general sparsification framework from [FHHP19] tailored to our needs, and can be skipped by readers that are already familiar with this work. We present our algorithm to compute the MSF indices in Section 3. This is used as a black box in our algorithm, which is presented and analyzed in Section 4. In Section 5, we show how the results of Section 4 generalize to graphs with unbounded weights.

Technical Overview
The high-level set-up of our sparsification algorithm is similar to the algorithm for unweighted graphs of Fung et al. [FHHP19]. Our main contribution consists of showing how to generalize this technique to weighted graphs, by using maximum spanning forest (MSF) indices instead of Nagamochi-Ibaraki (NI) indices. On a less significant note, we prove that by a tightening of the analysis one can show that the size and time bounds hold with high probability, and not only in expectation.
NI indices are defined by means of an NI forest packing: view graphs with integer weights as unweighted multigraphs, and repeatedly compute a spanning forest. The NI index is the (last) forest in which an edge appears (for details see Definition 2.4). The MSF index is also defined by a forest packing, but in this case the MSF packing: we say F = {F 1 , . . . , F M } is an M -partial maximum spanning forest packing of G if for all i = 1, . . . , M , F i is a maximum spanning forest in G \ i−1 j=1 F j . Now, we say that an edge e has MSF index i (w.r.t. to some (partial) MSF packing F ) if e appears in the i-th forest F i of the (partial) MSF packing F . The MSF index has been used previously in the context of dynamic graph sparsifiers (see Abraham et al. [ADK + 16]). However, there it was only used because it rendered a faster running time, but using NI indices in the corresponding static construction would have been possible as well.
In this paper, we use distinctive properties of the MSF index, and the NI index would not suffice. We show that using the MSF index, we can generalize the sparsification algorithm for unweighted graphs to an algorithm for weighted graphs, thereby demonstrating that the MSF index is a natural analogue for the NI index in the weighted setting. We provide an algorithm to compute an M -partial MSF packing in time O(m · min(α(n) log(M ), log(n))) for polynomiallyweighted graphs. We show that for unbounded weights we can compute a sufficient estimation, also in time O(m · min(α(n) log(M ), log(n))).
An important distinction between the unweighted algorithm of Fung et al. and our weighted algorithm, is that the use of contractions to keep running times low throughout the algorithm is no longer possible: edges of different weights have to be treated differently, hence cannot be contracted. By using multiple iterations with an exponentially decreasing precision parameter we can overcome this problem.
In the case of a polynomially-weighted input graph, the algorithm consists of two main phases. In the first phase, we compute sets F 0 , F 1 , . . . , F Γ ⊆ E, where edges satisfy some lower bound on the weight of any cut separating their endpoints. In the second phase, we sample edges from each set F i with a corresponding probability.
We set a parameter ρ = Θ ln(n) ǫ 2 and start by computing a 2ρ-partial maximum spanning forest packing for G. We define F 0 to be the union of these 2ρ forests. We add the edges of F 0 to G ǫ , which will become our sparsifier. We sample each of the remaining edges E \ F 0 with probability 1/2 to construct X 1 . To counterbalance for the sampling, we will boost the weight of each sampled edge with a factor 2. Now we continue along these lines, but in each iteration we let F i consist of an exponentially growing number of spanning forests: F i is defined as the union of the forests in a (2 i+1 · ρ)-partial MSF packing packing of X i . Then, X i+1 is sampled from the remaining edges X i \ F i , where again each edge is included with probability 1/2. We continue this process until there are sufficiently few edges left in X i+1 . We add these remaining edges to G ǫ .
The second phase of the algorithm is to sample edges from the sets F i and add these sampled edges to G ǫ . Hereto, note that an edge e of F i (for i ≥ 1) was not part of F i−1 , meaning it was not part of any spanning forest in a (2 i · ρ)-partial MSF packing of X i−1 . This implies that for an edge e ∈ F i the weight of any cut C in X i−1 containing e is at least 2 i · ρ · w(e). Now we use the general framework for cut sparsification of Fung et al. [FHHP19], which boils down to the fact that this guarantee on the weights of cuts implies that we can sample edges from F i with probability proportional to 1/(2 i w(e)). We show that this results in a sufficiently sparse graph.
Intuitively, it might seem redundant to sample edges from X i \ F i to form X i+1 . This is indeed not necessary to guarantee that the resulting graph is a sparsifier. However, it ensures that the number of iterations is limited, which leads to better bounds on the size of the sparsifier and the running time. Since we sample edges with probability 1/2 in each phase, we need to repeat the sampling O(log(m/(m 0 )) times to get the size of X i down to O(m 0 ). As this number of steps depends on the initial number of edges m, we get better bounds for size and running time if m is already small. We will exploit this by preprocessing the graph with an algorithm from [FHHP19] that gives a cut sparsifier of size O(n log 2 (n)/ǫ 2 ) in linear time. Moreover, we can show that repeatedly calling our algorithm has no worse asymptotic time bound than calling it once, since the input graph becomes sparser very quickly. By doing so, we obtain a sparsifier of size O(n log(n)/ǫ 2 ).
Since we only use that the MSF index gives a guaranteed lower bound on the connectivity of an edge, one might wonder why the NI index does not work here. After all, the NI indices of a graph can be computed in linear time, which would result in a significant speed-up. However, when computing the NI index, the weight of an edge influences the number of forests necessary, while computing the MSF index only requires the comparison of weights. Moreover, the number of trees in a MSF packing is always bounded by n. We can use this to bound the number of edges in the created sparsifier. The same technique with NI indices would make the size of the sparsifier depend on the maximum weight in the original graph.
To show that the algorithm outputs a cut sparsifier, it needs to be proven that both the sampling in the first and the second phase preserve cuts. We follow the lines of the analysis of [FHHP19], which makes use of cut projections and Chernoff bounds. We show that by partitioning the edge sets according to their weight this method extends to weighted graphs.
One part of the algorithm has remained unaddressed: the computation of the maximum spanning forests. The approach we use here is related to Kruskal's algorithm for computing minimum spanning trees [Kru56]. We start by sketching the M -partial MSF packing algorithm for polynomial weights. We sort the edges according to their weights using radix sort in O(m) time. We create M empty forests on n vertices. Starting with the heaviest edge, we add each edge e to the first forest in which it does not create a cycle. We can find this forest using a binary search in log(M ) steps. By using a disjoint-forest representation for the union-find data structure necessary to carry out these steps, we achieve a total time of O(mα(n) log(M )).
When working with unbounded weights, the bottleneck is the initial sorting of the edges. Radix sort does not guarantee to be efficient for unbounded weights. Instead we could use a comparison-based algorithm, such as merge sort, which takes time O(m log(n)). By employing a different data structure than before, we can guarantee total running time O(m log(n)). However, we do not need the exact MSF indices for our sampling procedure, an estimate suffices. We can apply a 'windowing' technique from [BK15] to split the graph into subgraphs, where we can rescale the weights to polynomial weights and apply our previously mentioned algorithm. We then achieve a total running time of O(mα(n) log(M )), as before. For more details on this, we refer to Section 3.1. So in total we have running time O(m · min(α(n) log(m/n), log(n))).

Notation and Review
Throughout this paper, we consider G = (V, E) to be an undirected, integer weighted graph on |V | = n vertices with |E| = m edges. We define a set of edges C ⊆ E to be a cut if there exists a partition of the vertices V in two non-empty subsets A and B, such that C consists of all edges with one endpoint in A and the other endpoint in B. The weight of the cut is the sum of the weights of the edges of the cut: w G (C) = e∈C w G (e). The minimum cut is defined as the cut with minimum weight. We say that a (reweighted) subgraph H ⊆ G is a (1 ± ǫ)-cut sparsifier for a weighted graph G if for every cut C in H, its weight w H (C) is within a multiplicative factor of 1 ± ǫ of its weight w G (C) in G. A key concept in the realm of cut sparsification is the connectivity of an edge.
Definition 2.1. Let G = (V, E) be a graph, possibly weighted. We define the connectivity of an edge e = (u, v) ∈ E to be the minimal weight of any cut separating u and v. We say that e is k-heavy if it has connectivity at least k. For a cut C, we define the k-projection of C to be the k-heavy edges of the cut C.
The following theorem from [FHHP19] bounds the number of distinct k-projections of a graph, it is a generalization of a preceding theorem by Karger, see [Kar93,KS96]. This result can be useful when showing that cuts are preserved by a sampling scheme. This is due to the fact that while there may be exponentially many different cuts, this theorem shows that there are only polynomially many cut projections. Hence if one can reduce a claim for cuts to their k-projections, a high probability bound can be obtained through the application of a Chernoff bound.
Theorem 2.2. For any k ≥ λ and any η ≥ 1, the number of distinct k-projections in cuts of weight at most ηk in a graph G is at most n 2η , where λ is the weight of a minimum cut in G.
Throughout this paper, we say a statement holds with high probability (w.h.p.) if it holds with probability at least 1 − n c , for some constant c. This constant can be modified by adjusting the constants hidden in asymptotic notation.

A General Framework for Cut Sparsification
We review the general framework for cut sparsification as presented in [FHHP19]. This section does not contain new results, and can be skipped by readers that are only interested in our contribution.
The framework shows that edges can be sampled using different notions of connectivity estimators. Although this scheme provides one proof for the validity of multiple parameters, it might be worth noting that an analysis tailored to the used connectivity estimator might provide a better result. For example, when the framework is applied with 'edge strengths', it produces a sparsifier of size O(n log 2 (n)/ǫ 2 ), a log(n) factor denser than the edge strength-based sparsifier of Benczúr and Karger [BK15].
Let G = (V, E) be a graph with integer weights, and let ǫ ∈ (0, 1), c ≥ 1 be parameters. Given a parameter γ (possibly depending on n) and an integer-valued parameter λ e for each e ∈ E. We obtain G ǫ from G by independently compressing each edge e with parameter p e = min 1, 16(c + 7)γ ln(n) 0.38λ e ǫ 2 .
Compressing an edge e with weight w(e) consists of sampling r e from a binomial distribution with parameters w(e) and p e . If r e > 0, we include the edge in G ǫ with weight r e /p e . In the following we describe a sufficient condition on the parameters γ and λ e such that G ǫ is a (1 ± ǫ)-cut sparsifier for G with probability at least 1 − 4/n c . Hereto we partition the edges according to their value λ e : where e (C) = e∈C w G (e) and e The following theorem shows that compressing with parameters adhering to these conditions gives a cut sparsifier with high probability. If there exists G satisfying Π-connectivity and γ-overlap for some Π, then G ǫ is a (1 ± ǫ)-cut sparsifier for G, with probability at least 1 − 4/n c , where G ǫ is obtained by edge compression using parameters γ and λ e 's.

A First Application of the Framework
In this section, we review the application of the framework from the previous section with Nagamochi-Ibaraki (NI) indices as parameters, as presented in [FHHP19]. As the name suggests, NI indices were first introduced by Nagamochi and Ibaraki [NI92a,NI92b]. The algorithm they provide gives a graph partitioning into forests, and subsequently a corresponding index for each edge, called the NI index.
If G is a weighted graph, each edge e must be contained in w(e) contiguous forests. We define the NI index, denoted by l e , to be the index of the (last if weighted) forest in which e appears.
Nagamochi and Ibaraki show that the NI indices can be computed in linear time for unweighted graphs and in O(m + n log(n)) time for weighted graphs, see [NI92b,NI92a]. As is shown in [FHHP19], we can use the NI index as the connectivity estimator in the sparsification framework to obtain the following result.
The sampling itself takes at most O(m) time, as explained in Section 4.4. As the NI indices can be computed in O(m + n log(n)) time, this implies that the total running time is O(m + n log(n)). As a graph with m ≤ n log(n) is already sparse, we can assume m > n log(n). Thus, for our purposes, the total running time is simply O(m).
Next we provide a bound for the number of edges in the sparsifier G ǫ . [FHHP19] proves this same bound in expectation, we provide a proof for this bound 'with high probability'. Lemma 2.6. With high probability, the size of the graph G ǫ in Theorem 2.5 is O(n log 2 (n)/ǫ 2 ).
For each neighbor u of v in G, we compress the edge e = (u, v) with parameter p e = min 1, 224 ln(n) 0.38ǫ 2 le , where l e is the NI index of e. For each edge, the probability that it remains after compression is Let Y e be the random variable that is 1 if e remains, and 0 else. We note that E e:v∈e Y e ≤ 224 0.38 ln 2 (n)/ǫ 2 . Now we apply a Chernoff bound (Theorem A.2) to obtain Using a union bound we get the desired result Consequently, we obtain that with high probability the number of edges of the sparsifier is at most O(n log 2 (n)/ǫ 2 ).
The state of the art for polynomially-weighted graphs is achieved by postprocessing this result with the algorithm by Benczúr and Karger [BK15]. Thus our improvement on [BK15] leads to an overall improved result.

The Computational Model
If we have an input graph G = (V, E) with weights w : E → {1, . . . , W }, we assume our computational model has word size Θ(log(W ) + log(n)). Note that for polynomial weights, this comes down to a word size of Θ(log(n)). Moreover, we assume that basic operations on such words have uniform cost, i.e., they can be performed in constant time. In particular, these basic operations are addition, multiplication, inversion, logarithm, and sampling a random bit string of word size precision. Such assumptions are in line with previous work [BK15,FHHP19], where they are made implicitly.

A Maximum Spanning Forest Packing
An important primitive in our algorithm is the use of the maximum spanning forest (MSF) index. The concept is similar to the Nagamochi-Ibaraki index, the important difference is that an edge e with weight w(e) appears in w(e) different NI forests. This means that the number of NI forests depends on the numerical values of the edge weights, and thus can grow far beyond O(n). On the other hand, the number of maximum spanning forests in a MSF packing is bounded by the maximum degree in the graph, hence also by n. While this already has noteworthy implications for polynomially-weighted graphs, it is even more significant for superpolynomially-weighted graphs. We believe that this property might make them suitable for applications other than presented here.
Note that we do not demand the F i ∈ F to be non-empty, as this suits notation bests in our applications. Also note that a (partial) MSF packing is fully determined by the MSF indices.
The following theorem states that computing the MSF indices up to M takes O(mα(n) log(M )) time for polynomially-weighted graphs. Proof. The outline of the algorithm is as follows.
1. Sort the edges by weight in descending order using radix sort in base n. We need at most M trees, since we only compute an M -partial MSF packing. By using radix sort, the initial sorting takes time O(cm) time (for a time bound of radix sort, see e.g. [CLRS09]). We show that the remainder of the algorithm can be executed in O(mα(n) log(M )) time.
For every 1 ≤ i ≤ M we maintain the non-singular components of F i with a union-find data structure (supporting the three operations MakeSet i , Union i , and FindSet i ). To be precise, we use the disjoint-set forest representation of Tarjan [Tar75] (see e.g. [CLRS09,Chapter 21]). Additionally, for every node v ∈ V we maintain s(v), the smallest index i such that {v} is a singleton component of F i .
For the binary search in Step 3a it is sufficient to first search over indices i < min{s(u), s(v)}. If this search is successful and we find such an index i < min{s(u), s(v)}, then we perform Union i (u, v). Otherwise, we have learned that min{s(u), s(v)} is the smallest index i such that u and v are not connected in F i . The algorithm then proceeds as follows: , then we perform MakeSet j (u) and increase s(u) by one.
• If j = s(v) (which could also be the case in addition to j = s(u)), we perform MakeSet j (v) and increase s(j) by one.
• Finally, we perform Union j (u, v). Now let ϕ i , χ i , and ψ i denote the number of MakeSet i -, Union i -, and FindSet i -operations in the i-th union-find data structure, respectively. Since we use the disjoint-set forest representation, we obtain a bound on the running time (see [CLRS09,Theorem 21.14 To obtain a bound on the total running time for all operations in the union-find data structures, we sum over all i: . Now observe that in total we perform at most two MakeSet-operations per edge (one for each of its endpoints) and thus M i=1 ϕ i ≤ 2m. The number of Union-operations is always bounded above by We therefore arrive at a total running time of O(cm + mα(n) log(M )) = O(m (α(n) log(M ) + c)).
If we want to compute the full maximum spanning forest packing, it suffices to set M to be the maximum degree in the graph. When M is large, managing the data structures slightly differently yields a better result. Proof. We use the same algorithm as in Theorem 3.2 with two simple changes. In step 1, we use an optimal comparison-based sorting algorithm, like merge sort, instead of radix sort. This takes time O(m log(n)). In steps 2 and 3, we use a linked-list representation [CLRS09, Chapter 21] instead of the disjoint-set forest representation. To analyze the running time, recall the following notation. Let ϕ i , χ i , and ψ i denote the number of MakeSet i -, Union i -, and FindSet i -operations in the i-th union-find data structure, respectively. Here MakeSet i -, Union i -, and FindSet i are the operations on the component F i . By [CLRS09, Theorem 21.1], we obtain a bound on the running time of: O(χ i + ψ i + ϕ i log(ϕ i )). We sum over all i to obtain As before, we have Note that if we do not have parallel edges, then M ≤ n, so the running time simplifies to O(m log(n)). Also note that the weights no longer need to be bounded for this result. In the next section, we consider an algorithm for sparse graphs with unbounded weights.

An Estimation for Unbounded Weights
For our purposes we do not need the exact MSF indices, but an estimate suffices. The MSF index guarantees that if an edge e = (u, v) ∈ E has MSF index f e , then there are at least f e paths from u to v, where every edge on such a path has weight at least w(e). We relax this, to get the guarantee that if an edge e = (u, v) ∈ E has estimated MSF indexf e , then there are at leastf e paths from u to v, where every edge on such a path has weight at least (1 − 1/n)w(e). When we only compute estimates, we can do this faster than when we compute exact indices. The following lemma is inspired by the windowing technique of Benczúr and Karger [BK15], which shows that strong connectivities can be computed efficiently for graphs with unbounded weights by 'windowing' these weights. This means we divide the graph into subgraphs according to an estimate and compute the sought connectivity estimators in these subgraphs. Hereto, we first compute a single maximum spanning forest F for G. Now we define d(e) to be the minimum weight among the edges on the path from u to v in F , where e = (u, v). This can be done in total time O(m + n), see [Tho99]. Proof. We will split the graph G into graphs G (D) for different values of D. In each G (D) we compute the estimatorf e for some subset of edges from E ′ . We iteratively define D to be the highest value among the d(e) for which e ∈ E ′ andf e has not been computed yet. We look at the subgraph G Next we compute an estimator of the MSF index in G (D) , by computing the MSF indices in a reweighted graph. We rescale the graph by multiplying all weights with n 3 /D and rounding to the closest integer. This means that we have an error in the weight of at most D/n 3 . For an edge with D/n 2 < w(e) ≤ D, this means that the error is at most w(e)/n. So using Theorem 3.2, we can compute the MSF indices in this reweighted sugraph with edge weights bounded by n 3 in time O(m ′ α(n) log(M )), where there is a multiplicative error in edge weights of at most (1 ± 1/n). Note that each edge appears in at most two subgraphs, hence we have a total time of O(mα(n) log(M )).

Cut Sparsification for Weighted Graphs
In this section, we present our algorithm for computing a (1 ± ǫ)-cut sparsifier G ǫ for a weighted graph G. This makes use of the framework as presented in Section 2.1 and the maximum spanning forest packing as treated in Section 3. This section works towards proving the following theorem for polynomially-weighted graphs. In Section 5, we will generalize the techniques of this section to graphs with unbounded weights.
Theorem 4.1. There exists an algorithm that, given a weighted graph G = (V, E), and freely chosen parameter ǫ > 0, computes a graph G ǫ , which is a (1 ± ǫ)-cut sparsifier for G with high probability. The algorithm runs in time O(m · min(α(n) log(m/n), log(n))) and the number of edges of G ǫ is O n log(n)/ǫ 2 log m/(n log(n)/ǫ 2 ) .
To be precise, we give an algorithm where the given bounds on both running time and size of the sparsifier hold with high probability. By simply halting when the running time exceeds the bound, and outputting an empty graph if we exceed the size bound, this gives the result above.
To achieve a better bound on the size of the sparsifier, we repeatedly apply this theorem to the input graph, with an exponentially decreasing precision parameter.  ∈ (0, 1), computes a graph G ǫ , which is a (1 ± ǫ)-cut sparsifier for G with high probability. The algorithm runs in time O(m · min(α(n) log(m/n), log(n))) and the number of edges of G ǫ is O n log(n)/ǫ 2 .
Proof. We obtain this result by repeatedly applying the algorithm from Theorem 4.1, for a total of k := log * m n log(n)/ǫ 2 times. In iteration i, we set ǫ i := ǫ/2 k−i+2 and denote the output of this iteration by G i . This means that G i is a (1 ± ǫ/2 k−i+2 )-cut sparsifier for G i−1 . In total, we see that Since k = log * m n log(n)/ǫ 2 = O(log * (n)), all bounds hold with high probability simultaneously, and thus the end result holds with high probability. Now for the size bound, we have that for some constant C > 0, where we denote m 0 := m. We will show by induction that which means in particular that m k = O n log(n)/ǫ 2 . The claim for m 1 is immediate. Suppose it holds for i − 1, then Note that We have log * (x) = O(log log(x)), hence we obtain 4 log * (x) log(x) = O(log 2 (x)) = O(x). Using this with x = m n log(n)/ǫ 2 gives us total running time =O(m · min(α(n) log(m/n), log(n))).

The Algorithm
To sparsify the graph, two methods of sampling are used. One of which is the framework presented in Section 2.1. However, instead of applying the framework to the graph directly, there is another sampling process that precedes it. To simplify equations, let us set ρ := (7+c)1352 ln(n) 0.38ǫ 2 . If |E| ≤ 4ρn log m/(n log(n)/ǫ 2 ) , we do nothing. That is, we return G ǫ = G. If not, we start by an initialization step and continue with an iterative process, which ends when the remaining graph becomes sufficiently small.
In iteration i, we create X i+1 from Y i by sampling each edge with probability 1/2. Next, we compute k i := ρ · 2 i+1 maximum spanning forests T 1 , . . . , T ki . We define F i := ki j=1 T j , and We continue until Y i has at most 2ρn edges, and set Γ to be the number of iterations. We retain all edges in F 0 . In other words: add each edge e ∈ F 0 to G ǫ with weight w(e). The edges of Y Γ are also retained, but they need to be scaled to counterbalance the Γ − 1 sampling steps: add each edge e ∈ Y Γ to G ǫ with weight 2 Γ−1 w(e).
Any other edge e ∈ F i is at least k i w(e)-heavy in X i−1 , as e / ∈ F i−1 . We exploit this heavyness to sample from these edges using the framework. For each e ∈ F i we: • Generate r e from the binomial distribution with parameters n e and p e ; • If r e is positive, add e to G ǫ with weight r e /p e . The factor 2 i in calling upon the binomial distribution can be seen as boosting the weight of the edge by a factor 2 i , which is needed to counterbalance the i sampling steps in creating F i .
Up to the computation method of the MSF packing, the presented algorithm is the same for polynomially and superpolynomially-weighted graphs. For the unbounded case, we use the MSF index estimator as presented in Section 3.1. In Section 5 we detail how this influences the correctness of the algorithm, and the bounds on size and running time.

Correctness
We will prove that G ǫ constructed in Sparsify(V, E, w, ǫ,c) is a (1 ± ǫ)-cut sparsifier for G with probability at least 1 − 8/n c . Following the proof structure of [FHHP19], we first define where Γ is the maximum number such that F i = ∅. We define G S := (V, S). And we prove the following two lemmas, that together yield the desired result.
Consequently, e is also w(e)k i -heavy in G i = (V, X i ).
Proof. Since e ∈ Y i = X i \ F i , we know that e was not part of any maximum spanning forest in a k i -partial MSF packing F i of G i . Hence, by definition of the maximum spanning forests, each of the forests in F i has a path connecting the vertices of e, with all edges of weight at least w(e). Thus any cut in G ′ i picks up a contribution of at least w(e) for each of the k i paths. Hence the minimum cut in G ′ i separating the vertices of e has value at least w(e)k i , or equivalently e is w(e)k i -heavy in G ′ i .
Next, we show in a general setting that certain ways of sampling preserve cuts. The following lemma is a generalization of Lemma 5.5 in [FHHP19].
We will show that for each j the statement of the lemma holds true with probability at least 1 − 2n (4−ζ)n j . Then the lemma follows from the union bound since where we use that n 4−ζ ≤ 1/2. Let C ∈ C j . For every e ∈ R, define the random variables Y e that takes value w(e) with probability p and 0 otherwise. We have Y e ∈ [0, 1], E[Y e ] = pw(e), and e∈R Y e = pr (C) . Now we apply Theorem A.1 with ǫ = δq (C) /r (C) and µ = pr (C) to obtain where the last inequality holds as As every edge in R ∩ C is π-heavy in (V, Q), we can apply Theorem 2.2 to see that the number of distinct sets R ∩ C is at most: Thus the union bound gives us that the statement of the lemma holds true for all cuts C ∈ C j with probability at least 1 − 2n (4−ζ)2 j .
We want to apply this lemma to our sampling procedure. We do this by considering different weight classes separately. We define X i,k := {e ∈ X i : 2 k ≤ w(e) ≤ 2 k+1 − 1}, and x We define Y i,k and y (C) i,k analogously. Some rescaling is necessary to ensure that all weights lie in (0, 1], as Lemma 4.5 requires. For A ⊆ E and β > 0, we write βA to indicate we multiply the weight of the edges by a factor of β. Lemma 4.6. With probability at least 1 − 4/n 4+c , for every cut C in G i , A closer look shows us that we also have that any π = ρ · 2 i , p = 1/2, and δ = ǫ/13 2 i/2+1 , and we check that (7 + c)1352 ln(n) 0.38ǫ 2 = (7 + c) ln(n) 0.38 . k k ′

Figure 2: A visualization of the area covered by
So we can apply Lemma 4.5 with these settings to obtain: which holds for all cuts C with probability 1 − 4/n 3+c .
Now we look at the general case, for which we sum all weight classes. Hereto, we define x Proof. We rescale and sum over k for each of the weight classes in Lemma 4.6 to get Next, we want to interchange the sum over k with the sum over k ′ , a visual argument for the adjustment of the bounds can be found in Figure 2.
which holds simultaneously for all cuts C with probability at least 1 − 4/n 1+c . The reason is that at most m ≤ n 2 of the X i,k ∩ C are non-empty, hence a union bound gives the desired bound on the probability.
We will repeatedly apply this lemma. To show that the accumulated error does not grow beyond ǫ/3, we use the following fact. For a proof we refer to [FHHP19]. Lemma 4.8. Let x ∈ (0, 1] be a parameter. Then for any k ≥ 0, As a final step towards proving Lemma 4.2, we prove a lemma that focusses on the sparsification occurring in the last Γ − j + 1 iterative steps of our algorithm.
Note that setting j = 0 gives us Lemma 4.2. Although this lemma is a generalization of the corresponding case for unweighted graphs in [FHHP19], the proof for the weighted case will be exactly the same: all the work that needed to be done is contained in the previous lemmas. We include the proof here for completeness.
Proof of Lemma 4.9. Let C be a cut. We define s analogously. We will show that the weight of C in S j is at most (1 + (ǫ/3)2 −j/2 ) times the weight of C in G j .
We repeat the last step Γ − j − 1 times to conclude The proof of s j (1 − (ǫ/3)2 −j/2 ) is analogous. As we have that Γ ≤ n, we can use a union bound to conclude that Lemma 4.7 holds for all simultaneously with probability at least 1 − 4/n c , which concludes the proof.
To prove Lemma 4.3, we will invoke the framework from [FHHP19], as given in Section 2.1. More specifically, we will apply Theorem 2.3. We set the parameter γ := 64/3, and for each e ∈ F i we set λ e := ρ · 4 i w(e). This is in line with our choice for p e : min 1, 16(c + 7)γ ln(n) 0.38λ e ǫ 2 = min 1, 16(c + 7)γ ln(n) 0.38ρ · 4 i w(e) e ǫ 2 = min 1, 384 169 We have to provide a set of subgraphs G and a set of parameters Π such that Π-connectivity and γ-overlap are satisfied.
To explore the connectivity of edges in R i := {e ∈ E : 2 i ≤ λ e ≤ 2 i+1 − 1} we partition these sets as follows: We will view these edges in the subgraph: Next, we want to replace X j−1 with S j−1 . Hereto, we apply Lemma 4.6 with ǫ = 13 · 2 i/2+1 , which shows that for each of the weight classes the cuts are preserved up to a factor 2. Hence we obtain e is ρ · 2 2j+Λ−2 -heavy in E j,k := Now let e ′ ∈ R j,k be any edge, and let C be a cut such that e ′ ∈ C. We need to show that the weight of this cut in E j,k is at least ρ·4 Γ 2 Λ . Let e := argmin e∈C {j e : e ∈ R je,ke for some k e ≥ k} (in case e is not unique, pick any). By the above statement we have that e is ρ · 2 2je+Λ−2 -heavy in E je,ke ⊆ E je,k . Thus e is ρ · 4 Γ 2 Λ -heavy in 4 Γ−je+1 E je,k . This is a subgraph of E je,k , which in turn is a subgraph of E j,k . Hence e is ρ · 4 Γ 2 Λ -heavy in E j,k , and thus C has weight at least ρ · 4 Γ 2 Λ . Now we take all weight classes together to find the set of subgraphs G for which Π-connectivity is satisfied.
Proof. Note that e ∈ R i satisfies 2 i ≤ ρ · 2 2j w(e) ≤ 2 i+1 − 1 if e ∈ F j . Hence e ∈ R j,k with 2j + k = i. We are only considering edges in F j with 1 ≤ j ≤ Γ, thus we have R i = min(⌊i/2⌋,Γ) j=1 R j,i−2j , hence the claim follows directly from Lemma 4.10.
It remains to show that γ-overlap is satisfied.
where e (C) = e∈C w GS (e) and e (C) i = e∈C∩Ei w Gi (e).
Proof. We add F 0 and Y Γ to G ǫ , so we do not need to be concerned about the intersection of the cut C with these sets. This means we only intersect a cut C with F j where 1 ≤ j ≤ Γ. Hence we start our sum with i = 2. We consider the sum we need to bound: Next, we want to interchange the sum over i and the sum over j and change the bounds accordingly. See Figure 3a for a visual argument. Figure 3: Two visualizations of the area covered by a double sum.
Interchanging the sum over i and j ′ does not change the bounds, as they are independent of each other. When interchanging the sum over i and the sum over k ′ we have to be more careful, see Figure 3b for a visual argument.
Next, we want to interchange the sum over j with the sum over j ′ , a visual argument can be found in Figure 4.   Note that this probability is equal for all e ∈ F i . Since F i is the union of k i = ρ · 2 i+1 spanning forests, we know that |F i | ≤ ρ2 i+1 n. Hence the expected size of F ′ i , the sampled edges in F i , equals We have that the total number of sampled edges equals

Size of the Sparsifier
n ≥ cn ln(n) 0.38 . We have at most n 2 sets X i , so we can conclude that with high probability |X i | ≤ 2 3 |X i−1 | in each step, and by induction We see that This compression process can also be seen as the sum of m independent random variables that take values in {1, 0}. 3 We have just calculated that the expected value µ is at most Bcn ln(n) log m/(cn log(n)/ǫ 2 ) /ǫ 2 , for some B > 0. Using this, we apply a Chernoff bound (Theorem A.2) to get an upper limit for the number of sampled edges: P |F ′ | > 2Bcn ln n log m/(cn log(n)/ǫ 2 ) /ǫ 2 ≤ exp −0.38Bcn ln(n) log m/(cn log(n)/ǫ 2 ) /ǫ 2 = n −0.38cnB log(m/(cn log(n)/ǫ 2 ))/ǫ 2 .

Time Complexity
First off, if m ≤ 4ρn log m/(n log(n)/ǫ 2 ) = O(cn log(n)/ǫ 2 log m/(n log(n)/ǫ 2 ) ), the algorithm does nothing and returns the original graph. So for this analysis we can assume m > 4ρn log m/(n log(n)/ǫ 2 ) . We analyze the time complexity of the algorithm in two phases. The first phase consists of computing the probabilities p e for all e ∈ E. The second one is compressing edges, given these probabilities.
The first phase contains i iterations of the while loop (lines 10-17). In each iteration we sample edges from Y i ⊆ X i with probability 1/2 to form X i+1 . This takes time at most O(|X i |). Next, we compute a maximum spanning forest packing of the graph G i+1 = (V, X i+1 ). We know that we can compute a M -partial maximum spanning forest packing of a polynomially-weighted graph with n vertices and m 0 edges in O(m 0 · min(α(n) log(M ), log(n))) time (see Theorem 3.2 and Theorem 3.3). So this iteration takes at most O(|X i+1 | · (min(α(n) log(k i+1 ), log(n)))) time. As noted earlier, we have with high probability that |X i | ≤ 2 3 i m. If mα(n) log(m/n) ≤ m log(n), we conclude w.h.p. that the first phase takes total time at most And if m log(n) < mα(n) log(m/n), we have that w.h.p. the first phase takes total time at most = O(m log n).
In the second phase, we sample each edge e from the binomial distribution with parameters n e and p e . We will show this can be done with a process that takes T = O(m) time with high probability. Hereto, we use an algorithm from [Dev80] for binomial sampling, for which the pseudocode is given in Algorithm 2.
Output: A random sample from the binomial distribution with parameters n and p.
Generate u ∼ U (0, 1). 5 S ← S + ⌊log(u)/ log(1 − p)⌋ + 1. For each edge e ∈ F i we need to draw from the binomial distribution with parameters n e and p e . We denote T e for the time we need to sample e. By the above, we have E[T e ] = 1 + n e p e . So, the expected number of successes is at most (1 + n e p e ) = i |F i | + O(cn log(n) log m/(n log(n)/ǫ 2 ) /ǫ 2 ), as shown in Section 4.3. Let B > 0 such that i e∈Fi n e p e ≤ Bcn ln(n) log m/(n log(n)/ǫ 2 ) /ǫ 2 . We can use a Chernoff bound (see Theorem A.2) on the sum of these i e∈Fi n e random variables to obtain: 38Bcn ln(n) log m/(n log(n)/ǫ 2 ) /ǫ 2 = n −0.38Bcn log(m/(n log(n)/ǫ 2 ))/ǫ 2 .
So we can say that with high probability we need time for the sampling.

Adaptation to Unbounded Weights
In this section, we sketch how we can adapt the algorithm of the previous section to sparse graphs with unbounded weights. The key to this is Lemma 3.4, which shows that for unbounded weights we might not be able to compute the MSF indices exactly, but we can find an estimate for edges e with w(e) > d(e)/n. Recall the definition of d(e): compute a single maximum spanning forest F for G and define d(e) to be the minimum weight among the edges on the path from u to v in F , where e = (u, v). The only adaptation for unbounded weights is that the first time we compute maximum spanning forests in Algorithm 1, we set aside any edges e ∈ E with w(e) ≤ d(e)/n. We show that we can sample efficiently from these vertices, since they are well-connected by F 0 , the initial MSF that remains in our sparsifier. We will do this by sampling them with λ e = ρ · d(e). Note that we only have to set aside vertices the first time we compute a MSF packing, after this the estimates d(e) in a new graph can only decrease, so if a vertex satisfies w(e) ≤ d(e)/n in a certain subgraph, it also satisfied this in the initial graph.
For the remaining vertices, we apply the algorithm as presented in the previous section. The only difference is that we use Lemma 3.4 to compute an estimate of the MSF indices. This means that if an edge e ∈ E obtains the estimate indexf e w.r.t. some graph E ′ , we have that e is at least f e w e (1 − 1/n)-heavy in E ′ . For simplicity, we use 1 − 1/n ≥ 1/2. We see that this impacts the analysis in two places where the heaviness is used: Lemma 4.6 and Lemma 4.10.
We examining Lemma 4.6, we see that we apply Lemma 4.5 with δ 2 pπ ≥ ζ ln(n) 0.38 , for certain δ, p, π, and ζ. We want to apply this lemma but haveπ = π/2, hence we setδ = √ 2δ. If we want to end up with the original result of Lemma 4.6, we set theǫ = ǫ/ √ 2. This constant factor change gets absorbed in the asymptotic notation for size and running time of the algorithm.
The second lemma we investigate is Lemma 4.10, which is the Π-connectivity in the sampling. Here, there is an easy solution: we boost all edges in E j,k by a factor two, which ensures the Π-connectivity as desired. Consequently, all edges in E i are boosted with a factor two, which propagates to a factor two in e i (C) as denoted in Lemma 4.12, resulting to a γ-overlap with γ = 128 3 , rather than 64 3 . Summing this up, we can say that our original analysis holds when we call the algorithm withǫ = ǫ/ √ 2 andρ = (7+c)2704 ln(n) 0.38ǫ 2 , where the change in ρ is a direct consequence of the change in γ.
The last thing that remains, is to show that, when we sample, Π-connectivity is also satisfied for the edges e ∈ E with w(e) ≤ d(e)/n. This is an extension to Corollary 4.11.
Lemma 5.1. Suppose e ∈ R i and w(e) ≤ d(e)/n, then e is π = ρ · 4 Γ 2 Λ -heavy in Proof. We know that e is d(e)-heavy in F 0 , so we look for the occurrence of F 0 in E i : We look more closely at the connectedness of e in this particular set. We note that w(e ′ ) ≥ d(e) for any edge on a path in F 0 from u to v for e = (u, v), by definition of d(e). So we only need to consider e ′ ∈ F 0 with ρ · w(e ′ ) ≥ ρ · d(e) = λ e ≥ 2 i , as e ∈ R i . This means that e is d(e)-heavy in We can rescale this to exploit the weights fully: e is 2 Λ -heavy in ∞ k ′ =i−2 2 Λ−k ′ {e ′ ∈ F 0 : 2 k ′ ≤ ρ · w(e ′ ) ≤ 2 k ′ +1 − 1}. Combining this with Equation 1 gives us that e is ρ · 4 Γ+1 2 Λ -heavy in E i , which is a factor four more than we needed to show.

Size and time complexity
For the size of the resulting graph G ǫ , the upper bound of the previous section still holds for the edges that are sampled according to their MSF index. It remains to show that the contribution of any edges with w(e) ≤ d(e)/n is small. For these edges we have p e = 384 w(e) n · w(e) .
As there can be at most n 2 edges with w(e) ≤ d(e)/n, we obtain that the expected number of edges in G ǫ originating from such edges is at most O(n). By the same arguments as given in Section 4.3, this holds not only in expectation, but also with high probability. Concerning the time complexity, we use Theorem 3.3 or Lemma 3.4 instead of Theorem 3.2. These run in time O(m log(n)) and O(mα(n) log(M )) respectively. Since the size of the sparsifier does not increase significantly, the time needed for sampling does not increase significantly either. Hence we obtain a total time of O(m · min(α(n) log(m/n), log(n))). This makes the algorithm the fastest cut sparsification algorithm known for graphs with unbounded weights.

Conclusion
In this paper, we presented a faster (1 ± ǫ)-cut sparsification algorithm for weighted graphs. We have shown how to compute sparsifiers of size O(n log(n)/ǫ 2 ) in O(m·min(α(n) log(m/n), log(n))) time, for integer weighted graphs. Both algorithms apply a sampling technique where the MSF index is used as a connectivity estimator.
We have shown that we can compute an M -partial MSF packing in O(mα(m) log(M )) time for polynomially-weighted graphs. For graphs with unbounded integer weights, we have shown that we can compute a complete MSF packing in O(m log(n)) time, and a sufficient estimation of an M -partial MSF packing can be computed in time O(mα(m) log (M )). An open question is whether a more efficient computation is possible. This would improve on our sparsification algorithm, but might also be advantageous in other applications. The NI index has shown to be useful in various applications. We believe to have shown that the MSF index is a natural analogue.
To develop an algorithm to compute an MSF packing, one might be inclined to build upon one of the algorithms that compute a minimum spanning tree faster than Kruskal's algorithm, such as the celebrated linear-time algorithm of Karger, Klein, and Tarjan [KKT95]. However, this algorithm and many other fast minimum spanning tree algorithms make use of edge contractions. It is far from obvious how to generalize this to a packing: in that case, we need to work simultaneously on multiple trees, hence we cannot simply contract the input graph in favor of any single one. To make this work, a more meticulous use of data structures seems necessary.
Computation of the MSF indices in linear time would be an ultimate goal. However, for our application a slightly looser bound suffices. If we can reduce the running time to compute the MSF indices to O(m + n log(n)), then we obtain a time bound of O(m) for cut sparsification. Moreover, we do not need the exact MSF index, an estimate suffices. This can either be a constant-factor approximation of the MSF index for each edge, or an estimate in the weights used in the forests, as done for graphs with unbounded weights in Section 5.

A Tail bounds
To analyze the sampling methods used in Section 4, we make use of the well-known Chernoff bound to get a grasp on the tail of various distributions [Che52]. Proof. Let ǫ := (δ − 1) µ ′ µ . We have ǫ ≥ 1, so min(ǫ, ǫ 2 ) = ǫ. The statement now follows directly from Theorem A.1.

B Reduction from Real to Integer Weights
In this section, we show how to reduce the computation of a cut sparsifier of a graph with non-negative real weights to integer weights, formalizing the procedure sketched by Benczúr and Karger [BK15]. Let G = (V, E, w) be a weighted graph, where w : E → R. Denote First, we show that the graph H is indeed a (1 + ǫ)-cut sparsifier of G. Hereto, we note that for any cut C we have w H (C) = 2 −r wĤ (C) ≤ 2 −r (1 + ǫ/3)wĜ(C) = (1 + ǫ/3)w G ′ (C) ≤ (1 + ǫ)w G (C), where the last inequality holds as each weight w ′ (e) has at most an additive error of 2 −r ≤ ǫ 2 W min ≤ ǫ 2 with respect to w(e), hence at most an multiplicative error of ǫ 2 . Analogously we obtain w H (C) ≥ (1 − ǫ)w G (C).
By construction,Ĝ has integer weights, which are bounded by O( Wmax ǫWmin ). Steps 1, 2, 3, and 5 can be implemented in O(m) time. So indeed we have reduced the problem to finding a cut sparsifier of a graph with integer weights. Moreover, note that if G has polynomially bounded real weights, in the sense that W max = O(poly(n)) and W min = Ω(1/ poly(n)), then the graphĜ has polynomially bounded integer weights. We can state this independent of ǫ, since for ǫ ≤ 1/m we can always output the entire input graph as a cut sparsifier of optimal size O(n/ǫ 2 ) [ACK + 16].