Correlation Clustering in Data Streams

Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as k-center, k-median, and k-means. Such algorithms need to be both time and and space efficient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, O(n·polylogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n\cdot {{\,\mathrm{polylog}\,}}n)$$\end{document}-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the “quality” of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. Unfortunately, the standard LP and SDP formulations are not obviously solvable in O(n·polylogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n\cdot {{\,\mathrm{polylog}\,}}n)$$\end{document}-space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling.


Introduction
The correlation clustering problem was first formulated as an optimization problem by Bansal, Blum, and Chawla [2004].The input is a complete weighted graph G on n nodes, where each pair of nodes uv has weight w uv ∈ R. A positive-weight edge indicates that u and v should be in the same cluster, whereas a negativeweight edge indicates that u and v should be in different clusters.Given a node-partition C = {C 1 ,C 2 , . ..}, we say edge uv agrees with C, denoted by uv ∼ C, if the relevant soft constraint is observed.The goal is to find the partition C that maximizes agree(G, C) := ∑ uv∼C |w uv | or, equivalently, that minimizes disagree(G, C) := ∑ uv |w uv | − agree(G, C).Solving this problem exactly is known to be NP-hard.A large body of work has been devoted to approximating max-agree(G) = max C agree(G, C) and min-disagree(G) = min C disagree(G, C), along with variants min-disagree k (G) and max-agree k (G), where we consider partitions with at most k clusters.In this paper, we focus on multiplicative approximation results.If all weights are ±1, there is a polynomial time approximation scheme (PTAS) for max-agree [Bansal et al., 2004, Giotis andGuruswami, 2006] and a 2.06-approximation [Chawla et al., 2015], for min-disagree.When there is an upper bound, k, on the number of clusters in C, and all weights are ±1, Giotis and Guruswami [2006] introduced a PTAS for both problems.Even k = 2 is interesting, with an efficient local-search approximation introduced by Coleman, Saunderson, and Wirth [2008].
If the weights are arbitrary, there is a 0.7666-approximation for max-agree [Charikar, Guruswami, andWirth, 2005, Swamy, 2004] and an O(log n)-approximation for min-disagree [Charikar et al., 2005, Demaine et al., 2006].These methods use convex programming: as originally described, this cannot be implemented in O(n polylog n) space, even when the input graph is sparse.This aspect is well known in practice, and Bagon and Galun [2011], Bonchi, Garcia-Soriano, and Liberty [2014], Elsner and Schudy [2009] discuss the difficulty of scaling the convex programming approach.
Clustering and Graph Analysis in Data Streams.Given the importance of clustering as a basic tool for analyzing massive data sets, it is unsurprising that a considerable effort has gone into designing clustering algorithms in the relevant computational models.In particular, in the data-stream model we are permitted a limited number of passes (ideally just one) over the data while using only limited memory.This model abstracts the challenges in traditional applications of stream processing such as network monitoring, and also leads to I/O-efficient external-memory algorithms.Naturally, in either context, an algorithm should also be fast, both in terms of the time to process each stream element and in returning the final answer.
Classical clustering problems including k-median [Charikar et al., 2003, Guha et al., 2000], k-means [Ailon, Jaiswal, and 2009], and k-center [Charikar et al., 2004, Guha, 2009, McCutchen and Khuller, 2008] have all been studied in the data stream model, as surveyed by Silva et al. [2013].Non-adaptive sampling algorithms for correlation clustering can be implemented in the data stream model, as applied by Ailon and Karnin [2012], to construct additive approximations.Chierichetti, Dalvi, and Kumar [2014] presented the first multiplicative approximation data stream algorithm: a polynomial-time (3 + ε)-approximation for min-disagree on ±1-weighted graphs using O(ε −1 log 2 n) passes and semi-streaming space -that is, a streaming algorithm using Θ(n polylog n) memory [Feigenbaum et al., 2005].Pan et al. [2015] and Bonchi et al. [2014] discuss faster non-streaming implementations of related ideas but Chierichetti, Dalvi, and Kumar [2014] remained the state of the art data stream algorithm until our work.Using space roughly proportional to the number of nodes can be shown to be necessary for solving many natural graph problems including, it will turn out, correlation clustering.For a recent survey of the semi-streaming algorithms and graph sketching see McGregor [2014].disagree k for k ≥ 3 arbitrary 1 Ω(n 2 ) Any Computational Model.In the basic graph stream model, the input is a sequence of edges and their weights.
The available space to process the stream and perform any necessary post-processing is O(n polylog n) bits.
Our results also extend to the dynamic graph stream model where the stream consists of both insertions and deletions of edges; the weight of an edge is specified when the edge is inserted and deleted (if it is subsequently deleted).For simplicity, we assume that all weights are integral.We will consider three types of weighted graphs: We note that many of our algorithms, such as those based on sparsification [Ahn and Guha, 2018], can also be implemented in MapReduce.

Our Results
We summarize our results in Table 1.
Min-Disagree.We show that any constant pass algorithm that can test whether min-disagree(G) = 0 in a single pass, for unit weights, must store Ω(n) bits (Theorems 31).For arbitrary weights, the lower bound increases to Ω(n + |E − |) (Theorem 30) and to Ω(n 2 ) in the case the graph of negative edges may be dense.We provide a single-pass algorithm that uses s = Õ(nε −2 + |E − |) space and Õ(s 2 ) time and provides an O(log |E − |) approximation (Theorem 19).Since Demaine et al. [2006] and Charikar et al. [2005] provide approximation-preserving reductions from the "minimum multicut" problem to min-disagree with arbitrary weights, it is expected to be difficult to approximate the latter to better than a log |E − | factor in polynomial time.For unit weights when min-disagree(G) ≤ t, we provide a single-pass polynomial time algorithm that uses Õ(n + t) space (Theorem 4).We provide a Õ(nε −2 )-space PTAS for min-disagree 2 for bounded weights (Theorem 10).
We also consider multiple-pass streaming algorithms.For unit weights, we present a O(log log n)-pass algorithm that mimics the algorithm of Ailon et al. [2008], and provides a 3-approximation in expectation (Theorem 28), improving on the result of Chierichetti et al. [2014].For min-disagree k (G), on unit-weight graphs with k ≥ 3, we give a min(k − 1, O(log log n))-pass polynomial-time algorithm using Õ(nε −2 ) space (Theorem 29).This result is based on emulating an algorithm by Giotis and Guruswami [2006] in the data stream model.

Techniques and Roadmap
In Section 2, we present three basic data structures for the agree and disagree query problems where a partition C is specified at the end of the stream, and the goal is to return an approximation of agree(G, C) or disagree(G, C).They are based on linear sketches and incorporate ideas from work on constructing graph sparsifiers via linear sketches.These data structures can be constructed in the semi-streaming model and can be queried in Õ(n) time.As algorithms rely on relatively simple matrix-vector operations, they can be implemented fairly easily in MapReduce.
In Section 3 and 4, we introduce several new ideas for solving the LP and SDP for min-disagree and max-agree.In each case, the convex formulation must allow each candidate solution to be represented, verified, and updated in small space.But the key point made here is that the formulation plays an outsized role in terms of space efficiency, both from the perspective of the state required to compute and the operational perspective of efficiently updating that state.In future, we expect the space efficiency of solving convex optimization to be increasingly important.
We discuss multipass for algorithms for min-disagree in Section 5. Our results are based on adapting existing algorithms that, if implemented in the data stream model, may appear to take O(n) passes.However, with a more careful analysis we show that O(log log n) passes are sufficient.Finally, we present space lower bounds in Section 6.These are proved using reductions from communication complexity and establish that many of our algorithms are space-optimal.

Basic Data Structures and Applications
We introduce three basic data structures that can be constructed with a single-pass over the input stream that defines the weighted graph G. Given a query partition C, these data structures return estimates of agree(G, C) or disagree(G, C).Solving the correlation clustering optimization problem with these structures directly would require exponential time or ω(n polylog n) space.Instead, we will exploit them carefully to design more efficient solutions.However, in this section, we will present a short application of each data structure that illustrates their utility.

First Data Structure: Bilinear Sketch
Consider a graph G with unit weights (w i j ∈ {−1, 1}) and a clustering C. Our first data structure allows us to solve the query problem, which is, given G and C, to report (an approximation of) disagree(G, C).Define the matrices M G and M C where M G i j = max(0, w i j ) and Hence, the (squared) matrix distance, induced by the Frobenius norm, gives exactly To efficiently estimate M G − M C 2 F when C is not known a priori, we can repurpose the bilinear sketch approach of Indyk and McGregor [2008].The basic sketch is as follows: 1. Let α ∈ {−1, 1} n and β ∈ {−1, 1} n be independent random vectors whose entries are 4-wise independent; in a single pass over the input, compute Specifically, we maintain a counter that is initialized to 0 and for each i j ∈ E + in the stream we add α i β j to the counter and if i j ∈ E + is deleted we subtract α i β j from the counter; the final value of the counter equals Y .Note that α and β can be determined by a hash function that can be stored in Õ(1) space such that each entry can be constructed in Õ(1) time.

Given query partition
To analyze the algorithm we will need the following lemma due to Indyk and McGregor [2008] and Braverman et al. [2010].
Lemma 1.For each { f i j } i, j∈ [n] , E (∑ i, j α i β j f i j ) 2 = ∑ i, j f 2 i j and V (∑ i, j α i β j f i j ) 2 ≤ 9(∑ i, j f 2 i j ) 2 .The following theorem will be proved by considering an algorithm that computes multiple independent copies of the above sketch and combines the estimates from each.
Theorem 2. For unit weights, there exists an O(ε −2 log δ −1 log n)-space algorithm for the disagree query problem.Each positive edge is processed in Õ(ε −2 ) time, while the query time is Õ(ε −2 n).
Proof.We first observe that, given Y , the time to compute X is Õ(n).This follows because for a cluster C ℓ ∈ C, on n ℓ nodes, we can compute ∑ i∈C ℓ α i and ∑ i∈C ℓ β i in Õ(n ℓ ) time.Hence the total query timeis Õ(∑ ℓ n ℓ ) = Õ(n) as claimed.
We next argue that repeating the above scheme a small number of times in parallel yields a good estimate of disagree(G, C).To do this, note that We then apply Lemma 1 to f i j = M G i j − M C i j and deduce that Hence, running O(ε −2 log δ −1 ) parallel repetitions of the scheme and averaging the results appropriately yields a (1 ± ε)-approximation for disagree(G, C) with probability at least 1 − δ .Specifically, we partition the estimates into O(log δ −1 ) groups, each of size O(ε −2 ).We can ensure that with probability at least 2/3, the mean of each group is within a 1 ± ε factor by an application of the Chebyshev bound; we then argue using the Chernoff bound that the median of the resulting group estimates is a 1 ± ε approximation with probability at least 1 − δ .
Remark.We note that by setting δ = 1/n n in the above theorem, it follows that we may estimate disagree(G, C) for all partitions C using Õ(ε −2 n) space.Hence, given exponential time, we can also (1 + ε)-approximate min-disagree(G) While this is near-optimal in terms of space, in this paper we focus on polynomial-time algorithms.
Application to Cluster Repair.Consider the Cluster Repair problem [Gramm et al., 2005], in which, for some constant t, we are promised min-disagree(G) ≤ t and want to find the clustering argmin C disagree(G, C).
We first argue that, given spanning forest F of (V, E + ) we can limit our attention to checking a polynomial number of possible clusterings.The spanning forest F can be constructed using a Õ(n)-space algorithm in the dynamic graph stream model [Ahn, Guha, and McGregor, 2012a].Let C F be the clustering corresponding to the connected components of E + .Let F 1 , F 2 , . . ., F p be the forests that can be generated by adding t 1 and then removing t 2 edges from F where t 1 + t 2 ≤ t.Let C F i be the node-partition corresponding to the connected components of F i .Lemma 3. The optimal partition of G is C F i for some 1 ≤ i ≤ p.Furthermore, p = O(n 2t ).

Proof. Let E +
* be the set of edges in the optimal clustering that are between nodes in the same cluster and suppose that let E + * = (E + ∪ A) \ D, i.e., A is the set of positive edges that need to be added and D is the set of edges that need to be deleted to transform E + into a collection of node-disjoint clusters.Since min-disagree(G) ≤ t, we know |A| + |D| ≤ t.It is possible to transform F into a spanning forest F ′ of E + ∪ A by adding at most |A| edges.It is then possible to generate a spanning forest of F ′′ with the same connected components as E + * = (E + ∪ A) \ D by deleting at most |D| edges from F ′ .Hence, one of the forests F i considered has the same connected components at E + * .To bound p, we proceed as follows.There are less than n 2t 1 different forests that can result from adding at most t 1 edges to F. For each, there are at most n t 2 forests that can be generated by deleting at most t 2 edges from the, at most n − 1, edges in F ′ .Hence, p < ∑ t 1 ,t 2 :0≤t 1 +t 2 ≤t n 2t 1 +t 2 < t 2 n 2t .
The procedure is then to take advantage of this bounded number of partitions by computing each C F i in turn, and estimating disagree(G, C F i ).We report the C F i that minimizes the (estimated) repair cost.Consequently, setting δ = 1/(p poly(n)) in Theorem 2 yields the following theorem.

Second Data Structure: Sparsification
The next data structure is based on graph sparsification and works for arbitrarily weighted graphs.A sparsification of graph G is a weighted graph H such that the weight of every cut in H is within a 1 + ε factor of the weight of the corresponding cut in G.A celebrated result of Benczúr and Karger [1996] shows that it is always possible to ensure the the number of edges in H is Õ(nε −2 ).A subsequent result shows that this can be constructed in the dynamic graph stream model.
The next lemma establishes that a graph sparsifier can be used to approximate agree and disagree of a clustering.Lemma 6.Let H + and H − be sparsifications of G + = (V, E + ) and G − = (V, E − ) such that all cuts are preserved within factor (1 ± ε/6), and let H Proof.The proofs for agree and disagree are symmetric, so we restrict our attention to agree.Let ε ′ = ε/6.
The weight of edges in E − that are cut is within a 1 + ε ′ factor in the sparsifier.Consider an arbitrary cluster C ∈ C, then letting w ′ (•) represent the weight in the sparsifier, where the third line follows because, for each u ∈ C, the weights of cuts ({u},V \ {u}) and (C,V \ C) are approximately preserved.Summing over all cluster C ∈ Cs, the total additive error is (assuming ε ≤ 1), as required.
The last part of theorem follows because w(E + ) ≤ max-agree(G) by considering the trivial all-in-onecluster partition.
Application to max-agree with Bounded Weights.In Section 3, based on the sparsification construction, we develop a poly(n)-time streaming algorithm that returns a 0.766-approximation for max-agree when G has arbitrary weights.However, in the case of unit weights, a RAM-model PTAS for max-agree is known [Bansal et al., 2004, Giotis andGuruswami, 2006].It would be unfortunate if, by approximating the unitweight graph by a weighted sparsification, we lost the ability to return a 1 ± ε approximation in polynomial time.
We resolve this by emulating an algorithm by Giotis and Guruswami [2006] for max-agree k using a single pass over the stream1 .Their algorithm is as follows: 1. Let {V i } i∈[m] be an arbitrary node-partition, where m = ⌈4/ε⌉ and ⌊n/m⌋ ≤ |V i | ≤ ⌈n/m⌉.

For each
3. For all possible k-partition of each of S 1 , . . ., S m : • For each j, let {S j i } i∈k be the partition of S j • Compute and record the cost of the clustering in which v ∈ V j is assigned to the ith cluster, where 4. For all the clusterings generated, return the clustering C that maximizes agree(G, C). [2006] prove that the above algorithm achieves a 1 + ε approximation factor with high probability if all weights are {−1, +1}.We explain in Section A that their analysis actually extends to the case of bounded weights.The more important observation is that we can simulate this algorithm in conjunction with a graph sparsifier.Specifically, the sets V 1 , . . .,V m and S 1 , . . ., S m can be determined before the stream is observed.To emulate step 3, we just need to collect the rnm edges incident to each S i during the stream.If we simultaneously construct a sparsifier during the stream we can evaluate all of the possible clusterings that arise.This leads to the following theorem.

Third Data Structure: Node-Based Sketch
In this section, we develop a data structure that supports queries to disagree(G, C) for arbitrarily weighted graphs when C is restricted to be a 2-partition.For each node i, define the vector, a i ∈ R ( n 2 ) , indexed over the n 2 edges, where the only non-zero entries are: The result follows immediately from consideration of the different possible for values for the {i, j}th coordinate of the vector ∑ ℓ∈C 1 a ℓ − ∑ ℓ∈C 2 a ℓ .The sum can be expanded as We apply the ℓ 1 -sketching result of Kane, Nelson, and Woodruff [2010] to compute a random linear sketch of each a i .
Theorem 9.For arbitrary weights, and for query partitions that contain two clusters, to solve the disagree query problem, there exists an Unfortunately, for queries C where |C| > 2, Ω(n 2 ) space is necessary, as shown in Section 6.
Application to min-disagree 2 (G) with Bounded Weights.We apply the above node-based sketch in conjunction with another algorithm by Giotis and Guruswami [2006], this time for min-disagree 2 .Their algorithm is as follows: 1. Sample r = poly(1/ε, k) • log n nodes S and for every possible k-partition {S i } i∈ [k] of S: (a) Consider the clustering where v ∈ V \ S is assigned to the ith cluster where 2. For all the clusterings generated, return the clustering C that minimizes disagree(G, C).
As with the max-agreement case, Giotis and Guruswami [2006] prove that the above algorithm achieves a 1 + ε approximation factor with high probability if all weights are {−1, +1}.We explain in Section A that their analysis actually extends to the case of bounded weights.Again note we can easily emulate this algorithm for k = 2 in the data stream model in conjunction with the third data structure.The sampling of S and its incident edges can be performed using one pass and O(nr log n) space.We then find the best of these possible partitions in post-processing using the above node-based sketches.
3 Convex Programming in Small Space: min-disagree In this section, we present a linear programming-based algorithm for min-disagree.At a high level, progress arises from new ideas and modifications needed to implement convex programs in small space.While the time required to solve convex programs has always been an issue, a relatively recent consideration is the restriction to small space [Ahn and Guha, 2013].In this presentation, we pursue the Multiplicative Weight Update technique and its derivatives.This method has a rich history across many different communities [Arora et al., 2012], and has been extended to semi-definite programs [Arora and Kale, 2007].In this section, we focus on linear programs in the context of min-disagree; we postpone the discussion of SDPs to Section 4. In all multiplicative weight approaches, the optimization problem is first reduced to a decision variant, involving a guess, α, of the objective value; we show later how to instantiate this guess.The LP system is MWM-LP: To solve the MWM-LP approximately, the multiplicative-weight update algorithm proceeds iteratively.In each iteration, given the current solution, y, the procedure maintains a set of multipliers (one for each constraint) and computes a new candidate solution y ′ which (approximately) satisfies the linear combination of the inequalities, as defined in Theorem 11.
Theorem 11 (Arora et al. [2012]).Suppose that, δ ≤ 1 2 and in each iteration t, given a vector of nonnegative multipliers u(t), a procedure (termed Oracle) provides a candidate y ′ (t) satisfying three admissibility conditions, The computation of the new candidate depends on the specific LP being solved.The parameter ρ is called the width, and controls the speed of convergence.A small-width Oracle is typically a key component of an efficient solution, for example, to minimize running times, number of rounds, and so forth.However, the width parameter is inherently tied to the specific formulation chosen.Consider the standard LP relaxation for min-disagree, where variable x i j indicates edge i j being cut.
for all i, j, ℓ x i j ≥ 0 for all i, j The triangle constraints state that if we cut one side of a triangle, we must also cut at least one of the other two sides.The size of formulation is in Θ(n 3 ), where n is the size of the vertex set, irrespective of the number of nonzero entries in E + ∪ E − .Although we will rely on the sparsification of E + , that does not in any way change the size of the above linear program.To achieve Õ(n) space, we need new formulations, and new algorithms to solve them.The first hurdle is the storage requirement.We cannot store all the edges/variables which can be Ω(n 2 ).This is avoided by using a sparsifier and invoking (the last part of) Lemma 6.Let H + be the sparsification of E + with m ′ = |H + |.For edge sq ∈ H + let w h sq denote its weight after sparsification.For each pair i j ∈ E − , let P i j (E ′ ) denote the set of all paths involving edges only in the set E ′ .Consider the following LP for min-disagree, similar to that of Wirth [2004], but in this sparsified setting: The intuition of an integral (0/1) solution is that z i j = 1 for all edges i j ∈ E − that are not cut, and x sq = 1 for all sq ∈ H + that are cut.That is, the relevant variable is 1 whenever the edge disagrees with the input advice.By Lemma 6, the objective value of LP1 is at most (1 + ε) times the optimum value of min-disagree.However, LP1 now has exponential size, and it is unclear how we can maintain the multipliers and update them in small space.To overcome this major hurdle, we follow the approach below.

A Dual Primal Approach
Consider a primal minimization problem, for example, min-disagree, in the canonical form: The dual of the above problem for a guess, α of the optimum solution (to the Primal) becomes Dual LP: c T y ≥ α s.t.Ay ≤ b, y ≥ 0 , which is the same as the decision version of MWM-LP as described earlier.We apply Theorem 11 to the Dual LP, however we still want a solution to the Primal LP.Note that despite approximately solving the Dual LP, we do not have a Primal solution.Even if we had some optimal solution to the Dual LP, we might still require a lot of space or time to find a Primal solution, though we could at least rely on complementary slackness conditions.Unfortunately, similar general conditions do not exist for approximately optimum (or feasible) solutions.To circumvent this issue: (b) The Oracle is modified to provide a y, subject to conditions (i)-(iii) of Theorem 11, or an x that, for some Intuitively, the Oracle is asked to either make progress towards finding a feasible dual solution or provide an f -approximate primal solution in a single step.
(c) If the Oracle returns an x then we know that c T y > (b T x)/ f is not satisfiable.We can then consider smaller values of α, say α ← α/(1 + δ ).We eventually find a sufficiently small α that the Dual LP is (approximately feasible) and we have a x satisfying Note that computations for larger α continue to remain valid for smaller α.
This idea, of applying the multiplicative-weight update method to a formulation with exponentially many variables (the Dual), and modifying the Oracle to provide a solution to the Primal (that has exponentially many constraints) in a single step, has also benefited solving MAXIMUM MATCHING in small space [Ahn and Guha, 2018].However in Ahn and Guha [2018], the constraint matrix was unchanging across iterations (objective function value did vary) -here we will have the constraint matrix vary across iterations (along with value of the objective function).Clearly, such a result will not apply for arbitary constraint matrices and the correct choice of a formulation is key.
One key insight is that the dual, in this case (and as a parallel with matching) has exponentially many variables, but fewer constraints.Such a constraint matrix is easier to satisfy approximately in a few iterations because there are many more degrees of freedom.This reduces the adaptive nature of the solution, and therefore we can make a lot of progress in satisfying many of the primal constraints in parallel.Other examples of this same phenomenon are the numerous dynamic connectivity/sparsification results in Guha et al. [2015], where the algorithm repeatedly finds edges in cuts (dual of connectivity) to demonstrate connectivity.In that example, the O(log n) seemingly adaptive iterations collapse into a single iteration.
Parts of the three steps, that is, (a)-(c) outlined above, have been used to speed up running times of SDPbased approximation algorithms [Arora and Kale, 2007].In such cases, there was no increase to the number of constraints nor consideration of non-standard formulations, It is often thought, and as explicitly discussed by Arora and Kale [2007], that primal-dual approximation algorithms use a different set of techniques from the primal-dual approach of multiplicative-weight update methods.By switching the dual and the primal, in this paper, we align both sets of techniques and use them interchangeably.
The remainder of Section 3 is organized as follows.We first provide a generic Oracle construction algorithm for MWM-LP, in Section 3.2.As a warm up example, we then apply this algorithm on the multicut problem in Section 3.3 -the multicut problem is inherently related to min-disagree for arbitrary weights [Charikar et al., 2005, Demaine et al., 2006].We then show how to combine all the ideas together to solve min-disagree in Section 3.4.

From Rounding Algorithms to Oracles
Recall the formulation MWM-LP, and Theorem 11.Algorithm 1 takes an f -approximation for the Primal LP and produces an Oracle for MWM-LP.
Algorithm 1 From a rounding algorithm to an Oracle.
1: Transform vector u(t) (a vector of weights for the constraints of Dual LP) into a vector of scaled primal variables x, thus: Perform a rounding algorithm for the Primal LP with x as the input fractional solution (as described in (b) previously).Either there is a subset of violated constraints in the Primal LP or (if no violated constraint exists) there is a solution with objective value at most f • α, where f is the approximation factor for the rounding algorithm.In case no violated constraint exists, return x. 3: Let S = {i 1 , i 2 , . . ., i k } be (the indexation of) the set of violated constraints in the Primal LP and let ∆ = ∑ i∈S c i .4: Let y i = α/∆ for i ∈ S, and let y i = 0 otherwise.Return y.Note the two return types are different based on progress made in primal or dual directions.
The following lemma shows how to satisfy the first two conditions of Theorem 11; the width parameter has to be bounded separately for a particular problem.
Lemma 12.If c j > 0 for each Primal constraint, and ∑ i u(t) i > 0, then Algorithm 1 returns a candidate y that satisfies conditions (i) and (ii) of Theorem 11.
Proof.By construction, c T y = α, addressing condition (i).So we prove that u(t) T Ay−u(t) T b ≤ 0. Since u(t) is a scaled version of x, The inequality in the second line follows from y j only being positive if the corresponding Primal LP constraint is violated.Finally, by construction, ∑ j y j c j = α and ∑ i b i x i = α; since we also assumed that ∑ i u(t) i > 0, the lemma follows.

Warmup: Streaming MULTICUT Problem
The MINIMUM MULTICUT problem is defined as follows.Given a weighted undirected graph and κ pairs of vertices (s i ,t i ), for i = 1, . . ., κ, the goal is to remove the lowest weight subset of edges such that every i, s i is disconnected from t i .
In the streaming context, suppose that the weights of the edges are in the range [1,W ] and the edges are ordered in an arbitrary order defining a dynamic data stream (with both insertions and deletions).We present a O(log κ)-approximation algorithm for the multicut problem that uses Õ(nε −2 logW + κ) space and Õ(n 2 ε −7 log 2 W ) time excluding the time to construct a sparsifier.The Õ(n 2 ) term dominates the time required for sparsifier construction, for more details regarding streaming sparsifiers, see Guha et al. [2015], Kapralov et al. [2014].The algorithm comprises the following, the parameter δ will eventually be set to O(ε).
MC1 Sparsify the graph defined by the dynamic data stream, preserving all cuts, and thus the optimum multicut, within 1 ± δ factor.Let E ′ be the edges in the sparsification and |E ′ | = m ′ , where m ′ = O(nδ −2 logW ), from the results of Ahn et al. [2012b].Let (w jq ) refer to weights after the sparsification.
MC2 Given an edge set E ′′ ⊆ E ′ , let P ′ (i, E ′′ ) be the set of all s i -t i paths in the edge set E ′′ .The LP that captures MULTICUT is best viewed as relaxation of a 0/1 assignment.Variable x jq is an indicator of whether edge ( j, q) is in the multicut.If we interpret x jq as assignment of lengths, then for all i ∈ [κ], all p ∈ P ′ (i, E ′ ) have length at least 1.The relaxation is therefore: MC4 Following the dual-primal approach above, as α decreases (note the initial α 0 being high, we cannot hope to even approximately satisfy the dual), we consider the (slightly modified) dual More specifically, we consider the following variation: given α, let E ′ (α) be the set of edges of weight at least δ α/m ′ , and we seek: MC5 We run the Oracle is provided in Algorithm 2.
MC6 If we receive a x we set α ← α/(1 + δ ) as in (c) in Section 3.1.This step occurs at least once (Lemma 14).Note that reducing α corresponds to adding constraints as well as variables to LP4 due to new edges in E ′ (α/(1 + δ )) − E ′ (α).We set u i ′ (t + 1) = (1 − δ /ρ) t for each new constraint i ′ added, assuming that we have run the Oracle in step (MC5) a total of t times thus far.Lemma 16 shows that this transformation provides a u and a collection y(t) as if the multiplicative weight algorithm for LP4 was run for the current value of α = α 1 .
MC7 If we have completed the number of iterations required by Theorem 11 we average the y returned then we have an approximately feasible solution for LP4.This corresponds to a proof of (near) optimality.We return the x returned corresponding to the previous value of α (which was α(1 + δ )) as the solution.This is a f (1 + O(δ )) approximation (Lemma 14).If we have not completed the number of iterations, we return to (MC5).
Lemma 13.Consider introducing the edges of E ′ from the largest weight to smallest.Let w be the weight of the first edge whose introduction connects some pair Proof.Note w is a lower bound on α * ; moreover, if we delete the edge with weight w and all subsequent edges in the ordering we have feasible multicut solution.Therefore α * ≤ n 2 w.The lemma follows.
Naively, this edge-addition process runs in Õ(m ′ κ) time, since the connectivity needs to be checked for every pair.However, we can introduce the edges in groups, corresponding to weights in (2 z−1 , 2 z ], as z decreases; we check connectivity after introducing each group.This algorithm runs in time Õ(m ′ + κ logW ) and approximates w, i.e., overestimates w by a factor of at most 2, since we have a geometric sequence of group weights.The initial value of α can thus be set to (1 + 4δ )2 z n 2 .Lemma 14. α is decreased, as in (MC6), at least once.The solution returned in ( MC7) is a f Proof.Using Theorem 11 once we are in (MC7) multiplying the average of the y p by 1/(1 + 4δ ) gives a feasible solution for LP4 for the edge set E(α).Moreover, for all paths p, containing any edge in E ′ − E ′ (α), we have y p = 0. Therefore this new solution is a feasible solution of LP3.Therefore α/(1 + 4δ ) ≤ α * once we reach the required number of iterations in (MC7).This proves that we must decrease α at least once, because α 0 is larger than (1 + 4δ )α * (Lemma 13).
The solution x corresponds to f α(1 + δ ).Since α is bounded above by α * (1 + 4δ ), the second part of the lemma follows as well.
Proof.If we decrease α then at some point line (7) of Algorithm 2 provides a solution ≪ α * , which is infeasible.Note that the solution would have value f α.But this has to be at least α * .Thus α cannot decrease arbitrarily.Combined with the upper bound in Lemma 13, the result follows.
Lemma 16.Algorithm 2 returns an admissible (defined in Theorem 11) y for LP4 with (the width) ρ = m ′ /δ and ℓ = 1.Moreover the set of assignments of y p (over the different iterations) that were admissible for α = α 2 remains admissible if α is lowered to α 1 < α 2 and u updated as dectribed in (MC6).
Proof.Using Lemma 12, Algorithm 2 returns a y which satisfies conditions (i) and (ii) of Theorem 11.By construction, in Algorithm 2 y p = α and only one y p has a non-zero value.Since we removed all the edges of weight less than δ α/m ′ , the width parameter is bounded by αm ′ /(δ α) = m ′ /δ .Observe that ℓ = 1.
If α 1 < α 2 , then E ′ (α 1 ) ⊇ E ′ (α 2 ), and therefore P(i, E ′ (α 1 )) ⊇ P(i, E ′ (α 2 )).Therefore, for the formu- lation LP4, we are adding new variables corresponding to new variables (paths) as well as new constraints corresponding to the newly added edges.We can interpret the y for α 2 to have 0 values for the new variables.This would immediately satisfy (i).This would satisfy (iii) for the old constraints as well.Condition (iii) is satisfied for the newly introduced constraints because the old paths p with y p > 0 for α 2 did not contain an edge in E ′ (α 1 ).Thus A i y(t) = 0 for the new constraints and b = 1 and −ρ ≤ −1 ≤ ρ.
Algorithm 2 Oracle for LP4 1: Given weights u t jq , for ( j, q) ∈ E ′ (α), define x jq = αu t jq / ∑ ( j,q)∈E ′ (α) w jq u t jq .2: Define the shortest path metric d x (•, •) with the x jq representing edge lengths.Define ≤ r}, which corresponds to a family of balls/regions centered at ζ , each of radius r.Let cut(B(ζ , r)) be the total weight of edges in E ′ (α) that are cut by B(ζ , r), i.e., 3: Find a collection of regions B(ζ 1 , r 1 ), . . ., B(ζ g , r g ), . . .such that every r g ≤ 1 3 and each s i belongs to some region, and ∑ g cut(B(u g , r g )) ≤ 3α ln(κ + 1).Lemma 17 shows us how to achieve this.4: if for some i both s i and t i belong to the same region then 5: Find the corresponding path p, which is of length at most 2/3, which violates the constraint.Return y p = α.Implicitly return y p ′ = 0 for all other paths that involve the s i -t i pair.6: else 7: Return the union of the cuts defined by the balls (this corresponds to x).The edges in E ′ (α) contribute at most 3α ln(κ + 1).The edges in E ′ − E ′ (α) contribute at most δ α.The total is (3 ln(κ + 1) + δ )α.Note that the return types are different as outlined in the dual-primal framework in (a)-(c) earlier.
, the first term in the left hand side remains unchanged.The left hand side decreases for every new constraint, and the right hand side increases for every new constraint.
The next lemma arises from a result of Garg et al. [1993]; in this context, Z = α.
Lemma 17. Garg et al. [1993].Let Z = ∑ (u,v) x uv w uv .For r ≥ 0, let B(u, r) = {v | d x (u, v) ≤ r} where d x is the shortest path distance based on the values x uv .Let vol(B(u, r)) be If C = 3 ln(κ + 1), the ball stops growing before the radius becomes 1/3.We start this process for ζ 1 = s 1 .Repeatedly, if some s j is not in a ball, then we remove B(ζ i , r i ) (all edges inside and those being cut) and continue the process with ζ i+1 = s j , on the remainder of the graph.The collection of B(ζ 1 , r 1 ), . . . ,B(ζ g , r g ), . . .satisfy the condition that r g ≤ 1/3 for all g and ∑ g cut(B(ζ g , r g )) ≤ CZ.
The proof follows from the fact that cut(B(ζ , r)) is the derivative of vol(B(ζ , r)) as r increases and the volume cannot increase by more than a factor of κ + 1, because it is at least Z/k and cannot exceed Z/k + Z.For nonnegative x jq the above algorithm runs in time Õ(m ′ ) using standard shortest-path algorithms.
Using Theorem 11, the total number of iterations needed in MC7, for a particular α is O(ρδ Observe that the algorithm repeatedly constructs a set of balls with non-negative weights; which can be performed in O(m ′ log n) time.In each of these balls with m edges, we can find the shortest path in O( m log n) time (to find the violated pair s i -t i ).Summed over the balls, each iteration can be performed in O(m ′ log n) time.Coupled with the approximation introduced by a sparsifier, setting δ = O(ε) we get: Theorem 18.There exists a single-pass O(log κ)-approximation algorithm for the multicut problem in the dynamic semi-streaming model that runs in Õ(n 2 ε −7 log 2 W ) time and Õ(nε −2 logW + κ) space.

min-disagree with Arbitrary Weights
In this section, we prove the following theorem: Theorem 19.There is a 3(1+ε) log |E − |-approximation algorithm for min-disagree that requires Õ((nε space, and a single pass. Consider the dual of LP1, where P = ∪ i j∈E − P i j (H + ).
We attempt to find an approximate feasible solution to LP6 for a large value of α.If the Oracle fails to make progress then it provides a solution to LP1 of value f • α.In that case we set α ← α/(1 + δ ) and try the Oracle again.Note that if we lower α then the Oracle invocations for larger values of α continue to remain valid; if α 1 ≤ α 2 , then P i j (H + (α 1 )) ⊇ P i j (H + (α 2 )) exactly along the lines of Lemma 16.
Eventually we lower α sufficiently that we have a feasible solution to LP6, and we can claim Theorem 19 exactly along the lines of Theorem 18.The Oracle is provided in Algorithm 3 and relies on the following lemma: Using the definition of d x () and B() as in Lemma 17, let 4 Convex Programming in Small Space: max-agree In this section we discuss an SDP-based algorithm for max-agree.We will build upon our intuition in Section 3 where we developed a linear program based algorithm for min-disagree.However several steps, such as switching of primals and duals, will not be necessary because we will use a modified version of the multiplicative weight update algorithm for SDPs as described by Steurer [2010].As will become clear, the switch of primals and duals is already achieved in the internal working of Steurer [2010].Consider: Definition 1.For matrices X, Z, let X • Z denote ∑ i, j X i j Z i j , let X 0 denote that X is positive semidefinite, and let X Z denote X − Z 0.
A semidefinite decision problem in canonical form is: where C, X ∈ R n×n and g ∈ R q + .Denote the set of the feasible solutions by X .Typically we are interested in the Cholesky decomposition of X, a set of n vectors {x i } such that X i j = x T i x j .Consider the following theorem: Theorem 22 (Steurer [2010]).Let D be a fixed diagonal matrix with positive entries and assume X be nonempty.Suppose there is an Oracle that for each positive semidefinite X either (a) tests and declares X to be approximately feasible -for all 1 ≤ i ≤ q, we have The above theorem does not explicitly discuss maintaining a set of multipliers.But interestingly, the algorithm in Steurer [2010] that proves Theorem 22 can be viewed as a dual-primal algorithm.This algorithm collects separating hyperplanes to solve the dual of the SDP: on failure to provide such a hyperplane, the algorithm provides a primal feasible X.The candidate X generated by the algorithm is an exponential of the (suitably scaled) averages of the hyperplanes (A, b): this would be the case if we were applying the multiplicative-weight update paradigm to the dual of the SDP in canonical form!Therefore, along with maximum matching Ahn and Guha [2018] and min-disagree (Section 3) we have yet another example where switching the primal and the dual formulations helps.However in all of these cases, we need to prove that that we can produce a feasible primal solution in a space efficient manner, when the Oracle (for the dual) cannot produce a candidate.
We use Lemma 6 and edge set H = H + ∪ H − .Let w h i j correspond to the weight of an edge i j ∈ H.Our SDP for max-agree is: If two vertices, i and j, are in the same cluster, their corresponding vectors x i and x j will coincide, so X i j = 1; on the other hand, if they are in different clusters, their vectors should be orthogonal, so X i j = 0. Observe that under the restriction X ii = X j j = 1, the contribution of an i j ∈ H − is X ii + X j j − 2X i j = (1 − X i j ), as intended.However, this formulation helps prove that the width is small.Definition 2. Define d i = ∑ j:i j∈H |w h i j | and ∑ i d i = 2W .Let D be the diagonal matrix with D ii = d i /2W .A random partition of the graph provides a trivial 1/2-approximation for maximizing agreements.Letting W be the total weight of edges in H, the sparsified graph, we perform binary search for α ∈ [W /2,W ], and stop when the interval is of size δW .This increases the running time by a O(log δ −1 ) factor.
The diagonal matrix D specified in Definition 2 sets up the update algorithm of Steurer [2010].The choice of D will be critical to our algorithm: typically, this D determines the "path" taken by the SDP solver, since D alters the projection to density matrices.Summarizing, Theorem 23 follows from the Oracle provided in Algorithm 4. The final solution only guarantees x i •x j ≥ −δ .Even though the standard rounding algorithm assumes X i j ≥ 0, the fractional solution with X i j ≥ −δ can be rounded efficiently.Ensuring x i • x j ≥ 0 appears to be difficult (or to require a substantially different oracle).
Algorithm 4 Oracle for SDP.
1: For the separating hyperplane, we only describe non-zero entries in A. Recall that we have a candidate X where On the other hand, for a feasible X ′ , x ′ i 2 = 1 for all i.Hence A • X ′ = ∑ i∈S 1 −d i /∆ 1 = −1.This proves that the oracle is δ -separating when it returns from line 7.For lines 10 and 13, the proof is almost identical.
For line 17, we do not use the violated constraints; instead we use C ′ to construct A, and show that C ′ • X ′ ≥ (1 − 3δ )α.We start from the fact that C • X ′ ≥ α, since X ′ is feasible for SDP.By removing all nodes in S 1 , we remove all edges incident on the removed nodes.The total weight of removed edges is bounded by ∆ 1 , which is this case is less than δ α.Similarly, we lose at most δ α for each of S 2 and S 3 .Hence, the difference between C ′ • X ′ and C • X ′ is bounded by 3δ α, and so Lemma 25.Algorithm 4 satisfies criterion (ii) of Theorem 22, i.e., ρD A − bD −ρD for some ρ = O(1/δ ).
Proof.Since |b| ≤ 1 it suffices to show that for every positive semidefinite Y, |A • Y| = ρD • Y.For line 7, the proof is straightforward.To start, A is a diagonal matrix where The proof is identical for line 10.
For lines 13 and 17, consider the decomposition of Y, i.e., {y i } such that Y i j = y i • y j .We use the fact that y i • y j ≤ y i 2 + y j 2 for every pair of vectors y i and y j .Therefore for Y i j = y i • y j , we have at line 13, H + , H − as modified by line 15, then Lemmas 24 and 25, in conjunction with Theorem 22 prove Theorem 23.The update procedure Steurer [2010] maintains (and defines) the candidate vector X implicitly.In particular it uses matrices of dimension n × d, in which every entry is a (scaled) Gaussian random variable.The algorithm also uses a precision parameter (degree of the polynomial approximation to represent matrix exponentials) r.Assuming that T M is the time for a multiplication between a returned A and some vector, the update process computes the tth X in time O(t • r • d • T M ), a quadratic dependence on t in total.We will ensure that any returned A has at most m ′ nonzero entries, and therefore T M = O(m ′ ).The algorithm requires space that is sufficient to represent a linear combination of the matrices A which are returned in the different iterations.We can bound ρ = O(1/δ ), and therefore the total number of iterations is Õ(δ −4 ).For our purposes, in max-agree we will have d = O(δ −2 log n), r = O(log(δ −1 ), and T M = O(m ′ ), giving us a Õ(nδ −10 ) time and Õ(nδ −2 ) space algorithm.However, unlike the general X used in Steurer's approach, in our oracle the X is used in a very specific way.This leaves open the question of determining the exact space-versus-running-time tradeoff.
Rounding the Fractional Solution: Note that the solution of the SDP found above is only approximately feasible.Since the known rounding algorithms can not be applied in a black box fashion, the following Lemma proves the rounding algorithm.
Proof.We show that the rounding algorithm returns a clustering with at least 0.7666(1 − O(δ ))C ′ • X agreements.Combined with the fact that C ′ • X > (1 − 4δ )α (line 19), we obtain the desired result.
Since we deal with C ′ instead of C, we can ignore all nodes and edges in S 1 , S 2 , and S 3 .We first rescale the vectors in X to be unit vectors.Since all vectors that are not ignored (not in S 1 nor S 2 ) have length between 1 − O(δ ) and 1 + O(δ ) (since we take the square root), this only changes the objective value by O(δ w i j ) for each edge.Hence the total decrease is bounded by O(δW ) = O(δ α).
We then (1) first change the objective value of edges (i, j) with −δ < x i • x j < 0 by ignoring them, and only then (2) consider fixing the violated constraints x i • x j < 0 to produce a feasible or integral solution.
Step (1) decreases the objective value by at most δ |w i j | for each negative edge.Again, the objective value decreases by at most O(δ α).For step (2) we use Swamy's rounding algorithm Swamy [2004], which obtains a 0.7666 approximation factor.The constraint x i • x j ≥ 0 required by Swamy's algorithm is not satisfied for some edges.However, the rounding algorithm is based on random hyperplanes and the probability that x i and x j are split by a hyperplane only increases as x i • x j decreases.For positive edges, we already accounted for this in step (1) when the value of the edge was made 0. For negative edges, the probability that i and j land in different clusters only increases by having negative x i • x j , but again, the contribution to the objective is still 0. Therefore, we obtain a clustering that has at least 0.7666(1 − O(δ ))C ′ • X agreements.

Multipass Algorithms
In this section, we present O(log log n)-pass algorithms for min-disagree on unit weight graphs: these apply to both a fixed and unrestricted number of clusters.In each pass over the data, the algorithm is presented with the same input, although not necessarily in the same order.

min-disagree with Unit Weights
Consider the 3-approximation algorithm for min-disagree on unit-weight graphs due to Ailon et al. [2008].
1: Let v 1 , . . ., v n be a uniformly random ordering of V .Let U ← V be the set of "uncovered" nodes.2: for i = 1 to n do 3: It may appear that emulating the above algorithm in the data stream model requires Ω(n) passes, since determining whether v i should be chosen may depend on whether v j is chosen for each j < i.However, we will show that O(log log n)-passes suffice.This improves upon a result by Chierichetti et al. [2014], who developed a modification of the algorithm that used O(ε −1 log 2 n) streaming passes and returned a (3 + ε)approximation, rather than a 3-approximation.Our improvement is based on the following lemma: • After the (2 j)-th pass we have simulated the first t j iterations of Ailon et al.'s algorithm.Since t j ≥ n for j = 1 + log log n, our algorithm terminates after O(log log n) passes.
Theorem 28.On a unit-weight graph, there exists a O(log log n)-pass semi-streaming algorithm that returns with high probability a 3-approximation to min-disagree.
Proof.In the first pass, we need to store at most t 2 1 = ((2n) 1−1/2 ) 2 = 2n edges.For the odd numbered passes after the first pass, by Lemma 27, the space is at most 5 with high probability.The additional space used in the even numbered passes is trivially bounded by O(n log n).The approximation factor follows from the analysis of Ailon et al. [2008].

min-disagree k with Unit Weights
Our result in this section is based the following algorithm of Giotis and Guruswami [2006] that returns a (1 + ε)-approximation for min-disagree k on unit-weight graphs.Their algorithm is as follows: 1. Sample r = poly(1/ε, k) We first observe the above algorithm can be emulated in min(k − 1, log n) passes in the data stream model.To emulate each recursive step in one pass we simply choose S are the start of the stream and then collect all incident edges on S. We then use the disagree oracle developed in Section 2.1 to find the best possible partitions during post-processing.It is not hard to argue that this algorithm terminates in O(log n) rounds, independent of k: Call clusters with fewer than n/2k nodes "small", and those with at least n/2k nodes "large".Observe that the number of nodes in small clusters halves in each round since there are at most k − 1 small clusters and each has at most n/(2k) nodes.This would suggest a min(k − 1, log n) pass data stream algorithm, one pass to emulate each round of the offline algorithm.However, the next theorem shows that the algorithm can actually be emulated in min(k − 1, log log n) passes.
Proof.To design an O(log log n) pass algorithm, we proceed as follows.At the start of the i-th pass, suppose we have k ′ clusters still to determine and that V i is the set of remaining nodes that have not yet been included in large clusters.We will pick k ′ random sets of samples S 1 , . . ., S k ′ in parallel from V i each of size is unknown to Alice.Any one-way protocol from Alice to Bob that allows Bob to learn x i, j requires Ω(n 2 ) bits of communication [Ablayev, 1996].
Consider the protocol for INDEX where Alice creates a graph G over nodes V = {v 1 , . . ., v n } and adds edges {{v i , v j } : x i, j = 1} each with weight −1.She runs a data stream algorithm on G and sends the state of the algorithm to Bob who adds positive edges {u, v i } and {u, v j } where u is a new node.All edges without a specified weight are treated as not present, or equivalently as having weight zero.Hence the set of weights used in this graph is {−1, 0, +1}.Now, if x i j = 0, then disagree(G) = 0: consider the partition containing {u, v i , v j }, with each other item comprising a singleton cluster.Alternatively, x i j = 1 implies disagree(G) ≥ 1 since a clustering must disagree with one of the three edges on {u, v i , v j }.It follows that every data stream algorithm returning a multiplicative estimate of min-disagree(G) requires Ω(n 2 ) space.
Proof.The proof uses a reduction from the communication problem of DISJ where Alice and Bob have strings x, y ∈ {0, 1} n and wish to determine where there exists an i such that x i = y i = 1.Any p round protocol between Alice and Bob requires Ω(n) bits of communication [Kalyanasundaram and Schnitger, 1992] and hence there must be a message of Ω(n/p) bits.
Consider the protocol for DISJ on a graph G with nodes V = {a 1 , . . ., a n , b 1 , . . ., b n , c 1 , . . ., c n }.For each i ∈ [n], Alice adds an edge {a i , b i } with weight (−1) x i +1 .She runs a data stream algorithm on G and sends the state of the algorithm to Bob.For each i ∈ [n], Bob adds an edge {b i , c i } of weight (−1) y i +1 along with negative edges {{a i , c i } : i ∈ [n]} ∪ {{u, v} : u ∈ {a i , b i , c i }, v ∈ {a j , b j , c j }, i = j} .
Note that min-disagree(G) > 0 iff there exists i with x i = y i = 1.Were there no such i, the positive edges would all be isolated, whereas if x i = y i = 1 then every partition violates one of the edges on {a i , b i , c i }.
It follows that every p-pass data stream algorithm returning a multiplicative estimate of min-disagree(G) requires Ω(n/p) space.
Next we show a lower bound that applies when the number of negative weight edges in bounded.This shows that our upper bound in Theorem 19 is essentially tight.
Theorem 32.A one-pass stream algorithm that tests whether min-disagree(G) = 0, with probability at least 9/10, requires Ω(n + |E − |) bits if permitted weights are {−1, 0, 1}.Finally, we show that the data structure for evaluating 2-clusterings of arbitrarily weighted graphs (Section 2.3) cannot be extended to clusterings with more clusters.
Theorem 33.When |C| = 3, a data structure that returns a multiplicative estimate of disagree(G, C) with probability at least 9/10, requires Ω(n 2 ) space.
Proof.We show a reduction from the communication problem of INDEX where Alice has a string x ∈ {0, 1} n 2 indexed as [n] × [n] and Bob wants to learn x i, j for some i, j ∈ [n] that is unknown to Alice.A one-way protocol from Alice to Bob that allows Bob to learn x i, j requires Ω(n 2 ) bits of communication Ablayev [1996].Consider the protocol for INDEX where Alice creates a graph G over nodes V = {a 1 , . . ., a n , b 1 , . . ., b n } and adds edges {a u b v : x u,v = 1} each with weight −1.She runs a data stream algorithm on G and sends the state of the algorithm to Bob who then queries the partition C = {a i b j , {a ℓ : ℓ = i}, {b ℓ : ℓ = j}}.Since disagree(G, C) = x i j it follows that every data stream algorithm returning multiplicative estimate of disagree(G, C) requires Ω(n 2 ) space.
(a) We apply the multiplicative-weight framework to the Dual LP and try to find an approximately feasible solution y such that c T y ≥ (1 − O(δ ))α and Ay ≤ b, y ≥ 0.
Proof.A lower bound of Ω(|E − |) follows by considering the construction in Theorem 30 on |E − | nodes.A lower bound of Ω(n) when n ≥ |E − | follows by considering the construction in Theorem 31 without adding the negative edges {uv: u ∈ {a i , b i , c i }, v ∈ {a j , b j , c j }, i = j}.

Table 1 :
Summary of approximation results in this paper.