1 Introduction

Given an undirected graph G(VE), the Max Cut problem asks for a partition of the vertices of G into two sets, such that the number of edges with exactly one endpoint in each set of the partition is maximized. This problem can be naturally generalized for weighted (undirected) graphs. A weighted graph is denoted by \(G (V, E, {\textbf{W}})\), where V is the set of vertices, E is the set of edges and \({\textbf{W}}\) is a weight matrix, which specifies a weight \({\textbf{W}}_{i,j}\), for each pair of vertices ij. In particular, we assume that \({\textbf{W}}_{i,j}=0\), for each edge \(\{i,j\} \notin E\).

Definition 1

(Weighted Max Cut) Given a weighted graph \(G (V, E, {\textbf{W}})\), find a partition of V into two (disjoint) subsets AB, so as to maximize the cumulative weight of the edges of G having one endpoint in A and the other in B.

Weighted Max Cut is fundamental in theoretical computer science and is relevant in various graph layout and embedding problems [1]. Furthermore, it also has many practical applications, including infrastructure cost and circuit layout optimization in network and VLSI design [2], minimizing the Hamiltonian of a spin glass model in statistical physics [3], and data clustering [4]. In the worst case Max Cut (and also Weighted Max Cut) is APX-hard, meaning that there is no polynomial-time approximation scheme that finds a solution that is arbitrarily close to the optimum, unless P = NP [5].

The average case analysis of Max Cut, namely the case where the input graph is chosen at random from a probabilistic space of graphs, is also of considerable interest and is further motivated by the desire to justify and understand why various graph partitioning heuristics work well in practical applications. In most research works the input graphs are drawn from the Erdős-Rényi random graphs model \({\mathcal G}_{n, m}\), i.e. random instances are drawn equiprobably from the set of simple undirected graphs on n vertices and m edges, where m is a linear function of n (see also [6, 7] for the average case analysis of Max Cut and its generalizations with respect to other random graph models). One of the earliest results in this area is that Max Cut undergoes a phase transition on \({\mathcal G}_{n, \gamma n}\) at \(\gamma =\frac{1}{2}\) [8], in that the difference between the number of edges of the graph and the Max-Cut size is O(1), for \(\gamma <\frac{1}{2}\), while it is \(\Omega (n)\), when \(\gamma > \frac{1}{2}\). For large values of \(\gamma \), it was proved in [9] that the maximum cut size of \(G_{n, \gamma n}\) normalized by the number of vertices n reaches an absolute limit in probability as \(n \rightarrow \infty \), but it was not until recently that the latter limit was established and expressed analytically in [10], using the interpolation method; in particular, it was shown to be asymptotically equal to \((\frac{\gamma }{2}+P_* \sqrt{\frac{\gamma }{2}})n\), where \(P_* \approx 0.7632\). We note however that these results are existential, and thus do not lead to an efficient approximation scheme for finding a tight approximation of the maximum cut with large enough probability when the input graph is drawn from \({\mathcal G}_{n, \gamma n}\). An efficient approximation scheme in this case was designed in [8], and it was proved that, with high probability, this scheme constructs a cut with at least \(\left( \frac{\gamma }{2} + 0.37613 \sqrt{\gamma }\right) n = (1+0.75226 \frac{1}{\sqrt{\gamma }}) \frac{\gamma }{2}n\) edges, noting that \(\frac{\gamma }{2}n\) is the size of a random cut (in which each vertex is placed independently and equiprobably in one of the two sets of the partition). Whether there exists an efficient approximation scheme that can close the gap between the approximation guarantee of [8] and the limit of [10] remains an open problem.

In this paper, we study the average case analysis of Weighted Max Cut when input graphs are drawn from the weighted random intersection graphs model (the unweighted version of the model was initially defined in [11]), which is defined below. In this model, edges are formed through the intersection of label sets assigned to each vertex and edge weights are equal to the number of common labels between edgepoints.

Definition 2

(Weighted random intersection graph) Consider a universe \({\mathcal M} = \{1, 2, \ldots , m\}\) of labels and a set of n vertices V. We define the \(m \times n\) representation matrix \({\textbf{R}}\) whose entries are independent Bernoulli random variables with probability of success p. For \(\ell \in {\mathcal M}\) and \(v \in V\), we say that vertex v has chosen label \(\ell \) iff \({\textbf{R}}_{\ell , v}=1\). Furthermore, we draw an edge with weight \([{\textbf{R}}^T {\textbf{R}}]_{v,u}\) between any two vertices uv for which this weight is strictly larger than 0. The weighted graph \(G = (V, E, {\textbf{R}}^T {\textbf{R}})\) is then a random instance of the weighted random intersection graphs model \(\overline{\mathcal G}_{n, m, p}\).

Random intersection graphs are relevant to and capture quite nicely social networking; vertices are the individual actors and labels correspond to specific types of interdependency. Other applications include oblivious resource sharing in a (general) distributed setting, efficient and secure communication in sensor networks [12], interactions of mobile agents traversing the web etc. (see e.g. the survey papers [13, 14] for further motivation and recent research related to random intersection graphs). In all these settings, weighted random intersection graphs, in particular, also capture the strength of connections between actors (e.g. in a social network, individuals having several characteristics in common have more intimate relationships than those that share only a few common characteristics). One of the most celebrated results in this area is the equivalence (measured in terms of total variation distance) of random intersection graphs and Erdős-Rényi random graphs when the number of labels satisfies \(m = n^{\alpha }, \alpha >6\) [15]. This bound on the number of labels was improved in [16], where it was proved that the total variation distance between the two models tends to 0 when \(m = n^{\alpha }, \alpha >4\). Furthermore, [17] proved the equivalence of sharp threshold functions among the two models for \(\alpha \ge 3\). Similarity of the two models has been proved even for smaller values of \(\alpha \) (e.g. for any \(\alpha > 1\)) in the form of various translation results (see e.g. Theorem 1 in [18]), suggesting that some algorithmic ideas developed for Erdős-Rényi random graphs also work for random intersection graphs (and also weighted random intersection graphs).

In view of this, in the present paper we study the average case analysis of Weighted Max Cut under the weighted random intersection graphs model, for the range \(m=n^{\alpha }, \alpha \le 1\) for two main reasons: First, the average case analysis of Max Cut has not been considered in the literature so far when the input is drawn from the random intersection graphs model, and thus the asymptotic behaviour of the maximum cut remains unknown especially for the range of values where random intersection graphs and Erdős-Rényi random graphs differ the most. Furthermore, studying a model where we can implicitly control its intersection number (indeed m is an obvious upper bound on the number of cliques that can cover all edges of the graph) may help understand algorithmic bottlenecks for finding maximum cuts in Erdős-Rényi random graphs.

Second, we note that the representation matrix \({\textbf{R}}\) of a weighted random intersection graph can be used to define a random set system \(\Sigma \) consisting of m sets \(\Sigma =\{L_1, \ldots , L_m\}\), where \(L_{\ell }\) is the set of vertices that have chosen label \(\ell \); we say that \({\textbf{R}}\) is the incidence matrix of \(\Sigma \). Therefore, there is a natural connection between Weighted Max Cut and the discrepancy of such random set systems, which we formalize in this paper. In particular, given a set system \(\Sigma \) with incidence matrix \({\textbf{R}}\), its discrepancy is defined as \(\text {disc}(\Sigma ) = \min _{{\textbf{x}} \in \{\pm 1\}^n} \max _{L \in \Sigma } \left|\sum _{v \in L} x_v \right|= \min _{{\textbf{x}} \in \{\pm 1\}^n}\Vert {\textbf{R}} {\textbf{x}} \Vert _{\infty }\), i.e. it is the minimum imbalance of all sets in \(\Sigma \) over all 2-colorings \({\textbf{x}}\). Recent work on the discrepancy of random rectangular matrices defined as above [19] has shown that, when the number of labels (sets) m satisfies \(n \ge 0.73 m \log {m}\), the discrepancy of \(\Sigma \) is at most 1 with high probability. The proof of the main result in [19] is based on a conditional second moment method combined with Stein’s method of exchangeable pairs, and improves upon a Fourier analytic result of [20], and also upon previous results in [21, 22]. The design of an efficient algorithm that can find a 2-coloring having discrepancy O(1) in this range still remains an open problem. Approximation algorithms for a similar model for random set systems were designed and analyzed in [23]; however, the algorithmic ideas there do not apply in our case.

1.1 Our Contribution

In this paper, we introduce the model of weighted random intersection graphs and we study the average case analysis of Weighted Max Cut through the prism of Discrepancy of random set systems. We formalize the connection between these two combinatorial problems for the case of arbitrary weighted intersection graphs in Corollary 1. We prove that, given a weighted intersection graph \(G = (V,E,{\textbf{R}}^T {\textbf{R}})\) with representation matrix \({\textbf{R}}\), and a set system with incidence matrix \({\textbf{R}}\), such that \(\text {disc}(\Sigma ) \le 1\), a 2-coloring has maximum cut weight in G if and only if it achieves minimum discrepancy in \(\Sigma \). In particular, Corollary 1 applies in the range of values considered in [19] (i.e. \(n \ge 0.73\,m \log {m}\)), and thus any algorithm that finds a maximum cut in \(G(V,E,{\textbf{R}}^T {\textbf{R}})\) with large enough probability can also be used to find a 2-coloring with minimum discrepancy in a set system \(\Sigma \) with incidence matrix \({\textbf{R}}\), with the same probability of success.

We then consider weighted random intersection graphs in the case \(m = n^{\alpha }, \alpha \le 1\), and we prove that the maximum cut weight of a random instance \(G(V,E,{\textbf{R}}^T {\textbf{R}})\) of \(\overline{{\mathcal G}}_{n, m, p}\) concentrates around its expected value (see Theorem 2). In particular, with high probability over the choices of \({\textbf{R}}\), \(\texttt {Max-Cut}(G) \sim \mathbb {E}_{{\textbf{R}}}[\texttt {Max-Cut}(G)]\), where \(\mathbb {E}_{{\textbf{R}}}\) denotes expectation with respect to \({\textbf{R}}\). The proof is based on the Efron-Stein inequality for upper bounding the variance of the maximum cut. As a consequence of our concentration result, we prove in Theorem 3 that, in the case \(\alpha <1\), a random 2-coloring (i.e. biparition) \({\textbf{x}}^{(rand)}\) in which each vertex chooses its color independently and equiprobably, has cut weight asymptotically equal to \(\texttt {Max-Cut}(G)\), with high probability over the choices of \({\textbf{x}}^{(rand)}\) and \({\textbf{R}}\).

The latter result on random cuts allows us to focus the analysis of our randomized algorithms of Sect. 4 on the case \(m=n\) (i.e. \(\alpha =1\)), and \(p = \frac{c}{n}\), for some constant c (see also the discussion at the end of Sect. 3.1), where the assumptions of Theorem 3 do not hold. It is worth noting that, in this range of values, the expected weight of a fixed edge in a weighted random intersection graph is equal to \(mp^2 = \Theta (1/n)\), and thus we hope that our work here will serve as an intermediate step towards understanding when algorithmic bottlenecks for Max Cut appear in sparse random graphs (especially Erdős-Rényi random graphs) with respect to the intersection number. In particular, in Sect. 4.1, we analyze the Majority Cut Algorithm that extends the algorithmic idea of [8] to weighted intersection graphs as follows: vertices are colored sequentially (each color \(+1\) or \(-1\) corresponding to a different set in the partition of the vertices), and the t-th vertex is colored opposite to the sign of \(\sum _{i \in [t-1]} [{\textbf{R}}^T {\textbf{R}}]_{i,t} x_i\), namely the total available weight of its incident edges, taking into account colors of adjacent vertices. Our average case analysis of the Majority Cut Algorithm shows that, when \(m=n\) and \(p = \frac{c}{n}\), for large constant c, with high probability over the choices of \({\textbf{R}}\), the expected weight of the constructed cut is at least \(1+\beta \) times larger than the expected weight of a random cut, for any constant \(\beta = \beta (c) \le \sqrt{\frac{8}{27 \pi c^3}} - o(1)\). The fact that the lower bound on beta is inversely proportional to \(c^{3/2}\) was to be expected, because, as p increases, the approximation of the maximum cut that we get from the weight of a random cut improves (see also the discussion at the end of Sect. 3.1).

In Sect. 4.2 we propose a framework for finding maximum cuts in weighted random intersection graphs for \(m=n\) and \(p = \frac{c}{n}\), for constant c, by exploiting the connection between Weighted Max Cut and the problem of discrepancy minimization in random set systems. In particular, we design the Weak Bipartization Algorithm, that takes as input an intersection graph with representation matrix \({\textbf{R}}\) and outputs a subgraph that is “almost” bipartite. In fact, the input intersection graph is treated as a multigraph composed by overlapping cliques formed by the label sets \(L_{\ell } = \{v: {\textbf{R}}_{\ell , v}=1\}, \ell \in {\mathcal M}\). The algorithm attempts to destroy all odd cycles of the input (except from odd cycles that are formed by labels with only two vertices) by replacing each clique induced by some label set \(L_{\ell }\) by a random maximal matching. In Theorem 5 we prove that, with high probability over the choices of \({\textbf{R}}\), if the Weak Bipartization Algorithm terminates, then its output can be used to construct a 2-coloring that has minimum discrepancy in a set system with incidence matrix \({\textbf{R}}\), which also gives a maximum cut in \(G(V,E,{\textbf{R}}^T {\textbf{R}})\). It is worth noting that this does not follow from Corollary 1, because a random set system with incidence matrix \({\textbf{R}}\) has discrepancy larger than 1 with (at least) constant probability when \(m=n\) and \(p = \frac{c}{n}\). Our proof relies on a structural property of closed 0-strong vertex-label sequences (loosely defined as closed walks of edges formed by distinct labels) in the weighted random intersection graph \(G(V, E, {\textbf{R}}^T {\textbf{R}})\) (Lemma 1). Finally, in Theorem 6, we prove that our Weak Bipartization Algorithm terminates in polynomial time, with high probability, if the constant c is strictly less than 1. Therefore, there is a polynomial time algorithm for finding weighted maximum cuts, with high probability, when the input is drawn from \(\overline{{\mathcal G}}_{n, n, \frac{c}{n}}\), with \(c<1\). We believe that this part of our work may also be of interest regarding the design of efficient algorithms for finding minimum disrepancy colorings in random set systems.

A preliminary version of this paper appeared in the Proceedings of the 32nd International Symposium on Algorithms and Computation (ISAAC) [24].

2 Notation and Preliminary Results

We denote weighted undirected graphs by \(G(V, E, {\textbf{W}})\); in particular, \(V=V(G)\) (resp. \(E=E(G)\)) is the set of vertices (resp. set of edges) and \({\textbf{W}} = {\textbf{W}}(G)\) is the weight matrix, i.e. \({\textbf{W}}_{i, j}\) is the weight of (undirected) edge \(\{i,j\} \in E\). We allow \({\textbf{W}}\) to have non-zero diagonal entries, as these do not affect cut weights. We also denote the number of vertices by n, and we use the notation \([n] = \{1,2,\ldots ,n\}\). We also use this notation to define parts of matrices, for example \({\textbf{W}}_{[n], 1}\) denotes the first column of the weight matrix.

A bipartition of the set of vertices is a partition of V into two nonempty sets AB, such that \(A \cap B = \emptyset \) and \(A \cup B = V\). Bipartitions correspond to 2-colorings, which we denote by vectors \({\textbf{x}}\) such that \(x_i=+1\) if \(i \in A\) and \(x_i=-1\) if \(i \in B\).

Given a weighted graph \(G(V, E, {\textbf{W}})\), we denote by \(\texttt {Cut}(G, {\textbf{x}})\) the weight of a cut defined by a bipartition \({\textbf{x}}\), namely \(\texttt {Cut}(G, {\textbf{x}}) = \sum _{\{i, j\} \in E: i \in A, j \in B} {\textbf{W}}_{i,j} = \frac{1}{4} \sum _{\{i, j\} \in E} {\textbf{W}}_{i,j} (x_i-x_j)^2\). The maximum cut of G is \(\texttt {Max-Cut}(G) = \max _{{\textbf{x}} \in \{-1, +1\}^n} \texttt {Cut}(G, {\textbf{x}})\).

For a weighted random intersection graph \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) with representation matrix \({\textbf{R}}\), we denote by \(S_v\) the set of labels chosen by vertex \(v \in V\), i.e. \(S_v = \{\ell : {\textbf{R}}_{\ell , v}=1\}\). Furthermore, we denote by \(L_{\ell }\) the set of vertices having chosen label \(\ell \), i.e. \(L_{\ell }=\{v:{\textbf{R}}_{\ell , v}=1\}\). Using this notation, the weight of an edge \(\{v, u\} \in E\) is \(\left|S_v \cap S_u \right|\); notice also that this is equal to 0 when \(\{v, u\} \notin E\). We also note here that we may also think of a weighted random intersection graph as a simple weighted graph where, for any pair of vertices vu, there are \(\left|S_v \cap S_u \right|\) simple edges between them.

A set system \(\Sigma \) defined on a set V is a family of sets \(\Sigma = \{L_1, L_2, \ldots , L_m\}\), where \(L_\ell \subseteq V, \ell \in [m]\). The incidence matrix of \(\Sigma \) is an \(m \times n\) matrix \({\textbf{R}} = {\textbf{R}}(\Sigma )\), where for any \(\ell \in [m], v \in [n]\), \({\textbf{R}}_{\ell , v} = 1\) if \(v \in S_{\ell }\) and 0 otherwise. The discrerpancy of \(\Sigma \) with respect to a 2-coloring \({\textbf{x}}\) of the vertices in V is \(\text {disc}(\Sigma , {\textbf{x}}) = \max _{\ell \in [m]} \left|\sum _{v \in V} {\textbf{R}}_{\ell , v} x_v \right|= \Vert {\textbf{R}} {\textbf{x}} \Vert _{\infty }\). The discrepancy of \(\Sigma \) is \(\text {disc}(\Sigma ) = \min _{{\textbf{x}} \in \{-1, +1\}^n} \text {disc}(\Sigma , {\textbf{x}})\).

It is well-known that the cut size of a bipartition of the set of vertices of a graph G(VE) into sets A and B is given by \(\frac{1}{4} \sum _{\{i,j\} \in E} (x_i-x_j)^2\), where \(x_i=+1\) if \(i \in A\) and \(x_i=-1\) if \(i \in B\). This can be naturally generalized for multigraphs and also for weighted graphs. In particular, the Max-Cut size of a weighted graph \(G(V, E, {\textbf{W}})\) is given by

$$\begin{aligned} \texttt {Max-Cut}(G) = \max _{{\textbf{x}} \in \{-1, +1\}^n} \frac{1}{4} \sum _{\{i,j\} \in E} {\textbf{W}}_{i,j} (x_i-x_j)^2. \end{aligned}$$
(1)

In particular, we get the following Proposition:

Proposition 1

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a weighted intersection graph with representation matrix \({\textbf{R}}\). Then, for any \({\textbf{x}} \in \{-1, +1\}^n\),

$$\begin{aligned} \texttt {Cut}(G, {\textbf{x}}) = \frac{1}{4} \left( \sum _{i,j \in [n]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} - \left\| {\textbf{R}} {\textbf{x}} \right\| ^2 \right) \end{aligned}$$
(2)

and so

$$\begin{aligned} \texttt {Max-Cut}(G) = \frac{1}{4} \left( \sum _{i,j \in [n]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} - \min _{{\textbf{x}} \in \{-1, +1\}^n} \left\| {\textbf{R}} {\textbf{x}} \right\| ^2 \right) , \end{aligned}$$
(3)

where \(\Vert \cdot \Vert \) denotes the 2-norm. In particular, the expectation of the size of a random cut, where each entry of \({\textbf{x}}\) is independently and equiprobably either +1 or -1 is equal to \(\mathbb {E}_{{\textbf{x}}}\left[ \texttt {Cut}(G, {\textbf{x}})\right] = \frac{1}{4} \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j}\), where \(\mathbb {E}_{{\textbf{x}}}\) denotes expectation with respect to \({\textbf{x}}\).

Proof

We first note that, by straightforward calculation, for any weighted graph \(G(V, E, {\textbf{W}})\), where \({\textbf{W}}\) is symmetric and \({\textbf{W}}_{i,j} = 0\) if \(\{i,j\} \notin E\), and any \({\textbf{x}} \in \{-1,+1\}^n\), we have

$$\begin{aligned} \sum _{i,j \in [n]^2} {\textbf{W}}_{i,j} - {\textbf{x}}^T {\textbf{W}} {\textbf{x}}= & {} \sum _{i,j \in [n]^2} {\textbf{W}}_{i,j} - \sum _{i,j \in [n]^2} {\textbf{W}}_{i,j} x_i x_j \\= & {} \frac{1}{2} \sum _{i,j \in [n]^2} {\textbf{W}}_{i,j} \left( x_i^2+x_j^2 - 2x_i x_j\right) \\= & {} \frac{1}{2} \sum _{i,j \in [n]^2} {\textbf{W}}_{i,j} \left( x_i-x_j\right) ^2 \\= & {} \sum _{\{i,j\} \in E} {\textbf{W}}_{i,j} \left( x_i-x_j\right) ^2. \end{aligned}$$

Noting that \(\texttt {Cut}(G) = \frac{1}{4} \sum _{\{i,j\} \in E} {\textbf{W}}_{i,j} (x_i-x_j)^2\), the above settle equation (2), by taking \({\textbf{W}} = {\textbf{R}}^T {\textbf{R}}\). Similarly, by Eq. (1), and since the term \(\sum _{i,j \in [n]^2} {\textbf{W}}_{i,j}\) is independent of \({\textbf{x}}\), we have

$$\begin{aligned} \texttt {Max-Cut}(G) = \frac{1}{4} \left( \sum _{i,j \in [n]^2} {\textbf{W}}_{i,j} - \min _{{\textbf{x}} \in \{-1, +1\}^n} {\textbf{x}}^T {\textbf{W}} {\textbf{x}} \right) , \end{aligned}$$
(4)

which settles equation (3), by taking \({\textbf{W}} = {\textbf{R}}^T {\textbf{R}}\).

For the last part of the Proposition, notice that diagonal entries of the weight matrix in (4) cancel out, and so, for any \({\textbf{x}} \in \{-1, +1\}^n\), setting \({\textbf{W}} = {\textbf{R}}^T {\textbf{R}}\), we have

$$\begin{aligned} \sum _{i,j \in [n]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} - \left\| {\textbf{R}} {\textbf{x}} \right\| ^2 = \sum _{i\ne j, i,j \in [n]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} - \sum _{i\ne j, i,j \in [n]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} x_i x_j. \end{aligned}$$

Taking expectations with respect to \({\textbf{x}}\), the contribution of the second sum in the above expression equals 0, which completes the proof. \(\square \)

Since \(\sum _{i,j \in [n]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j}\) is fixed for any given representation matrix \({\textbf{R}}\), the above Proposition implies that, to find a bipartition of the vertex set V that corresponds to a maximum cut, we need to find an n-dimensional vector in \(\arg \min _{{\textbf{x}} \in \{-1, +1\}^n} \left\| {\textbf{R}} {\textbf{x}} \right\| ^2\). We thus get the following:

Corollary 1

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a weighted intersection graph with representation matrix \({\textbf{R}}\) and \(\Sigma \) a set system with incidence matrix \({\textbf{R}}\). If \(\text {disc}(\Sigma ) \le 1\), then \({\textbf{x}}^* \in \arg \min _{{\textbf{x}} \in \{-1, +1\}^n} \left\| {\textbf{R}} {\textbf{x}} \right\| ^2\) if and only if \({\textbf{x}}^* \in \arg \min _{{\textbf{x}} \in \{-1, +1\}^n} \text {disc}(\Sigma , {\textbf{x}})\). In particular, if the minimum discrepancy of \(\Sigma \) is at most 1, a bipartition corresponds to a maximum cut iff it achieves minimum discrepancy.

Proof

Since \(\text {disc}(\Sigma , {\textbf{x}}^*) \le 1\), then each component of \({\textbf{R}}{\textbf{x}}^*\) is either 0 or 1, for any \({\textbf{x}}^* \in \{-1, +1\}^n\). In particular, since every element of \({\textbf{R}}\) is either 0 or 1, for any \(\ell \in [m]\), \(\left[ {\textbf{R}}{\textbf{x}}^*\right] _{\ell }\) will be equal to 0, if and only if the number of ones in the \(\ell \)-th row is even, and it will be equal to 1 otherwise. This is the best one can hope for, since sets with an odd number of elements can never have discrepancy less than 1. Therefore, \(\Vert {\textbf{R}} {\textbf{x}}^*\Vert \) is also the minimum possible. In particular, this implies that, in the case \(\text {disc}(\Sigma , {\textbf{x}}^*) \le 1\), any 2-coloring that achieves minimum discrepancy gives a bipartition that corresponds to a maximum cut and vice versa. \(\square \)

Notice that the above result is not necessarily true when \(\text {disc}(\Sigma ) > 1\), since the minimum of \(\Vert {\textbf{R}} {\textbf{x}} \Vert \) could be achieved by 2-colorings with larger discrepancy than the optimal.

2.1 Range of Values for Selection Probability

Concerning the success probability p, we note that, when \(n,m \rightarrow \infty \), and \(p = o\left( \sqrt{\frac{1}{nm}} \right) \), direct application of the results of [25] suggest that \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) is chordal with high probability, but in fact the same proofs reveal that a stronger property holds, namely that there is no closed vertex-label sequence (refer to the precise definition in Sect. 4.2) having distinct labels. Therefore, in this case, finding a bipartition with maximum cut weight is straightforward: indeed, one way to construct a maximum cut is to run our Weak Bipartization Algorithm from Sect. 4.2, and then to apply Theorem 5 (noting that the algorithm’s termination condition trivially holds, since the set \({\mathcal C}_{odd}(G^{(b)})\) defined in Sect. 4.2 is empty). In view of this, in Sect. 3, we will assume that \(p \ge C_1 \sqrt{\frac{1}{nm}}\), for arbitrary positive constant \(C_1\) that can be as small as possible; this implies that edge weights are \(\Omega \left( \sqrt{\frac{m}{n}} \right) \) on expectation. On the other hand, in view of our results in Sect. 3.1 regarding the near optimality of the weight of a random cut, in the analysis of our randomized algorithms in Sect. 4, we assume \(n=m\) and \(p = \Theta \left( \frac{1}{n} \right) \); this range of values gives sparse graph instances, but the corresponding distribution of weighted random intersection graphs is different from the distribution of sparse Erdős-Rényi random graphs, even without taking weights into account (please refer to the end of Sect. 3.1 for a more technical justification for the latter assumption).

3 Concentration of Max-Cut

In this section, we prove that the size of the maximum cut in a weighted random intersection graph concentrates around its expected value. We note however, that the following Theorem does not provide an explicit formula for the expected value of the maximum cut.

Theorem 2

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model with \(m=n^a, \alpha \le 1\), and \(p \ge C_1 \sqrt{\frac{1}{nm}}\), for arbitrary positive constant \(C_1\), and let \({\textbf{R}}\) be its representation matrix. Then \(\texttt {Max-Cut}(G) = (1 \pm o(1)) \mathbb {E}_{{\textbf{R}}}[\texttt {Max-Cut}(G)]\) with high probability, as \(n \rightarrow \infty \), where \(\mathbb {E}_{{\textbf{R}}}\) denotes expectation with respect to \({\textbf{R}}\), i.e. \(\texttt {Max-Cut}(G)\) concentrates around its expected value.

Proof

Let \(G=G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a weighted random intersection graph, and let \({\textbf{D}}\) denote the (random) diagonal matrix containing all diagonal elements of \({\textbf{R}}^T{\textbf{R}}\). In particular, Eq. (3) of Proposition 1 can be written as

$$\begin{aligned} \texttt {Max-Cut}(G) = \frac{1}{4} \left( \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} - \min _{{\textbf{x}} \in \{-1, +1\}^n} {\textbf{x}}^T \left( {\textbf{R}}^T {\textbf{R}} -{\textbf{D}}\right) {\textbf{x}} \right) . \end{aligned}$$

Furthermore, for any given \({\textbf{R}}\), notice that, if we select each element of \({\textbf{x}}\) independently and equiprobably from \(\{-1, +1\}\), then \(\mathbb {E}_{{\textbf{x}}}[{\textbf{x}}^T \left( {\textbf{R}}^T {\textbf{R}} -{\textbf{D}}\right) {\textbf{x}}]=0\), where \(\mathbb {E}_{{\textbf{x}}}\) denotes expectation with respect to \({\textbf{x}}\). By the probabilistic method, we thus have \(\min _{{\textbf{x}} \in \{-1, +1\}^n} {\textbf{x}}^T \left( {\textbf{R}}^T {\textbf{R}} -{\textbf{D}}\right) {\textbf{x}} \le 0\), implying the following bound:

$$\begin{aligned} \frac{1}{4} \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} \le \texttt {Max-Cut}(G) \le \frac{1}{2} \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j}, \end{aligned}$$
(5)

where the second inequality follows trivially by observing that \(\frac{1}{2} \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j}\) equals the sum of the weights of all edges.

By linearity of expectation, we have \(\mathbb {E}_{{\textbf{R}}}\left[ \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} \right] = \mathbb {E}_{{\textbf{R}}}\left[ \sum _{i\ne j, i,j \in [n]} \sum _{\ell \in [m]} {\textbf{R}}_{\ell ,i} {\textbf{R}}_{\ell , j} \right] = n(n-1)mp^2 = \Theta (n^2mp^2)\), which is \(\Omega (n)\) in the range of parameters that we consider. In particular, by (5), we have

$$\begin{aligned} \mathbb {E}_{{\textbf{R}}}[\texttt {Max-Cut}(G)] = \Theta (n^2mp^2). \end{aligned}$$
(6)

By Chebyshev’s inequality, for any \(\epsilon >0\), we have

$$\begin{aligned} \Pr \left( \left|\texttt {Max-Cut}(G)-\mathbb {E}_{{\textbf{R}}}[\texttt {Max-Cut}(G)]\right|\ge \epsilon n^2mp^2 \right) \le \frac{\text {Var}_{{\textbf{R}}}(\texttt {Max-Cut}(G))}{\epsilon ^2 n^4m^2p^4}, \end{aligned}$$
(7)

where \(\text {Var}_{{\textbf{R}}}\) denotes variance with respect to \({\textbf{R}}\). To bound the variance on the right hand side of the above inequality, we use the Efron-Stein inequality. In particular, we write \(\texttt {Max-Cut}(G):= f({\textbf{R}})\), i.e. we view \(\texttt {Max-Cut}(G)\) as a function of the label choices. For \(\ell \in [m], i \in [n]\), we also write \({\textbf{R}}^{(\ell , i)}\) for the matrix \({\textbf{R}}\) where entry \((\ell , i)\) has been replaced by an independent, identically distributed (i.i.d.) copy of \({\textbf{R}}_{\ell , i}\), which we denote by \({\textbf{R}}_{\ell , i}'\). By the Efron-Stein inequality, we now have

$$\begin{aligned} \text {Var}_{{\textbf{R}}}(\texttt {Max-Cut}(G)) \le \frac{1}{2} \sum _{\ell \in [m], i \in [n]} \mathbb {E}\left[ \left( f({\textbf{R}}) - f\left( {\textbf{R}}^{(\ell , i)} \right) \right) ^2 \right] . \end{aligned}$$
(8)

Notice now that, given all entries of \({\textbf{R}}\) except \({\textbf{R}}_{\ell , i}\), the probability that \(f({\textbf{R}})\) is different from \(f\left( {\textbf{R}}^{(\ell , i)} \right) \) is at most \(\Pr ({\textbf{R}}_{\ell , i} \ne {\textbf{R}}_{\ell , i}') = 2p(1-p)\). Furthermore, if \(L_{\ell } \backslash \{i\}\) is the set of vertices different from i which have selected \(\ell \), we then have that \(\left( f({\textbf{R}}) - f\left( {\textbf{R}}^{(\ell , i)} \right) \right) ^2 \le \left|L_{\ell } \backslash \{i\} \right|^2\), because the intersection graph with representation matrix \({\textbf{R}}\) differs by at most \(\left|L_{\ell } \backslash \{i\} \right|\) edges from the intersection graph with representation matrix \({\textbf{R}}^{(\ell , i)}\). Notice now that, by definition, \(\left|L_{\ell } \backslash \{i\} \right|\) follows the Binomial distribution \({\mathcal B}(n-1, p)\). In particular, \(\mathbb {E} \left[ \left|L_{\ell } \backslash \{i\} \right|^2 \right] = (n-1)p(np-2p+1)\), implying \(\mathbb {E}\left[ \left( f({\textbf{R}}) - f\left( {\textbf{R}}^{(\ell , i)} \right) \right) ^2 \right] \le 2p(1-p) (n-1)p(np-2p+1)\), for any fixed \(\ell \in [m], i \in [n]\).

Putting this all together, (8) becomes

$$\begin{aligned} \text {Var}_{{\textbf{R}}}(\texttt {Max-Cut}(G))\le & {} \frac{1}{2} \sum _{\ell \in [m], i \in [n]} 2p(1-p) (n-1)p(np-2p+1) \nonumber \\= & {} nm p(1-p) (n-1)p(np-2p+1) = O(n^3mp^3), \end{aligned}$$
(9)

where the last equation comes from the fact that, in the range of values that we consider, we have \(np = \Omega (1)\). Therefore, by (7), we get

$$\begin{aligned}{} & {} \Pr \left( \left|\texttt {Max-Cut}(G)-\mathbb {E}_{{\textbf{R}}}[\texttt {Max-Cut}(G)]\right|\ge \epsilon n^2mp^2 \right) \\{} & {} \quad \le \frac{O(n^3mp^3)}{\epsilon ^2 n^4m^2p^4} = O\left( \frac{1}{\epsilon ^2 nmp}\right) , \end{aligned}$$

which goes to 0 in the range of values that we consider. Together with (6), the above bound proves that \(\texttt {Max-Cut}(G)\) is concentrated around its expected value, and the proof is completed. \(\square \)

3.1 Max-Cut for Small Number of Labels

Using Theorem 2, we can now show that, in the case \(m = n^{\alpha }, \alpha <1\), and \(p = \Omega \left( \sqrt{\frac{1}{nm}}\right) \), a random cut has asymptotically the same weight as \(\texttt {Max-Cut}(G)\), where \(G=G(V,E, {\textbf{R}}^T {\textbf{R}})\) is a random instance of \(\overline{\mathcal G}_{n, m, p}\). In particular, let \({\textbf{x}}^{(rand)}\) be constructed as follows: for each \(i \in [n]\), set \(x^{(rand)}_{i} = -1\) independently with probability \(\frac{1}{2}\), and \(x^{(rand)}_{i} = +1\) otherwise. In view of Eq. (3), the main idea for the proof of the following Theorem is to show that, with high probability over random \({\textbf{x}}\) and \({\textbf{R}}\), \(\Vert {\textbf{R}} {\textbf{x}}\Vert ^2\) is asymptotically smaller than the expectation of the weight of the cut defined by \({\textbf{x}}^{(rand)}\). The result then follows by concentration of \(\texttt {Max-Cut}(G)\) around its expected value, and straightforward bounds on \(\texttt {Max-Cut}(G)\).

Theorem 3

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model with \(m=n^a, \alpha < 1\), and \(p \ge C_1 \sqrt{\frac{1}{nm}}\), for arbitrary positive constant \(C_1\), and let \({\textbf{R}}\) be its representation matrix. Then the cut weight of the random 2-coloring \({\textbf{x}}^{(rand)}\) satisfies \(\texttt {Cut}(G, {\textbf{x}}^{(rand)}) = (1-o(1)) \texttt {Max-Cut}(G)\) with high probability over the choices of \({\textbf{x}}^{(rand)}\), \({\textbf{R}}\).

Proof

Let \(G=G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a weighted random intersection graph. By Eq. (2) of Proposition 1, for any \({\textbf{x}} \in \{-1, +1\}^n\), we have:

$$\begin{aligned} \texttt {Cut}(G, {\textbf{x}}) = \frac{1}{4} \left( \sum _{i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} - \Vert {\textbf{R}} {\textbf{x}}\Vert ^2 \right) . \end{aligned}$$
(10)

Taking expectations with respect to random \({\textbf{x}}\) and \({\textbf{R}}\), we get

$$\begin{aligned} \mathbb {E}_{{\textbf{x}}, {\textbf{R}}}[\texttt {Cut}(G, {\textbf{x}})]= & {} \frac{1}{4} \cdot \mathbb {E}_{{\textbf{R}}}\left[ \sum _{i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} - \sum _{i \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,i} \right] \nonumber \\= & {} \frac{1}{4} \cdot \mathbb {E}_{{\textbf{R}}}\left[ \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} \right] = \frac{1}{4} n(n-1)mp^2. \end{aligned}$$
(11)

To prove Theorem 3, we will show that, with high probability over random \({\textbf{x}}\) and \({\textbf{R}}\), we have \(\Vert {\textbf{R}} {\textbf{x}}\Vert ^2 = o\left( \mathbb {E}_{{\textbf{R}}}\left[ \frac{1}{4} \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} \right] \right) = o(n^2mp^2)\), in which case the theorem follows by concentration of \(\texttt {Max-Cut}(G)\) around its expected value (Theorem 2), and the fact that \(\texttt {Max-Cut}(G) \ge \frac{1}{4} \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j}\) (see Eq. (5)). Indeed, by Eq. (10) and the lower bound on \(\texttt {Max-Cut}(G)\), we get that \(\texttt {Max-Cut}(G) - \Vert {\textbf{R}} {\textbf{x}}\Vert ^2 \le \texttt {Cut}(G, {\textbf{x}}) \le \texttt {Max-Cut}(G)\). Furthermore, by concentration of \(\texttt {Max-Cut}(G)\) around its expected value and the fact that \(\mathbb {E}_{{\textbf{R}}}[\texttt {Max-Cut}(G)] = \Theta (n^2mp^2)\) (Eq. (6)), we get that \(\texttt {Max-Cut}(G) = \Theta (n^2mp^2)\), with high probability. Therefore, having \(\Vert {\textbf{R}} {\textbf{x}}\Vert ^2 = o(n^2mp^2)\) implies \(\texttt {Max-Cut}(G) - o(\texttt {Max-Cut}(G)) \le \texttt {Cut}(G, {\textbf{x}}) \le \texttt {Max-Cut}(G)\), as needed.

To this end, fix \(\ell \in [m]\) and consider the random variable counting the number of ones in the \(\ell \)-th row of \({\textbf{R}}\), namely \(Y_{\ell } = \sum _{i \in [n]} {\textbf{R}}_{\ell , i}\). By the multiplicative Chernoff bound, for any \(\delta >0\),

$$\begin{aligned} \Pr (Y_{\ell }> (1+\delta ) np ) \le \left( \frac{e^{\delta }}{(1+\delta )^{1+\delta }} \right) ^{np}. \end{aligned}$$

Since \(np \ge C_1 \sqrt{\frac{n}{m}} = C_1 n^{\frac{1-\alpha }{2}}\), taking any \(\delta \ge 2\), we get

$$\begin{aligned} \Pr (Y_{\ell }> 3 np ) \le \left( \frac{e^{2}}{27} \right) ^{np} = o\left( \frac{1}{m} \right) . \end{aligned}$$
(12)

Therefore, by the union bound,

$$\begin{aligned} \Pr (\exists \ell \in [m]: Y_{\ell }> 3 np ) = o(1), \end{aligned}$$
(13)

implying that, all rows of \({\textbf{R}}\) have at most 3np non-zero elements with high probability.

Fix now \(\ell \) and consider the random variable corresponding to the \(\ell \)-th entry of \({\textbf{R}} {\textbf{x}}\), namely \(Z_{\ell } = \sum _{i \in [n]} {\textbf{R}}_{\ell , i} x_i\). In particular, given \(Y_{\ell }\), notice that \(Z_{\ell }\) is equal to the sum of \(Y_{\ell }\) independent random variables \(x_i \in \{-1, +1\}\), for i such that \({\textbf{R}}_{\ell , i}=1\). Therefore, since \(\mathbb {E}_{{\textbf{x}}}[Z_{\ell }] = \mathbb {E}_{{\textbf{x}}}[Z_{\ell } |Y_{\ell }]=0\), by Hoeffding’s inequality, for any \(\lambda \ge 0\),

$$\begin{aligned} \Pr (\left|Z_{\ell } \right|>\lambda |Y_{\ell }) \le 2 e^{-\frac{\lambda ^2}{2Y_{\ell }}}. \end{aligned}$$

Therefore, by the union bound, and taking \(\lambda \ge \sqrt{6 np \ln {n}}\),

$$\begin{aligned} \Pr (\exists \ell \in [m]:\left|Z_{\ell } \right|>\lambda )\le & {} \Pr (\exists \ell \in [m]: Y_{\ell }> 3 np ) + 2 m e^{-\frac{\lambda ^2}{6np}} \nonumber \\= & {} o(1)+ \frac{2m}{n} = o(1), \end{aligned}$$
(14)

implying that all entries of \({\textbf{R}} {\textbf{x}}\) have absolute value at most \(\sqrt{6 np \ln {n}}\) with high probability over the choices of \({\textbf{x}}\) and \({\textbf{R}}\). Consequently, with high probability over the choices of \({\textbf{x}}\) and \({\textbf{R}}\), we have \(\Vert {\textbf{R}} {\textbf{x}}\Vert ^2 \le 6mnp \ln {n}\), which is \(o(n^2mp^2)\), since \(\ln {n} = o(np)\) in the range of parameters considered in this theorem. This completes the proof. \(\square \)

We note that the same analysis also holds when \(n=m\) and p is sufficiently large (e.g. \(\ln {n} = o(np)\)). In particular, similar probability bounds hold in Eqs. (12), (13) and (14), for the same choices of \(\delta \ge 2\) and \(\lambda \ge \sqrt{7 np \ln {n}}\), implying that \(\Vert {\textbf{R}} {\textbf{x}}\Vert ^2 \le 7mnp \ln {n} = o(n^2mp^2)\) with high probability. In view of this, in the following sections we will only assume \(m=n\) (i.e. \(\alpha =1\)) and also \(p = \frac{c}{n}\), for some positive constant c (note that, we no longer have \(\ln {n} = o(np)\), as p is much smaller, and so the above proof idea does not apply in this case).

4 Algorithmic Results (Randomized Algorithms)

4.1 The Majority Cut Algorithm

In the following algorithm, the 2-coloring representing the bipartition of a cut is constructed as follows: initially, a small constant fraction \(\epsilon \) of vertices are randomly placed in the two partitions, and then in each subsequent step, one of the remaining vertices is placed in the partition that maximizes the weight of incident edges with endpoints in the opposite partition.

figure a

Clearly the Majority Cut Algorithm runs in polynomial time in nm. Furthermore, the following Theorem provides a lower bound on the expected weight of the cut constructed by the algorithm in the case \(m=n\), \(p = \frac{c}{n}\), for large constant c, and \(\epsilon \rightarrow 0\). For the proof, we first express the weight increase of the constructed cut due to the coloring of the t-th vertex, in the subgraph induced by the colored vertices, as the absolute value of a random variable \(Z_t\). Then, given the colors and label choices of all previously colored vertices (namely vertices \(v_1, \ldots , v_{t-1}\)) we lower bound the conditional expectation of \(|Z_t |\) by the mean absolute difference \(\textrm{MD}(Z^B_t)\) of a certain binomial random variable \(Z_t^B\). Finally, we lower bound \(\textrm{MD}(Z^B_t)\) by using the Berry-Esseen Theorem for Gaussian approximation, which is stated below.

Theorem

(Berry-Esseen Theorem [26]) Let \(X_1, X_2, \ldots ,\) be independent, identically distributed random variables, with \(\mathbb {E}[X_i]=0, \mathbb {E}[X_i^2] = \sigma ^2>0\), and \(\mathbb {E}[|X_i|^3] = \rho < \infty \). For \(N>0\), let \(F_N(\cdot )\) be the cumulative distribution function of \(\frac{X_1+\cdots +X_N}{\sigma \sqrt{N}}\), and let \(\Phi (\cdot )\) be the cumulative distribution function of the standard normal distribution. Then, \(\sup _{x \in \mathbb {R}} |F_N(x)-\Phi (x) |\le \frac{0.4748 \rho }{\sigma ^3 \sqrt{N}}\).

We now state and prove the main theorem in this section.

Theorem 4

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model, with \(m=n\), and \(p = \frac{c}{n}\), for large positive constant c, and let \({\textbf{R}}\) be its representation matrix. Then, with high probability over the choices of \({\textbf{R}}\), the majority algorithm constructs a cut with expected weight at least \((1+\beta ) \frac{1}{4} \mathbb {E}\left[ \sum _{i\ne j, i,j \in [n]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} \right] \), where \(\beta = \beta (c) \le \sqrt{\frac{8}{27 \pi c^3}} - o(1)\) is a constant, i.e. at least \(1+\beta \) times larger than the expected weight of a random cut.

Proof

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) (i.e. the input to the Majority Cut Algorithm) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model, with \(m=n\), and \(p = \frac{c}{n}\), for some large enough constant c. For \(t \in [n]\), let \(M_t\) denote the constructed cut size just after the consideration of a vertex \(v_t\), for some \(t \ge \epsilon n+1\). In particular, by Eq. (2) for \(n=t\), reasoning similarly as to get Eq. (3), and since the values \(x_1, \ldots , x_{t-1}\) are already decided in previous steps, we have

$$\begin{aligned} M_t= & {} \frac{1}{4} \left( \sum _{i,j \in [t]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} - \min _{x_t \in \{-1, +1\}} \left\| {\textbf{R}}_{[m], [t]} {\textbf{x}}_{[t]} \right\| ^2 \right) \end{aligned}$$
(15)

The first of the above terms is

$$\begin{aligned} \frac{1}{4} \sum _{i,j \in [t]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} = \frac{1}{4} \left( \sum _{i,j \in [t-1]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} + 2 \sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} + \left[ {\textbf{R}}^T {\textbf{R}} \right] _{t,t}\right) \nonumber \\ \end{aligned}$$
(16)

and the second term is

$$\begin{aligned}{} & {} -\frac{1}{4} \min _{x_t \in \{-1, +1\}} \left\| {\textbf{R}}_{[m], [t]} {\textbf{x}}_{[t]} \right\| ^2 \nonumber \\{} & {} \quad = -\frac{1}{4} \min _{x_t \in \{-1, +1\}} \left\| {\textbf{R}}_{[m],t} x_t + \sum _{i \in [t-1]} {\textbf{R}}_{[m],i} x_i \right\| ^2 \nonumber \\{} & {} \quad = -\frac{1}{4} \min _{x_t \in \{-1, +1\}} \sum _{i,j \in [t]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} x_i x_j \nonumber \\{} & {} \quad = -\frac{1}{4} \left( \sum _{i,j \in [t-1]^2} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,j} x_i x_j {+} 2 \min _{x_t \in \{-1, +1\}}{ \sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} x_i x_t}{+} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{t,t} \right) \nonumber \\ \end{aligned}$$
(17)

By (15), (16) and (17), we have

$$\begin{aligned} M_t= & {} M_{t-1} + \frac{1}{2} \sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} - \frac{1}{2} \min _{x_t \in \{-1, +1\}}{ \sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} x_i x_t} \nonumber \\= & {} M_{t-1} + \frac{1}{2} \sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} + \frac{1}{2} \left|\sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} x_i\right| \end{aligned}$$
(18)

Define now the random variable

$$\begin{aligned} Z_t = Z_t({\textbf{x}}, {\textbf{R}}) = \sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} x_i = \sum _{\ell \in [m]} {\textbf{R}}_{\ell , t} \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i, \end{aligned}$$

where \({\textbf{x}}=(x_1, \ldots , x_n) \in \{-1,+1\}^n\) is the 2-coloring constructed by the Majority Cut Algorithm (in fact only the first \(t-1\) entries of \({\textbf{x}}\) are needed for \(Z_t\)), so that \(M_t = M_{t-1} + \frac{1}{2} \sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} + \frac{1}{2} \left|Z_t\right|\). Observe that, in the latter recursive equation, the term \(\frac{1}{2} \sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t}\) corresponds to the expected increment of the constructed cut if the t-vertex chose its color uniformly at random. Therefore, lower bounding the expectation of \(\frac{1}{2}\left|Z_t\right|\) will tell us how much better the Majority Algorithm does when considering the t-th vertex.

Towards this end, we first note that, given \({\textbf{x}}_{[t-1]} = \{x_i, i \in [t-1] \}\), and \({\textbf{R}}_{[m], [t-1]}=\{ {\textbf{R}}_{\ell , i}, \ell \in [m], i \in [t-1]\}\), \(Z_t\) is the sum of m independent random variables, since the Bernoulli random variables \({\textbf{R}}_{\ell ,t}, \ell \in [m],\) are independent, for any given t (note that the conditioning is essential for independence, otherwise the inner sums in the definition of \(Z_t\) would also depend on the \(x_i\)’s, which, for \(i \ge \epsilon n+1\), are functions of \(x_1, \ldots , x_{i-1}\), and of the entries of \({\textbf{R}}\)). Furthermore, \(\mathbb {E}[Z_t |{\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]}] = p \sum _{\ell \in [m]} \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i\) and \(\text {Var}(Z_t |{\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]}) = p(1-p) \sum _{\ell \in [m]} \left( \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i \right) ^2\). Given \({\textbf{x}}_{[t-1]}\) and \({\textbf{R}}_{[m], [t-1]}\), define the sets \(A^+_t = \{\ell \in [m]: \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i > 0\}\) and \(A^-_t = \{\ell \in [m]: \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i < 0\}\). In particular, given \({\textbf{x}}_{[t-1]} = \{x_i, i \in [t-1] \}\), and \({\textbf{R}}_{[m], [t-1]}=\{ {\textbf{R}}_{\ell , i}, \ell \in [m], i \in [t-1]\}\), \(Z_t\) can be written as

$$\begin{aligned} Z_t = \sum _{\ell \in A^+_t} {\textbf{R}}_{\ell , t} \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i - \sum _{\ell \in A^-_t} {\textbf{R}}_{\ell , t} \left|\sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i \right|, \end{aligned}$$
(19)

where \({\textbf{R}}_{\ell , t}, \ell \in A^+_t \cup A^-_t\) are independent Bernoulli random variables with success probability p.

Note that \(\mathbb {E}[|Z_t |\big |{\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]}]\) does not increase if we replace \(\sum _{\ell \in A^+_t} {\textbf{R}}_{\ell , t} \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i\) and \(\sum _{\ell \in A^-_t} {\textbf{R}}_{\ell , t} \left|\sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i \right|\) in the expression (19) for \(Z_t\) by independent binomial random variables \(Z_t^+ \sim {\mathcal B}\left( \sum _{\ell \in A^+_t} \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i, p\right) \) and \(Z_t^- \sim {\mathcal B}\left( \sum _{\ell \in A^-_t} \left|\sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i \right|, p\right) \), respectively.Footnote 1 In particular, if \(Z_t^{'+}\) and \(Z_t^{'-}\) follow the same distribution as \(Z_t^+\) and \(Z_t^-\), respectively, and \(Z_t^+, Z_t^{'+}, Z_t^-, Z_t^{'-}\) are stochastically independent, then

$$\begin{aligned} 2 \cdot \mathbb {E}\left[ |Z_t |\big |{\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]} \right]\ge & {} \mathbb {E}\left[ \left|Z_t^+ - Z_t^- \right|+ \left|Z_t^{'-} - Z_t^{'+} \right|\big |{\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]} \right] \\\ge & {} \mathbb {E}\left[ \left|\left( Z_t^+ + Z_t^{'-} \right) - \left( Z_t^- - Z_t^{'+} \right) \right|\big |{\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]} \right] \end{aligned}$$

In view of the above, if \(Z^B_t\) is a random variable which, given \({\textbf{x}}_{[t-1]} = \{x_i, i \in [t-1] \}\), and \({\textbf{R}}_{[m], [t-1]}=\{ {\textbf{R}}_{\ell , i}, \ell \in [m], i \in [t-1]\}\), follows the Binomial distribution \({\mathcal B}\left( N_t, p\right) \), where

$$\begin{aligned} N_t{\mathop {=}\limits ^{\text {def}}} \sum _{\ell \in A^+_t} \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i + \sum _{\ell \in A^-_t} \left|\sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i \right|, \end{aligned}$$
(20)

then

$$\begin{aligned} \mathbb {E}[|Z_t |\big |{\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]}] \ge \frac{1}{2} \cdot \textrm{MD}(Z^B_t), \end{aligned}$$
(21)

where \(\textrm{MD}(\cdot )\) is the mean absolute difference of (two independent copies of) \(Z^B_t\). In particular, \(\textrm{MD}(Z^B_t) = \mathbb {E}[\left|Z^B_t - Z'^B_t \right|]\), where \(Z^B_t, Z'^B_t\) are independent random variables following \({\mathcal B}\left( N_t, p\right) \). Unfortunately, we are aware of no simple closed formula for \(\textrm{MD}(Z^B_t)\), and so we resort to Gaussian approximation through the Berry-Esseen Theorem: we write \(Z^B_t = \sum _{i=1}^{N_t} Z^B_{t,i}\), \(Z'^B_t = \sum _{i=1}^{N_t} Z'^B_{t,i}\), and set \(X_i = Z^B_{t,i} - Z'^B_{t,i}\), where \(Z^B_{t,i}, Z'^B_{t,i}\) are independent Bernoulli random variables with success probability p, for any \(i \in [N_t]\). In particular, we have \(\mathbb {E}[X_i]=0\), \(\mathbb {E}[X_i^2] = \mathbb {E}[|X_i |^3] = 2p(1-p)\). Therefore, by the Berry-Esseen Theorem, given \({\textbf{x}}_{[t-1]} = \{x_i, i \in [t-1] \}\), and \({\textbf{R}}_{[m], [t-1]}=\{ {\textbf{R}}_{\ell , i}, \ell \in [m], i \in [t-1]\}\),the distribution of \(Z^B_t - Z'^B_t\) is approximately Normal \({\mathcal N}(0, 2p(1-p)N_t)\), with approximation error \(\frac{0.4748}{\sqrt{2p(1-p) N_t}}\).

Notice that the latter approximation error bound becomes o(1) if \(N_t = \Theta (n), p = \frac{c}{n}\) and c is large enough. Therefore, we next show that, with high probability over the choices of \({\textbf{R}}\), \(N_t = \Theta (n)\), for any \(t \ge \epsilon n+1\), where \(\epsilon \) is the constant used in the Majority Algorithm. In particular, even though we cannot control the variables \(x_i \in \{-1,+1\}, i \in [t-1]\), in the definition of \(N_t\), we will find a lower bound that holds with high probability, by using the random variable

$$\begin{aligned} Y_t = Y_t({\textbf{R}}, {\textbf{x}}) {\mathop {=}\limits ^{\text {def}}} \left|\left\{ \ell \in [m]: \sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} \text { is odd} \right\} \right|, \end{aligned}$$

and employing the following inequality

$$\begin{aligned} N_t \ge Y_t. \end{aligned}$$
(22)

Indeed, (22) holds because, for any \(i \in [t-1]\), if \(\sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i}\) is odd, then \(\left|\sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i \right|\ge 1\), no matter what value the \(x_i\)’s have. Therefore, \(\sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} x_i\) will contribute at least 1 to the right side of (20), and thus (22) follows.

Notice now that, for any fixed i and \(t \ge \epsilon n+1\), we have \(\Pr (\sum _{i \in [t-1]} {\textbf{R}}_{\ell ,i} \text { is odd}) = \sum _{j \text { odd}} \left( {\begin{array}{c}t-1\\ j\end{array}}\right) p^j (1-p)^{t-1-j} = \frac{1}{2} \left( 1 - (1-2p)^{t-1}\right) \ge \frac{1}{2} \left( 1 - e^{-2p(t-1)}\right) \ge \frac{1}{2} \left( 1 - e^{-2c\epsilon }\right) \), where in the last inequality we set \(p = \frac{c}{n}\). Taking \(c \rightarrow \infty \), the latter bound becomes \(\frac{1}{2} - o(1)\). Therefore, by independence of the entries of \({\textbf{R}}\), \(Y_t\) stochastically dominates a binomial random variable \({\mathcal B}(t-1, \frac{1}{3})\). Furthermore, by the multiplicative Chernoff (upper) bound, for any \(\delta >0\),

$$\begin{aligned} \Pr \left( Y_t<(1-\delta ) \frac{t-1}{3} \right) < \left( \frac{e^{-\delta }}{(1-\delta )^{1-\delta }}\right) ^{\frac{t-1}{3}}. \end{aligned}$$

Taking \(\delta = \frac{1}{2}\) and noting that \(t \ge \epsilon n +1\), we have

$$\begin{aligned} \Pr \left( Y_t<\frac{t-1}{6} \right) < \left( \frac{e}{2}\right) ^{-\frac{\epsilon n}{6}}, \end{aligned}$$

which is o(1/n), for any constant \(\epsilon >0\). By the union bound,

$$\begin{aligned} \Pr \left( \exists t: t \ge \epsilon n+1, Y_t<\frac{t-1}{6} \right) =o(1). \end{aligned}$$

By inequality (22), we thus have that, with high probability over the choices of \({\textbf{R}}\), \(N_t \ge \frac{t-1}{6} \ge \frac{\epsilon n}{6}\), for all \(t \ge \epsilon n+1\), as needed.

Combining the above, by the Berry-Esseen Theorem, given \({\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]}\), the distribution of \(Z_t^B-Z'^B_t\) is approximately Normal \({\mathcal N}(0, 2p(1-p)N_t)\) with approximation error o(1) as \(c \rightarrow \infty \), with high probability over the choices of \({\textbf{R}}\). In particular, given \({\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]}\), \(|Z_t^B-Z'^B_t |\) follows approximately (i.e. with the same approximation error o(1)) the folded normal distribution with mean value (at least) \(\sqrt{\frac{2}{\pi } \text {Var}(Z_t^B-Z'^B_t |{\textbf{x}}_{[t-1]}, {\textbf{R}}_{[m], [t-1]})} = \sqrt{\frac{4}{\pi } p(1-p) N_t}\). Since \(N_t \ge \frac{t-1}{6} \ge \frac{\epsilon n}{6}\) with high probability, and also \(p = \frac{c}{n}\), we get that \(p(1-p)N_t \ge \frac{c (t-1)}{6n} -o(1)\), with high probability, where the o(1) includes the approximation error given by the Berry-Esseen Theorem. Consequently, by inequality (21), with high probability over the choices of \({\textbf{R}}\) (which is \(1- o(1)\)),

$$\begin{aligned} \mathbb {E}\left[ |Z_t |\right] = \mathbb {E}\left[ \left|\sum _{i \in [t-1]} \left[ {\textbf{R}}^T {\textbf{R}} \right] _{i,t} x_i\right|\right] \ge \sqrt{\frac{c (t-1)}{6 \pi n}} -o(1). \end{aligned}$$

Summing over all \(t \ge \epsilon n+1\), we get

$$\begin{aligned} \sum _{t \ge \epsilon n+1} \mathbb {E}\left[ |Z_t |\right] \ge \sqrt{\frac{c}{6 \pi n}} \sum _{t \ge \epsilon n} \sqrt{t} -o(n) = \sqrt{\frac{c}{6 \pi n}} \left( \sum _{t \ge 1} \sqrt{t} - \epsilon n \sqrt{\epsilon n} \right) -o(n). \end{aligned}$$

Using the fact that \(\sum _{t \ge 1} \sqrt{t} = \frac{2}{3} n^{3/2} +o(n)\), we thus have that

$$\begin{aligned} \sum _{t \ge \epsilon n+1} \mathbb {E}\left[ |Z_t |\right] \ge \sqrt{\frac{c}{6 \pi }} \left( \frac{2}{3} - \epsilon ^{3/2}\right) n -o(n). \end{aligned}$$

On the other hand, we have that the expected weight of a random cut is equal to \(\frac{1}{4} n(n-1)mp^2 = \frac{c^2}{4}n + o(n)\) (see e.g. Eq. (11)). The proof is completed by taking \(\epsilon \rightarrow 0\). \(\square \)

It is worth noting that the dependency of the lower bound for \(\beta \) on the constant c is to be expected; indeed our results in Sect. 3.1 suggest that, when the label selection probability p becomes large enough, the weight of random cut is asymptotically optimal.

4.2 Intersection Graph (Weak) Bipartization

Notice that we can view a weighted intersection graph \(G(V, E, {\textbf{R}}^T{\textbf{R}})\) as a multigraph, composed by m (possibly) overlapping cliques corresponding to the sets of vertices having chosen a certain label, namely \(L_{\ell } = \{v: {\textbf{R}}_{\ell , v}\}, \ell \in [m]\). In particular, let \(K^{(\ell )}\) denote the clique induced by label \(\ell \). Then \(G = \cup ^+_{\ell \in [m]} K^{(\ell )}\), where \(\cup ^+\) denotes union that keeps multiple edges and also retains label information for each edge (e.g., edges within clique \(K^{(\ell )}\) are formed by label \(\ell \)). In this section, we present an algorithm that takes as input an intersection graph G given as a union of overlapping cliques and outputs a subgraph that is “almost” bipartite.

To facilitate the presentation of our algorithm, we first give some useful definitions. A closed vertex-label sequence is a sequence of alternating vertices and labels starting and ending at the same vertex, namely \(\sigma := v_1, \ell _1, v_2, \ell _2, \cdots , v_k, \ell _{k}, v_{k+1}=v_1\), where \(v_i \in V\), \(\ell _i \in {\mathcal M}\), and \(\{v_i, v_{i+1}\} \subseteq L_{\ell _i}\), for all \(i \in [k]\) (i.e. \(v_i\) is connected to \(v_{i+1}\) in the intersection graph; see Fig. 1). The size of the closed vertex-label sequence, denoted by \(|\sigma |\), is the number of its labels, i.e., \(|\sigma |=k\). We will also say that label \(\ell \) is strong if \(|L_{\ell } |\ge 3\), otherwise it is weak. For a given closed vertex-label sequence \(\sigma \), and any integer \(\lambda \in [|\sigma |]\), we will say that \(\sigma \) is \(\lambda \)-strong if \(|L_{\ell _i} |\ge 3\), for \(\lambda \) indices \(i \in [|\sigma |]\). The structural Lemma below is useful for our analysis.Footnote 2

Fig. 1
figure 1

Weighted intersection graph as a multigraph composed by 3 overlapping cliques \(K^{(l_1)} \cup ^+ K^{(l_1)} \cup ^+ K^{(l_3)}\) (left) and graph \(G^{(b)}\) constructed by the Weak Bipartization Algorithm, consisting of a closed vertex-label sequence \(\sigma ^{(\text {odd})} = x, l_2, v, l_1, u, l_3, x\) (right)

Lemma 1

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model, with \(m=n\), and \(p = \frac{c}{n}\), for some constant \(c>0\). With high probability over the choices of \({\textbf{R}}\), 0-strong closed vertex-label sequences in G do not have labels in common.

Proof

We will use the first moment method and so we need to prove that the expectation of the number of pairs of distinct 0-strong closed vertex-label sequences in G that have at least one label in common goes to 0. To this end, for \(j \in [\min (k, k')-1]\), let \(A_j(k, k')\) denote the number of such sequences \(\sigma , \sigma '\), with \(k=|\sigma |, k' = |\sigma ' |\), that have j labels in common. In particular, for integers \(k, k'\), let \( \sigma :=v_1, \ell _1, v_2, \ell _2, \cdots , v_k, \ell _{k}, v_{k+1}=v_1\), and let \(\sigma ':=v'_1, \ell '_1, v'_2, \ell '_2, \cdots , v'_{k'}, \ell '_{k'}, v'_{k'+1}=v_1\). Notice that, any such fixed pair \(\sigma , \sigma '\) has the same probability to appear, namely \(p^{2(k+k'-j)} (1-p)^{(n-2)(k+k'-j)}\); indeed, \(p^{2k} (1-p)^{(n-2)k}\) is the probability that \(\sigma \) appears (recall that \(\sigma \) has k labels and it is 0-strong, i.e. each label is only selected by two vertices) and \(p^{2(k'-j)} (1-p)^{(n-2)(k'-j)}\) is the probability that \(\sigma '\) appears given that \(\sigma \) has appeared. Furthermore, the number of such pairs of sequences is dominated by the number of sequences that overlap in j consecutive labels (e.g. the first j), which is at most \(n^k m^k n^{k'-j-1} m^{k'-j}\) (notice that j common labels implies that there are at least \(j'+1\) common vertices). Overall, since \(n=m\) and \(p = \frac{c}{n}\), we have

$$\begin{aligned} \mathbb {E}[A_j(k, k')]\le & {} (1+o(1)) \frac{1}{n} (np)^{2(k+k'-j)} (1-p)^{(n-2)(k+k'-j)} \\= & {} (1+o(1)) \frac{1}{n} \left( c^2 (1-p)^{n-2} \right) ^{k+k'-j}. \end{aligned}$$

Since \(n \rightarrow \infty \) and \(p = \frac{c}{n}\), by elementary calculus we have that \(c^2 (1-p)^{n-2}\) bounded by a constant (which depends only on c) strictly less than 1. Therefore, the above expectation is at most \(e^{-\ln {n} - \Theta (1) (k+k'-j)}\). Therefore, summing over all choices of \(k, k' \in [n]\) and \(j \in [\min (k, k')-1]\), we get that the expected number of pairs of distinct 0-strong closed vertex-label sequences that have at least one label in common is at most

$$\begin{aligned} \sum _{k, k' \in [n]} \sum _{j \in [\min (k, k')-1]} e^{-\ln {n} - \Theta (1) (k+k'-j)} = o(1), \end{aligned}$$

and the proof is completed by Markov’s inequality. \(\square \)

The following definition is essential for the presentation of our algorithm.

Definition 3

Given a weighted intersection graph \(G=G(V,E, {\textbf{R}}^T {\textbf{R}})\) and a subgraph \(G^{(b)} \subseteq G\), let \({\mathcal C}_{odd}(G^{(b)})\) be the set of odd length closed vertex-label sequences \(\sigma := v_1, \ell _1, v_2, \ell _2, \cdots , v_k, \ell _{k}, v_{k+1}=v_1\) that additionally satisfy the following:

  1. (a)

    \(\sigma \) has distinct vertices (except the first and the last) and distinct labels.

  2. (b)

    \(v_i\) is connected to \(v_{i+1}\) in \(G^{(b)}\), for all \(i \in [|\sigma |]\).

  3. (c)

    \(\sigma \) is \(\lambda \)-strong, for some \(\lambda > 0\).

Our Weak Bipartization Algorithm initially replaces each clique \(K^{(\ell )}\) by a random maximal matching \(M^{(\ell )}\), and thus gets a subgraph \(G^{(b)} \subseteq G\) (see Fig. 1). If \({\mathcal C}_{odd}(G^{(b)})\) is not empty, then the algorithm selects \(\sigma \in {\mathcal C}_{odd}(G^{(b)})\) and a strong label \(\ell \in \sigma \), and then replaces \(M^{(\ell )}\) in \(G^{(b)}\) by a new random matching of \(K^{(\ell )}\). The algorithm repeats until all odd cycles are destroyed (or runs forever trying to do so).

figure b

The following results are the main technical tools that justify the use of the Weak Bipartization Algorithm for Weighted Max Cut.

Lemma 2

If \({\mathcal C}_{odd}(G^{(b)})\) is empty, then \(G^{(b)}\) may only have 0-strong odd cycles.

Proof

For the sake of contradiction, assume \({\mathcal C}_{odd}(G^{(b)}) = \emptyset \), but \(G^{(b)} = \cup ^+_{\ell \in [m]} M^{(\ell )}\) has an odd cycle \(C_k\) that is not 0-strong and has minimum length. Notice that \(C_k\) corresponds to a closed vertex-label sequence, say \(\sigma := v_1, \ell _1, v_2, \ell _2, \cdots , v_k, \ell _{k}, v_{k+1}=v_1\), where \(\{v_i, v_{i+1}\} \in M^{(\ell _i)}\), for all \(i \in [k]\). Furthermore, by assumption, conditions (b) and (c) of Definition 3 are satisfied by \(\sigma \) (indeed \(\{v_i, v_{i+1}\} \in M^{(\ell _i)}\), for all \(i \in [k]\), and \(\sigma \) is \(\lambda \)-strong, for some \(\lambda >0\)). Therefore, the only reason for which \(\sigma \) does not belong to \({\mathcal C}_{odd}(G^{(b)})\) is that condition (a) of Definition 3 is not satisfied, i.e. there are distinct indices \(i > i' \in [k]\) such that \(\ell _i = \ell _{i'}\). Clearly, such indices are not consecutive (i.e. \(i' \ne i+1\)), because \(\ell _i\) is strong and step 6 of our algorithm implies that \(M^{(\ell _i)}\) is a matching of \(K^{(\ell _i)}\). But then either the vertex-label sequence \(v_1, \ldots , v_i, \ell _i, v_{i'+1}, \ell _{i'+1}, v_{i'+2}, \ldots , v_{k+1} = v_1\) or the vertex-label sequence \(v_{i+1}, \ell _{i+1}, v_{i+2}, \ldots , v_{i'}, \ell _{i}, v_{i+1}\) corresponds to a shorter odd cycle, which is a contradiction on the minimality of \(C_k\). \(\square \)

Theorem 5

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model, with \(n=m\) and \(p = \frac{c}{n}\), where \(c>0\) is a constant, and let \({\textbf{R}}\) be its representation matrix. Let also \(\Sigma \) be a set system with incidence matrix \({\textbf{R}}\). With high probability over the choices of \({\textbf{R}}\), if the Weak Bipartization Algorithm terminates on input G, its output can be used to construct a 2-coloring \({\textbf{x}}^{(\text {disc})} \in \arg \min _{{\textbf{x}} \in \{\pm 1\}^n} \text {disc}(\Sigma , {\textbf{x}})\), which also gives a maximum cut in G, i.e. \({\textbf{x}}^{(\text {disc})} \in \arg \max _{{\textbf{x}} \in \{\pm 1\}^n} \text {Cut}(G, {\textbf{x}})\).

Proof

By construction, the output of the Weak Bipartization Algorithm, namely \(G^{(b)}\), has only 0-strong odd cycles. Furthermore, by Lemma 1 these cycles correspond to vertex-label sequencies that are label-disjoint. Let H denote the subgraph of \(G^{(b)}\) in which we have destroyed all 0-strong odd cycles by deleting a single (arbitrary) edge \(e_C\) from each 0-strong odd cycle C (keeping all other edges intact), and notice that \(e_C\) corresponds to a weak label. In particular, H is a bipartite multi-graph and thus its vertices can be partitioned into two independent sets AB constructed as follows: In each connected component of H, start with an arbitrary vertex v and include in A (resp. in B) the set of vertices reachable from v that are at an even (resp. odd) distance from v. Since H is bipartite, it does not have odd cycles, and thus this construction is well-defined, i.e. no vertex can be placed in both A and B.

We now define \({\textbf{x}}^{(disc)}\) by setting \(x^{(disc)}_i = +1\) if \(i \in A\) and \(x^{(disc)}_i = +1\) if \(i \in B\). Let \({\mathcal M}_0\) denote the set of weak labels corresponding to the edges removed from \(G^{(b)}\) in the construction of H. We first note that, for each \(\ell _C \in {\mathcal M}_0\) corresponding to the removal of an edge \(e_C\), we have \(\left|\sum _{i \in L_{\ell _C}} x^{(disc)}_i \right|=2\). Indeed, since \(e_C\) belongs to an odd cycle in \(G^{(b)}\), its endpoints are at even distance in H, which means that either they both belong to A or they both belong to B. Therefore, their corresponding entries of \({\textbf{x}}^{(disc)}\) have the same sign, and so (taking into account that the endpoints of \(e_C\) are the only vertices in \(L_{\ell _C}\)), we have \(\left|\sum _{i \in L_{\ell _C}} x^{(disc)}_i \right|=2\). Second, we show that, for all the other labels \(\ell \in [m] \backslash {\mathcal M}_0\), \(\left|\sum _{i \in L_{\ell }} x^{(disc)}_i \right|\) will be equal to 1 if \(|L_{\ell } |\) is odd and 0 otherwise. For any label \(\ell \in [m] \backslash {\mathcal M}_0\), let \(M^{(\ell )}\) denote the part of \(G^{(b)}\) corresponding to a maximal matching of \(K^{(\ell )}\), and note that all edges of \(M^{(\ell )}\) are contained in H. Since H is bipartite, no edge in \(M^{(\ell )}\) can have both its endpoints in either A or B. Therefore, by construction, the contribution of entries of \({\textbf{x}}^{(disc)}\) corresponding to endpoints of edges in \(M^{(\ell )}\) to the sum \(\sum _{i \in L_{\ell }} x^{(disc)}_i\) is 0. In particular, if \(|L_{\ell } |\) is even, then \(M^{(\ell )}\) is a perfect matching and \(\left|\sum _{i \in L_{\ell }} x^{(disc)}_i \right|= 0\), otherwise (i.e. if \(|L_{\ell } |\) is odd) there is a single vertex not matched in \(M^{(\ell )}\) and \(\left|\sum _{i \in L_{\ell }} x^{(disc)}_i \right|= 1\).

To complete the proof of the theorem, we need to show that \(\text {Cut}(G, {\textbf{x}}^{(disc)})\) is maximum. By Proposition 1, this is equivalent to proving that \(\Vert {\textbf{R}} {\textbf{x}}^{(disc)}\Vert \le \Vert {\textbf{R}} {\textbf{x}}\Vert \) for all \({\textbf{x}} \in \{-1,+1\}^n\). Suppose that there is some \({\textbf{x}}^{(min)} \in \{-1,+1\}^n\) such that \(\Vert {\textbf{R}} {\textbf{x}}^{(disc)}\Vert > \Vert {\textbf{R}} {\textbf{x}}^{(min)}\Vert \). As mentioned above, for all \(\ell \in [m] \backslash {\mathcal M}_0\), we have \([{\textbf{R}} {\textbf{x}}^{(disc)}]_{\ell } \le 1\), and so \([{\textbf{R}} {\textbf{x}}^{(disc)}]_{\ell } \le [{\textbf{R}} {\textbf{x}}^{(min)}]_{\ell }\). Therefore, the only labels where \({\textbf{x}}^{(min)}\) could do better are those corresponding to edges \(e_C\) that are removed from \(G^{(b)}\) in the construction of H, i.e. \(\ell _C \in {\mathcal M}_0\), for which we have \([{\textbf{R}} {\textbf{x}}^{(disc)}]_{\ell _C} =2\). However, any such edge \(e_C\) belongs to an odd cycle C, and thus any 2-coloring of the vertices of C will force at least one of the 0-strong labels corresponding to edges of C to be monochromatic. Taking into account the fact that, by Lemma 1, with high probability over the choices of \({\textbf{R}}\), all 0-strong odd cycles correspond to vertex-label sequences that are label-disjoint, we conclude that \(\Vert {\textbf{R}} {\textbf{x}}^{(disc)}\Vert \le \Vert {\textbf{R}} {\textbf{x}}^{(min)}\Vert \), which completes the proof. \(\square \)

The fact that Theorem 5 is not an immediate consequence of Corollary 1 follows from the observation that a random set system with incidence matrix \({\textbf{R}}\) has discrepancy larger than 1 with (at least) constant probability when \(m=n\) and \(p = \frac{c}{n}\). Indeed, by a straightforward counting argument, we can see that the expected number of 0-strong odd cycles is at least constant. Furthermore, in any 2-coloring of the vertices at least one of the weak labels forming edges in a 0-strong odd cycle will be monochromatic. Therefore, with at least constant probability, for any \({\textbf{x}} \in \{-1,+1\}^n\), there exists a weak label \(\ell \), such that \(x_i x_j=1\), for both \(i, j \in L_{\ell }\), implying that \(\text {disc}(L_{\ell })=2\).

We close this section by a result indicating that the conditional statement of Theorem 5 is not void, namely there is a range of values for c where the Weak Bipartization Algorithm terminates in polynomial time.

Theorem 6

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model, with \(n=m\) and \(p = \frac{c}{n}\), where \(0<c<1\) is a constant, and let \({\textbf{R}}\) be its representation matrix. With high probability over the choices of \({\textbf{R}}\), the Weak Bipartization Algorithm terminates on input G in \(O\left( (n+\sum _{\ell \in [m]} |L_{\ell } |) \cdot \log {n} \right) \) polynomial time.

Before presenting the proof of the Theorem, we first prove the following structural Lemma regarding the expected number of closed vertex label sequences.

Lemma 3

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model. Let also \(C_k\) denote the number of distinct closed vertex-label sequences of size k in G. Then

$$\begin{aligned} \mathbb {E}[C_k] = \frac{1}{k} \frac{n!}{(n-k)!} \frac{m!}{(m-k)!} p^{2k}. \end{aligned}$$
(23)

In particular, when \(m=n \rightarrow \infty \), \(p = \frac{c}{n}, c>0\), and \(k \ge 3\), we have \(\mathbb {E}[C_k] \le \frac{e}{2\pi } c^{2k}\).

Proof

Notice that there are \(\frac{1}{k} \frac{n!}{(n-k)!}\) ways to arrange k out of n vertices in a cycle. Furthermore, in each such arrangement, there are \(\frac{m!}{(m-k)!}\) ways to place k out of m labels so that there is exactly one label between each pair of vertices. Since labels in any given arrangement must be selected by both its adjacent vertices, (23) follows by linearity of expectation.

Setting \(m=n\) and \(p = \frac{c}{n}\), and using the inequalities \(\sqrt{2 \pi } n^{n+\frac{1}{2}}e^{-n} \le n! \le e n^{n+\frac{1}{2}}e^{-n}\),

$$\begin{aligned} \mathbb {E}[C_k]= & {} \frac{1}{k} \left( \frac{n!}{(n-k)!} \right) ^2 \left( \frac{c}{n}\right) ^{2k} \\\le & {} \frac{1}{k} \frac{e^2 n^{2n+1} e^{-2n}}{2\pi (n-k)^{2n-2k+1} e^{2k-2n}} \left( \frac{c}{n}\right) ^{2k} = \frac{1}{k} \frac{e^2}{2\pi } \left( \frac{n}{n-k}\right) ^{2n-2k+1} \left( \frac{c}{e} \right) ^{2k} \\\le & {} \frac{e^2}{2\pi } \frac{n}{k (n-k)} e^{\frac{k}{n-k} (2n-2k)} \left( \frac{c}{e} \right) ^{2k} = \frac{e^2}{2\pi } \frac{n}{k (n-k)} c^{2k}. \end{aligned}$$

When n goes to \(\infty \) and \(k \ge 3\), then the above is at most \(\frac{e}{2\pi } c^{2k}\) as needed. \(\square \)

We are now ready for the proof of the Theorem.

Proof of Theorem 6

We will prove that, when \(m=n \rightarrow \infty \), \(p = \frac{c}{n}, c<1\), and \(k \ge 3\), with high probability, there are no closed vertex-label sequences that have labels in common. To this end, recalling Definition 3 for \({\mathcal C}_{odd}(G^{(b)})\), we provide upper bounds on the following events: \(A {\mathop {=}\limits ^{\text {def}}} \{\exists k \ge \log {n}: C_k \ge 1\}\), \(B {\mathop {=}\limits ^{\text {def}}} \{|{\mathcal C}_{odd}(G^{(b)})|\ge \log {n}\}\) and \(C {\mathop {=}\limits ^{\text {def}}} \{\exists \sigma \ne \sigma ' \in {\mathcal C}_{odd}(G^{(b)}): \exists \ell \in \sigma , \ell \in \sigma '\}\).

By the union bound, Markov’s inequality and Lemma 3, we get that, with high probability, all closed vertex-label sequences have less than \(\log {n}\) labels:

$$\begin{aligned} \Pr \left( A \right) \le \sum _{k \ge \log {n}} \mathbb {E}[C_k] \le \sum _{k \ge \log {n}} \frac{e}{2\pi } c^{2k} = \frac{e}{2 \pi } \frac{c^{2 \log {n}}}{1-c^2} = O\left( c^{2 \log {n}} \right) = o(1), \end{aligned}$$

where the last equality follows since \(c<1\) is a constant. Furthermore, by Markov’s inequality and Lemma 3, and noting that any closed vertex-label sequence in \({\mathcal C}_{odd}(G^{(b)})\) must have at least \(k \ge 3\) labels, we get that, with high probability, there are less than \(\log {n}\) closed vertex-label sequences in \({\mathcal C}_{odd}(G^{(b)})\):

$$\begin{aligned} \Pr \left( B \right)\le & {} \frac{1}{\log {n}} \sum _{k \ge 3} \mathbb {E}[C_k] \le \frac{1}{\log {n}} \sum _{k \ge 3} \frac{e}{2\pi } c^{2k} \nonumber \\= & {} \frac{1}{\log {n}} \frac{e}{2 \pi } \frac{c^{6}}{1-c^2} = O\left( \frac{1}{\log {n}} \right) . \end{aligned}$$
(24)

To bound \(\Pr (C)\), fix a closed vertex-label sequence \(\sigma \), and let \(|\sigma |\ge 3\) be the number of its labels. Notice that, the probability that there is another closed vertex-label sequence that has labels in common with \(\sigma \) implies the existence of a vertex-label sequence \(\breve{\sigma }\) that starts with either a vertex or a label from \(\sigma \), ends with either a vertex or a label from \(\sigma \), and has at least one label or at least one vertex that does not belong to \(\sigma \). Let \(|\breve{\sigma }|\) denote the number of labels of \(\breve{\sigma }\) that do not belong to \(\sigma \). Then the number of different vertex-label sequences \(\breve{\sigma }\) that start and end in labels from \(\sigma \) is at most \(|\sigma |^2 n^{|\breve{\sigma }|+1} m^{|\breve{\sigma }|}\); indeed \(\breve{\sigma }\) in this case has \(|\breve{\sigma }|\) labels and \(|\breve{\sigma }|+1\) vertices that do not belong to \(\sigma \). Therefore, by independence, each such sequence \(\breve{\sigma }\) has probability \(p^{2|\breve{\sigma }|+2}\) to appear. Similarly, the number of different vertex-label sequences \(\breve{\sigma }\) that start and end in vertices from \(\sigma \) is at most \(|\sigma |^2 n^{|\breve{\sigma }|-1} m^{|\breve{\sigma }|}\) and each one has probability \(p^{2|\breve{\sigma }|}\) to appear. Finally, the number of different vertex-label sequences \(\breve{\sigma }\) that start in a vertex from \(\sigma \) and end in a label from \(\sigma \) (notice that this also covers the case where \(\breve{\sigma }\) starts in a label from \(\sigma \) and ends in a vertex from \(\sigma \)) is at most \(|\sigma |^2 n^{|\breve{\sigma }|} m^{|\breve{\sigma }|}\) and each one has probability \(p^{2|\breve{\sigma }|+1}\) to appear. Overall, for a given sequence \(\sigma \), the expected number of sequences \(\breve{\sigma }\) described above that additionally satisfies \(|\breve{\sigma }|< \log {n}\), is at most

$$\begin{aligned}{} & {} \sum _{k=0}^{\log {n}-1} |\sigma |^2 n^{k+1} m^{k} p^{2k+2} + \sum _{k=1}^{\log {n}-1} |\sigma |^2 n^{k-1} m^{k} p^{2k} + \sum _{k=1}^{\log {n}-1} |\sigma |^2 n^{k} m^{k} p^{2k+1}\nonumber \\{} & {} \quad \le c |\sigma |^2 \frac{\log {n}}{n}, \end{aligned}$$
(25)

where in the last inequality we used the fact that \(m=n, p = \frac{c}{n}\) and \(c<1\). Since the existence of a sequence \(\breve{\sigma }\) for \(\sigma \) that additionally satisfies \(|\breve{\sigma }|\ge \log {n}\) implies event A, and on other hand the existence of more than \(\log {n}\) different sequences \(\sigma \in |{\mathcal C}_{odd}(G^{(b)})|\) implies event B, by Markov’s inequality and (25), we get

$$\begin{aligned} \Pr (C)\le & {} \Pr (A) + \Pr (B) + c \frac{(\log {n})^4}{n} \nonumber \\= & {} O\left( c^{2 \log {n}} \right) + O\left( \frac{1}{\log {n}} \right) + O\left( \frac{(\log {n})^4}{n} \right) = O\left( \frac{1}{\log {n}} \right) . \end{aligned}$$
(26)

We have thus proved that, with high probability over the choices of \({\textbf{R}}\), closed vertex-label sequences in \({\mathcal C}_{odd}(G^{(b)})\) are label disjoint, as needed.

In view of this, the proof of the Theorem follows by noting that, since closed vertex label sequences in \({\mathcal C}_{odd}(G^{(b)})\) are label disjoint, steps 5 and 6 within the while loop of the Weak Bipartization Algorithm will be executed exactly once for each sequence in \({\mathcal C}_{odd}(G^{(b)})\), where \(G^{(b)}\) is defined in step 3 of the algorithm; indeed, once a closed vertex label sequence \(\sigma \in {\mathcal C}_{odd}(G^{(b)})\) is destroyed in step 6, no new closed vertex label sequence is created. In fact, once \(\sigma \) is destroyed we can remove the corresponding labels and edges from \(G^{(b)}\), as these will no longer belong to other closed vertex label sequences. Furthermore, to find a closed vertex label sequences in \({\mathcal C}_{odd}(G^{(b)})\), it suffices to find an odd cycle in \(G^{(b)}\), which can be done by running DFS, requiring \(O(n+\sum _{\ell \in [m]} |L_{\ell }|)\) time, because \(G^{(b)}\) has at most \(\sum _{\ell \in [m]} |L_{\ell }|\) edges. Finally, by (24), we have \(|{\mathcal C}_{odd}(G^{(b)})|< \log {n}\) with high probability, and so the running time of the Weak Bipartization Algorithm is \(O((n+\sum _{\ell \in [m]} |L_{\ell }|) \log {n})\), which concludes the proof of Theorem 6.

5 Discussion and Some Open Problems

In this paper, we introduced the model of weighted random intersection graphs and we studied the average case analysis of Weighted Max Cut through the prism of discrepancy of random set systems. In particular, in the first part of the paper, we proved concentration of the weight of a maximum cut of \(G(V, E, {\textbf{R}}^T {\textbf{R}})\) around its expected value, and we used it to show that, with high probability, the weight of a random cut is asymptotically equal to the maximum cut weight of the input graph, when \(m = n^{\alpha }, \alpha <1\). On the other hand, in the case where the number of labels is equal to the number of vertices (i.e. \(m=n\)), we proved that a majority algorithm gives a cut with weight that is larger than the weight of a random cut by at least a constant factor, when \(p = \frac{c}{n}\) and c is large.

In the second part of the paper, we highlighted a connection between Weighted Max Cut of sparse weighted random intersection graphs and Discrepancy of sparse random set systems, formalized through our Weak Bipartization Algorithm and its analysis. We demonstrated how our proposed framework can be used to find optimal solutions for these problems, with high probability, in special cases of sparse inputs (\(m=n, p=\frac{c}{n}, c<1\)).

One of the main problems left open in our work concerns the termination of our Weak Bipartization Algorithm for large values of c. We conjecture the following:

Conjecture 1

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model, with \(m=n\), and \(p = \frac{c}{n}\), for some constant \(c \ge 1\). With high probability over the choices of \({\textbf{R}}\), on input G, the Weak Bipartization Algorithm terminates in polynomial time.

We also leave the problem of determining whether the Weak Bipartization Algorithm terminates in polynomial time, in the case \(m=n\) and \(p = \omega (1/n)\), as an open question for future research.

Towards strengthening the connection between Weighted Max Cut under the \(\overline{{\mathcal G}}_{n, m, p}\) model, and Discrepancy in random set systems, we conjecture the following:

Conjecture 2

Let \(G(V,E, {\textbf{R}}^T {\textbf{R}})\) be a random instance of the \(\overline{{\mathcal G}}_{n, m, p}\) model, with \(m=n, p = \frac{c}{n}\), for some positive constant c, and let \({\textbf{R}}\) be its representation matrix. Let also \(\Sigma \) be a set system with incidence matrix \({\textbf{R}}\). Then, with high probability over the choices of \({\textbf{R}}\), there exists \({\textbf{x}}^{\text {disc}} \in \arg \min _{{\textbf{x}} \in \{-1, +1\}^n} \text {disc}(\Sigma , {\textbf{x}})\), such that \( \texttt {Cut}(G, {\textbf{x}}^{\text {disc}})\) is asymptotically equal to \(\texttt {Max-Cut}(G)\).