Online Clique Clustering

Clique clustering is the problem of partitioning the vertices of a graph into disjoint clusters, where each cluster forms a clique in the graph, while optimizing some objective function. In online clustering, the input graph is given one vertex at a time, and any vertices that have previously been clustered together are not allowed to be separated. The goal is to maintain a clustering with an objective value close to the optimal solution. For the variant where we want to maximize the number of edges in the clusters, we propose an online algorithm based on the doubling technique. It has an asymptotic competitive ratio at most 15.646 and a strict competitive ratio at most 22.641. We also show that no deterministic algorithm can have an asymptotic competitive ratio better than 6. For the variant where we want to minimize the number of edges between clusters, we show that the deterministic competitive ratio of the problem is n-ω(1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n-\omega (1)$$\end{document}, where n is the number of vertices in the graph.


Introduction
The correlation clustering problem and its different variants have been extensively studied over the past decades; see e.g. [1,5,11]. The instance of correlation clustering consists of a graph whose vertices represent some objects and edges represent their similarity. The objective is to find a partitioning of the graph into disjoint subsets called clusters that is optimal, or at least near-optimal, with respect to some objective function. Several objective functions are used in the literature, e.g., maximizing the number of edges within the clusters plus the number of non-edges between clusters (maximizing agreements), or minimizing the number of non-edges inside the clusters plus the number of edges outside them (minimizing disagreements). Unlike more conventional approaches to clustering, typically involving some parameter that controls the number or the size of clusters, correlation clustering is parameter-free-the structure of the computed clustering conforms naturally to the similarity function. Bansal et al. [1] show that both the minimization of disagreement edges and the maximization of agreement edges versions are NP-hard. However, from the point of view of approximation the two versions differ. In the case of maximizing agreements, this problem admits a PTAS, whereas in the case of minimizing disagreements it is APX-hard. Several efficient constant factor approximation algorithms are proposed for minimizing disagreements [1,5,11] and maximizing agreements [5].
Another approach to developing parameter-free clustering models is by imposing restrictions on the structure of clusters. We study the variant, called clique clustering, where the clusters are required to form disjoint cliques in the underlying graph G = (V , E). Here, we can maximize the number of edges inside the clusters or minimize the number of edges outside the clusters. These measures give rise to the maximum and minimum clique clustering problems, respectively. The computational complexity and approximability of these problems have attracted attention recently [12,15,18], and they have several applications within the areas of gene expression profiling and rDNA clone classification [2,14,[18][19][20]. In the context of rDNA clone classification, for example, a collection of unknown rDNA clones from some environment (bacteria from gut or fungi from soil) are subjected to a sequence of hybridization experiments with appropriately designed primers (short DNA sequences). The signals from these experiments produce a fingerprint vector associated with each clone. Very similar fingerprint vectors typically represent rDNA clones of closely related organisms, so clustering of such fingerprint vectors allows one to estimate the level of diversity of a bacterial or fungal community and help with their taxonomic classification. The use of parameter-free clustering is necessitated by the lack of prior information about the environments from which the samples are taken, and clustering into cliquesrather than correlation clustering-reduces the likelihood of "false positives", namely classifying unrelated organisms into one category.
In this paper, we focus on the online variant of clique clustering, where the input graph G is not known in advance. The vertices of G arrive one at a time. Let v t denote the vertex that arrives at time t, for t = 1, 2, . . .. When v t arrives, its edges to all preceding vertices v 1 , . . . , v t−1 are revealed as well. In other words, after step t, the subgraph of G induced by v 1 , v 2 , . . . , v t is known, but no other information about G is available. In fact, we assume that even the number n of vertices is not known upfront.
Our objective is to design an online algorithm, namely one that constructs a clustering incrementally, step by step, based on the information acquired up to the current step. Specifically, when v t arrives at step t, the algorithm first creates a singleton clique {v t }. Then it is allowed to merge any number of cliques (possibly none) in its current partitioning into larger cliques. No other modifications of the clustering are allowed. The merge operation in this online setting is irreversible; once vertices are clustered together, they will remain so, and hence, a bad decision may have significant impact on the final solution. This online model was proposed by Charikar et al. [4].
With only limited information about the input sequence and the restrictions on allowed operations, an online clique clustering algorithm cannot be guaranteed to always compute an optimal solution. This is a common feature of most online problems, where information about the input appears gradually over time, and the online algorithm is required to build its solution incrementally, at all times maintaining a valid solution to the already revealed input sequence. As is common in the area of online algorithms, we will measure the performance of an online algorithm by its competitive ratio, which represents the ratio between the optimal solution and the solution produced by the algorithm. We distinguish between the strict competitive ratio, which is the worst-case such ratio over all possible inputs, and the asymptotic competitive ratio, which is (roughly) the limit of such ratios when the optimum value grows to infinity. We define these concepts formally in Sect. 2. We refer the reader to [3] for more background on online problems and competitive analysis.
We emphasize that we place no limits on the computational power of our algorithms. This approach allows us to focus specifically on the limits posed by the lack of complete information about the input. Similar settings have been studied in previous work on online computation, for example for online medians [8,9,16], minimum-latency tours [6], and several other online optimization problems [10], where algorithms with unlimited computational power were studied.
Our approach to online clustering is closely related to that of Mathieu et al. [17], who studiy online correlation clustering. Also in their version, vertices arrive one at a time and clusters need to be built incrementally. They prove that for minimizing disagreements the optimal competitive ratio is (n), and that it is achieved by a simple greedy algorithm. For maximizing agreements they show that the greedy algorithm is 2-competitive, that a slightly smaller competitive ratio can be achieved with a more sophisticated algorithm, but that no online algorithm can have ratio smaller than 1.199. Interestingly, while the values of the ratios are significantly different, these results parallel those for clique clustering presented in this paper, see below.
Our results. We investigate the online clique clustering problem and provide upper and lower bounds for the competitive ratios for its maximization and minimization versions, that we denote MaxCC and MinCC, respectively. Section 3 is devoted to the study of MaxCC. We first observe that the competitive ratio of the natural greedy algorithm is linear in n. We then give a constant competitive algorithm for MaxCC, with asymptotic competitive ratio at most 15.646 and strict competitive ratio at most 22.641. The algorithm is based on the doubling technique often used in online algorithms. We show that the doubling approach cannot give a competitive ratio smaller than 10.927. We also give a general lower bound, proving that there is no online algorithm for MaxCC with competitive ratio smaller than 6. Both these lower bounds apply also to asymptotic ratios.
In Sect. 4 we study online algorithms for MinCC. We prove that no online algorithm can have a competitive ratio of n − ω (1). We then show that the competitive ratio of the greedy algorithm is n − 2, matching this lower bound.

Preliminaries
We begin with some notation and basic definitions of the MaxCC and MinCC clustering problems. They are defined on an input graph G = (V , E), with vertex set V and edge set E. We wish to find a partitioning of the vertices in V into clusters so that each cluster induces a clique in G. In addition, we want to optimize some objective function associated with the clustering. In the MaxCC case this objective function is to maximize the total number of edges inside the clusters, whereas in the MinCC case we want to minimize the number of edges outside the clusters.
We adapt the incremental model that was proposed by Charikar et al. [4] and Mathieu et al. [17] for the online correlation clustering problem. Throughout the paper we will implicitly assume that any graph G has its vertices ordered v 1 , v 2 , . . . , v n . These vertices arrive one at a time; that is, at step t vertex v t and all edges between v t and the previous vertices v 1 , v 2 , . . . , v t−1 are revealed. At each step the arriving vertex is placed into a new singleton cluster. The resulting clustering can be then updated using any number of merge operations, where merge(C, C ) merges two existing clusters C, C into one, provided that the resulting cluster induces a clique in G. This means that once two vertices are clustered together, they cannot be later separated.
For MaxCC, we define the profit of a clustering C = {C 1 , . . . , C k } of a given graph G = (V , E) to be the total number of edges in its cliques, that is k i=1 . Similarly, for MinCC, we define the cost of C to be the total number of edges outside its cliques, that is For a graph G, we denote the optimal profit or cost for MaxCC and MinCC, respectively, by profit OPT (G) and cost OPT (G).
As mentioned earlier, we will measure the performance of an online algorithm by its competitive ratio. This ratio is defined as the worst case ratio between the profit/cost of the online algorithm and the profit/cost of an offline optimal algorithm, one that knows the complete input sequence in advance. More formally, for an online algorithm S, we define profit S (G) to be the profit of S when the input graph is G = (V , E) and, similarly, let cost S (G) def = |E| − profit S (G) be the cost of S on G. We say that an online algorithm S is R-competitive for MaxCC if there is a constant β such that, for any input graph G, we have Similarly S is R-competitive for MinCC if there is a constant β such that, for any input graph G, we have The reason for defining the competitive ratio differently for maximization and minimization problems is to have all ratios being at least 1. The smallest R for which an online algorithm S is R-competitive is called the (asymptotic) competitive ratio of S. The smallest R for which S is R-competitive with β = 0 is called the strict competitive ratio of S. (If it so happens that these minimum values do not exist, in both cases the competitive ratio is actually defined by the corresponding infimum.) Note that an online algorithm does not know when the last vertex arrives and, as a consequence, in order to be R-competitive, it needs to ensure that the corresponding bound, (1) or (2), is valid after each step. To be more precise, for any given step t, inequalities (1) and (2) need to hold for the graph G = G t induced by vertices v 1 , v 2 , . . . , v t . We stress here that we place no restrictions on the optimal solution (in particular, it does not need to be incremental); that is, for any such G t , the values of profit OPT (G t ) and cost OPT (G t ) are simply offline optimal solutions computed for the input graph G = G t .

Online Maximum Clique Clustering
In this section, we study online MaxCC, the clique clustering problem where the objective is to maximize the number of edges within the cliques. The main results here are upper and lower bounds for the competitive ratio. For the upper bound, we give an algorithm that uses a doubling technique to achieve a competitive ratio of at most 15.646. For the lower bound, we show that no online algorithm has a competitive ratio smaller than 6. Additional results include a competitive analysis of the greedy algorithm and a lower bound for doubling based algorithms.

The Greedy Algorithm for Online MAXCC
Greedy, the greedy algorithm for MaxCC, merges each input vertex with the largest current cluster that maintains the clique property. This maximizes the increase in profit at this step. If no such merging is possible the vertex remains in its singleton cluster. Greedy algorithms are commonly used as heuristics for a variety of online problems and in some cases they produce near-optimal solutions. We show that for MaxCC the solution of Greedy can be far from optimal.
We start with an observation that examines the ratio of Greedy for small values of n; and we follow it with lower and upper bounds for arbitrary values of n.  Proof We first give the proof for the strict ratio, and then extend it to the asymptotic ratio. By Observation 1, for n = 2, 3, the ratio is 1 and the theorem holds. So we can assume that n ≥ 4. Consider an adversary that provides input to the algorithm to make it behave as badly as possible. Our adversary creates an instance with n vertices, numbered from 1 to n. The odd vertices are connected to form a clique, and similarly the even vertices are connected to form a clique. In addition each vertex of the form 2i, for i = 1, . . . , (n − 1)/2 , is connected to vertex 2i − 1; see Fig. 2.
Greedy clusters the vertices as odd/even pairs, leaving the vertex 2k − 1 as a singleton if n = 2k − 1 is odd, and leaving both vertices 2k − 1 and 2k as singletons if n = 2k is even. This generates a clustering of profit profit GDY (G) = k − 1. An optimal algorithm clusters the odd vertices in one clique of size k and the even vertices in another clique of size k − 1 or k, depending on whether n is odd or even. The profit for the optimal solution is profit OPT (G) = (k − 1) 2 if n is odd, and profit OPT (G) = k(k − 1) if n is even. Hence, the ratio between the optimum and the greedy solution is k − 1 = (n − 1)/2 = n/2 if n is odd, and k = n/2 = n/2 if n is even; therefore the worst case strict competitive ratio of the greedy algorithm is at least n/2 .
To obtain the same lower bound on the asymptotic ratio, it suffices to notice that, if we follow the above adversary algorithm, then for any R < n/2 and any constant β > 0, we can find sufficiently large n for which inequality (1) will be false.
Next, we look at the upper bound for the greedy algorithm.
Theorem 2 For all n ≥ 2, Greedy's strict competitive ratio for MaxCC is at most n/2 .
Proof By Observation 1, for n = 2, 3, the ratio is 1 and the theorem holds. So in the rest of the proof we can assume that n ≥ 4.
Fix an optimal clustering of G that we denote OPT(G). Assume this clustering consists of p non-singleton clusters of sizes c 1 , . . . , c p . The profit of OPT(G) is . Let k = max i c i be the size of the maximum cluster of OPT(G).
Case 1: k ≤ n/2 . In this case, we can distribute the profit of each cluster of Greedy equally among the participating vertices; that is, if a vertex belongs to a Greedy cluster of size c, it will be assigned a profit of 1 2 (c − 1). We refer to this quantity as charged profit. We now note that at most one vertex in each cluster of OPT(G) can be a singleton cluster in Greedy's clustering, since otherwise Greedy would cluster any two such vertices together. This gives us that each vertex in a non-singleton cluster of OPT(G), except possibly for one, has charged profit at least 1 2 . So the total profit charged to the vertices of an OPT(G) cluster of size c i is at least 1 2 (c i − 1). Therefore the profit ratio for this clique of OPT(G), namely the ratio between its optimal profit and Greedy's charged profit, is at most From this bound and the case assumption, all cliques of OPT(G) have profit ratio at most k ≤ n/2 , so the competitive ratio is also at most n/2 . Case 2: k ≥ n/2 + 1. In this case there is a unique cluster Q in OPT(G) of size k. The optimum profit is maximized if the graph has one other clique of size n − k, so We now consider two sub-cases.

Case 2.1:
Greedy's profit is at least k. In this case, using (3) and k ≥ n/2 + 1 ≥ 1 2 (n + 1), the competitive ratio is at most where the first inequality holds because

Case 2.2:
Greedy's profit is at most k − 1. We show that in this case the profit of Greedy is in fact equal to k − 1, and that Greedy's clustering has a special form.
To prove this claim, consider those clusters of Greedy that intersect Q. For i ≥ 1 and j ≥ 0, let d i j be the number of these clusters that have i vertices in Q and j outside Q. Note that at most one cluster of Greedy can be wholly contained in Q, as otherwise Greedy would merge such clusters. Denote by α the size of this cluster of Greedy contained in Q (if it exists; if not, let α = 0). Let also β = d 11 and The last inequality holds because, for integer values of α, the expression α(α − 3) is minimized for α ∈ {1, 2}. Combined with the case assumption that Greedy's profit is at most k − 1, we can now conclude that Greedy's profit is indeed equal to k − 1 and, in addition, we have that γ = 0 and α ∈ {1, 2}.
So, for α = 1, Greedy's clustering consists of k−1 disjoint edges, each with exactly one endpoint in Q, plus a singleton vertex in Q. Thus n ≥ 2k − 1. As k ≥ n/2 + 1, this is possible only when n = 2k − 1. By (3), the optimal profit in this case is at most (k − 1) 2 , so the ratio is at most k − 1 = n/2 .
For α = 2, Greedy's clustering consists of k − 1 edges, of which one is contained in Q and the remaining ones have exactly one endpoint in Q. So n ≥ 2k − 2. If n is odd, this and the bound k ≥ n/2 + 1 would force n = 2k − 1, in which case the argument from the paragraph above applies. On the other hand, if n is even, then these bounds will force n = 2k − 2. Then, by (3), the optimal profit is k 2 − 3k + 3, so the competitive ratio is at most (k 2 −3k +3)/(k −1) = k −2+1/(k −1) ≤ k −1 = n/2 , for k ≥ 2, concluding the proof.

A Constant Competitive Algorithm for MAXCC
In this section, we give our competitive online algorithm OCC. Roughly, the algorithm works in phases. In each phase we consider the "batch" of nodes that have not yet been clustered with other nodes, compute an optimal clustering for this batch, and add these new clusters to the algorithm's clustering. The phases are defined so that the profit for consecutive phases increases exponentially.
The overall idea can be thought of as an application of the "doubling" algorithm (see [10], for example), but in our case a subtle modification is required. Unlike other doubling approaches, the phases are not completely independent in our algorithm: the clustering computed in each phase, in addition to the new vertices, needs to include the singleton vertices from earlier phases as well. This is needed, because in our objective function singleton clusters do not bring any profit.
We remark that one could alternatively consider using profit value k 2 /2 for a clique of size k, which is a very close approximation to our function if k is large. This would lead to a simpler algorithm and much simpler analysis. However, this function is a bad approximation when the clustering involves many small cliques. This is, in fact, the most challenging scenario in the analysis of our algorithm, and instances with this property are also used in the lower bound proof in Sect. 3.4.

Algorithm OCC
We now describe our algorithm. Fix some constant parameter γ > 1. The algorithm works in phases, starting with phase j = 0. At any moment the clustering maintained by the algorithm contains a set U of singleton clusters. During phase j, each arriving vertex is added into U . As soon as there is a clustering of U of profit at least γ j , the algorithm clusters U according to this clustering and adds these new (non-singleton) clusters to its current clustering. The vertices that still form singleton clusters remain in U and then phase j + 1 starts.
Note that phase 0 ends as soon as one edge is revealed, since then it is possible for OCC to create a clustering with γ 0 = 1 edge. The last phase may not be complete; as a result all nodes released in this phase will be clustered as singletons. Observe also that the algorithm never merges non-singleton cliques produced in different phases.

Asymptotic Analysis of OCC
For the purpose of the analysis it is convenient to consider (without loss of generality) only infinite ordered graphs H , whose vertices arrive one at a time in some order v 1 , v 2 , . . ., and we consider the ratios between the optimum profit and OCC's profit after each step. Furthermore, to make certain that all phases are well-defined, we will assume that the optimum profit for the whole graph H is unbounded. Any finite instance can be converted into an infinite instance with this property by appending to it an infinite sequence of disjoint edges, without decreasing the worst-case profit ratio. Similarly, S j (H ) denotes the total profit of Algorithm OCC at the end of phase j (including the incremental clustering produced in phase j). During phase 0 the graph is empty, and at the end of phase 0 it consists of only one edge, so S 0 (H ) = O 0 (H ) = 1. For any phase j > 0, the profit of OCC is equal to S j−1 (H ) throughout the phase, except right after the very last step, when new non-singleton clusters are created. At the same time, the optimum profit can only increase. Thus the maximum ratio in phase j is at most O j (H )/S j−1 (H ). We can then conclude that, to estimate the competitive ratio of our algorithm OCC, it is sufficient to establish an asymptotic upper bound on numbers R j , for j = 1, 2, . . ., defined by where the maximum is taken over all infinite ordered graphs H . (While not immediately obvious, the maximum is well-defined. There are infinitely many prefixes of H on which OCC will execute j phases, due to the presence of singleton clusters. However, since these singletons induce an independent set after j phases, only finitely many graphs H need to be considered in this maximum.) Deriving a recurrence for R j 's. Our objective now is to derive a recurrence relation for the sequence R 1 , R 2 , . . .. The value of R 1 is some constant whose exact value is not important here since we are interested in the asymptotic ratio. (We will, however, estimate R 1 later, when we bound the strict competitive ratio in Sect. 3.2.3.) So now, fix some j ≥ 2 and assume that ratios R 1 , R 2 , . . . , R j−1 are given. We want to bound R j in terms of R 1 , R 2 , . . . , R j−1 . To this end, let H * be some infinite graph With H * fixed, to avoid clutter, we will omit it in our notation, We claim that, without loss of generality, we can assume that in the computation on H * , the incremental clusterings of Algorithm OCC in each phase 1, 2, . . . , j − 1 do not contain any singleton clusters. (The clustering in phase j, however, is allowed to contain singletons.) We will refer to this property as the No-Singletons Assumption.
To prove this claim, we modify the ordering of H * as follows: if there is a phase i < j such that the incremental clustering of U in phase i clusters some vertex v from U as a singleton, then delay the release of v to the beginning of phase i + 1. Postponing a release of a vertex that was clustered as a singleton in some phase i < j to the beginning of phase i +1 does not affect the computation and profit of OCC, because vertices from singleton clusters remain in U , and thus are available for clustering in phase i + 1. In particular, the value of S j−1 will not change. This modification also does not change the value of O j , because the subgraph of H * induced by the first j phases is the same, only the ordering of the vertices has been changed. We can thus repeat this process until the No-Singletons Assumption is eventually satisfied. This proves the claim.
After this modification of H * we still have O j /S j−1 = R j , and of course also With the No-Singletons Assumption, the set U is empty at the beginning of each phase 0, 1, . . . , j. We can thus divide the vertices of H * released in phases 0, 1, . . . , j into disjoint batches, where batch B i contains the vertices released in phase i, for i = 0, 1, . . . , j. (At the end of phase i, right before the clustering is updated, we will have B i = U .) For each such i, denote by i the maximum profit of a clustering of B i . Then the total profit after i phases is S i = 0 + · · · + i , and, by the definition of OCC, For i = 0, 1, . . . , j, letB i = B 0 ∪ · · · ∪ B i be the set of all vertices of H * released in phases 0, . . . , i. Consider the optimal clustering ofB j . In this clustering, every cluster has some number a of nodes inB j−1 and some number b of nodes in B j . For any a, b ≥ 0, let k a,b be the number of clusters of this form in the optimal clustering ofB j . Then we have the following bounds, where the sums range over all integers a, b ≥ 0: Equality (5) is the definition of O j . Inequality (6) holds because the right hand side represents the profit of the optimal clustering ofB j restricted toB j−1 , so it cannot exceed the optimal profit O j−1 forB j−1 . Similarly, inequality (7) holds because the right hand side is the profit of the optimal clustering ofB j restricted to B j , while j is the optimal profit of B j . The last bound (8) follows from the fact that (as a consequence of the No-Singletons Assumption) our algorithm does not have any singleton clusters inB j−1 . This means that in OCC's clustering ofB j−1 (which has ak a,b vertices) each vertex has an edge included in some cluster, so the number of these edges must be at least 1 2 ak a,b . We can also bound the algorithm's profit values i and S i , for 0 ≤ i ≤ j, from above. (Why we need upper bounds for the algorithm's profit will be seen shortly.) We have 0 = 1 and for each phase i ≥ 1, To show (9), suppose that phase i ends at step t (that is, right after v t is revealed). Consider the optimal partitioning P of B i , and let the cluster c containing v t in P have size p + 1. If we remove v t from this partitioning, we obtain a partitioning P of the batch after step t − 1, whose profit must be strictly smaller than γ i . So the profit of P is smaller than γ i + p. In partitioning P , the cluster c − {v t } has size p. We thus obtain that p 2 < γ i , because, in the worst case, P consists only of cluster c. This gives us p < 1 2 ( 8γ i + 1 + 1). The second inequality in (9) follows by routine calculation.
From (9), by adding up all profits from phases 0, . . . , i, we obtain an upper bound on the total profit of the algorithm:

Lemma 3.1 For any pair of non-negative integers a and b, the inequality
Proof Define the function i.e., 2x times the difference between the right hand side and the left hand side of the inequality above. It is sufficient to show that F(a, b, x) is non-negative for integers a, b ≥ 0 and 0 < x ≤ 1.
Consider first the cases when a ∈ {0, 1} or b ∈ {0, 1}. F(0, b, x) = b(b−1) ≥ 0, for any non-negative integer b and any x. F(a, 0, x) = ax(ax − x +2) ≥ ax(ax +1) > 0, for any positive integer a and 0 < x ≤ 1. F(a, 1, x) = x 2 a(a − 1) ≥ 0, for any positive integer a and any x. F(1, 2, x) = 2 − 2x ≥ 0, for 0 < x ≤ 1, and Thus, it only remains to show that F(a, b, x) is non-negative when both a ≥ 2 and b ≥ 2. The function F(a, b, x) is quadratic in x and hence has one local minimum at x 0 = b−1 a−1 , as can be easily verified by differentiating F in x. Therefore, in the case when a ≤ b, F(a, b, x) ≥ F(a, b, 1 > 0, which completes the proof. We now combine all estimates derived above to establish our recurrence. Fix some parameter x, 0 < x < 1, whose value we will determine later. Using Lemma 3.1, the bounds (5)- (8), and the definition of R j−1 , we obtain (Recall that j ≥ 2.) Thus R j satisfies the inequality From inequalities (9) and (10), we have for all i = 0, 1, . . . , j. Above, we use the notation o(1) to denote any function that tends to 0 as the phase index i goes to infinity (with x and γ assumed to be some fixed constants, still to be determined). Substituting into inequality (12), we obtain our recurrence for the numbers R j : Solving recurrence. (13) Define Proof The proof is by routine calculus, so we only provide a sketch. For all j ≥ 1 let ρ j = R j − R. Then, substituting this into (13) and simplifying, we obtain that the ρ j 's satisfy the recurrence Since x + 1 < γ , this implies that ρ j = o(1), and the lemma follows.
Lemma 3.2 gives us (essentially) a bound of R on the asymptotic competitive ratio of Algorithm OCC, for fixed values of parameters γ (of the algorithm) and x (of the analysis). We can now choose γ and x to make R as small as possible. R is minimized for parameters x = 1 2 (5 − Summarizing, we obtain the following theorem.

Theorem 3
The asymptotic competitive ratio of Algorithm OCC is at most 15.646.

Strict Competitive Ratio
In fact, for γ = 1 2 (3 + √ 13), Algorithm OCC has a low strict competitive ratio as well. We show that this ratio is at most 22.641. The argument uses the same value of parameter x = 1 2 (5 − √ 13), but requires a more refined analysis. When phase 0 ends, the competitive ratio is 1. For j ≥ 1, let O j be the optimal profit right before phase j ends, that is before the last vertex of phase j is released. (Earlier we used O j to upper bound this value, but this is a loose bound because it also includes the profit for the last step of phase j.) It remains to show that for phases The outline of the argument is as follows. By exhaustively analyzing the behavior of Algorithm OCC in phase 1, taking into account that γ ≈ 3.303 > 3, we can establish that R 1 = 10. We will then bound the remaining ratios using a refined version of recurrence (12).
We start by establishing the maximum value of R 1 = O 1 /S 0 = O 1 . Let t be the last step of phase 1. By the construction of the algorithm, since γ ≈ 3.303, after step t − 1 the profit of the vertices released in phase 1 is at most 3. We can assume that phase 0 has only two vertices v 1 , v 2 connected by an edge. Let H t be the subgraph of H induced by v 1 , . . . , v t−1 and H t be its subgraph induced by v 3 , . . . , v t−1 . Thus R 1 is equal to the maximum value (over all choices of H ) of the optimal profit of H t under the assumption that the optimal profit of H t is at most 3. We now proceed to bound this value.
Denote by K i the clique with i vertices. The optimal clustering of H t cannot include a K 4 , and either 1. H t has no K 3 , and it has at most three K 2 's, or 2. H t has a K 3 , with each edge of H t having at least one endpoint in this K 3 .
In Case 1, H t cannot contain a K 5 . If a clustering of H t includes a K 4 then this K 4 contains v 1 , v 2 , and two vertices from phase 1. So, in addition to this K 4 it can at best include two K 2 's, for a total profit of at most 8. In Case 2, if a clustering of H t includes a K 5 , then it cannot include any cluster except this K 5 , so its profit is 10. If a clustering of H t includes a K 4 , then this K 4 must contain at least one of v 1 and v 2 , and it may include at most one other clique of type K 2 . This will give a total profit of at most 7. Summarizing, in each case the profit of H t is at most 10 giving us R 1 ≤ 10, as claimed. For phases j ≥ 2, we can tabulate upper bounds for R j by explicitly computing the ratios R j = O j /S j−1 using the following modification of recurrence (12), where we use the more exact bounds obtained by rounding the bounds i ≥ γ i and (9), which we can do because i is integral. From the definition of S i def = 1+ i i =1 i , we compute the first few estimates as shown in Table 1.
To bound the sequence {R j } j≥9 we rewrite recurrence (16) as and bound α j and β j using (9) and (10). With routine calculations, we can establish the bounds α j < 3 5 and β j < 8, for j ≥ 8. Therefore R j ≤R j , whereR j iŝ for j ≥ 8 and some positive constant a. The sequence {R j } j≥9 , is thus bounded above by a monotonically growing function of j having limit 20 and henceR j ≤ 20 for every j ≥ 9.
Combining this with the bounds estimated in Table 1, we see that the largest bound on R j is 22.641 given for j = 5. We can thus conclude that the strict competitive ratio of OCC is at most 22.641.
We can improve on the strict competitive ratio by choosing different values for γ and x that allow the asymptotic competitive ratio to increase slightly. The optimal values can be found empirically (using mathematical software) to be γ = 4.02323428 and x = 0.823889, giving asymptotic competitive ratio 15.902 and strict competitive ratio 20.017.

A Lower Bound for Algorithm OCC
In this section we will show that, for any choice of γ , the worst-case ratio of Algorithm OCC is at least 10.927.
Denote by B j the j-th batch, that is the vertices released in phase j. We will use notation S j for the profit of OCC and O j for the optimal profit on the sub-instance consisting of the first j batches. To avoid clutter we will omit lower order terms in our calculations. In particular, we focus on j being large enough, treating γ j as integer, and all estimates for S j and O j given below are meant to hold within a factor of 1 ± o(1). (The asymptotic notation is with respect to the phase index j tending to ∞.) We start with a simpler construction that shows a lower bound of 9; then we will explain how to improve it to 10.927. In the instance we construct, all batches will be disjoint, with the jth batch B j having 2γ j vertices connected by γ j disjoint edges (that is, a perfect matching). We will refer to these edges as batch edges. The edges between any two batches B i and B j , for i < j, form a complete bipartite graph. These edges will be called cross edges; see Fig. 3.
At the end of each phase j, the algorithm will collect all γ j edges inside B j . Therefore, by summing up the geometric sequence, right before the end of phase j (before the algorithm adds the new edges from B j to its clustering), the algorithm's profit is After the first j phases, the adversary's clustering consists of cliques C p , p = 0, 1, . . . , γ j − 1, where C p contains the p-th edge (that is, its both endpoints) from each batch B i for i = p, p + 1, . . . , j; see Fig. 3. We claim that the adversary gain after j phases satisfies [Recall that all equalities and inequalities in this section are assumed to hold only within a factor of 1 ± o (1).] We now justify this bound. The second term γ j is simply the number of batch edges in B j . To see where each term 4γ i comes from, consider the p-th batch edge from B i , for i < j. When we add B j after phase j, the adversary can add the 4 cross edges connecting this edge's endpoints to the endpoints of the pth batch edge in B j to C p . Overall, this will add 4γ i cross edges between B i and B j to the existing adversary's cliques. From recurrence (17), by simple summation, we get Dividing it by OCC's profit of at most γ j /(γ − 1), we obtain that the ratio is at least γ −1 , which, by routine calculus, is at least 9. We now outline an argument showing how to improve this lower bound to 10.927. The new construction is almost identical to the previous one, except that we change the very last batch B j . As before, each batch B i , for i < j, has γ i disjoint edges. Batch B j will also have γ j edges, but they will be grouped into q = 1 3 γ j disjoint triangles. (So B j has γ j vertices.) For p = 0, 1, . . . , q − 1, we add the p-th triangle to clique C p . (If q > γ j−1 , the last q − γ j−1 triangles will form new cliques.) This modification will preserve the number of edges in B j and thus it will not affect the algorithm's profit. But now, for each i = 0, 1, . . . , j − 1 and each p = 0, 1, . . . , min(q, γ i ) − 1, we can connect the two vertices in B i ∩ C p to three vertices in B j , instead of two. This creates two new cross edges that will be called extra edges. It should be intuitively clear that the number of these extra edges is (γ j ), which means that this new construction gives a ratio strictly larger than 9.
Specifically, to estimate the ratio, we will distinguish three cases, depending on the value of γ . Suppose first that γ ≥ 3. Then q ≥ γ j−1 , so the number of extra edges is 2

not two. Thus the new optimal profit is
Dividing by OCC's profit, the ratio is at least γ 2 +5γ −2 γ −1 , which is at least 11 for γ ≥ 3. The second case is when In this case all vertices in B 0 ∪ B 1 ∪ · · · ∪ B j−2 and 2 3 γ j vertices in B j−1 get an extra edge, so the number of extra edges is 2γ j−1 /(γ − 1) + 2 3 γ j . Therefore the new adversary profit is We thus have that the ratio is at least 5γ 3 +5γ 2 +8γ −6 3γ (γ −1) . Minimizing this quantity, we obtain that the ratio is at least 10.927.

A Lower Bound of 6 for MAXCC
We now prove that any deterministic online algorithm S for the clique clustering problem has competitive ratio at least 6. We present the proof for the strict competitive ratio and explain later how to extend it to the asymptotic ratio. The lower bound is established by showing, for any constant R < 6, an adversary algorithm for constructing an input graph G on which profit OPT (G) ≥ R · profit S (G), that is the optimal profit is at least R times the profit of S.
Skeleton trees. Fix some non-negative integer D. (Later we will make the value of D depend on R.) It is convenient to describe the graph constructed by the adversary in terms of its underlying skeleton tree T , which is a rooted binary tree. The root of T will be denoted by r . For a node v ∈ T , define the depth of v to be the number of edges on the simple path from v to r . The adversary will only use skeleton trees of the following special form: each non-leaf node at depths 0, 1, . . . , D − 1 has two children, and each non-leaf node at levels at least D has one child. Such a tree T can be thought of as consisting of its core subtree, which is the subtree of T induced by the nodes of depth up to D, with paths attached to its leaves at level D. The nodes of T at depth D are the leaves of the core subtree. If v is a leaf of the core subtree of T then the path extending from v down to a leaf of T is called a tentacle-see Fig. 4.
(Thus v belongs both to the core subtree and to the tentacle attached to v.) The length of a tentacle is the number of its edges. The nodes in the tentacles are all considered to be left children of their parents.
Skeleton-tree graphs. The graph represented by a skeleton tree T will be denoted by G. We differentiate between the nodes of T and the vertices of G. The relation between T and G is illustrated in Fig. 4. The graph G is obtained from the tree T as follows: Fig. 4 On the left, an example of a skeleton tree T . The core subtree of T has depth 2 and two tentacles, one of length 2 and one of length 1. On the right, the corresponding graph G • For each node u ∈ T we create two vertices u L and u R in G, with an edge between them. This edge (u L , u R ) is called the cross edge corresponding to u.
These edges are called upward edges. • If u ∈ T is a node in a tentacle of T and is not a leaf of T , then G has a vertex u D with edge (u D , u R ). This edge is called a whisker.
The adversary algorithm. The adversary constructs T and G gradually, in response to algorithm S's choices. Initially, T is a single node r , and thus G is a single edge (r L , r R ). At this time, profit S (T ) = 0 and profit OPT (T ) = 1, so S is forced to collect this edge (that is, it creates a 2-clique {r L , r R }), since otherwise the adversary can immediately stop with unbounded strict competitive ratio. In general, the invariant of the construction is that, at each step, the only nonsingleton cliques that S can add to its clustering are cross edges that correspond to the current leaves of T . Suppose that, at some step, S collects a cross edge (u L , u R ), corresponding to node u of T . (S may collect more cross edges in one step; if so, the adversary applies its algorithm to each such edge independently.) If u is at depth less than D, the adversary extends T by adding two children of u. If u is at depth at least D, the adversary only adds the left child of u, thus extending the tentacle ending at u. In terms of G, the first move appends two triangles to u L and u R , with all corresponding upward edges. The second move appends a triangle to u L and a whisker to u R (see Fig. 5). In the case when S decides not to collect any cross edges at some step, the adversary stops the process.
Thus, the adversary will be building the core binary skeleton tree down to depth D, and from then on, if the game still continues, it will extend the tree with tentacles. Our objective is to prove that, in each step, right after the adversary extends the graph but before S updates its clustering, we have where D → 0 when D → ∞. This is enough to prove the lower bound of 6 − D on the strict ratio. The reason is this: If S does not collect any edges at some step, the game stops, the ratio is 6 − D , and we are done. Otherwise, the adversary will stop the game after 2 D+1 + M steps, where M is some large integer. Then the profit of S is bounded by 2 D+1 + M (the number of steps) plus the number of remaining cross edges, and there are at most 2 D of those, so S's profit is at most 2 D+2 + M. At that time, T will have at least M nodes in tentacles and at most 2 D tentacles, so there is at least one tentacle of length M/2 D , and this tentacle contributes ((M/2 D ) 2 ) edges to the optimum. Thus for M large enough, the ratio between the optimal profit and the profit of S will be larger than 6 (or any constant, in fact). Once we establish (18), the lower bound of 6 will follow, because for any fixed R < 6 we can take D to be large enough to achieve a lower bound of 6 − D ≥ R. Computing the adversary's profit. We now explain how to estimate the adversary's profit for G. To this end, we provide a specific recipe for computing a clique clustering of G. We do not claim that this particular clustering is actually optimal, but it is a lower bound on the optimum profit, and thus it is sufficient for our purpose.
For any node v ∈ T that is not a leaf, denote by P L (v) the longest path from v to a leaf of T that goes through the left child of v. If v is a non-leaf in the core tree, and thus has a right child, then P R (v) is the longest path from v to a leaf of T that goes through this right child. In both cases, ties are broken arbitrarily but consistently, for example in favor of the leftmost leaves. If v is in a tentacle (so it does not have the right child), then we let P R (v) = {v}. Let Since v is not a leaf, the definition of T implies that m ≥ 2. We now define the clique C L (v) in G that corresponds to P L (v). Intuitively, for each v i we add to C L (v) one of the corresponding vertices, v L i or v R i , depending on whether v i+1 is the left or the right child of v i . The following formal definition describes the construction of C L (v) in a top-down fashion: . We define C R (v) analogously to C L (v), but with two differences. First, we use P R (v) instead of P L (v) and second, if v is in a tentacle then we let C R (v) = {v R , v D }. In other words, the whiskers form 2-cliques.
Observe that except cliques C R (v) corresponding to the whiskers (that is, when v is in a tentacle), all cliques C σ (v) have cardinality at least 3.
We now define a clique partitioning C * of G, as follows: First we include cliques C L (r ) and C R (r ) in C * . We then proceed recursively: choose any node v such that exactly one of v L , v R is already covered by some clique of Analysis. Denote by T v the subtree of T rooted at v. By G v we denote the subgraph of G induced by the vertices that correspond to the nodes in T v . Each clique in C * that intersects G v induces a clique in G v , and the partitioning C * induces a partitioning C * v of G v into cliques. We will use notation O v for the profit of partitioning C * v . Note that C * v can be obtained with the same top-down process as C * , but starting from v as the root instead of r (Fig. 6). Fig. 6 On the left, an example of a path P L (r ) = (r , x, y, p, q) in T . In this example, D = 3. The corresponding clique C L (r ) is shown on the right (darker shape). The figure on the right also shows the adversary clique partitioning of G. To avoid clutter, upward edges are not shown We denote algorithm S's profit (the number of cross edges) within G v by S v . In particular, we have profit S (G) = S r and profit OPT (G) ≥ O r . Thus, to show (18), it is sufficient to prove that where D → 0 when D → ∞. We will in fact prove an analogue of inequality (19) for all subtrees T v . To this end, we distinguish between two types of subtrees T v . If T v ends at depth D of T or less (in other words, if T v is inside the core of T ), we call T v shallow. If T v ends at depth D + 1 or more, we call it deep. So deep subtrees are those that contain some tentacles of T .

Lemma 3.3 If T v is shallow, then
Proof This can be shown by induction on the depth of T v . If this depth is 0, that is T v = {v}, then O v = 1 and S v = 0, so the ratio is actually infinite. To jump-start the induction we also need to analyze the case when the depth of T v is 1. This means that S collected only edge (v L , v R ) from T v . When this happened, the adversary generated vertices corresponding to the two children of v in T and its clustering consists of two triangles. So now O v = 6 and S v = 1, and the lemma holds.
Inductively, suppose that the depth of T v is at least two, let y, z be the left and right children of v in T , and assume that the lemma holds for T y and T z . Naturally, we have S v = S y + S z + 1. Regarding the adversary profit, since the depth of T v is at least two, cluster C L (v) contains exactly one of y L , y R ; say it contains y L . Thus C L (v) is obtained from C L (y) by adding v L . By the definition of clustering C * , the depth of T y is at least 1, which means that adding v L will add at least three new edges. By a similar argument, we will also add at least three edges from v R . This implies that O v ≥ O y + O z + 6 ≥ 6 · S y + 6 · S z + 6 ≥ 6 · S v , completing the inductive step.
From Lemma (3.3) we obtain that, in particular, if T itself is shallow then O r ≥ 6·S r , which is even stronger than inequality (19) that we are in the process of justifying. Thus, for the rest of the proof, we can restrict our attention to skeleton trees T that are deep.
Before proving the lemma, let us argue first that this lemma is sufficient to establish our lower bound. Indeed, since we are now considering the case when T is a deep subtree itself, the lemma implies that O r + 2(D + s) ≥ 6 · S r , where s is the maximum tentacle length of T . But O r is at least quadratic in D + s. So for large D the ratio O r /S r approaches 6.
Proof To prove Lemma 3.4, we use induction on h, the core depth of T v . Consider first the base case, for h = 0 (when T v is just a tentacle). In his clustering C * v , the adversary has one clique of s + 2 vertices, namely all x L vertices in the tentacle (there are s + 1 of these), plus one z R vertex for the leaf z. He also has s whiskers, so his profit for T v is s+2 2 + s = 1 2 (s 2 + 5s + 2). S collects only s edges, namely all cross edges in T v except the last. (See Fig. 7.) Solving the quadratic inequality and using the integrality of s, we get O v + 2s ≥ 6s = 6 · S v . Note that this inequality is in fact tight for s = 1 and 2.
In the inductive step, consider a deep subtree T v . Let y and z be the left and right children of v. Without loss of generality, we can assume that T y is a deep tree with core depth h − 1 and the same maximum tentacle length s as T v , while T z is either shallow (that is, it has no tentacles), or it is a deep tree with maximum tentacle length at most s (Fig. 8).
By the inductive assumption, we have O y + 2(h − 1 + s) ≥ 6 · S y . Regarding z, if T z is shallow then from Lemma 3.3 we get O z ≥ 6 · S z , and if T z is deep (necessarily of core depth h − 1) then O z + 2(h − 1 + s ) ≥ 6 · S z , where s is T z 's maximum tentacle length, such that 1 ≤ s ≤ s. Consider first the case when T z is shallow. Note that The first equation is trivial, because the profit of S in G v consists of all cross edges in G y and G z , plus one more cross edge (v L , v R ). The second inequality holds because the adversary clustering C * v is obtained by adding v L to G y 's cluster with (h − 1) + s + 2 = h + s + 1 vertices, and v R can be added to G z 's cluster that with at least 3 vertices. We get The second case is when T z is a deep tree (of the same core depth h − 1 as T y ) with maximum tentacle length s , where 1 ≤ s ≤ s. As before, we have S v = S y + S z + 1. The optimum profit satisfies (by a similar argument as before, applied to both T y and T z ) We obtain (using s ≥ s ) This completes the proof of Lemma 3.4.
The asymptotic ratio. Lemma 3.4 implies (as explained right after the lemma) that there is no deterministic algorithm for the clique clustering problem with strict competitive ratio smaller than 6. We still need to explain how to extend our proof so that it also applies to the asymptotic competitive ratio. This is quite simple: Choose some large constant K . The adversary will create K instances of the above game, playing each one independently. Our construction above uses the fact that at each step the algorithm is forced to collect one of the pending cross edges, otherwise its competitive ratio exceeds ratio R (where R is arbitrarily close to 6). Now, for K sufficiently large, the algorithm is forced to collect cross edges in all except for some finite number of copies of the game, where this number depends on the additive constant in the competitiveness bound.
Summarizing this section, we have just proved the following lower bound.

Theorem 4
There is no online deterministic algorithm for MaxCC clustering with competitive ratio smaller than 6.
Note: Our construction is very tight, in the following sense. Suppose that S maintains T as balanced as possible. Then the ratio is exactly 6 when the depth of T is 1 or 2. Furthermore, suppose that D is very large and the algorithm constructs T to have depth D or more, that is, it starts growing tentacles (but still maintaining T balanced.) Then the ratio is 6 − o(1) for tentacle lengths s = 1 and s = 2. The intuition is that when the adversary plays optimally, he will only allow the online algorithm to collect isolated edges (cliques of size 2). For this reason, we conjecture that 6 is the optimal competitive ratio.

Online MINCC Clustering
In this section, we study the clique clustering problem with a different measure of optimality that we call MinCC. For MinCC, we define the cost of a clustering C to be the total number of non-cluster edges. Specifically, if the cliques in C are C 1 , C 2 , . . . , C k then the cost of C is The objective is to construct a clustering that minimizes this cost.

A Lower Bound for Online MINCC Clustering
In this section, we present a lower bound for deterministic MinCC clustering. Proof (a) Consider an algorithm S with competitive ratio R n = n − ω(1). Thus, according to the definition (2) of the competitive ratio, there is a constant β that We can assume that β is a positive integer. The adversary first produces a graph of 2β + 2 vertices connected by β + 1 disjoint edges (v 2i−1 , v 2i ), for i = 1, 2, . . . , β + 1. At this point, S must have added at least one pair v 2 j−1 , v 2 j to its clustering, because otherwise, since cost OPT (G) = 0, inequality (2) would be violated. The adversary then chooses some large n and adds n − 2β − 2 new vertices v 2β+3 , . . . , v n that together with v 2 j form a clique of size n − 2β − 1; see Fig. 9. All edges from v 2 j to these new vertices are non-cluster edges for S and the optimum solution has only one non-cluster edge (v 2 j−1 , v 2 j ). Thus giving us a contradiction for sufficiently large n.
(b) The proof of this part is a straightforward modification of the proof for (a): the adversary starts by releasing just one edge (v 1 , v 2 ), and the online algorithm is forced to cluster v 1 and v 2 together, because now β = 0. Then the adversary adds n − 2 vertices that together with v 2 form a clique. This clique will be its only cluster, so cost OPT (G) = 1. For the online algorithm all edges between v 2 and the other vertices in this clique will be outside its clusters, so cost S (G) ≥ n − 2, proving part (b).

The Greedy Algorithm for Online MINCC Clustering
We continue the study of online MinCC clustering, and we prove that Greedy, the greedy algorithm presented in Sect. 3.1, yields a competitive ratio matching the lower bound from the previous section. (u, v) be a non-cluster edge of Greedy. Then OPT (the optimal clustering) has at least one non-cluster edge adjacent to u or v (which might also be (u, v) itself).

Proof
The key observation for this proof is that, for any triplet of vertices x, y, and z, if the graph contains the two edges (x, y) and (x, z) but y and z are not connected by an edge, then in any clustering at least one of the edges (x, y) or (x, z) is a non-cluster edge. Without loss of generality suppose vertex v arrives after vertex u. Let A be the cluster of Greedy containing vertex u right after the step when vertex v arrives. We have that v / ∈ A by the assumption of the lemma. We have two possibilities. First, if A contains some vertex u not connected to v, then the earlier key observation shows that one of the edges (u , u), (u, v) is a non-cluster edge for OPT (see Fig. 10 on the left).
Second, assume that v is connected to all vertices of A. Greedy had an option of adding v to A but it didn't, so it placed v in some clique B (of size at least 2) that is not merge-able with A, that is, there are vertices u ∈ A and v ∈ B which are not connected by an edge (see Fig. 10 on the right). Now the earlier key observation shows that one of the edges (u , v), (v, v ) is a non-cluster edge of OPT. This completes the proof of the lemma.

Theorem 6
The strict competitive ratio of Greedy is n − 2.
Proof To estimate the number of non-cluster edges of Greedy, we use a charging scheme. Let (u, v) be a non-cluster edge of Greedy. We charge it to non-cluster edges of OPT as follows.
Self charge: If (u, v) is a non-cluster edge of OPT, we charge 1 to (u, v) itself. Proximate charge: If (u, v) is a cluster edge in OPT, we split the charge of 1 from (u, v) evenly among all non-cluster edges of OPT incident to u or v.
From Lemma 4.1, the charging scheme is well-defined, that is, all non-cluster edges of Greedy have been charged fully to non-cluster edges of OPT. It remains to estimate the total charge that any non-cluster edge of OPT may have received. Since the strict competitive ratio is the ratio between the number of non-cluster edges of Greedy and the number of non-cluster edges of OPT, the maximum charge to any non-cluster edge of OPT is an upper bound for the strict competitive ratio.
Consider a non-cluster edge (x, y) of OPT. Edge (x, y) can receive charges only from itself (self charge) and other edges incident to x or y (proximate charges). Let P be the set of vertices adjacent to both x and y, and let Q be the set of vertices that are adjacent to only one of them, but excluding x and y: P = N (x) ∩ N (y) and Q = N (x) ∪ N (y) − P − {x, y}.
(N (z) denotes the neighborhood of a vertex z, the set of vertices adjacent to z.) We have |P| + |Q| ≤ n − 2.
Edges connecting x or y to Q will be called Q-edges. Trivially, the total charge from Q-edges to (x, y) is at most |Q|.
Edges connecting x or y to P will be called P-edges. Consider some z ∈ P. Since x and y are in different clusters of OPT, at least one of P-edges (x, z) or (y, z) must also be a non-cluster edge for OPT. By symmetry, assume that (x, z) is a non-cluster edge for OPT. If (x, z) is a non-cluster edge of Greedy then (x, z) will absorb its self charge. So (x, z) will not contribute to the charge of (x, y). If (y, z) is a non-cluster edge of Greedy then either it will be self charged (if it's also a non-cluster edge of OPT) or its proximate charge will be split between at least two edges, namely (x, y) and (x, z). Thus the charge from (y, z) to (x, y) will be at most 1 2 . Therefore the total charge from P-edges to (x, y) is at most 1 2 |P|. We now have some cases. Case 1: (x, y) is a cluster edge of Greedy. Then (x, y) does not generate a self charge, so the total charge received by (x, y) is at most 1 2 |P| + |Q| ≤ |P| + |Q| ≤ n − 2. Case 2: (x, y) is a non-cluster edge of Greedy. Then (x, y) contributes a self charge to itself. Case 2.1: |P| ≥ 2. Then 1 2 |P| ≤ |P|−1, so the total charge received by (x, y) is at most 1 2 |P| + |Q| + 1 ≤ (|P| − 1) + |Q| = |P| + |Q| ≤ n − 2. Case 2.2: At least one Q-edge is a cluster edge of Greedy. Then the total proximate charge from Q-edges is at most |Q| − 1, so the total charge received by (x, y) is at most 1 2 |P| + (|Q| − 1) + 1 ≤ |P| + |Q| ≤ n − 2. Case 2.3: |P| ∈ {0, 1} and all Q-edges are non-cluster edges of Greedy. We claim that this case cannot actually occur. Indeed, if |P| = 0 then Greedy would cluster x and y together. Similarly, if P = {z}, then Greedy would cluster x, y and z together. In both cases, we get a contradiction with the assumption of Case 2.
Summarizing, we have shown that each non-cluster edge of OPT receives a total charge of at most n − 2, and the theorem follows.
The proof of Theorem 6 applies in fact to a more general class of algorithms, giving an upper bound of n − 2 on the strict competitive ratio of all "non-procrastinating" algorithms, which never leave merge-able clusters in their clusterings (that is clusters C, C such that C ∪ C forms a clique).