A combinatorial multi-armed bandit approach to correlation clustering

Given a graph whose edges are assigned positive-type and negative-type weights, the problem of correlation clustering aims at grouping the graph vertices so as to minimize (resp. maximize) the sum of negative-type (resp. positive-type) intra-cluster weights plus the sum of positive-type (resp. negative-type) inter-cluster weights. In correlation clustering, it is typically assumed that the weights are readily available. This is a rather strong hypothesis, which is unrealistic in several scenarios. To overcome this limitation, in this work we focus on the setting where edge weights of a correlation-clustering instance are unknown, and they have to be estimated in multiple rounds, while performing the clustering. The clustering solutions produced in the various rounds provide a feedback to properly adjust the weight estimates, and the goal is to maximize the cumulative quality of the clusterings. We tackle this problem by resorting to the reinforcement-learning paradigm, and, specifically, we design for the first time a Combinatorial Multi-Armed Bandit (CMAB) framework for correlation clustering. We provide a variety of contributions, namely (1) formulations of the minimization and maximization variants of correlation clustering in a CMAB setting; (2) adaptation of well-established CMAB algorithms to the correlation-clustering context; (3) regret analyses to theoretically bound the accuracy of these algorithms; (4) design of further (heuristic) algorithms to have the probability constraint satisfied at every round (key condition to soundly adopt efficient yet effective algorithms for correlation clustering as CMAB oracles); (5) extensive experimental comparison among a variety of both CMAB and non-CMAB approaches for correlation clustering.

Getting the best exploration-exploitation tradeoff is a key desideratum. The effectiveness of such a tradeoff is measured by the (expected) cumulative quality of the clusterings produced in all the rounds. This is the ultimate objective to be optimized, and a major challenge in the design of proper algorithms.
It should be noted that the aforementioned exploration-exploitation tradeoff refers to a reinforcement learning paradigm, which, in this work, we adopt by resorting to the Combinatorial Multi-Armed Bandit (CMAB) framework (Chen et al. 2016(Chen et al. , 2018aKveton et al. 2015a, b;Lagrée et al. 2016;Wang and Chen 2017;Xu et al. 2020). The CMAB framework has been contextualized to several specific problems, including influence maximization (Chen et al. 2016;Vaswani and Lakshmanan 2015;Wu et al. 2019), community detection (Mandaglio and Tagarelli 2019a, b), community exploration (Chen et al. 2018b), shortest-path discovery (Talebi et al. 2017), feature selection (Liu et al. 2021). In our previous work (Mandaglio et al. 2020), we devise non-CMAB algorithms for a correlation-clustering problem variant in which interactions between entities are characterized by known input probability distributions and conditioned by external factors within the environment where the entities interact. Remarkably, none of those settings is any close to the one we consider in this work, i.e., devising a CMAB framework for (correlation) clustering.
Applications. The domain we consider in this work finds application in all those contexts where it is not preferable (or not permitted) to wait until edge weights have been produced before performing a clustering. Rather, it is desired to place clustering solutions soon, learn the weights along the way, and tolerate that the clustering quality will be less good in the initial rounds, while getting improved as the rounds go by.
For instance, we might consider a team formation scenario, where individuals need to be organized (clustered) into teams (Juárez et al. 2022). Individuals are associated with technical/soft skills which are required for task assignments within the teams. Any two individuals exhibiting a certain skill-level similarity should be assigned to the same team, and conversely to different teams if they are dissimilar to each other; clearly, given the variety of skills and their compatibility levels, the exact degree of matching between two individuals' skills are not a-priori known at the beginning of the team formation process, and indeed similarities should be learned through team-formation history. In this regard, individuals collaborate with both their teammates and individuals from other teams, for, e.g., general coordination purposes. A desirable goal is to establish teams so as to maximize the overall (i.e., intra-team plus inter-team) similarity between pairs of individuals. This is a problem that can easily be casted to correlation clustering, where vertices correspond to individuals, clusters correspond to teams, and the positive-type and negative-type edge weights correspond to intra-team and interteam similarities, respectively. Note that empathy (i.e., mutable) characteristics between individuals are here discarded, since this may cause drift in the likelihood that they (dis)like each other once they are (temporarily) members of the same team, i.e., edge weights would change through the rounds. In this regard, an analogous scenario is the task allocation for robots, each of which is programmed to handle a number of operations. Correlation clustering would be helpful to enable forming coalitions between robots, in order to allocate them to tasks to be completed optimally according to some efficiency requirements.
Two further example scenarios are commercial scheduling (e.g., Bollapragada and Garbiras 2004; Giallombardo et al. 2016) and shelf space allocation (e.g., Hübner et al. 2021). The former consists in optimally assigning a set of commercials to fill in each advertisement slot scheduled by a TV broadcaster, where vertices are commercials and edge weights denote marketing-driven benefits in assigning, resp. separating, any two commercials within the same, resp. to different slots; edge weights might initially be estimated by accounting for requirements provided by either the brand customers and the TV braodcaster, then the weights will be adjusted by observing the feedback provided by (online) market-surveys (e.g., delivered to targeted audience of the TV programme schedule). Shelf space allocation is to model the shelf space dimensioning and positioning for allocating selected products based on practical retail requirements, so as to maximize the product selling; here, the feedback observed from the selling outcomes, as well as from how the customers welcome or not the retailer choices, would be related to the opportunity of ensuring brand visibility, or improving customers satisfaction.
All the aforementioned scenarios correspond to well-known optimization problems in operation research and related fields; nonetheless, they are also key-enabling in emerging contexts, such as the development of smart production systems brought by Industry 4.0 (Grillo et al. 2022). However, such problems have not commonly been addressed in terms of correlation clustering; and the few existing exceptions (e.g., Dutta et al. (2019)) are far from a CMAB perspective-unlike we study in this work-which needs to be profitably adopted since correlation-clustering weights are unlikely to be apriori known.
Contributions. The one we deal with in this work is a natural reinforcementlearning scenario, which, to the best of our knowledge, has never been considered in the context of correlation clustering. We tackle it by designing-for the first time-a Combinatorial Multi-Armed Bandit (CMAB) (Chen et al. 2016) framework for correlation clustering. In doing so, we achieve a mix of modeling, algorithmic, technical, and empirical contributions, including principled framework design and problem formulations, design and theoretical analysis of algorithms, tricks to make the algorithms work in practice, experimental evaluation. More in details, our main contributions are as follows: • We novelly formulate correlation clustering in a CMAB setting, by providing a contextualization of the main ingredients of a typical CMAB framework and CMAB formulations for both Min-CC and Max-CC (Sect. 3). Among others, a key impactful consequence of this contribution is that it enables the use of generalist CMAB approximation algorithms/heuristics for CMAB-based Min-CC/ Max-CC with minimum customization effort. In this regard, we show how the popular Combinatorial Upper Confidence Bound (CUCB) method (Chen et al. 2016) can be employed in the context of Max-CC (Appendix A.2). • We introduce the Combinatorial Lower Confidence Bound (CLCB) method, which can be viewed as the counterpart of CUCB for minimization problems, 1 3 and show how to suitably customize it in order to handle Min-CC instances (Sect. 4). • The effectiveness of a CMAB algorithm is typically assessed in terms of regret, i.e., a measure of how far the (expected) cumulative quality of the solutions yielded by an algorithm is from the optimal cumulative quality. In this regard, Chen et al. (2016) provide a regret analysis of the CUCB method, which shows that, if the underlying combinatorial-optimization problem satisfies certain properties, CUCB is guaranteed to achieve a regret that is at most logarithmic in the number of clustering rounds, in presence of an approximation oracle. Here, we build upon Chen et al.'s result and show that: • Our CMAB formulation of Max-CC satisfies Chen et al.'s properties, thus, CUCB achieves logarithmic regret for Max-CC as well (Appendix A.2.1). • We devise a principled regret definition for Min-CC. According to this definition, we also provide a regret analysis that, along the lines of Chen et al.'s analysis for CUCB, proves that CLCB achieves logarithmic regret in the number of clustering rounds for Min-CC in presence of an approximation oracle (Sect. 4.1). Our regret definition and analysis for Min-CC are general enough to be reused in any minimization CMAB problems with approximation oracle. This is a per-se contribution, as, to the best of our knowledge, no regret definitions/analyses for CMAB minimization problems (with approximation oracle) exist in the literature. • We further investigate the applicability of the CLCB-like algorithm in practice (Sect. 4.2). A key desideratum in this regard is to employ the traditional Pivot algorithm for Min-CC (Ailon et al. 2008) as an (approximation) oracle within CLCB, for its efficiency, theoretical yet empirical effectiveness, and ease of implementation. A major challenge here is that, to achieve its approximation guarantees (and to provide effective solutions in practice too), Pivot needs the input edge weights to satisfy the probability constraint. Unfortunately, the CLCB algorithm does not guarantee the fulfilment of this constraint at every round. Thus, we design novel variants of the basic CLCB where the correlation-clustering problem instances to be given as input to Pivot (are close to) meet the probability constraint. • We conduct an extensive evaluation to experimentally test the performance of CMAB correlation-clustering algorithms, including the algorithms devised in this work, as well as (correlation-clustering-customized) popular CMAB heuristics, such as -greedy, pure exploitation, and Combinatorial Thompson Sampling (Wang and Chen 2018) (Sects. 5-7). We consider the Min-CC context only, due to the availability of practical approximation oracles (unlike Max-CC). Results show superior and close accuracy of CMAB methods over non-CMAB baselines and a reference method that performs correlation clustering with the true edge weights, respectively. Also, the per-round runtime of CMAB methods is (at worst) comparable to the runtime of executing a linear-time correlationclustering algorithm once.

3
A CMAB approach to correlation clustering Section 2 discusses background and related work. Section 8 concludes the paper.

Correlation clustering
The minimization (Min-CC) and maximization (Max-CC) formulations of correlation clustering aim at minimizing disagreements and maximizing agreements, respectively. They are formally defined as follows: Problem 1 (Min-CC (Ailon et al. 2008)) Given a graph G = (V, E) , and nonnegative weights w + uv , w − uv ∈ ℝ + 0 for each edge (u, v) ∈ E , find a clustering C * ∶ V → ℕ + such that: Problem 2 (Max-CC (Ailon et al. 2008)) Given a graph G = (V, E) , and nonnegative weights w + uv , w − uv ∈ ℝ + 0 for each edge (u, v) ∈ E , find a clustering C * ∶ V → ℕ + such that: In the above problems, and hereinafter, we let a clustering be represented as an injective function that expresses cluster-membership for the vertices in V.
Min-CC and Max-CC are equivalent in terms of optimality and complexity class [both NP-hard (Bansal et al. 2004;Shamir et al. 2004)], but have different approximation-guarantee properties, with the latter being easier in this regard. On general edge weights, both Min-CC and Max-CC are APX-hard (Charikar et al. 2005), with Max-CC admitting constant-factor approximation algorithms (Charikar et al. 2005;Swamy 2004), and with the best known approximation factor for Min-CC being O(log |V|) (and unlikely to be meliorable) (Charikar et al. 2005;Demaine et al. 2006).
Although considering various types of weight, all the above works still assume that edge weights are all available as input. In this work, we go beyond this limitative view, and focus on the context where edge weights are not available beforehand, but they have to be discovered while performing (multiple rounds of) clustering.
Beyond basic correlation clustering. Several extensions to the basic correlationclustering formulations have been studied, including constrained/relaxed formulations (e.g., constraining the number/size of clusters, allowing overlapping clusters), and adaptations to more sophisticated types of graph (e.g., bipartite graphs, labeled graphs, multilayer graphs, hypergraphs) or nonconventional computational settings (e.g., online, parallel, streaming). We point the interested reader to Bonchi et al. (2014Bonchi et al. ( , 2022; Pandove et al. (2018) for more details on these advanced topics. Here, let us just discuss the problem of query-efficient correlation clustering (Bressan et al. 2019;García-Soriano et al. 2020), which, to our knowledge, is the only correlation-clustering extension that exhibits some (slight) similarity with the setting we study in this work. Query-efficient correlation clustering assumes that edge weights are discovered by querying an oracle, and the goal is to cluster the input graph by using a limited budget of Q queries ( Q ≪ O(|V| 2 ) ). Although it is still assumed that edge weights are not available beforehand (like in our setting), query-efficient correlation clustering focuses on a scenario that remains profoundly different from the one tackled in this work. In fact, it considers a hard limit Q on the number of edge weights that can be ultimately discovered, which is a restriction that is not present in our setting. Moreover, the feedback on edge weights is given by an oracle, which provides true edge weights for any query, at any time. Instead, in our setting, the feedback consists in a sample of the weight distributions that are used to update the weight estimates, and is provided by the clustering itself (there is no oracle). Finally, existing approaches to query-efficient correlation clustering (Bressan et al. 2019;García-Soriano et al. 2020) handle binary weights only.

Combinatorial multi-armed bandit
Combinatorial Multi-Armed Bandit (CMAB) is a popular reinforcement-learning framework to learn how to perform actions by exploring/exploiting the feedback from an environment (Chen et al. 2016). It extends basic Multi-Armed Bandit (MAB) (Berry and Fristedt 1985) so that the actions to be performed/learned correspond to combinatorial structures (superarms) that are defined on top of simpler, basic actions (base arms). Specifically, a CMAB instance consists of m base arms. Each base arm i is assigned a set {X i,t | 1 ≤ i ≤ m, 1 ≤ t ≤ T} of random variables, where T is the number of rounds. The support of each X i,t -assumed to range from [0, 1]-indicates the random "outcome" of playing base arm i in round t. This outcome is interpreted as a feedback from the environment and used to carry out the learning process. The random variables {X i,t } T t=1 of the same arm i are independent and identically distributed, according to some unknown distribution with unknown expectation i . Random variables of different base arms may be dependent or distributed with different laws. Estimates {̂i} m i=1 of the true unknown { i } m i=1 expectations are kept (and updated) at every round. A CMAB instance also includes a set A ⊆ 2 [m] of possible superarms. A is typically defined as the subset of all subsets of base arms satisfying certain constraints. At each round t, a superarm A t ∈ A is played and the outcomes of the random variables X j,t , for all the base arms j ∈ A t , are observed. These outcomes can be used to update the knowledge on the estimates {̂j} j∈A t . Playing a superarm A t gives a reward R t (A t ) , which is a random variable defined as a function of the outcomes of A t 's base arms. R t (A t ) may simply be a summation ∑ j∈A t X j,t of the outcomes of A t 's base arms, but more complex (possibly nonlinear) rewards are allowed. In any case, it is often assumed that the expectation [R t (A t )] is a function of only A t 's base arms and all the { i } m i=1 (true) expectations. For minimization problems, the reward can be replaced by a notion L t (A t ) of loss. The adaptation is straightforward.
The objective of a CMAB algorithm is to select a superarm to be played at every round, so as to maximize the cumulative expected reward obtained in all the rounds, i.e., [ . With this ultimate goal in place, a superarm A t can be chosen by either exploiting the knowledge acquired from the outcomes of previous rounds, or exploring arms that have not been played much. Here is the exploration-exploitation tradeoff that usually appears in reinforcement-learning scenarios: a key design principle of any CMAB algorithm consists in deciding to what extent it should pick the arms that have provided good rewards so far (exploitation) or select different arms with the aim of getting even better rewards (exploration).
As for exploitation-aware superarms, it is assumed the availability of an oracle that computes a superarm based on the current estimates {̂i} m i=1 of the base-arm expectations and the knowledge it possesses on the specific problem at hand. The oracle can be exact, i.e., it outputs , or an ( , )-approximation one, for some , ≤ 1 , i.e., it outputs a superarm A t so that ). The effectiveness of a (C)MAB algorithm is typically measured in terms of the so-called regret metric, which corresponds to the difference in the cumulative expected reward between always playing the optimal arm (possibly scaled by factors and in case of ( , )-approximation oracles) and playing arms according to the algorithm. A major theoretical desideratum in this regard consists in providing a suitable regret analysis, which guarantees that the algorithm at hand achieves a certain bounded regret. The seminal work by Chen et al. (2016) shows that it is possible to design CMAB algorithms achieving O(log T) regret, and that this is a tight bound. Regret definitions and analyses for CMAB maximization problems exist for both exact and approximation oracles (Chen et al. 2016;Wang and Chen 2017). As for minimization problems, to the best of our knowledge, they have been devised for exact oracles only (Cesa-Bianchi and Lugosi 2012; Talebi et al. 2017). In this work, we provide for the first time a regret analysis for a minimization problem (Min-CC) with approximation oracle. The generality of our regret definition and analysis make us believe that this is a contribution of interest for CMAB minimization problems in general, not only for (correlation) clustering.

Problem definition
In this section we provide the details of the proposed contextualization of CMAB to correlation clustering. As a first step, we let the weights w + e , w − e of every edge e ∈ E be modeled as random variables W + e , W − e with [0, 1] support, 1 and mean All such random variables and their means are assumed to be unknown (as typical in CMAB), and not to change in the various clustering rounds. Any CMAB algorithm keeps estimates of the true means, which are denoted as: Let also every edge e = (u, v) ∈ E be represented by a pair of replicas, e in and e out , which model the fact that e is an intra-cluster or inter-cluster edge (with respect to a given clustering), respectively. Let S in = {e in | e ∈ E} and S out = {e out | e ∈ E} be the sets of all intra-cluster and inter-cluster edge replicas, respectively. We make the base arms in CMAB correlation clustering correspond to the set S = S in ∪ S out of all edge replicas (thus, the number of base arms is m = 2|E| ), and a superarm be identified by a set of base arms that are consistent with the notion of clustering. Formally, a superarm corresponds to a clustering-compliant replica set: Definition 1 (Clustering-compliant replica set) A set S ⊆ S of edge replicas is clustering-compliant if (i) for all e ∈ E , S does not contain both e in and e out , and (ii) for all e 1 = (x, y), e 2 = (y, z), e 3 = (x, z) ∈ E , if e in 1 , e in 2 ∈ S , then e in 3 ∈ S.
In the above definition, (i) is because an edge cannot be both intra-cluster and inter-cluster, while (ii) guarantees the transitive property that if vertices x, y are within the same cluster and y, z are within the same cluster, then x, z must be in the

3
A CMAB approach to correlation clustering same cluster too. Simply speaking, a superarm corresponds to a clustering. Thus, we hereinafter refer to "superarm" and "clustering" as two equivalent notions.
The outcome of the base arms that are triggered while playing a superarm depends on the correlation-clustering formulation. In Min-CC, the outcome of every intra-cluster edge replica e in comes from the corresponding negativetype-weight W − e random variable, while the outcome of every inter-cluster edge replica e out comes from the corresponding positive-type-weight W + e random variable. The rationale is that, in Min-CC, the clustering quality is measured in terms of the negative-type weight of all intra-cluster edges and the positivetype weight of all the inter-cluster edges. Thus, placing a clustering (i.e., playing a superarm) is expected to give a feedback that is consistent with Min-CC 's objective function: the outcome of e in (resp. e out ) replicas should be used to update − e (resp. + e ). Conversely, in Max-CC, e in and e out are assigned (and their outcome come from) W + e and W − e , respectively. The reward/loss corresponds to the correlation-clustering objective function, hence its definition depends on the correlation-clustering formulation too. Given a superarm S , let S in and S out denote the intra-cluster and inter-cluster edge replicas in S , respectively. Min-CC utilizes a disagreement-based loss d(S) defined as: while Max-CC employs a reward a(C) defined in terms of agreements as: The expectations of d(⋅) and a(⋅) are as follows (by linearity of the expectation): where the " " subscript in d and ā is to emphasize that those functions depend on the true means . Denoting by C S the clustering corresponding to superarm S , Eq. (7) can alternatively (yet equivalently) be written as: Table 1 summarizes the elements of our CMAB correlation-clustering formulation.
CMAB-Min-CC and CMAB-Max-CC problems. Given a graph G = (V, E) , we perform discrete rounds t = 1, … , T , where at each round t, a clustering C t of the vertices in V is computed and used to update the mean estimates ̂ + ,̂ − of the random variables modeling the positive-type and negative-type edge weights, respectively. As discussed above, in Min-CC, the weight of an edge e between vertices within the same cluster (resp. in different clusters) is interpreted as a random sample useful to update ̂− e (resp. ̂+ e ). In Max-CC, the opposite holds. The ultimate (8) 1 3 objective is to minimize/maximize the cumulative expected loss/reward of the clusterings yielded in all the rounds. Formally, the problems we tackle in this work are: Problem 3 (CMAB-Min-CC) Given a graph G = (V, E) and a number T > 0 of rounds, for every t = 1, … , T find a clustering C t ∶ V → ℕ + so as to minimize Problem 4 (CMAB-Max-CC) Given a graph G = (V, E) and a number T > 0 of rounds, for every t = 1, … , T find a clustering C t ∶ V → ℕ + so as to maximize The expectation in Eqs. (9) and (10) is taken among all the random events generating the C t clusterings (due to, e.g., possible randomization in the oracle that computes the clusterings). There is a further expectation in those equations, which is implicit in the definition of expected loss d (⋅) and expected reward ā (⋅) (see Eq. 8).
As previously discussed, CMAB-Max-CC (resp. CMAB-Min-CC) requires an oracle to solve, for each round, a Max-CC (resp. Min-CC) instance according to the mean estimates ̂ + ,̂ − . However, oracles available for Max-CC (Charikar et al. 2005;Swamy 2004) are both inefficient and, more importantly, poorly useful in practice, since they are not able to output more than a fixed number of clusters (i.e., six). This implies that the corresponding CMAB setting (i.e., CMAB-Max-CC) will inherit this issue too, since the clusterings yielded at each round are obtained through these algorithms. This aspect is a showstopper in our context, as we are interested in algorithms that are effective and theoretically solid, yet capable of providing outputs whose quality is recognizable in practice too, not only theoretically. For this reason, we hereinafter focus our attention on algorithms for CMAB-Min-CC only. For completeness, algorithms for CMAB-Max-CC are however presented in Appendix A.2.
. 1 3 A CMAB approach to correlation clustering

Algorithms for CMAB-Min-CC
In this section, we present algorithms for CMAB-Min-CC (Problem 3). We first focus on the context of general oracles for Min-CC (Sect. 4.1), and, then, on the case where the employed Min-CC oracles achieve theoretical guarantees only if the input meets certain properties (Sect. 4.2). Finally, we discuss the special case of input edge-weight distributions satisfying specific constraints (Sect. 4.3).

General Min-CC oracles
The CC-CLCB algorithm. We devise a variant of the so-called Combinatorial Upper Confidence Bound (CUCB) algorithm (Chen et al. 2016) which is an extension of the UCB1 method for MAB (Auer et al. 2002). It keeps, along with the estimate of the means of the base-arm random variables, confidence intervals within which the true means fall with overwhelming probability, and plays superarms based on the upper bound of those intervals. Our proposed variant, termed Combinatorial Lower Confidence Bound (CLCB), is tailored for minimization problems but follows the principles of CUCB: it maintains confidence intervals where the true means fall in with high probability, but, conversely to CUCB, it plays superarms based on the confidence-interval lower bounds. Our customization of CLCB to Min-CC is termed CC-CLCB and outlined as Algorithm 1. CC-CLCB keeps track of the mean estimates ̂ = {̂ + ,̂ − } (Eq. 4), and of the number T + e (resp. T − e ) of times a sample from W + e (resp. W − e ) random variable has been observed until the current round, for all e ∈ E . At the beginning, ∀e ∈ E ∶ T + e = T − e = 0 , and ̂ are initialized, e.g., randomly or based on prior domain knowledge (Line 1). In every round t, the current mean estimates are adjusted with a term ± e (defined based on Chernoff-Hoeffding bounds (Auer et al. 2002;Chen et al. 2016)), so as to foster, to some extent, the exploration of less often played base arms (Line 3). This leads to the adjusted means {̃ + e ,̃ − e } e∈E (Line 4), which are interpreted as positive-type and negative-type edge weights of a correlation-clustering instance, respectively, and are fed as input (along with G) to an oracle O that computes a Min-CC solution C t (Line 5). C t is used as a feedback to update the mean estimates (Sect. 3, Table 1). Specifically, the weight of each intra-cluster (resp. inter-cluster) edge e is interpreted as a sample of W − e (resp. W + e ), and is used to update ̂− e , T − e (resp. ̂+ e , T + e ). ̂+ e and ̂− e are updated so as to be equal to the average of the samples from W + e and W − e observed so far, respectively (Lines 6-11).

Regret analysis of CC-CLCB.
As correlation clustering is NP-hard, it is unlikely that CC-CLCB can be equipped with an exact oracle O for Min-CC running in polynomial time. Hence, in analyzing the theoretical guarantees of CC-CLCB, we consider the case where O is a Min-CC( , )-approximation oracle: The condition in Definition 2 to recognize O as a Min-CC-( , )-approximation oracle needs to hold on every Min-CC instance that is given as input to O at each round. Hence, the condition has to hold on the mean estimates, not the true ones. Similarly to the maximization counterpart, existing Min-CC algorithms achieving ( O(log |V|) ) guarantees in expectation (Charikar et al. 2005;Demaine et al. 2006) can be employed as Min-CC-( , )-approximation oracles. More details in Appendix A.1.
We introduce a notion of ( , )-approximation regret, which can be viewed as the minimization counterpart of the traditional one defined in Chen et al. (2016) and used in maximization problems. Applied to the Min-CC context, this measure is defined as follows: The rationale of the above definition is as follows. First, being the focus on a minimization problem, the lower the probability of success, the higher the loss value to compare with. Moreover, to take into account possible divergences of the approximation oracle from the optimum, and recalling that here are losses, not rewards, we add an extra term to the 1d (C * I ) loss that "interpolates" between the highest = 1 probability (thus, we compare with 1d (C * I ) ) and the worst = 0 probability (thus, we compare with the maximum value M of loss). Note that the achieved by the CMAB method at hand in the various rounds. It is defined by noticing that, in every round t = 1, … , T , a Min-CC-( , )-approximation oracle yields, with probability (at least) , a solution whose d (⋅) value is at most 1 times the optimum (i.e., 1d (C * I ) ), and, with probability (at most) 1 − , a solution whose d (⋅) value is more than 1d (C * I ) . In the latter case, consistently with the regret definition in maximization problems (Chen et al. 2016), we assume that the d (⋅) value of the yielded solutions is equal to an upper bound UB = M on d . More precisely: The comparison term in Reg A , , (T) is pessimistic in assuming that when the ( , ) -approximation oracle does not achieve approximation guarantees it yields solutions whose d (⋅) value is equal to the upper bound M . However, note that this happens with probability 1 − . In our context, 1 − is in the order of |V| −c (cf. Appendix A.1), with c set to 1 in our experiments (cf. Sect. 5). This means that the pessimistic assumption arises just in a tiny minority |V| −1 T of the rounds. Also, the comparison term still adopts the optimistic assumption that the true weights are known, while they are actually not for the CMAB method that is being evaluated in terms of Reg A , , (T). As typically required in (C)MAB, the above regret is consistent with the definition of cumulative expected reward/loss at hand (i.e., Eq. 9, in this case). Thus, minimizing that regret corresponds to solving CMAB-Min-CC (Problem 3). A key theoretical desideratum in CMAB (and online-learning settings in general), is having a regret bounded by some function that is sublinear in the number of rounds. This is motivated by the fact that the overall objective is typically a summation over the number of rounds, thus a regret growing (at least) linearly in the number of rounds is considered as a straightforward result that any algorithm can easily achieve.
As shown in the next theorem, CC-CLCB achieves a regret bound that is logarithmic in the number of rounds:

when equipped with a Min-CC-( , ) -approximation oracle O (Definition 2), is upper-bounded by a function that is O(log T).
Proof (sketch) The proof relies on the following main result: the d (⋅) function (Eq. 8) satisfies the properties of monotonicity and 1-norm bounded smoothness. This triggers a (rather long yet complex) chain of further results along the line of those derived in Wang and Chen (2017) for the regret analysis of algorithms for CMAB maximization problems. The ultimate of such results attests the desired logarithmic regret bound. A more detailed proof is reported in Appendix A.3. ◻

Min-CC oracles requiring the probability constraint
The CC-CLCB algorithm makes no assumptions on the input graph or edgeweight distributions. Thus, to achieve regret guarantees, CC-CLCB needs a Min-CC oracle whose approximation guarantees hold in general, without requiring restrictions on the input. As said above, algorithms of this kind, in the context of Min-CC, exist (Charikar et al. 2005;Demaine et al. 2006), but they suffer from issues such as limited efficiency yet not easy implementation (they need to solve a linear program of size (|V| 3 ) ), and non-constant ( O(log |V|) ) approximation factor. A much better option would be to resort to the well-established Pivot (Ailon et al. 2008), which is efficient (it takes linear time), easy to implement (it just randomly picks a vertex u and builds a cluster as composed of u and all vertices connected to u with and edge whose positive-type weight is no less than the negative-type one), and achieves constant-factor approximation. Unfortunately, the (expected factor-5) guarantees of Pivot hold only if the input graph is complete and the edge weights satisfy the probability constraint, i.e., w + uv + w − uv = 1 , ∀u, v ∈ V . For this purpose, here we focus on the design of heuristic variants of CC-CLCB that favor the fulfilment of the probability constraint on the Min-CC instances to be processed by the oracle. The rationale is that the closer a Min-CC instance is to meet the probability constraint, the closer Pivot is to its "theoretical comfort zone", thus expected to perform better.
The PC+Exp-CLCB algorithm. Our first proposal in this regard is the PC+Exp-CLCB algorithm (where "PC+Exp" means "probability constraint + exploration"). This algorithm, outlined as Algorithm 2, follows the same scheme as CC-CLCB, but it computes {̃ + uv ,̃ − uv } u,v∈V adjusted means so as to simultaneously favor some exploration and make the resulting Min-CC instance satisfy the probability constraint.
The Global-CLCB algorithm. As our second variant of CC-CLCB, we devise an algorithm-dubbed Global-CLCB-which builds Min-CC instances at each round that are as close as possible to meet a global constraint on the edge weights similar the one defined in Mandaglio et al. (2021). The fulfilment of this global constraint makes the probability-constraint-aware approximation guarantees still hold even if the probability constraint is locally violated. Global-CLCB mainly relies on the following result: , then any Min-CC algorithm (e.g., Pivot) achieving (expected) factorapproximation in presence of the probability constraint achieves (expected) factor-approximation on I too.
Proof (sketch) The result here is a special case of the one originally proved in Theorem 1 in Mandaglio et al. (2021), specifically arising for max = 1 . Therefore, the proof herein is exactly the same as the one of Theorem 1 in Mandaglio et al. (2021), with the only straightforward exception of replacing max with the constant 1. ◻ Global-CLCB attempts to compute {̃ + uv ,̃ − uv } u,v∈V adjusted means that are as close as possible to satisfy Theorem 2. Global-CLCB is the same as CC-CLCB, except for their respective Line 4. A detailed pseudocode of Global-CLCB is reported in Algorithm 3. We point out that CC-CLCB 's regret analysis does not hold for PC+Exp-CLCB or Global-CLCB. Deriving theoretical regret guarantees for these (or similar) heuristics is a challenging open question that we defer to future work.

Special edge-weight distributions
An interesting special input is the one of symmetric edge-weight distributions: Conceptually, this is like assuming that if a similarity equal to x holds for any two vertices, a (1 − x) distance implicitly holds for the same vertices as well.
CMAB-correlation-clustering instances where symmetry holds for all edgeweight distributions are easier to solve. In fact, symmetry in the distributions makes the instance at hand a full-information bandit setting: observing a sample x ∼ W + e is equivalent to observing a sample (1 − x) ∼ W − e , for all e ∈ E . This corresponds to having an outcome revealed for all the base arms, regardless of the superarm (clustering) played. In this case, therefore, exploration is meaningless. Rather, a fullexploitation strategy is worth to be performed, where a clustering considering solely the current mean estimates is yielded in each round. This strategy achieves a regret bound that is constant in the number of rounds, as stated by Theorem 3: Proof (sketch) The full-information bandit setting allows for simplifying some intermediate math in the regret analysis of a non-full-information setting (Theorem 1).
These simplifications ultimately lead to a O(1) regret bound. A detailed proof and a pseudocode of the full-exploitation strategy are in Appendix A.4. ◻

Visualization example
Figure 1 provides a visualization of the HighlandTribes and Contiguous-USA graphs, used as case in point (cf. Section 5), and their CMAB-Min-CC clusterings produced by -greedy and CC-CLCB, respectively, using the same oracle in both cases. In particular, we show the outcomes referring to the using correlation-clustering linear-programming method in Charikar et al. (2005) as an oracle. Our goal here is to provide empirical evidence of the significance of the CMAB-Min-CC setting and effectiveness of the CMAB-Min-CC methods. To this purpose, we show three snapshots of execution on each graph, namely at the initial, middle and final round of a method. Besides visualizing the cluster memberships of vertices-note that vertices of one cluster share the same color at any round, but color memberships may change at different rounds-we also use black edges and red edges to distinguish between edge-level agreements and disagreements, respectively, which denote that the disequality between the positive-type weight and negative-type weight of an edge holds, or does not hold, the same on the true weights and their estimates; formally, an , otherwise as red.
Two major remarks stand out by looking at the plots for each graph. First, the similarity measured in terms of normalized mutual information (NMI) between the CMAB-Min-CC solution produced by the oracle over the graph with mean estimates and the corresponding solution over the graph with true means, significantly improves as more rounds are carried out; in particular, as shown for HighlandTribes (plots (a-c)), already after few early rounds, NMI approaches the maximum value reached at the final round. Second, the number of edge-level agreements also rapidly increases after few rounds, until few disagreements are left at the final round.

Experimental methodology
Data. We consider ten publicly-available real-world graphs, as summarized in Table 2. Each of the five networks from bottom corresponds to the flattening of a network originally represented as a set of snapshot-graphs (Galimberti et al. 2020) Fig. 1 Example CMAB-Min-CC solutions obtained by -greedy on HighlandTribes, and by CC-CLCB on Contiguous-USA, over T = 200 rounds, using the linear-programming method in Charikar et al. (2005) as an oracle. Vertex colors correspond to cluster memberships, while edges are colored as black, resp. red, if there is an edge-level agreement, resp. disagreement, between the true edge weights and their estimated values; by edge-level agreement, we mean that the disequality between the positive-type weight and negative-type weight of an edge holds the same on the true weights and their estimates. Values within parentheses refer to NMI (with arithmetic mean normalization) between the CMAB-Min-CC solution by the oracle over the graph with mean estimates and the corresponding solution over the graph with true means ▸ (i.e., an edge between u and v exists in the flattened network if u and v were linked in at least one snapshot).
Edge weight distributions. The random variables W + e , W − e modeling the positivetype and negative-type edge weights in a Min-CC instance are assumed to follow a Bernoulli distribution, whose means are generated according to three schemes.
In the first two schemes, termed R-wd and PC-wd , the original-possibly incomplete-network topology of the underlying input graph is maintained, As far as the means + e , − e for each e ∈ E , R-wd samples uniformly at random both + e and − e from the [0, 1] interval, independently from one another, i.e., + e , − e ∼ Uniform(0, 1) , for all e ∈ E . On the other hand, PC-wd ensures that the probability constraint holds on the generated means, which corresponds to first sample + e ∼ Uniform(0, 1) , and then set − e = 1 − + e , for all e ∈ E . As a result, for both R-wd and PC-wd , + e , − e ∈ [0, 1] , while the samples observed from the edge weight distributions at each CMAB round ∈ {0, 1} , for all e ∈ E . In particular, samples W + e = 1 and W + e = 0 (resp. W − e = 1 and W − e = 0 ) are observed with probability + e and 1− + e (resp. − e and 1− − e ), respectively. The third scheme assumes the actual network topology imposes a binary, mutually exclusive setting for each pair of vertices, i.e., Since this setting leads to a new complete graph, the scheme is referred to as C-wd , and it will be considered only for the smaller datasets, as it is computationally unfeasible to handle complete versions of the larger datasets. As + uv , − uv ∈ {0, 1} , for all u, v ∈ V , the underlying W + uv , W − uv distributions are actually degenerate, and every sample observed in a CMAB round from W + uv (resp., W − uv ) will be equal to 1 if + uv = 1 (resp., − uv = 1 ), and it will be 0 if + uv = 0 (resp., − uv = 0). Assessment criteria. The means + e , − e generated via the above schemes correspond to the true correlation-clustering edge weights that are unknown to any CMAB method. These are used to evaluate the quality of the clusterings yielded in the various CMAB rounds via the average expected normalized cumulative Min-CC loss, calculated until each round t:

and the [⋅] expectation is computed by averaging the d (⋅)
values obtained over all the runs of execution of the randomized Min-CC oracles (see below). Note that Eq. (12) is a shorter and normalized version of Eq. (11). In fact, Eq. (12) is limited to the first term only in Eq. (11), as the second term is common to all the methods (under the same oracle). It is also normalized, so as to have the results on graphs of different size more easily comparable to each other.
As a second assessment criterion, we consider the error of the ̂+ e,t , ̂− e,t weight estimates at each round t, which is measured in terms of relative error norm as For both f (t) and ren (t) lower values correspond to better performance. Our main focus is on the f (T) , ren (T) values at the final round t = T , as they give a compact yet general evidence of the overall performance of a method. However, we also analyze the trend of ren (t) over the various rounds to assess statistical significance (Sect. 6.5), and report evidence of such trends (Sect. 7). Within this view, in the result tables presented in Sect. 6, we shall report f (T) and ren (T) values. Moreover, for the CMAB methods only, we also provide the growth rates, i.e., the average amount of relative change between the initial and the final round over the span T (in percentage): Finally, we are also interested in assessing the running time of the various tested methods (Sect. 6.4).
Methods. We involve methods falling into four approaches: (i) CMAB-Min-CC methods adopting the CLCB paradigm, (ii) classic general CMAB heuristics that, in this context, are customized to work for CMAB-Min-CC, (iii) baselines that do not follow the CMAB paradigm, and (iv) a reference method that performs clustering by utilizing the true edge weights. More specifically: (i) As CLCB-based methods, we include CC-CLCB (Algorithm 1), PC+Exp-CLCB (Algorithm 2), and Global-CLCB (Algorithm 3). Moreover, for both CC-CLCB and Global-CLCB, we also consider their CC-CLCB-m and Global-CLCB-m variants, which are less biased towards exploration. Specifically, following Wang and Chen (2018), CC-CLCB-m and Global-CLCB-m utilize uncertain terms defined as ± e = √ ln t∕2T ± e (instead of ± e = √ 3 ln t∕2T ± e ). (ii) As CMAB heuristics, we involve the well-established -greedy, pure exploitation (PE), and Combinatorial Thompson Sampling (CTS) (Wang and Chen 2018). As for -greedy, we consider both a fixed exploration rate, set to 0.1, and an adaptive exploration rate, set to be proportional to t −1 , at each t. These variants are dubbed EG-fixed and EG, respectively. (iii) As far as non-CMAB baselines, the idea is to set both types of unknown edge weights based on topological affinity of any two vertices' neighborhoods, and then run a Min-CC algorithm [specifically, Pivot (Ailon et al. 2008) in most experiments, and the linear programming approach dubbed LP+R (Charikar et al. 2005) in the experiment in Sect. 6.3] on such an input, employing no weight learning strategy. More precisely, we resort to two well-known topological similarity measures, namely Jaccard index and Adamic-Adar index, to set the positive-type weights: is the set of u's neighbors. The negative-type weights are then derived in such a way that the probability constraint holds, i.e., w − uv = 1 − w + uv . (iv) As a reference method, we consider clustering with the actual (i.e., true) edge weights via a state-of-the-art Min-CC algorithm (i.e., Pivot (Ailon et al. 2008) in most experiments, and LP+R (Charikar et al. 2005) in the experiment in Sect. 6.3). This method is termed Actual-weight.
Unless otherwise specified, all the CMAB methods are assumed to be equipped with the Pivot algorithm (Ailon et al. 2008) as an oracle for Min-CC. Pivot is used as a reference oracle because it is more usable in practice, due to its efficiency, approximation guarantees, and ease of implementation. However, we also carry out an experiment to evaluate the impact of a different oracle, specifically the LP+R algorithm (Ailon et al. 2008) (Sect. 6.3). As LP+R takes (|V| 3 ) time just to build the linear program, this experiment is performed on the smaller datasets only.
Since the chosen Min-CC oracles are randomized algorithms, for every experiment, we perform log 2 |V| independent runs of the selected oracle per CMAB round (setting = 1, c = 1 , cf. Appendix A.1), and take the best solution in terms of Min-CC objective with respect to the current weight estimates. In all the experiments, the number T of CMAB rounds is set to 500, while the number of runs of the Min-CC oracle for every round is set to 10.
Evaluation goals. As this is the first work that investigates Min-CC in a CMAB setting, our experiments are not really intended to assess the superiority of some proposed method(s) over the state of the art. Rather, our main objective here is to provide a comparative evaluation of a variety of CMAB heuristics, approximation algorithms, and heuristic variants of approximation algorithms in the context of CMAB-Min-CC, and derive experimental insights on the peculiarities of the various tested methods. Specifically, the main goals of our experimental evaluation are as follows: • Assess the performance of the CMAB methods (CC-CLCB, EG, EG-fixed, PE, CTS) in terms of f (T) and ren (T) , and compare them to non-CMAB baselines (Adamic-Adar, Jaccard) and the reference Actual-weight method (Sect. 6.1). • Compare the performance of the various CLCB variants (CC-CLCB, CC-CLCB-m, PC+Exp-CLCB, Global-CLCB, Global-CLCB-m) to each other, in terms of f (T) and ren (T) (Sect. 6.2). • Evaluate the impact of varying the Min-CC oracle on the performance of the various CMAB methods (Sect. 6.3). • Evaluate the efficiency of all the selected methods (Sect. 6.4). • Perform a statistical significance analysis of the reported results (Sect. 6.5).
Further characterization. We also analyze the number of output clusters and the stability of the performance over the various rounds and runs of the tested methods (Sect. 7). This analysis is intended not really as a performance assessment, rather as an additional useful insight to better characterize the tested methods.
Implementation and testing environment. All the tested methods are implemented in Python 3.8, with some of them using external libraries. In particular, LP+R adopts the PuLP library for linear programming, 2 Adamic-Adar and Jaccard use, respectively, the NetworkX and python-igraph libraries to compute the topological similarity scores. 3 All the experiments are carried out on the Cresco6 cluster, 4 a high-performance computing system running Linux Centos 7.4, and consisting of 434 nodes, where each one is equipped with two Intel(R) Xeon(R) Platinum 8160 CPU @2.10GHz x24 processor and 192GB ram. (Table 3). As a first general remark, the non-CMAB baselines (Adamic-Adar, Jaccard) achieve the worst performance in all the datasets and weight settings, while Actual-weight is always the best method, with only a couple of exceptions. This was expected, as the non-CMAB baselines employ no strategy to learn the true weights, whereas Actual-weight operates on the true weights. Importantly, in most cases, the CMAB methods (CC-CLCB, EG, EGfixed, PE, CTS) perform comparably or close to Actual-weight. The loss values of all the CMAB methods follow a decreasing trend over the rounds, as testified by the 1 3 A CMAB approach to correlation clustering negative growth rates (and better shown in Fig. 2, Sect. 7). This was expected, since the CMAB algorithms learn how to cluster the vertices over time. In general, all the CMAB algorithms converge to solutions with lower growth rate in the PC-wd setting than the R-wd setting. Also, with the exception of HighlandTribes, the difference of the best loss scores is higher in the PC-wd setting than the the R-wd one. This complies with the fact that the probability constraint leads to an easier Min-CC clustering task.

Quality of the clusterings
Focusing on the CMAB methods, the best performance corresponds to PE. This can be explained since (i) the Min-CC oracle therein used (i.e., Pivot) is a randomized algorithm, thus, although with a pure-exploitation bandit strategy, it results in some implicit exploration; (ii) due to the peculiarity of our problem, each super arm admits a feedback from half the total number of arms, thus a bandit strategy with minimal exploration would likely perform better in the long run. CC-CLCB exhibits very good performance: it is comparable or close to the best methods in most datasets and weight settings, achieving maximum and average difference in loss with respect to the best performer(s) over all the configurations of 0.038 and 0.014, respectively.
Quality of the learned edge weights (Table 4). A first general observation is that the weight estimates of all the CMAB methods improve as the rounds progress, and the relative error goes down over time, leading to a negative growth rate. This is consistent with the clustering improvement by increasing rounds observed from Table 3. As expected, the non-CMAB baselines yield the highest error values, while Actual-weight clearly achieves zero error everywhere. Among the CMAB methods, EG and EG-fixed yield the most accurate estimates in the C-wd weight setting. In the R-wd and PC-wd settings, EG-fixed is (comparable to) the best performer on the smaller datasets (Karate, Dolphins, Zebra, HighlandTribes, Contiguous-USA), while on the bigger datasets, CTS is (comparable to) the best method. CC-CLCB achieves the best performance in three datasets (Zebra, Last.fm, PrimarySchool) for R-wd and PC-wd distributions. Importantly, for some methods, the good/bad performance on the weight-estimation task does not necessarily translate into an equally good/bad performance in the clustering results discussed above. This has a twofold motivation: (1) the underlying oracle is not an exact algorithm for Min-CC, thus it may happen that clustering with weight estimates lead to clusterings that are of better quality when evaluated in terms of the actual weights, and (2) CMAB methods like CC-CLCB adopt exploration strategies that perturb the current weight estimates before giving them to the oracle, which corresponds to perform clustering with weights that are actually different from the estimated ones. (Table 5). In general, we observe that all the CC-CLCB variants perform rather closely to each other in all the configurations. Deepening the analysis, CC-CLCB-m and Global-CLCB-m are the best methods (in all the datasets but one) in the R-wd weight setting. Conversely, in the PC-wd and C-wd Table 3 Performance in terms of f (T) (Eq. 12) and (for the CMAB methods) gr % f (Eq. 14)       All the CMAB methods are equipped with Pivot as a Min-CC oracle. Per-dataset best scores among the CMAB methods are reported in bold settings, the best method is PC+Exp-CLCB in most cases: specifically, it is the best performer on all the datasets (though on par with CC-CLCB-m and Global-CLCB-m on the larger ones) in PC-wd , and on three out of five datasets in C-wd . Also, Global-CLCB performs better in the PC-wd and C-wd settings than the R-wd one. These findings comply with the design principles of the CC-CLCB variants that favor the fulfilment of the probability constraint on the Min-CC instances to be processed by the underlying oracle (cf. Sect. 4.2), which clearly benefit from those settings like PC-wd and C-wd where the probability constraint actually holds.

Quality of the clusterings
Interestingly, CLCB and Global-CLCB achieve the same results in all the configurations (and the same holds for CLCB-m vs. Global-CLCB-m). This can be explained as CC-CLCB and Global-CLCB (and the same is for CLCB-m and Global-CLCB-m) compute adjusted weight estimates (Line 4 in Algorithms 1 and 3) so that the ordering between the positive-type weight estimate and the negativetype weight estimate is likely to be the same for both algorithms. In other words, despite CC-CLCB and Global-CLCB may compute different actual values of those weight estimates, the two algorithms are mostly consistent in yielding a positivetype weight estimate that is higher/lower than the negative-type one. This leads to very similar clusterings yielded by the Pivot Min-CC oracle in every run and every round of both CC-CLCB and Global-CLCB, as Pivot places any two vertices in the same clustering by solely checking whether the positive-type weight on the edge between those vertices is higher than the negative-type one, without looking at the specific values of those weights.
Quality of the learned edge weights (Table 6). In terms of edge weights, the picture in the PC-wd and C-wd settings is roughly consistent with what observed in terms of clustering quality. Some differences arise in the R-wd setting, where, unlike the clustering quality criterion, CC-CLCB-m and Global-CLCB-m are the best methods only in a few configurations (they are outperformed mostly by PC+Exp-CLCB).
As another interesting observation, here are some differences between CLCB and Global-CLCB, and between CLCB-m and Global-CLCB-m. This confirms the argument discussed above, i.e., that those methods achieve the same clustering results even though they can learn different weight estimates. Table 7 shows the performance of all the competing methods when using LP+R as a Min-CC oracle (instead of Pivot). Here, we also show the relative difference (in percentage) between the score with LP+R and the corresponding score with Pivot. Thus, the more positive (resp. negative) such a relative difference, the worse (resp. better) the performance of using LP+R than using Pivot. The general trend in terms of clustering quality (Table 7a) is that LP+R leads to an increase (resp. decrease) in performance in the R-wd and PC-wd settings (resp. C-wd setting). This is likely due to the fact that R-wd and PC-wd are more challenging than C-wd , as it is well-known that Min-CC is easier on complete-graph input instances (Charikar et al. 2005). In fact, LP+R provides approximation guarantees at each round, A CMAB approach to correlation clustering Best scores per dataset are reported in bold Best scores per dataset are reported in bold regardless of the weights in input to the oracle. Conversely, Pivot provides quality guarantees if the probability constraint holds on the given input, but this is not the case in a generic round t. The C-wd setting (i.e., complete graph with probability constraint) corresponds to the most favorable scenario for Pivot to provide approximation guarantees.

Varying the Min-CC oracle
In terms of learned edge weights (Table 7b) the advantage of using LP+R is less evident. A reason might lie in the different random choices of the two algorithms (i.e., choosing the node around which a cluster is being built in Pivot, and integer rounding in LP+R): the random choices of Pivot likely lead to more exploration, hence a better chance to discover weights close to the true ones. Table 8 shows the runtimes of the tested methods on the larger datasets, averaged over the various runs and over the R-wd and PC-wd weight settings. Despite all the CMAB methods are roughly comparable with each other, CTS is however the slowest method, as it involves additional sampling operations with respect to the other ones.

Efficiency
The CMAB methods take seconds on the smaller Last.fm and PrimarySchool datasets, around one hour on ProsperLoans, and up to 3-5 h on the largest datasets, i.e., Wikipedia and DBLP. In general, however, we can conclude that all the CMAB methods are rather efficient. Even the highest runtimes on Wikipedia and DBLP are not worrying, considering that such datasets have around 10 M edges, and, more importantly, that the reported runtimes are cumulative of all the 500 CMAB rounds. In fact, the highest per-round runtime of a CMAB method is always-at worstcomparable to the runtime of Actual-weight, which performs Min-CC clustering just once. In most cases, it is even less, likely because the time of the round-independent steps is amortized over the various rounds.
Further results are shown in Table 9 and include the use of both oracles for the smallest datasets in our collection, according to all weight settings. As it can be noted in the table, the above qualitative remarks on the relative differences between the methods are equally evident.

Statistical significance
Here we present a further step of analysis to assess the statistical significance of the performance of the CMAB-Min-CC methods CC-CLCB, Global-CLCB, PC+Exp-CLCB, CTS, and EG, when equipped with Pivot as a Min-CC oracle.
To this purpose, we resorted to a Friedman's test. We designed it by considering all the methods, all the datasets, and all the weight settings in one single test. More specifically, we organized the data into a matrix with 5 columns (treatments) corresponding to the methods, and 250 rows (blocks) corresponding to the number of combinations between runs (10), datasets (10), and weight settings (R-wd and PC-wd available for all 10 datasets and C-wd available for 5 datasets), where each

3
A CMAB approach to correlation clustering   cell measures the average expected normalized cumulative loss (i.e., f (T) (Eq. 12) obtained by a particular method at the last round ( T = 500 ) on a particular configuration of run, dataset, weight setting. (Note that each run corresponds to a different fixed seed for handling computation randomness.) Our Friedman's test results indicate that there are significant differences-2 (4) = 158.1 , p-value < 2.2E-16-in the average expected normalized cumulative losses in run/dataset/weight-setting blocks based on the methods, i.e., the methods have different effect on average expected normalized cumulative loss obtained on each run/dataset/weight-setting combination.
We also computed the Kendall's coefficient of concordance (Kendall's W) for measuring the effect size (degree of difference) for Friedman's test. From the result above, Kendall's W is 0.304, which indicates an effect size at the boundary of the "small" and the "moderate" effects based on Cohen's interpretation guidelines (Tomczak and Tomczak 2014).
Since Friedman's test is an omnibus test statistic, in order to know which methods are significantly different, we carried out Nemenyi's all-pairs test as a post-hoc test for pairwise comparisons of methods, where the Bonferroni correction was used to adjust the p-values for multiple hypothesis testing at a 5% cut-off. Results show p-values in the range (10 −14 , 10 −4 ) (i.e., significant differences) for all pairs but CC-CLCB vs. Global-CLCB. It should be noted that the lack of statistical difference between CC-CLCB and Global-CLCB is not surprising: in fact, in Sect. 6.2, we already noticed and explained why CC-CLCB and Global-CLCB achieve the same performance in terms of f (T) , in all the datasets and weight settings.

Additional experiments
Clustering size. Tables 10, 11 and 12 show the number of clusters yielded by the tested methods, averaged over all the CMAB rounds and runs of the Min-CC oracle. For the CMAB methods, we also provide the difference (in percentage) between the number of clusters at the final round and the number of clusters at the first round (both averaged over the runs of the Min-CC oracle). By inspecting such results, we notice that in the cases of R-wd and PC-wd weight distributions, the use of LP+R oracle generally corresponds to less clusters compared to the Pivot oracle; some exceptions are observed for small-world datasets (e.g., Zebra, HighlandTribes) by most methods, especially with R-wd . The PC-wd mostly lead to less clusters than R-wd . Conversely, with the C-wd setting, the LP+R oracle leads consistently to a much larger number of clusters (at least double in many cases) than Pivot.
Moreover, we observe that the non-CMAB methods (i.e., Adamic-Adar and Jaccard) produce a relatively small number of clusters as long as the characteristics of the input dataset are those typical of a small-world network; for instance, in Last. fm and PrimarySchool, the clustering size is about 90% and 97% of the vertex set size, respectively. This is not surprising, as the adopted approaches of (CMAB) correlation clustering are not designed to optimize some criterion function defined on topological properties at meso-and macroscopic level (e.g., modularity), which results in a need for refining the clustering solutions through a cluster aggregation stage. Performance over the CMAB rounds. Figures 2 and 3 illustrate the performancein terms of average expected normalized cumulative Min-CC loss f (t) (Eq. 12)-of the tested methods over the various CMAB rounds t. As expected, the CMAB methods mostly exhibit a decreasing trend, with a decrease in loss scores that is more consistent in the first rounds, until it gets progressively vanishing as the rounds go on, meaning convergence in the weight learning process (and, thus, in the clustering quality too). A few exceptions to this strictly monotonically decreasing trend arise (e.g., with some CLCB-based methods in Last.fm PC-wd , ProsperLoans R-wd , ProsperLoans PC-wd , DBLP PC-wd ). However, the minimum of the f (t) function in all those exceptional cases is only slightly less than the value of f (t) at convergence (i.e., the difference is less than 0.004). Thus, remembering also that f (t) is an average of all the losses computed up to round t, we can conclude that those non-monotonic trends actually correspond to the normal fluctuations of the loss values in the first CMAB rounds, when there cannot be enough knowledge on the actual edge weights to get stable clustering quality.  Stability over the CMAB rounds. Tables 13,14,15,16,17 and 18 show the coefficient of variation (i.e., ratio between standard deviation and mean) of the scores of the tested methods in terms of f (T) (Eq. 12) and ren (T) (Eq. 13) criteria, respectively. It can be observed that the coefficients of variations of f (T) are typically vary small for all the methods: they mostly range from [10 −3 , 10 −2 ] in the smaller datasets, and from [10 −6 , 10 −4 ] in the larger datasets, with only a very few exceptions. In terms of ren (T) , the coefficients of variations in the R-wd and PC-wd weight settings are higher (especially in the smaller datasets), but they still remain rather small. In the C-wd setting, they are instead mostly equal or very close to zero. Therefore, as a general conclusion, we can state that the various tested methods exhibit high stability over the CMAB rounds and runs of the Min-CC oracle.

Conclusion
We have focused on the novel setting of correlation clustering where edge weights are unknown, and they need be discovered while performing multiple rounds of clustering. We have provided a Combinatorial Multi-Armed Bandit (CMAB) framework for  In the future, we plan to investigate the theoretical properties of our heuristics, advanced CMAB settings, and clustering problems other than correlation clustering.

Fig. 3
Performance in terms of f (t) (Eq. 12), over a number t = 1, … , 400 of rounds (iterations), for the larger datasets, and R-wd and PC-wd weight distributions oracle within CMAB correlation-clustering algorithms (where need not to be necessarily a constant; for instance, in the algorithms in Charikar et al. (2005); Demaine et al. (2006), is O(log |V|) ). We focus on the Min-CC context, thus on the notion of Min-CC-( , )-approximation oracle (Definition 2). A similar reasoning holds for Max-CC as well.
If > 1 is the expected approximation factor of a (randomized) Min-CC algorithm, it means that, for every Min-CC instance I, it holds that where C is the clustering output by the algorithm and C * I is the optimal clustering for I. To have a Min-CC-( , )-approximation oracle, we need to convert these guarantees-which hold in expectation, but not necessarily in every run-into guarantees that hold in every run with a certain probability at least (with typically in the order of 1 − (poly(|V|)) −1 ).
More specifically, it is well-known that an algorithm with guarantees in expectation can be converted into an algorithm with (1 + ) guarantees with high probability, for any > 0 , by exploiting Markov's inequality (Gupta 2005). In particular, running the algorithm with guarantees in expectation k = c ⋅ log 1+ (|V|) times (for any c > 0 ) and keeping the best output among all of those trials, it yields a clustering with (1 + ) quality guarantee with probability at least 1 − |V| −c . The aforementioned procedure corresponds to a Min-CC-( ( ), (c))-approximation oracle where ( ) = (1 + ) and (c) = 1 − |V| −c . Note that, given that is arbitrary, it means that there exist various Min-CC-( ( ), (c))-approximation oracles, each one corresponding to specific values of and c. Also, the worse the required approximation guarantee (i.e., higher , thus higher ( ) ), the higher the (c) success probability of the oracle (due to the relationship k = c log 1+ |V| between and c). By suitably choosing > 0 and c > 0 (so as to define the number k = c log 1+ |V| of trials), one can get the desired ( ), (c) values.

A.2: Algorithms for CMAB-Max-CC
In this section, we present algorithms for CMAB-Max-CC (Problem 4). When basic Chen et al.'s CMAB framework (Chen et al. 2016) is contextualized to a specific maximization problem, a major algorithmic contribution typically consists in adapting the so-called Combinatorial Upper Confidence Bound (CUCB) method to the context at hand, and showing how its theoretical guarantees are maintained/change while carrying out such an adaptation (Chen et al. 2018b;Liu et al. 2021;Mandaglio and Tagarelli 2019a, b;Talebi et al. 2017;Vaswani and Lakshmanan 2015;Wu et al. 2019). Here we follow that bulk of literature: we focus on the customization of  CUCB to the Max-CC context, and show that the theoretical guarantees of CUCB carry over rather easily to this customization. The CC-CUCB algorithm. CUCB (Chen et al. 2016) is an extension of the UCB1 method for MAB (Auer et al. 2002). It keeps, along with the estimate of the means of the base-arm random variables, confidence intervals within which the true means fall with overwhelming probability, and plays superarms based on the upper bound of those intervals.  Our customization of CUCB to Max-CC is termed CC-CUCB and outlined as Algorithm 4. CC-CUCB keeps track of the mean estimates ̂ = {̂ + ,̂ − } (Eq. 4), and of the number T + e (resp. T − e ) of times a sample from W + e (resp. W − e ) random variable has been observed until the current round, for all e ∈ E . At the beginning, ∀e ∈ E ∶ T + e = T − e = 0 , and ̂ are initialized, e.g., randomly or based on prior domain knowledge (Line 1). In every round t, the current mean estimates are adjusted with a term ± e (defined based on Chernoff-Hoeffding bounds (Auer et al. 2002;Chen et al. 2016)), so as to foster, to some extent, the exploration of less often played base arms (Line 3). This leads to the adjusted means {̃ + e ,̃ − e } e∈E (Line 4), which are interpreted as positive-type and negative-type edge weights of a correlation-clustering instance, respectively, and are fed as input (along with G) to an oracle O that computes a Max-CC solution C t (Line 5). C t is used as a feedback to update the mean estimates (Sect. 3, Table 1). Specifically, the weight of each intra-cluster (resp. inter-cluster) edge e is interpreted as a sample of W + e (resp. W − e ), and is used to update ̂+ e , T + e (resp. ̂− e , T − e ). ̂+ e and ̂− e are updated so as to be equal to the average of the samples from W + e and W − e observed so far, respectively (Lines 6-11).

A.2.1: Regret analysis of CC-CUCB
As correlation clustering is NP-hard, it is unlikely that CC-CUCB can be equipped with an exact oracle O for Max-CC running in polynomial time. Hence, in analyzing the theoretical guarantees of CC-CUCB, we consider the case where O is a Max-CC-( , )-approximation oracle: Definition 5 (Max-CC-( , )-approximation oracle) Given a Max-CC instance I = ⟨(V, E), {(w + e , w − e )} e∈E ⟩ , let C * I be the optimal solution to I. Given , ∈ (0, 1] , an  When used as an oracle O within CC-CUCB, the condition in Definition 5 to recognize O as a Max-CC-( , )-approximation oracle needs to hold on every Max-CC instance that is given as input to O at each round. Hence, the condition has to hold on the mean estimates, not the true ones. Existing algorithms for Max-CC achieving constant-factor guarantees in expectation (Charikar et al. 2005;Swamy 2004) can be employed as Max-CC-( , )-approximation oracles. We show the details of this in Appendix A.1.
When approximation oracles are used, the quality of a CMAB algorithm is typically measured in terms of the ( , )-approximation regret metric, which is defined as the fraction of the expected reward of playing the best superarm in every round, minus the sum of expected reward of the superarms played by the algorithm in all the rounds (Chen et al. 2016). In the CMAB-Max-CC setting, this metric becomes:  Definition 6 (Max-CC-( , )-approximation regret) Let C * I be the clustering maximizing ā (⋅) (Eq. 7) on a CMAB-Max-CC instance I (w.r.t. the true means, Eq. (3)), and let {C t } T t=1 be the clusterings output by an algorithm A run on I. For any , ∈ (0, 1] , the Max-CC-( , )-approximation regret of A is Chen et al. (2016); Wang and Chen (2017) show that, if the (expected) reward function satisfies certain properties (see Appendix A.2.2), the CUCB method indeed achieves a regret at most in the order of O(log T) . Here we show that this guarantee carries over to our CC-CUCB: The guarantees in Theorem 4 are on the regret computed based on the true means, despite the guarantees of the oracle are on the ̂ estimates. This is possible thanks to the way how CC-CUCB computes the mean estimates and the properties of the reward function required by the theorem. More details on this are reported in Appendix A.2.2.

A.2.2: Proof of Theorem 4
Definition 7 (Base arms induced by a clustering (CMAB-Max-CC)) Let S be the set of all base arms and C a clustering, we denote with S C the set of base arms corresponding to the clustering-compliant replica set induced by the clustering C , i.e., S C = { + uv | e = (u, v) ∈ e in } ∪ { − uv | e = (u, v) ∈ e out }.
Definition 9 (Gap (CMAB-Max-CC) (Wang and Chen 2017)) Given a superarm C , the gap C of C is defined as C = max{0,ā (C * I ) −ā (C))} , where C * I is the optimal solution of the Max-CC instance at hand. Moreover, for a base arm p ∈ S , we define p min = min C|p∈S C , C >0 C and p max = max C|p∈S C , C >0 C . As a convention, if for