Super-Reparametrizations of Weighted CSPs: Properties and Optimization Perspective

The notion of reparametrizations of Weighted CSPs (WCSPs) (also known as equivalence-preserving transformations of WCSPs) is well-known and finds its use in many algorithms to approximate or bound the optimal WCSP value. In contrast, the concept of super-reparametrizations (which are changes of the weights that keep or increase the WCSP objective for every assignment) was already proposed but never studied in detail. To fill this gap, we present a number of theoretical properties of super-reparametrizations and compare them to those of reparametrizations. Furthermore, we propose a framework for computing upper bounds on the optimal value of the (maximization version of) WCSP using super-reparametrizations. We show that it is in principle possible to employ arbitrary (under some technical conditions) constraint propagation rules to improve the bound. For arc consistency in particular, the method reduces to the known Virtual AC (VAC) algorithm. We implemented the method for singleton arc consistency (SAC) and compared it to other strong local consistencies in WCSPs on a public benchmark. The results show that the bounds obtained from SAC are superior for many instance groups.


Introduction
In the weighted constraint satisfaction problem (WCSP) we maximize the sum of (weight) functions over many discrete variables, where each function depends only on a (usually small) subset of the variables.A popular approach to tackle this NP-hard combinatorial optimization problem is via its linear programming (LP) relaxation [51,61,59,58,50].The dual of this LP relaxation [61,50,16] can be interpreted as follows.Feasible dual solutions correspond to reparametrizations (also known as equivalence-preserving transformations [16]) of the WCSP objective function, which are obtained by moving weights between weight functions so that the WCSP objective function is preserved.The dual LP relaxation then seeks to find such a reparametrization of the initial WCSP that minimizes an upper bound on the WCSP objective value by reparametrizations.For some instances, the minimal upper bound is equal to the maximal value of the WCSP objective (i.e., the LP relaxation is tight) but, in general, there is a gap between them.The precise form of the dual LP differs slightly from author to author.
For larger instances, solving the LP relaxation to global optimality is too costly.Therefore, the upper bound is usually minimized suboptimally by performing reparametrizations only locally.Stopping points of these suboptimal methods are usually characterized by various levels of local consistency of the CSP formed by the active tuples (i.e., the tuples with the maximum weight in each weight function individually) of the reparametrized WCSP.This is consistent with the fact that a necessary (but not sufficient) condition for global optimality of the dual LP relaxation is that the active-tuple CSP has a non-empty local consistency closure.The level of local consistency at optimum depends on the space of allowed reparametrizations: if weights can move only between pairs of weight functions of which one is unary or nullary, it is arc consistency (AC); if weights can move between two weight functions of any arity, it is pairwise consistency (PWC).These suboptimal methods can be divided into two main classes.
The first class, popular in computer vision and machine learning, is known as convex message passing [38,31,56,57,61,62,39].These methods repeat a simple local operation and can be seen as blockcoordinate descent with exact updates satisfying the so-called relative interior rule [64].At fixed points, the active-tuple CSP has non-empty AC (or PWC) consistency closure.These methods yield good upper bounds but are too slow to be applied in each node of a branch-and-bound search.
The second class has been called soft local consistency methods in constraint programming [16], due to its similarity to local consistencies in the ordinary CSP.One type of these methods moves only integer weights between weight functions (assuming all initial weights are integer) and is efficient enough to be maintained during search.Its most advanced representant is existential directional arc consistency (EDAC) algorithm [21].The other type allows moving fractional weights, which can lead to better bounds but is more costly, hence usually not suitable to be applied during search.Its representants are the virtual arc consistency (VAC) algorithm [15,16] and the very similar Augmenting DAG algorithm [42,61,60].These methods are based on the following fact: whenever the active-tuple CSP has an empty AC closure, there exists a reparametrization of the WCSP that decreases the upper bound.Thus, each iteration of these algorithms first applies the AC algorithm to the active-tuple CSP and if domain wipe-out occurs, it constructs a dual-improving direction by back-tracking the history of the AC algorithm, and finally reparametrizes the current WCSP by moving along this direction by a suitable step size.The VAC and Augmenting DAG algorithms converge to a non-unique state when the active-tuple CSP has a non-empty AC closure (which is called virtual arc consistency) but are typically faster than convex message passing methods.
In the soft-consistency terminology, global optima of the dual LP relaxation have been called optimally soft arc consistent (OSAC) WCSPs [17,16].In this sense, EDAC and VAC are relaxations of OSAC.But note that OSAC can no longer be considered a local consistency, since no algorithm using only local operations is known to enforce it 1 .
Reparametrizations in general cannot enforce stronger local consistencies of the active-tuple CSP than PWC.This can be seen as follows: if the active-tuple CSP has a non-empty PWC closure but violates some stronger local consistency (hence it is unsatisfiable), there exists no reparametrization that would decrease the upper bound and possibly make the active-tuple CSP satisfy the stronger local consistency.The only way to achieve stronger local consistencies (such as k-consistencies) of the active-tuple CSP by reparametrizations is to introduce new weight functions (of possibly higher arities) and then move weights between these new weight functions and the existing weight functions.This allows constructing a hierarchy of progressively tighter LP relaxations of the WCSP [62,53,45,7], including the Sherali-Adams hierarchy [54].
In this paper, we study a different LP-based approach, namely an LP formulation of the WCSP, which was proposed in [41] but never pursued later.It differs from the above well-known LP relaxation and does not belong to the hierarchy of LP relaxations obtained by introducing new weight functions of higher arities.This LP formulation minimizes the same upper bound on the WCSP objective value but this time over super-reparametrizations of the initial WCSP objective function, which are changes of the weights that either preserve or increase the WCSP objective value for every assignment.This LP formulation has an exponential number of inequality constraints (representing super-reparametrizations) and is exact, i.e., its minimal value is always equal to the maximal value of the WCSP objective.
We propose to solve this LP suboptimally by a local search method, which is based on the following key observation: whenever the active-tuple CSP is unsatisfiable, there exists a super-reparametrization (but possibly no reparametrization) that decreases the upper bound.The direction of this superreparametrization is a certificate of unsatisfiability of the active-tuple CSP, which can be constructed from the history of the CSP solver.Note that this approach strictly generalizes the VAC algorithm: if the active-tuple CSP has a non-empty AC closure but is unsatisfiable, the VAC algorithm is stuck (because no reparametrization can decrease the upper bound) but our algorithm can decrease the bound by a super-reparametrization.The cost for this greater generality is that super-reparametrizations may preserve neither the WCSP objective value for some assignments nor the set of optimal assignments, but they can nevertheless provide valid, and possibly tighter, upper bounds on the WCSP optimal value.
After formulating this general framework, we focus on the case when the unsatisfiability of the activetuple CSP is proved by local consistencies stronger than AC/PWC.In particular we use singleton arc consistency (SAC), which is interesting because it does not have bounded support [10] and therefore it would be difficult to achieve by introducing new weight functions of higher arity.We show how to construct a certificate of unsatisfiability of a CSP from the history of the SAC algorithm.Our algorithm then interleaves AC and SAC: we always keep decreasing the upper bound by reparametrizations until the active-tuple CSP has non-empty AC closure, and only then decrease the bound by a superreparametrization if the SAC closure of active-tuple CSP is found empty.In experiments we show that on many WCSP instances, this algorithm yields better bounds than state-of-the-art soft local consistency methods in reasonable runtime.Note, we report only the achieved upper bounds but do not use them in branch-and-bound search, which would be beyond the scope of our paper.
To the best of our knowledge, super-reparametrizations have not been utilized or studied except for [41] and [52].In [41], super-reparametrizations were used to obtain tighter bounds using a specialized cycle-repairing algorithm and were identified in [52] as a property satisfied by all formulations of the linear programming relaxations based on reparametrizations.However, [52] focuses almost solely on the relation between different formulations of reparametrizations, instead of super-reparametrizations.To fill in this gap, we theoretically analyze the associated optimization problem and also the properties of super-reparametrizations.
Compared to the previous version of this paper [27], we improved the current paper in the following ways: • Most importantly, we include a study on the theoretical properties of super-reparametrizations and compare them to those of reparametrizations ( §5).
• Although our implementation remains limited to WCSPs of arity 2, we present all of our theoretical results for WCSPs of any arity.
• We include a geometric interpretation that provides intuitive insights and thus simplifies understanding of our method ( §4.1.1).
• In addition to making our code publicly available, we add more information on implementation details to improve reproducibility ( §4.4).
• We also analyze the cone of non-negative weighted CSPs and prove that it is dual to the marginal polytope ( §3.2).
• Unsurprisingly, we show that some decision problems connected to our approach and superreparametrizations are NP-hard (in §6).
Structure We begin in §2 by formally defining the Weighted CSP, classical (crisp) CSP, and introducing the notation that will be used throughout the paper.Then, in §3, we formally define the optimization problem of minimizing an upper bound over reparametrizations and/or super-reparametrizations where we also state the sufficient and necessary optimality conditions.Next, §4 proposes a practical approach for approximate minimization of the upper bound over super-reparametrizations using constraint propagation.We also give experimental results comparing our approach with existing soft local consistencies.Additional properties of the underlying active-tuple CSPs (see definition later) and the sets of optimal (or also non-optimal) super-reparametrizations are given in §5.§6 presents the hardness results.We provide a detailed example demonstrating EDAC, VAC, and our proposed approach with SAC in Appendix A.

Notation
Let V be a finite set of variables and D a finite domain of each variable.An assignment x ∈ D V assigns2 a value x i ∈ D to each variable i ∈ V .Let C ⊆ 2 V be a set of non-empty scopes, i.e., (V, C) can be seen as an undirected hypergraph.The triplet (D, V, C) defines the structure of a (weighted) CSP and will be fixed throughout the paper.By we denote the set of tuples, partitioned into sets T S , S ∈ C. We say that an assignment x ∈ D V uses a tuple t = (S, k) ∈ T if x[S] = k where x[S] denotes the restriction of x onto the set S ⊆ V , i.e., for S = {i 1 , ..., i |S| } we have x[S] = (x i1 , ..., x i |S| ) (where the order of the components is defined by the total order on S inherited from some arbitrary fixed total order on V ).Each assignment x ∈ D V uses exactly one tuple from each T S .An instance of the constraint satisfaction problem (CSP) is defined by the quadruple (D, V, C, A) where A ⊆ T is the set of allowed tuples (while the tuples T − A are forbidden).As the CSP structure (D, V, C) will be always the same, we will refer to the CSP instance only as A (in other words, in the sequel we identify CSP instances with subsets of T ).An assignment x ∈ D V is a solution to a CSP A ⊆ T if it uses only allowed tuples, i.e., (S, x[S]) ∈ A for all S ∈ C. The set of all solutions to the CSP will be denoted by SOL(A) ⊆ D V .The CSP is satisfiable if SOL(A) = ∅, otherwise it is unsatisfiable.
The weighted constraint satisfaction problem (WCSP)3 seeks to find an assignment x ∈ D V that maximizes the function where f S : D S → R, S ∈ C, are given weight functions.All the weights (i.e., the values of the weight functions) together can be seen as a vector f ∈ R T , such that for t = (S, k) ∈ T we have The WCSP instance is defined by the quadruple (D, V, C, f ).However, as the structure (D, V, C) will be always the same, we will refer to WCSP instances only as f (in other words, we identify WCSP instances with vectors from R T ).Example 1.For example, if V = {1, 2, 3, 4}, C = {{1}, {2}, {2, 3}, {1, 4}, {2, 3, 4}}, and D = {a, b}, then we want to maximize the expression We have, e.g., Remark 1.In some formalisms [16,45], the objective (2) is to be minimized.For our purposes, these settings are equivalent and the results for minimization problems are analogous as one can invert the sign of all weights and maximize instead.Next, some papers consider only non-negative weights and the empty (nullary) scope ∅ ∈ C whose weight f ∅ constitutes a bound on the WCSP optimal value [16,45].However, we will later need both positive and negative weights in a WCSP, so we require ∅ / ∈ C to simplify notations (also, with both positive and negative weights, f ∅ would not yield a bound on the optimal value).
We will use another notation for the WCSP objective, which is common in machine learning, see, e.g., [59, §3].We define an indicator map φ : where • denotes the Iverson bracket, which equals 1 if the logical expression in the bracket is true and 0 if it is false.The WCSP objective (2) can now be written as the dot product This makes explicit that the WCSP objective is linear in the weight vector f .The WCSP optimal value is max where Note that M is defined only by the structure (D, V, C).

Bounding the WCSP Optimal Value
We define the function B : R T → R by This is a convex piecewise-affine function.For f ∈ R T , we call a tuple t = (S, k) ∈ T active4 if The set of all tuples that are active for f is denoted5 by A * (f ) ⊆ T .Note, A * (f ) ⊆ T can be interpreted as a CSP.

Theorem 1 ([61]
).For every WCSP f ∈ R T and every assignment x ∈ D V we have: Proof.Statement (a) can be checked by comparing expressions ( 2) and ( 7) term by term.Statement (b) says that B(f ) = f, φ(x) if and only if (S, x[S]) ∈ A * (f ) for all S ∈ C.This is again straightforward from ( 2) and (7).
Theorem 1 says that B(f ) is an upper bound on the WCSP optimal value.Moreover, it shows that B(f ) = f, φ(x) implies that x is a maximizer of the WCSP objective (2).

Minimal Upper Bound over Reparametrizations
We say that a WCSP f ∈ R T is reparametrization of a WCSP g ∈ R T (also known as an equivalencepreserving transformation of g) [38,59,51,61,62,50,17,16,57] if That is, f − g ∈ M ⊥ where is the orthogonal space [65, Chapter 1] of the set (6).Here, 'd' stands for 'direction' but note that any d ∈ M ⊥ , as a vector from R T , can be also seen as a standalone WCSP.The set M ⊥ is a subspace of R T , consisting of all WCSPs that have zero objective value for all assignments.Although M ⊥ is defined by an Clearly, the binary relation 'is a reparametrization of' (on the set of WCSPs with a fixed structure) is reflexive, transitive and symmetric, hence an equivalence.Given a WCSP g ∈ R T , it is a natural idea to minimize the upper bound on its optimal value by reparametrizations: By introducing auxiliary variables (as in Footnote 4), this problem can be transformed to a linear program, which is the dual LP relaxation of the WCSP g [51,61,62,50].Every f feasible for (13) satisfies i.e., B(f ) is an upper bound on the optimal value max x g, φ(x) of WCSP g.If inequality (14) holds with equality for some x, then f is optimal for (13) and the LP relaxation is tight.Necessary and sufficient conditions for optimality can be obtained from complementary slackness, see [61,62,50].Problem (13) has been widely studied [51,61,17,16,58,50,7] and many approaches for its (approximate) large-scale optimization have been proposed, typically based on block-coordinate descent [31,56,57,61,38,41,52,53] or constraint propagation [16,42,61,14,45].If f is optimal for (13), then the CSP A * (f ) has a non-empty pairwise-consistency (PWC) closure (for binary WCSPs, PWC reduces to arc consistency) [62].We conjecture that PWC is in general the strongest level of local consistency of A * (f ) that can be achieved by reparametrizations without enlarging the WCSP structure (i.e., without introducing new weight functions).
We remark that some approaches [17,16] achieve only (generalized) arc consistency rather than PWC because they optimize over a subset of all possible reparametrizations corresponding to a subspace of M ⊥ .In this case, WCSPs f optimal for (13) have been called optimally soft arc consistent (OSAC).

Minimal Upper Bound over Super-Reparametrizations
We say that a WCSP f ∈ R T is a super-reparametrization 7 That is, f − g ∈ M * where is the dual cone [65, Chapter 1] to the set (6).It is a polyhedral convex cone, consisting of the WCSPs that have nonnegative objective value for all assignments.This cone contains a line because M ⊥ ⊆ M * and the subspace M ⊥ is non-trivial (assuming |V | > 1).Precisely, we have The set of all super-reparametrizations of f is the translated cone The binary relation 'is a super-reparametrization of' (on the set of WCSPs with a fixed structure) induced by the convex cone M * is reflexive and transitive, hence a preorder.It is not antisymmetric: f −g ∈ M * and g−f ∈ M * does not imply f = g but merely f −g ∈ M ⊥ , i.e., that f is a reparametrization of g.This is because the cone M * may contain a line, see [35, §2] and [13, §2.4].
Remark 2. The optimal value (5) of a WCSP f can be also written as where conv denotes the convex hull operator [13].The equality in (17) follows from the well-known fact that a linear function on a polytope attains its maximum in at least one vertex of the polytope [50,Corollary 3.44].The set conv M ⊆ [0, 1] T is known as the marginal polytope and has the central role in approaches to WCSP based on linear programming (see [59,50] and references therein).It is easy to show that where cone denotes the conic hull operator [13] and * the dual cone operator.Thus, (16) can also be seen as the dual cone to the marginal polytope which, to the best of our knowledge, has not been mentioned before.
Following [41], we consider the problem min{ B(f Again, this can be reformulated as a linear program.Every f feasible for (19) (i.e., every superreparametrization of g) satisfies The next theorem characterizes optimal solutions: Theorem 2. Let f be feasible for (19).The following are equivalent: (a) f is optimal for (19).
Proof.Theorem 2 in particular says that the optimal value of ( 19) is equal to the optimal value of WCSP g (this has been observed already in [41,Theorem 1]).As stated in [41], this is not surprising because the complexity of the WCSP is hidden in the exponential set of constraints of (19).Let us remark that for f ∈ g + M * , deciding whether f is optimal for (19) is NP-complete, as shown later in Corollary 3.
Theorem 2 has a simple corollary: Proof.By Theorem 2, A * (g) is satisfiable if and only if (19) attains its optimum at the point f = g, i.e., B(g) ≤ B(f ) for every f ∈ g + M * .

Iterative Method to Improve the Bound by Super-Reparametrizations
In this section, we present an iterative method to suboptimally solve (19).Starting from a feasible solution to (19), every iteration finds a new feasible solution with a lower objective, which by (20) corresponds to decreasing the upper bound on the optimal value of the initial WCSP.

Outline of the Method
Consider a WCSP f feasible for (19), i.e., f ∈ g + M * .By Theorem 2, a necessary (but not sufficient) condition for f to be optimal for (19) In summary, we have the following implications and equivalences: The left-hand equivalence is just the definition of the optimum of ( 19), the right-hand equivalence is Theorem 3, and the top implication follows from Theorem 2. The bottom implication independently follows from transitivity of super-reparametrizations, which says that Suppose for the moment that we have an oracle that, for a given Theorem 3).By transitivity of super-reparametrizations, such f is feasible for (19).This suggests an iterative scheme to improve feasible solutions to (19).We initialize f 0 := g and then for k = 0, 1, 2, . . .repeat the following iteration: Note that transitivity of super-reparametrizations implies f k ∈ f 0 +M * for every k, so every f k is feasible for (19) as expected.An example of a single iteration is shown in Figure 2a and 2b.
This iterative method belongs to the class of local search methods to solve (19): having a current feasible estimate f k , we search for the next estimate f k+1 with a strictly better objective within a neighborhood f k + M * of f k .We can define local optima of (19) with respect to this method to be super-reparametrizations f of g such that A * (f ) is satisfiable.

Properties of the Method
By transitivity of super-reparametrizations, for every k we have which holds with equality if and only if . This shows that the search space of the method may shrink with increasing k, in other words, a larger and larger part of the feasible set f 0 + M * of ( 19) is cut off and becomes forever inaccessible.If, for some k, all (global) optima of ( 19) happen to lie in the cut-off part, the method has lost any chance to find a global optimum.This is illustrated in Figure 3.This has the following consequence.Every f k satisfies In every iteration, the left-hand side of inequality (23) decreases and the right-hand side increases or stays the same due to (22).If both sides meet for some k, the CSP A * (f k ) becomes satisfiable by Theorem 1(b) and the method stops.Monotonic increase of the right-hand side can be seen as 'greediness' of the method: if we could choose f k+1 from the initial feasible set f 0 + M * rather than from its subset f k + M * , the right-hand side could also decrease.Any increase of the right-hand side is undesirable because the bounds B(f k ) in future iterations will never be able to get below it.This is illustrated in Figures 4 and 5. (a) WCSP f 0 , A * (f 0 ) unsatisfiable.Unlike in (13), note that not every optimal assignment for WCSP f is optimal for WCSP g.We will return to this in §5.
If A * (f k ) is unsatisfiable, there are usually many vectors We should choose among them the one that does not cause 'too much' shrinking of the search space and/or increase of the right-hand side of (23).Inclusion (22) holds with equality if and only if f k+1 ∈ f k + M ⊥ , so whenever possible we should choose f k+1 to be a reparametrization (rather than just a super-reparametrization) of f k .Unfortunately, we know of no other useful theoretical results to help us choose f k+1 , so we are left with heuristics.One natural heuristic is to choose f k+1 such that the vector f k+1 − f k is sparse (i.e., has only a small number of non-zero components) and its positive components are small.Unfortunately, this can sometimes be too restrictive because, e.g., vectors from M ⊥ can be dense and their components have unbounded magnitudes.

Employing Constraint Propagation
So far we have assumed we can always decide if CSP A * (f ) is satisfiable.This is unrealistic because the CSP is NP-complete.Yet the approach remains applicable even if we detect unsatisfiability of A * (f ) only sometimes, e.g., using constraint propagation.Then our iteration changes to: In this case, stopping points of the method will be even weaker local minima of ( 19), but they nevertheless might be still non-trivial and useful.
The shrinking of the search space of the iterative method.The figure illustrates the translated cones f i + M * and several contours of the objective B(f ).After the second iteration, all global minima of the original problem (marked in grey) become inaccessible as the right hand side of ( 23) increases.
The fact that f is a super-reparametrization of g can be verified by computing the objective value for each assignment, e.g., for assignment x = (a, a) we have g, φ(x) In the sequel we develop this approach in detail.In particular we show, if A * (f k ) is unsatisfiable, how to find a vector f k+1 ∈ f k + M * satisfying B(f k+1 ) < B(f k ).We will do it in two steps.First (in §4.2), given the CSP A * (f k ) we find a direction d ∈ M * using constraint propagation.This direction is a certificate of unsatisfiability of the CSP A * (f k ) and, at the same time, an improving direction for (19).Second (in §4.3), given d and f k , we find a step size α > 0 such that f k+1 = f k +αd and B(f k ) > B(f k+1 ).An example of such a certificate of unsatisfiability is shown in Figure 2c.

Relation to Existing Approaches
The Augmenting DAG algorithm [42,61] and the VAC algorithm [16] are (up to the precise way of computing certificates d and step sizes α) an example of the described approach, which uses arc consistency to attempt to prove unsatisfiability of A * (f k ).In this favorable case, there exist certificates d ∈ M ⊥ , so we are, in fact, applying local search to (13) rather than (19).For stronger local consistencies, such certificates, in general, do not exist (i.e., inevitably d, φ(x) > 0 for some x).
The algorithm proposed in [41] can be also seen as an example of our approach.It interleaves iterations using arc consistency (in fact, the Augmenting DAG algorithm) and iterations using cycle consistency.
As an alternative to our approach, stronger local consistencies can be achieved by introducing new weight functions (of possibly higher arity) into the WCSP objective (2) and minimizing an upper bound by reparametrizations, as in [53,7,62,63,45].In our particular case, after each update f k+1 = f k + αd we could introduce a new weight function with scope and weights where k ∈ D S .Notice that such an added weight function would not increase the bound (7) since its weights are non-positive due to the fact that it needs to decrease the objective value for some assignments.In this view, our approach can be seen as enforcing stronger local consistencies but omitting these compensatory higher-order weight functions, thus saving memory.
Finally, the described approach can be seen as an example of the primal-dual approach [26] to optimize linear programs using constraint propagation.In detail, [26] proposed to construct the complementary slackness system for a given feasible solution and apply constraint propagation to detect if the system is satisfiable.If it is not satisfiable, this implies the existence of a certificate of unsatisfiability that can be used to improve the current solution.In our particular case, if (19) is formulated as a linear program, then the complementary slackness conditions (expressed in terms of the dual variables) are equivalent to the optimality conditions stated in Theorem 2 expressed as a set of linear equalities with an exponential number of non-negative variables.Applying constraint propagation on this system is in correspondence with constraint propagation on a CSP.

Certificates of Unsatisfiability of CSP
Constraint propagation8 is an iterative algorithm, which in each iteration (executed by a propagator ) infers that some allowed tuples R ⊆ A of a current CSP A ⊆ T can be forbidden without changing its solution set, i.e., SOL(A) = SOL(A − R), and forbids these tuples, i.e., sets A := A − R. The algorithm terminates when it is no longer able to forbid any tuples (in which case the propagator returns R = ∅) or when it becomes explicit that the current CSP is unsatisfiable.The former usually happens when the CSP achieves some local consistency level Φ.The latter happens if A ∩ T S = ∅ for some S ∈ C, which implies unsatisfiability of A because9 every assignment has to use one tuple from each T S .
In this section, we show how to augment constraint propagation so that if it proves a CSP unsatisfiable, it also provides its certificate of unsatisfiability d ∈ M * .This certificate is needed as an improving direction for (19), as was mentioned in §4.1.2.First, in §4.2.1, we introduce a more general concept, deactivating directions.One iteration of constraint propagation constructs an R-deactivating direction for the current CSP A, which certifies that SOL(A) = SOL(A − R).Then, in §4.2.2, we show how to compose the deactivating directions obtained from individual iterations of constraint propagation to a single deactivating direction for the initial CSP.If the initial CSP has been proved unsatisfiable by the propagation, this composed deactivating direction is then its certificate of unsatisfiability.

Deactivating Directions
For fixed A and R, all R-deactivating directions for A form a convex cone.Here, we show one way of constructing a deactivating direction: Then vector d ∈ R T with components is an R-deactivating direction for A.
Proof.Conditions (a) and (b) of Definition 1 are clearly satisfied, so it only remains to show that d ∈ M * .We have where Combining Theorems 4 and 5 yields that for any R ⊆ A with R = ∅, an R-deactivating direction for A exists if and only if SOL(A) = SOL(A − R).Thus, any R-deactivating direction for A is a certificate of the fact that SOL(A) = SOL(A − R).
Unfortunately, vectors d calculated naively by (27) can have many non-zero components, which is undesirable as explained in §4.1.1.However, it is clear from Definition 1 that if A ⊆ A ⊆ T and d is an R-deactivating direction for A , then d is an R-deactivating direction also for A. Moreover, (27) shows that larger sets A give rise to sparser vectors d.This offers us a possibility to obtain a sparser R-deactivating direction for A if we can provide a superset A ⊇ A of the allowed tuples satisfying SOL(A ) = SOL(A − R).
Given A ⊆ T and R ⊆ A, finding a maximal (w.r.t. the partial ordering by inclusion) superset A ⊇ A such that SOL(A ) = SOL(A − R) is closely related to finding a minimal unsatisfiable core 11of an unsatisfiable CSP.While finding a maximal such subset is very likely intractable 12 , for obtaining a 'sparse enough' vector d it suffices to find a 'large enough' such superset A .Such a superset is often cheaply available as a side result of executing the propagator.Namely, we take A = T − P where P is the set of forbidden tuples that were visited during the run of the propagator.Clearly, tuples not visited by the propagator could not be needed to infer SOL(A) = SOL(A − R).Note that P need not be the same for each CSP instance, even for a fixed level of local consistency: for example, if the arc consistency closure of A is empty, then A is unsatisfiable but a domain wipe-out may occur sooner or later depending on A, which affects which tuples needed to be visited.
Let us emphasize that an R-deactivating direction for A need not be always obtained using formula (27), any other method can be used as long as d satisfies Definition 1.We will now give examples of deactivating directions corresponding to some popular constraint propagation rules.In these examples, we assume that our CSP contains all unary constraints (i.e., {i} ∈ C for each i ∈ V ), so that rather than deleting domain values we can forbid tuples of the unary constraints.
If, for some S ∈ C, i ∈ S and k ∈ D, the left-hand statement in (29) is true and the right-hand statement is false, the AC propagator infers SOL(A) = SOL(A − R) where R = {({i}, k)}.To infer this, it suffices to know that the tuples P = { (S, l) | l ∈ D S , l i = k } are all forbidden.An R-deactivating direction d for A can be chosen as in (27) If the left-hand statement in (29) is false and the right-hand statement is true, the AC propagator infers To infer this, it suffices to know that the tuple P = {({i}, k)} is forbidden.In this particular case, rather than using (27) (with A replaced by T − P ), it is better to choose d as Vector (30) satisfies d ∈ M ⊥ , in contrast to vector (27) which satisfies only d ∈ M * .Thus, the update f k+1 = f k + αd is a mere reparametrization, which is desirable as explained in §4.1.1.
We note that reparametrizations considered in the previous paragraphs correspond to soft arc consistency operations extend and project in [16].
Example 4. We now consider cycle consistency as defined in [41]. 14As this local consistency was defined only for binary CSPs, we assume that |S| ≤ 2 for each S ∈ C and denote E = { S ∈ C | |S| = 2 }, so that (V, E) is an undirected graph.Let L be a (polynomially sized) set of cycles in the graph (V, E).A CSP A is cycle consistent w.r.t.L if for each tuple ({i}, k) ∈ A (where i ∈ V and k ∈ D) and each cycle L ∈ L that passes through node i ∈ V , there exists an assignment x with x i = k that uses only allowed tuples in cycle L.It can be shown that the cycle repair procedure in [41] constructs a deactivating direction whenever an inconsistent cycle is found.Moreover, the constructed direction in this case coincides with (27) where A is replaced by T − P for a suitable set P that contains a subset of the forbidden tuples within the cycle.

Algorithm 1
The procedure propagate applies constraint propagation to CSP A ⊆ T and returns the sequence (R i ) n i=0 of tuple sets that were forbidden and the corresponding deactivating directions (d i ) n i=0 .If all tuples in some scope S ∈ C become forbidden during propagation, propagate returns also S, otherwise it returns S = ∅.
Find a set R n ⊆ A n and an R n -deactivating direction d n for A n .

9:
n := n + 1 10: end while Example 5. Recall that a CSP A is singleton arc consistent (SAC) if for every tuple t = ({i}, k) ∈ A (where i ∈ V and k ∈ D), the CSP15 A| xi=k = A − (T {i} − {({i}, k)}) has a non-empty arc-consistency closure.Good (i.e., sparse) deactivating directions for SAC can be obtained as follows.For some ({i}, k) ∈ A, we enforce arc consistency of CSP A| xi=k , during which we store the causes for forbidding each tuple.
If A| xi=k is found to have empty AC closure, we backtrack and identify only those tuples which were necessary to prove the empty AC closure.These tuples form the set P .The deactivating direction is then constructed as in (27) where R = {({i}, k)} and A is replaced by T − P .Note that SAC does not have bounded support as many other local consistencies [10] do, so the size of P can be significantly different for different CSP instances.We show a detailed example of constructing a deactivating direction using SAC in Appendix A.

Composing Deactivating Directions
Consider now a propagator which, for a current CSP A ⊆ T , returns a set R ⊆ A such that SOL(A) = SOL(A − R) and an R-deactivating direction for A. This propagator is applied iteratively, each time forbidding a different set of tuples, until the current CSP achieves the desired local consistency level Φ or it becomes explicit that the CSP is unsatisfiable.This is outlined in Algorithm 1, which stores the generated sets R i of tuples being forbidden and the corresponding R i -deactivating directions d i .By line 5 of the algorithm, we have R j for every i ∈ {0, . . ., n + 1}.Therefore, by Theorem 5, we have SOL In this section, we show how to compose the generated sequence of R i -deactivating directions d i for A i into a single n i=0 R i -deactivating direction for A. This can be done using the following composition rule: Proof.First, if d t ≤ −1 for all t ∈ R, then d = d satisfies the required condition immediately.Otherwise, δ > 0 since d t < 0 for all t ∈ R by definition and −1 − d t < 0 due to d t > −1 in the definition of δ.We will show that d satisfies the conditions in Definition 1.
For t ∈ R , d t < 0 and d t = 0 holds by definition due to R ⊆ A − R, thus d t = d t + δd t = d t < 0 which together with the previous paragraph yields condition (a).

Algorithm 2
The procedure compose takes the sequences (R i ) n i=0 and (d i ) n i=0 (generated by the procedure propagate) and a non-empty index set I ⊆ {0, . . ., n} and composes them to an R * -deactivating direction d * for A. end if 9: end while Theorem 6 allows us to combine R i -deactivating direction Iteratively, we can thus gradually build a n i=0 R i -deactivating direction for A, which certifies unsatisfiability of A whenever Algorithm 1 detects on line 6 that A n+1 (and thus also A) is unsatisfiable.
However, it is not always necessary to construct a full This is outlined in Algorithm 2, which composes only a subsequence of directions d i based on a given set of indices I ⊆ {0, . . ., n} and constructs an R * -deactivating direction with R * ⊇ i∈I R i .Although Algorithm 2 is applicable for any set I, in our case I is obtained by taking a scope S ∈ C such that A n+1 ∩ T S = ∅ and then setting so that (A − R * ) ∩ T S = ∅ due to the following fact: Let I be given by (32).

Proof. For any sets A, R, T ⊆ T we have (
Correctness of Algorithm 2 is given by the following theorem: Proof.The fact that R * ⊇ i∈I R i is obvious due to R max I ⊆ R * by initialization on line 2 and R i ⊆ R * for any i ∈ I such that i < max I because in such case the update on line 7 is performed.Similarly, R * ⊆ n i=0 R i holds by initialization of R * on line 2 and updates on line 7.It remains to show that d * is R * -deactivating, which we will do by induction.We claim that vector d * is always R * -deactivating direction for A i on line 3 and R * -deactivating direction for A i+1 on line 5. Initially, we have d * = d i , so d * is R i -deactivating (i.e., R * -deactivating since R * = R i before the loop is entered) for A i .Also, when vector d * is first queried on line 5, i decreased by 1 due to the update on line 4, so d * is R * -deactivating for A i+1 .The required property thus holds when the condition on line 5 is first queried with i = max I − 1.
We proceed with the inductive step.If the condition on line 5 is not satisfied, then necessarily d * t = 0 for all t ∈ R i .So, if d * is R * -deactivating for A i+1 , then it is also R * -deactivating for A i = A i+1 ∪ R i , as seen from Definition 1.
If the condition on line 5 is satisfied, d * is R * -deactivating for A i+1 before the update on lines 6-7.Since A i+1 = A i −R i and d i is R i -deactivating for A i , Theorem 6 can be applied to d i and d * to obtain an (R * ∪ R i )-deactivating direction for A i .After updating R * on line 7, it becomes R * -deactivating for A i .
When eventually i = 0, d * is R * -deactivating for A 0 = A by line 2 in Algorithm 1.
Remark 3.This is similar to what the VAC [16] or Augmenting DAG algorithm [42,61] do for arc consistency.To attempt to disprove satisfiability of CSP A * (f ), these algorithms enforce AC of A * (f ), during which the causes for forbidding tuples are stored.If the empty AC closure of A * (f ) is detected (which corresponds to T S ∩ A n+1 = ∅ for some S ∈ C), these algorithms do not iterate through all previously forbidden tuples but only trace back the causes for forbidding the elements of the wiped-out domain (here, the elements of T S ).

Line Search
In §4.2 we showed how to construct an R-deactivating direction d for a CSP A, which certifies unsatis- ), we need to find a step size α > 0 so that f = f + αd, as discussed in §4.1.2.That means, we need to find α > 0 such that B(f + αd) < B(f ).This task is known in numerical optimization as line search.
Finding the best step size (i.e., exact line search) would require finding a global minimum of the univariate convex piecewise-affine function α → B(f + αd).As this would be too expensive for large WCSP instances, we find only a suboptimal step size (approximate line search) in the following theorem. 16heorem 7. Let f ∈ R T .Let d be an R-deactivating direction for A * (f ).Denote17 Then β, γ > 0 and for every S ∈ C and α ∈ R, WCSP f = f + αd satisfies: Proof.We have β > 0 because d t > 0 implies t is an inactive tuple, so max t∈T S f t > f t .We have γ > 0 because in f t − f t tuple t is always active and t is inactive, hence f t > f t .
To prove (a), let t * ∈ (A * (f ) − R) ∩ T S .Hence, by Definition 1, d t * = 0 and the value max t∈T S f t does not decrease for any α since f t * = f t * + αd t * = f t * .To show the maximum does not increase, consider a tuple t ∈ T S such that d t > 0 (due to α ≥ 0, tuples with d t ≤ 0 cannot increase the maximum).It follows that α ≤ β ≤ To prove (c), let (A * (f ) − R) ∩ T S = ∅.For all t ∈ T S ∩ R, we have f t = f t + αd t < f t by d t < 0 and α > 0, i.e., max t∈T S ∩R f t < max t∈T S ∩R f t .We proceed to show that f t ≤ max t ∈T S ∩R f t for Algorithm 3 The final algorithm to iteratively improve feasible solutions to (19).
if S = ∅ then 5: Define I as in (32). 6: Update f := f + min{β, γ}d * following Theorem 7. 8: If d is an R-deactivating direction for CSP A * (f ) and for all S ∈ C we have (A * (f )−R)∩T S = ∅ then, by Theorem 7(a,b), there is α > 0 such that This justifies why such direction d is called R-deactivating: a suitable update of f along this direction makes tuples R inactive for f .Remark 4. This might suggest that to improve the current bound B(f ), we need not use Algorithm 2 to construct an R * -deactivating direction d * with (A * (f ) − R * ) ∩ T S = ∅ for some S ∈ C, but instead, perform steps using the intermediate R i -deactivating directions d i to create a sequence Unfortunately, it is hard to make this work reliably as there are many choices for the intermediate step sizes 0 < α i < β i .We empirically found Algorithm 3 to be preferable.
If d is an R-deactivating direction for A * (f ) and for some S ∈ C we have (A * (f ) − R) ∩ T S = ∅, then, by Theorem 7(a,c), there is α > 0 such that f = f +αd satisfies B(f ) < B(f ).The following corollary of Theorem 7 finally justifies why the certificate d of unsatisfiability of CSP A * (f ) is an improving direction for (19): Proof.First, if for some S ∈ C we have that A ∩ T S = ∅, A is unsatisfiable and no f ∈ R T satisfies A = A * (f ), so the second condition is trivially satisfied by choosing any d ∈ M * .
Otherwise, let d be any A-deactivating direction (which exists by Theorem 4).It follows from Theorem 7 that for any f ∈ R T with A * (f ) = A, we can compute a suitable step size α > 0 such that B(f + αd) < B(f ).The remaining part follows from Theorem 3.

Final Algorithm
Having certificates of unsatisfiability from §4.2 and step sizes from §4.3, we can now formulate in detail the iterative method outlined in §4.1.2,see Algorithm 3. First, constraint propagation is applied to CSP A * (f ) by Algorithm 1 until either A * (f ) is proved unsatisfiable or no more propagation is possible.In the latter case, the algorithm halts and returns B(f ) as the best achieved upper bound on the optimal value of WCSP g.Otherwise, if A * (f ) is proved unsatisfiable due to A n+1 ∩ T S = ∅ for some S ∈ C, define I as in (32) so that (A * (f ) − i∈I R i ) ∩ T S = ∅, and compute an R * -deactivating direction d * where R * ⊇ i∈I R i using Proposition 2. Since (A * (f ) − R * ) ∩ T S = ∅, we can update WCSP f using Theorem 7. Consequently, the bound B(f ) strictly improves after each update on line 7.
Remark 5.In the maximization version of WCSP, hard constraints can be modelled by allowing minusinfinite weights, i.e., we then have g ∈ (R ∪ {−∞}) T .We argue that Algorithm 3 can be easily extended to such a setting.Without loss of generality, one can assume that i.e., there is at least one finite weight in each scope for the input WCSP g (as otherwise the WCSP is infeasible).With this assumption, the definition of the active-tuple CSP A * (•) remains unchanged and the tuples with minus-infinite weights are never active.Next, see that propagation in the active-tuple CSP and construction of the improving direction depend only on A * (f ), so these subroutines need not be modified and, consequently, the improving direction d * still contains only finite weights, i.e., d * ∈ R T .
The only difference can arise when computing the step size α = min{β, γ} by Theorem 7. If f ∈ R T , then α is always finite.In contrast, if f ∈ (R ∪ {−∞}) T , then it may happen that β = γ = ∞, so α = min{β, γ} = ∞ where we assume the usual arithmetic with infinities, so, e.g., a − (−∞) = ∞ for a ∈ R. As discussed earlier, the weights of the active tuples are always finite, which avoids indeterminate expressions when computing β and γ.Note, arithmetic with infinities is different from the addition with ceiling operator [16,43] (unless the ceiling is infinite).
Next, we comment on the finite and infinite case: • If the computed step size is finite, then αd * ∈ R T and the update on line 7 can be performed following the aforementioned arithmetic with infinities.In this case, condition (33) holds for the updated f since the set of tuples with minus-infinite weight (i.e., the set {t ∈ T | f t = −∞}) is kept unchanged by the update and we can continue with the next iteration.
• On the other hand, if the step size is infinite, then the bound B(f + αd * ) can be made arbitrarily low by setting α large enough.Stated formally, this means ∀b ∈ R ∃α > 0 : B(f + αd * ) ≤ b, which proves infeasibility of the WCSP instance, so the algorithm should return −∞ and terminate.
All in all, if hard constraints are allowed, the only required change in Algorithm 3 is that, if β = γ = ∞ on line 7, then the algorithm should terminate and return −∞ (which is an upper bound on the infeasible initial WCSP).
In Algorithm 3 we additionally used a heuristic analogous to capacity scaling in network flow algorithms [1, §7.3].On line 3 of Algorithm 3, we replace the active tuples A * (f ) with 'almost' active tuples for some threshold θ > 0. 18 This forces the algorithm to disprove satisfiability using tuples that are far from being active, thus hopefully leading to larger step sizes and faster decrease of the bound.Initially, θ is set to a high value and whenever we are unable to disprove satisfiability of A * θ (f ), the current θ is decreased as θ := θ/10.The process continues until θ becomes very small.
Although our theoretical results are more general, our implementation is limited only to binary WC-SPs, i.e., instances where the maximum arity of the weighted constraints is at most 2. We implemented two versions of Algorithm 3 (including capacity scaling 19 ), differing in the local consistency used to attempt to disprove satisfiability of CSP A * (f ): • Virtual singleton arc consistency via super-reparametrizations (VSAC-SR) uses singleton arc consistency.Precisely, we alternate between AC and SAC propagators: whenever a single tuple (i, k) ∈ V × D is removed by SAC, we step back to enforcing AC until no more AC propagations are possible, and repeat.
• Virtual cycle consistency via super-reparametrizations (VCC-SR) is the same as VSAC-SR except that SAC is replaced by CC.Though our implementation is different than [41] (we compose deactivating directions rather than alternate between the cycle-repair procedure and the Augmenting DAG algorithm), it has the same fixed points.
The procedures for generating deactivating directions for AC, SAC and CC were implemented as described in Examples 3, 5, and 4. We used AC3 algorithm to enforce AC.In SAC and CC it is useful to step back to AC whenever possible because deactivating directions of AC correspond to reparametrizations rather than super-reparametrizations, which is desirable as explained in §4.1.1.
and i ∈ V is the edge and variable with the lowest index (based on indexing in the input instance), respectively.The terminating condition was θ ≤ 10 −6 .Let us note that if capacity scaling is used with θ > 0 and the construction of the improving direction is deterministic (which is the case in our implementation), then the method is guaranteed to terminate after a finite number of iterations.This follows from our more general results that we state in [25, §2.2.1].In order to improve the efficiency of our method, we also decreased θ whenever the bound did not improve by more than 10 −15 in 20 consecutive iterations.
Remark 6.In analogy to [16,45], let us call a WCSP instance f virtual Φ-consistent (e.g., virtual AC or virtual RPC) if A * (f ) has a non-empty Φ-consistency closure.Then, a virtual Φ-consistency algorithm naturally refers to an algorithm to transform a given WCSP instance to a virtual Φ-consistent WCSP instance.In the VAC algorithm, this transformation is equivalence-preserving, i.e., a reparametrization.But in our case, it is a super-reparametrization, which is why we call our algorithms VSAC-SR and VCC-SR.
Since we restricted ourselves to binary WCSPs, let E) is an undirected graph.The cycles in VCC-SR were chosen as follows: if 2|E|/|V | ≤ 5 (i.e., the average degree of the nodes in (V, E) is at most 5), then all cycles of length 3 and 4 present in the graph (V, E) are used.If 2|E|/|V | ≤ 10, then all cycles of length 3 present in the graph are used.If 2|E|/|V | > 10 or the above method did not result in any cycles, we use all fundamental cycles w.r.t. a spanning tree of the graph (V, E). 20 No additional edges are added to the graph.Note, [41] experimented with grid graphs (where cycles of length 4 and 6 of the grid were used) and complete graphs (where cycles of length 3 were used).
Since both VSAC-SR and VCC-SR start by enforcing VAC (i.e., making A * (f ) arc consistent by reparametrizations), before running these methods we used toulbar2 to reparametrize the input WCSP instance to a VAC state (because a specialized algorithm is faster than the more general Algorithm 3).We employed specialized data structures for storing the sequences (R i ) n i=0 and (d i ) n i=0 from Algorithm 1, which utilize the property that the sets (R i ) n i=0 are disjoint and make easier sequential querying of (sparse) vectors (d i ) n i=0 in Algorithm 2. Note that the sequence (A i ) n+1 i=0 need not be stored and is only needed for theoretical analysis.Moreover, sparse representations were used when composing deactivating directions in Algorithm 2. To avoid working with 'structured' tuples (1), we employed a bijection between T and {1, . . ., |T |} to work with numerical indices instead.
Besides the above improvements, we did not fine-tune our implementation for efficiency.Thus, the set A * (f ) was always calculated by iterating through all tuples (which could be made faster if sparsity of the improving direction was taken into account).The hyper-parameters of our algorithm (e.g., the decrease schedule of θ or constants mentioned in Footnote 19) were not learned nor systematically optimized.SAC was checked on all active tuples without warm-starting or using any faster SAC algorithm than SAC1 [9,22].Perhaps most importantly, we did not implement inter-iteration warm-starting as in [60,24], i.e., after updating the weights on line 6 of Algorithm 3, some deactivating directions in the sequence that were not used to compose the improving direction may be preserved for the next iteration instead of being computed from scratch.Except for computing deactivating directions, the code was the same for VSAC-SR and VCC-SR.We implemented everything in Java.

Experiments
We compared the bounds calculated by VSAC-SR and VCC-SR with the bounds provided by EDAC [21], VAC [16], pseudo-triangles (option -t=8000 in toulbar2, adds up to 8 GB of ternary weight functions), PIC, EDPIC, maxRPC, and EDmaxRPC [45], which are implemented in toulbar2 [6].Our motivation for choosing these local consistencies is as follows: EDAC is the typically chosen local consistency that is maintained during branch-and-bound search.VAC is highly related to our approach and can be used in pre-processing (as it is faster than OSAC which is usually too memory-and time-consuming for practical purposes).Finally, we consider a class of recently proposed triangle-based consistencies [45] that enforce stronger forms of local consistency.
We did the comparison on the Cost Function Library benchmark [4].Due to limited computation resources, we used only the smallest 16500 instances (out of 18132).Of these, we omitted instances containing weight functions of arity 3 or higher.Moreover, to avoid easy instances, we omitted instances that were solved by VAC without search (i.e., toulbar2 with options -A -bt=0 found an optimal solution).We also omitted the validation instances that are used for testing and debugging.Overall, 5371 instances were left for our comparison.
For each instance and each method, we only calculated the upper bound and did not do any search.For each instance and method, we computed the normalized bound Bw−Bm Bw−B b where B m is the bound computed by the method for the instance and B w and B b is the worst and best bound for the instance  among all the methods, respectively.Thus, the best bound 21 transforms to 1 and the worst bound to 0, i.e., greater is better.For 26 instances, at least one method was not able to finish in the prespecified 1-hour CPU-time limit.These timed-out methods were omitted from the calculation of the normalized bounds for these instances.From the point of view of the method, the instance was not incorporated into the average of the normalized bounds of this particular method.We note that implementations of VSAC-SR and VCC-SR provide a bound when terminated at any time, whereas the implementations of the other methods provide a bound only when they are left to finish.Time-out happened 5, 2, 3, 6, and 24 times for pseudo-triangles, PIC, EDPIC, maxRPC, and EDmaxRPC, respectively.This did not affect the results much as there were 5371 instances in total.
The results in Table 1 show that no method is best for all instance groups, instead, each method is suitable for a different group.However, VSAC-SR performed best for most groups and otherwise was often competitive to the other strong consistency methods.VSAC-SR seems particularly good at spinglass maxcut [5], planning [19] and qplib [30] instances.Taking the overall unweighted average of group averages (giving the same importance to each group), VSAC-SR achieved the greatest average value.We also evaluated the ratio to worst bound, B m /B w , for instances with B w = 0; the results were qualitatively the same: VSAC-SR again achieved the best overall average of 3.93 (or 4.15 if only groups with ≥ 5 instances are considered) compared to second-best pseudo-triangles with 2.71 (or 2.84).
The runtimes (on a laptop with i7-4710MQ processor at 2.5 GHz and 16GB RAM) are reported in Table 2. Again, the results are group-dependent and one can observe that the methods explore different trade-offs between bound quality and runtime.However, the strong consistencies are comparable in terms of runtime on average, except for pseudo-triangles, which is a faster method that however needs significantly more memory.

Additional Properties of Super-Reparametrizations
In this section, we present a more detailed study of properties of WCSPs that are preserved by (possibly optimal) super-reparametrizations.To that end, we first revisit in §5.1 the notion of a minimal CSP for a set of assignments.The key result of §5 is presented in §5.2, where we study the relation of the set of optimal assignments of some WCSP to the set of optimal assignments of its super-reparametrization optimal for (19), showing that they need not coincide in general.In §5.3, we give some properties of general (i.e., not necessarily optimal for (19)) super-reparametrizations.

Minimal CSP
Let us ask when for a given set X ⊆ D V of assignments (i.e., a |V |-ary relation over D) does there exist A ⊆ T such that X = SOL(A), i.e., when is X representable as the solution set of a CSP with a given structure (D, V, C).For that, denote Thus, A ↑ (X) is the set of all CSPs whose solution set includes X and A min (X) is the intersection of these CSPs.We call A min (X) the minimal CSP for X.For CSPs with only binary relations, this concept was studied in [44] and [23, §2.3.2].

Optimal Assignments from Optimal Super-Reparametrizations
Theorem 2 says that the optimal value of ( 19) coincides with the optimal value max x g, φ(x) of WCSP g.We now focus on the optimal assignments (rather than optimal value) of WCSP g.For brevity, we will denote the set of all optimal assignments of WCSP g as Proof.To show OPT(g) ⊆ OPT(f ), let x * ∈ OPT(g).By Theorem 2, g, φ(x * ) = B(f ).Analogously to the proof of Theorem 2: since B(f ) and Theorem 1.
Our main goal in §5 is to characterize when the inclusion in Theorem 9 holds with equality, which is given by Theorem 11 below.Proposition 6.For every g ∈ R T and A ⊆ T such that OPT(g) ⊆ SOL(A), there exists f ∈ R T optimal for (19) such that A = A * (f ).
Proof.Define the vector f as where are the best and the second-best objective value of WCSP g.Note, if OPT(g) = D V , then F 2 is undefined but it does not matter because it is never used in (39).Since ∅ = OPT(g) ⊆ SOL(A), CSP A is satisfiable.Therefore for each S ∈ C we have A ∩ T S = ∅, hence max Equality A = A * (f ) now follows from (39).
To show that f is optimal for (19), we use (41) to obtain B(f ) = S∈C F 1 /|C| = F 1 = max x g, φ(x) and apply Theorem 2.
Theorem 10.For every g ∈ R T , we have Proof.The inclusion ⊇ says that for every optimal f we have OPT(g) ⊆ SOL(A * (f )), which was proved in Theorem 9.The inclusion ⊆ was proved in Proposition 6.Now we combine the results of §5.1 and §5.2 to obtain the main result of §5.First observe that, by (37), the set ( 42) is just the interval [A min (OPT(g)), T ].
Theorem 11.For every g ∈ R T , the following statements are equivalent: (a) OPT(g) = SOL(A) for some A ⊆ T , (b) OPT(g) = OPT(f ) for some f optimal for (19).
By the results of §5.1, statement (a) is equivalent to OPT(g) = SOL(A min (OPT(g))).Therefore, if (a) holds, then (b) holds for the above f .In the other direction, if (b) holds for the above f , then (a) holds.
Theorem 11 shows that the inclusion in Theorem 9 holds with equality for some optimal f if and only if the set OPT(g) of optimal assignments of WCSP g is as a solution set of some CSP with the same structure.If no such CSP exists, then OPT(g) OPT(f ) for all optimal f .An example of WCSP g for which no such CSP exists is in Figure 6.
It is natural to ask which WCSPs possess this property.Though we are currently unable to provide a full characterization of such WCSPs, we identify two such classes: Theorem 12 ( [51,61]).If the LP relaxation (13) of a WCSP g ∈ R T is tight, then OPT(g) = SOL(A) for some A ⊆ T .
Proof.If the LP relaxation ( 13) is tight, then there exists a vector f ∈ R T such that B(f ) = max x∈D V g, φ(x) and f is a reparametrization of g, i.e., f, φ(x) = g, φ(x) for all x ∈ D V , thus, f is also optimal for (19).It follows that the sets of optimal assignments for f and g coincide.By Theorem 9, A * (f ) is the required CSP.

Properties of General Super-Reparametrizations
Finally, we present one property of general super-reparametrizations f of a fixed WCSP g ∈ R T , i.e., f is only feasible (but possibly not optimal) for (19).Theorem 14.For every g ∈ R T we have the graph obtained from G * by adding 4 new vertices and including an edge between each pair of these new vertices.Let CSP A have the structure (D, V, C) where |D| = 3 and Hence, any x ∈ D V can be interpreted as an assignment of colors to the nodes of G and x ∈ SOL(A) if and only if x is a 3-coloring of G. Since G contains K 4 as its subgraph, it is not 3-colorable and A is unsatisfiable.Hence, setting R = A satisfies SOL(A − R) = SOL(∅) = ∅ = SOL(A).
For the purpose of our reduction, let us define δ = (|C| − 2)/2 > 0. We will show that for such a setting, d is not an R-deactivating direction for A if and only if G * is 3-colorable.
Plugging the above-defined sets A and R into the definition of d in ( 27) yields where COL In other words, only for a single edge in C − E * (i.e., edge of graph K 4 ), the adjacent vertices are assigned the same color, so d, φ(x) = −|C|/2 < 0 by (46) and definition of δ.Hence, d is not an R-deactivating direction for A.
For the other case, if G * is not 3-colorable, then for any x ∈ D V , COL(x) ≤ |C| − 2. The reason is that for at least one edge in K 4 and at least one edge in G * , the adjacent vertices will be assigned the same color in any assignment.By substituting the value of δ and a simple manipulation of ( 46 where the term in brackets is non-negative due to COL(x) ≤ |C| − 2 for any x ∈ D V .So, d is an R-deactivating direction for A.
In connection to §5.1, a number of decision problems concerning the minimal CSP have been also proved hard.For recent results, see [32,28].

Summary and Discussion
We have proposed a method to compute upper bounds on the (maximization version of) WCSP.The WCSP is formulated as a linear program with an exponential number of constraints, whose feasible solutions are super-reparametrizations of the input WCSP instance (i.e., WCSP instances with the same structure and greater or equal objective values).Whenever the CSP formed by the active (i.e., maximal in their weight functions) tuples of a feasible WCSP instance is unsatisfiable, there exists an improving direction (in fact, a certificate of unsatisfiability of this CSP) for the linear program.As this approach provides only a subset of all possible improving directions, it can be seen as a local search.We showed how these improving directions can be generated by constraint propagation (or, more generally, by other methods to prove unsatisfiability of a CSP).We showed that super-reparametrizations are closely related to the dual cone to the well-known marginal polytope.
Special cases of our approach are the VAC / Augmenting DAG algorithm [16,42,61], which uses arc consistency, and the algorithm in [41], which uses cycle consistency.We have implemented the approach for singleton arc consistency, resulting in VSAC-SR algorithm.When compared to existing soft local consistency methods on a public dataset, VSAC-SR provides comparable or better bounds for many instances.Although the runtimes are higher than those of the simpler techniques, such as EDAC or VAC, one can control different trade-offs between bound quality and runtime by stopping the method prematurely, e.g., when the step size becomes small or terminating already with a greater value of θ (see Footnote 19).
The approach in general requires storing all the weights of the super-reparametrized WCSP instance.This may be a drawback when the domains are large and/or the weight functions are not given explicitly as a table of values but rather by an algorithm (oracle).
We expect our improved bounds to be useful when solving practical WCSP instances.Applications may include, e.g., using the method in preprocessing, pruning the search space during branch-and-bound search, providing tighter optimality gaps for solutions proposed by heuristic approaches, or generating high-quality proposals for solutions, as in [41].However, we have done no experiments with this, so it is open whether the tighter bounds would outweigh the higher complexity of the algorithm.Due to the many options in which the method can be used, we leave this for future research.In addition, our approach can be also useful to solve more WCSP instances even without search (similarly, as the VAC algorithm solves all supermodular WCSPs without search) or, given a suitable primal heuristic, to solve WCSP instances approximately.
The approach can be straightforwardly extended to WCSPs with different domain sizes 26 and some weights equal to minus infinity (i.e., some constraints being hard).Of course, further experiments would be needed to evaluate the quality of the bounds if infinite weights are allowed.The WCSP framework also usually assumes a pre-defined specific finite bound that is updated during branch-and-bound [18] although the presented pseudocode does not support this, it is not difficult to extend it in this way.
Finally, we presented a theoretical analysis of the concept of super-reparametrizations of WCSPs, describing the properties of optimal super-reparametrizations and characterizing the set of active-tuple CSPs induced by different optimal super-reparametrizations.For example, even an optimal superreparametrization may change the set of optimal assignments, as shown in §5.2.Additionally, we have shown that general (i.e., possibly non-optimal) super-reparametrizations are only weakly related to the original WCSP instance.(e) WCSP f 3 with B(f ) = 47.This WCSP is VAC but not VSAC.We now proceed to show our example where EDAC, VAC, and VSAC will be gradually enforced.The initial WCSP f 1 is depicted in Figure 7a.The structure of this WCSP is (D, V, C) where D = {a, b}, V = {1, 2, 3}, and C = {{1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}}.The names of the variables are indicated in the figures.To simplify the figures throughout this example, we do not state the names of the values in them -the upper value is a and the lower value is b.The optimal objective value of WCSP f 1 is 43, which is attained, e.g., by the assignment x = (a, a, a).
WCSP f 1 is not EDAC (w.r.t. the natural ordering by ≤) because the tuple ({1}, b) is not fully supported by variable 2. To make WCSP f 1 EDAC, it is sufficient to shift weight from the unary tuple ({2}, a) to the binary weight function with scope {1, 2}, which results in WCSP f 2 depicted in Figure 7b.WCSP f 2 is a reparametrization of f 1 due to f 2 − f 1 ∈ M ⊥ (depicted in Figure 7c).WCSP f 2 is EDAC w.r.t.≤ but it is not VAC because the AC closure of A * (f 2 ) is empty.To see that the AC closure is empty, we can follow the propagations that are depicted in Figure 7d.The arrows from the cause of forbidding a tuple to the newly forbidden tuple.First, we can forbid the tuple ({1, 2}, (a, a)) because the tuple ({1}, a) is forbidden.Second, we can forbid the tuple ({2}, a) because both tuples ({1, 2}, (a, a)) and ({1, 2}, (b, a)) are forbidden.Next, we gradually forbid ({2, 3}, (a, a)), ({3}, a), ({1, 3}, (a, b)), and ({3}, b).This leads to domain wipe-out in variable 3.By shifting the weights against the direction of the arrows (as depicted in Figure 7d), we make this WCSP VAC.This yields the WCSP f 3 in Figure 7e which is a reparametrization of f 2 .For clarity, we also show how the weights were transformed in Figure 7f.
WCSP f 3 is VAC and even OSAC, so the bound B(f 3 ) cannot be improved by reparametrizations (without introducing a ternary weight function with scope {1, 2, 3}).WCSP f 3 is however not VSAC because it has empty SAC closure.Thus, A * (f 3 ) is unsatisfiable and we are able to construct a superreparametrization of WCSP f 3 with a better bound (recall Theorem 3 and §4.1.2).We show next the details of the construction, following our results from §4.2, §4.3, and §4.4.

Figure
Figure 1: Visualisations of two WCSPs f and d with structure as in Example 2. Variables (elements of V ) are depicted as rounded rectangles, tuples (elements of T ) as circles and line segments, and weights f t (and d t ) are written next to the circles and line segments.Black circles and full lines indicate active tuples, whereas white nodes and dashed lines indicate non-active tuples.

Figure 2 :
Figure 2: Example of one iteration on a binary WCSP whose (hyper)graph is a cycle of length 4.

Figure 4 :
Figure 4: Illustration to the iterative scheme: B(g) and B(f k ) are shown by the full lines, max x g, φ(x) and max x f k , φ(x) are represented by the dashed lines.
x[S]) ∈ R }| and n 2 (x) = |{ S ∈ C | (S, x[S]) ∈ T − A }|.For contradiction, let x ∈ D V satisfy d, φ(x) < 0. This implies n 1 (x) > 0 and n 2 (x) = 0, where the latter is because n 1 (x) ≤ δ by the definition of δ.That is, we have (S * , x[S * ]) ∈ R for some S * ∈ C and (S, x[S]) ∈ A for all S ∈ C.But the latter means x ∈ SOL(A) and the former implies x / ∈ SOL(A − R), a contradiction.Theorem 5. Let A ⊆ T and R ⊆ A. If there exists an R-deactivating direction for A, then SOL(A) = SOL(A − R).Proof.Observe that SOL(A) = SOL(A − R) is equivalent to SOL(A) ⊆ SOL(A − R) because forbidding tuples may only remove solutions, i.e., SOL is an isotone map (see §5.1).Let d be an R-deactivating direction for A and let x ∈ SOL(A) − SOL(A − R), so (S, x[S]) ∈ R for some S ∈ C. By (4), we have d, φ(x) < 0 because d S (x[S]) = 0 for all (S, x[S]) ∈ A − R by condition (b) in Definition 1 and d S (x[S]) < 0 for all (S, x[S]) ∈ R by condition (a).This contradicts d ∈ M * .

Example 3 .
Let us consider (generalized) arc consistency (AC).A CSP A is (G)AC if for all S ∈ C, i ∈ S and k ∈ D we have the equivalence13
n i=0 R i -deactivating direction because not every iteration of constraint propagation may have been necessary to prove unsatisfiability of A. Instead, we can use the scope S ∈ C satisfying A n+1 ∩ T S = ∅ (where A n+1 = A − n i=0 R i , as mentioned above) returned by Algorithm 1 on line 7 and construct an R * -deactivating direction d * for a (usually smaller) set R * ⊆ n i=0 R i such that (A − R * ) ∩ T S = ∅.Such a direction d * still certifies unsatisfiability of A and can be sparser and/or may have lower objective values d * , φ(x) than a n i=0 R i -deactivating direction, which is desirable as explained in §4.1.1.
§B], for WCSPs of any arity see[62,  §3.2]).An example of WCSP d ∈ M ⊥ is in Figure1b.The set of all reparametrizations of f is the affine subspace 6 then d t = 0 and such tuples remain active by f t = f t .Tuples t ∈ R ∩ T S become inactive since f t = f t +d t α < f t = max t ∈T S f t by d t < 0 and α > 0. Tuples t / ∈ A * (f ) either satisfy d t ≤ 0 and cannot become active or satisfy d t > 0 and by α < β ≤

Table 1 :
Results on instances from Cost Function Library: Average normalized bounds (for each instance group, the best average normalized bound is in bold).

Table 2 :
Results on instances from Cost Function Library: Average CPU time in seconds (for each instance group, the shortest average CPU time is in bold).