1 Introduction

In the weighted constraint satisfaction problem (WCSP) we maximize the sum of (weight) functions over many discrete variables, where each function depends only on a (usually small) subset of the variables. A popular approach to tackle this NP-hard combinatorial optimization problem is via its linear programming (LP) relaxation [1,2,3,4,5]. The dual of this LP relaxation [2, 5, 6] can be interpreted as follows. Feasible dual solutions correspond to reparametrizations (also known as equivalence-preserving transformations [6]) of the WCSP objective function, which are obtained by moving weights between weight functions so that the WCSP objective function is preserved. The dual LP relaxation then seeks to find such a reparametrization of the initial WCSP that minimizes an upper bound on the WCSP objective value by reparametrizations. For some instances, the minimal upper bound is equal to the maximal value of the WCSP objective (i.e., the LP relaxation is tight) but, in general, there is a gap between them. The precise form of the dual LP differs slightly from author to author.

For larger instances, solving the LP relaxation to global optimality is too costly. Therefore, the upper bound is usually minimized suboptimally by performing reparametrizations only locally. Stopping points of these suboptimal methods are usually characterized by various levels of local consistency of the CSP formed by the active tuples (i.e., the tuples with the maximum weight in each weight function individually) of the reparametrized WCSP. This is consistent with the fact that a necessary (but not sufficient) condition for global optimality of the dual LP relaxation is that the active-tuple CSP has a non-empty local consistency closure. The level of local consistency at optimum depends on the space of allowed reparametrizations: if weights can move only between pairs of weight functions of which one is unary or nullary, it is arc consistency (AC); if weights can move between two weight functions of any arity, it is pairwise consistency (PWC). These suboptimal methods can be divided into two main classes.

The first class, popular in computer vision and machine learning, is known as convex message passing [2, 7,8,9,10,11,12]. These methods repeat a simple local operation and can be seen as block-coordinate descent with exact updates satisfying the so-called relative interior rule [13]. At fixed points, the active-tuple CSP has non-empty AC (or PWC) consistency closure. These methods yield good upper bounds but are too slow to be applied in each node of a branch-and-bound search.

The second class has been called soft local consistency methods in constraint programming [6], due to its similarity to local consistencies in the ordinary CSP. One type of these methods moves only integer weights between weight functions (assuming all initial weights are integer) and is efficient enough to be maintained during search. Its most advanced representant is existential directional arc consistency (EDAC) algorithm [14]. The other type allows moving fractional weights, which can lead to better bounds but is more costly, hence usually not suitable to be applied during search. Its representants are the virtual arc consistency (VAC) algorithm [6, 15] and the very similar Augmenting DAG algorithm [2, 16, 17]. These methods are based on the following fact: whenever the active-tuple CSP has an empty AC closure, there exists a reparametrization of the WCSP that decreases the upper bound. Thus, each iteration of these algorithms first applies the AC algorithm to the active-tuple CSP and if domain wipe-out occurs, it constructs a dual-improving direction by back-tracking the history of the AC algorithm, and finally reparametrizes the current WCSP by moving along this direction by a suitable step size. The VAC and Augmenting DAG algorithms converge to a non-unique state when the active-tuple CSP has a non-empty AC closure (which is called virtual arc consistency) but are typically faster than convex message passing methods.

In the soft-consistency terminology, global optima of the dual LP relaxation have been called optimally soft arc consistent (OSAC) WCSPs [6, 18]. In this sense, EDAC and VAC are relaxations of OSAC. But note that OSAC can no longer be considered a local consistency, since no algorithm using only local operations is known to enforce itFootnote 1.

Reparametrizations in general cannot enforce stronger local consistencies of the active-tuple CSP than PWC. This can be seen as follows: if the active-tuple CSP has a non-empty PWC closure but violates some stronger local consistency (hence it is unsatisfiable), there exists no reparametrization that would decrease the upper bound and possibly make the active-tuple CSP satisfy the stronger local consistency. The only way to achieve stronger local consistencies (such as k-consistencies) of the active-tuple CSP by reparametrizations is to introduce new weight functions (of possibly higher arities) and then move weights between these new weight functions and the existing weight functions. This allows constructing a hierarchy of progressively tighter LP relaxations of the WCSP [11, 21,22,23], including the Sherali-Adams hierarchy [24].

In this paper, we study a different LP-based approach, namely an LP formulation of the WCSP, which was proposed in [25] but never pursued later. It differs from the above well-known LP relaxation and does not belong to the hierarchy of LP relaxations obtained by introducing new weight functions of higher arities. This LP formulation minimizes the same upper bound on the WCSP objective value but this time over super-reparametrizations of the initial WCSP objective function, which are changes of the weights that either preserve or increase the WCSP objective value for every assignment. This LP formulation has an exponential number of inequality constraints (representing super-reparametrizations) and is exact, i.e., its minimal value is always equal to the maximal value of the WCSP objective.

We propose to solve this LP suboptimally by a local search method, which is based on the following key observation: whenever the active-tuple CSP is unsatisfiable, there exists a super-reparametrization (but possibly no reparametrization) that decreases the upper bound. The direction of this super-reparametrization is a certificate of unsatisfiability of the active-tuple CSP, which can be constructed from the history of the CSP solver. Note that this approach strictly generalizes the VAC algorithm: if the active-tuple CSP has a non-empty AC closure but is unsatisfiable, the VAC algorithm is stuck (because no reparametrization can decrease the upper bound) but our algorithm can decrease the bound by a super-reparametrization. The cost for this greater generality is that super-reparametrizations may preserve neither the WCSP objective value for some assignments nor the set of optimal assignments, but they can nevertheless provide valid, and possibly tighter, upper bounds on the WCSP optimal value.

After formulating this general framework, we focus on the case when the unsatisfiability of the active-tuple CSP is proved by local consistencies stronger than AC/PWC. In particular we use singleton arc consistency (SAC), which is interesting because it does not have bounded support [26] and therefore it would be difficult to achieve by introducing new weight functions of higher arity. We show how to construct a certificate of unsatisfiability of a CSP from the history of the SAC algorithm. Our algorithm then interleaves AC and SAC: we always keep decreasing the upper bound by reparametrizations until the active-tuple CSP has non-empty AC closure, and only then decrease the bound by a super-reparametrization if the SAC closure of active-tuple CSP is found empty. In experiments we show that on many WCSP instances, this algorithm yields better bounds than state-of-the-art soft local consistency methods in reasonable runtime. Note, we report only the achieved upper bounds but do not use them in branch-and-bound search, which would be beyond the scope of our paper.

To the best of our knowledge, super-reparametrizations have not been utilized or studied except for [25] and [27]. In [25], super-reparametrizations were used to obtain tighter bounds using a specialized cycle-repairing algorithm and were identified in [27] as a property satisfied by all formulations of the linear programming relaxations based on reparametrizations. However, [27] focuses almost solely on the relation between different formulations of reparametrizations, instead of super-reparametrizations. To fill in this gap, we theoretically analyze the associated optimization problem and also the properties of super-reparametrizations.

Compared to the previous version of this paper [28], we improved the current paper in the following ways:

  • Most importantly, we include a study on the theoretical properties of super-reparametrizations and compare them to those of reparametrizations (§5).

  • Although our implementation remains limited to WCSPs of arity 2, we present all of our theoretical results for WCSPs of any arity.

  • We include a geometric interpretation that provides intuitive insights and thus simplifies understanding of our method (§4.1.1).

  • In addition to making our code publicly available, we add more information on implementation details to improve reproducibility (§4.4).

  • We also analyze the cone of non-negative weighted CSPs and prove that it is dual to the marginal polytope (§3.2).

  • Unsurprisingly, we show that some decision problems connected to our approach and super-reparametrizations are NP-hard (in §6).

Structure

We begin in §2 by formally defining the Weighted CSP, classical (crisp) CSP, and introducing the notation that will be used throughout the paper. Then, in §3, we formally define the optimization problem of minimizing an upper bound over reparametrizations and/or super-reparametrizations where we also state the sufficient and necessary optimality conditions. Next, §4 proposes a practical approach for approximate minimization of the upper bound over super-reparametrizations using constraint propagation. We also give experimental results comparing our approach with existing soft local consistencies. Additional properties of the underlying active-tuple CSPs (see definition later) and the sets of optimal (or also non-optimal) super-reparametrizations are given in §5. §6 presents the hardness results. We provide a detailed example demonstrating EDAC, VAC, and our proposed approach with SAC in Appendix.

2 Notation

Let V be a finite set of variables and D a finite domain of each variable. An assignment \(x\in D^{V}\) assignsFootnote 2 a value \(x_{i}\in D\) to each variable \(i\in V\). Let \(C \subseteq 2^{V}\) be a set of non-empty scopes, i.e., (VC) can be seen as an undirected hypergraph. The triplet (DVC) defines the structure of a (weighted) CSP and will be fixed throughout the paper. By

$$\begin{aligned} T = \{ (S,k)\mid S\in C, \; k \in D^{S}\} = \bigcup _{S\in C} T_{S} \quad \text {where}\quad T_{S} = \{ (S,k) \mid k \in D^{S}\} \end{aligned}$$
(1)

we denote the set of tuples, partitioned into sets \(T_{S}\), \(S\in C\). We say that an assignment \(x\in D^{V}\) uses a tuple \(t=(S,k)\in T\) if \(x[S]=k\) where x[S] denotes the restriction of x onto the set \(S\subseteq V\), i.e., for \(S=\{i_{1}, ..., i_{|S|}\}\) we have \(x[S]=(x_{i_{1}}, ..., x_{i_{|S|}})\) (where the order of the components is defined by the total order on S inherited from some arbitrary fixed total order on V). Each assignment \(x\in D^{V}\) uses exactly one tuple from each \(T_{S}\).

An instance of the constraint satisfaction problem (CSP) is defined by the quadruple (DVCA) where \(A\subseteq T\) is the set of allowed tuples (while the tuples \(T-A\) are forbidden). As the CSP structure (DVC) will be always the same, we will refer to the CSP instance only as A (in other words, in the sequel we identify CSP instances with subsets of T). An assignment \(x\in D^{V}\) is a solution to a CSP \(A\subseteq T\) if it uses only allowed tuples, i.e., \((S,x[S])\in A\) for all \(S\in C\). The set of all solutions to the CSP will be denoted by \(\text {SOL}(A)\subseteq D^{V}\). The CSP is satisfiable if \(\text {SOL}(A)\ne \emptyset\), otherwise it is unsatisfiable.

The weighted constraint satisfaction problem (WCSP)Footnote 3 seeks to find an assignment \(x\in D^{V}\) that maximizes the function

$$\begin{aligned} \sum _{S\in C} f_{S}(x[S]) \end{aligned}$$
(2)

where \(f_{S}:D^{S}\rightarrow \mathbb {R}\), \(S\in C\), are given weight functions. All the weights (i.e., the values of the weight functions) together can be seen as a vector \(f\in \mathbb {R}^{T}\), such that for \(t=(S,k)\in T\) we have \(f_{t}=f_{S}(k)\). The WCSP instance is defined by the quadruple (DVCf). However, as the structure (DVC) will be always the same, we will refer to WCSP instances only as f (in other words, we identify WCSP instances with vectors from \(\mathbb {R}^{T}\)).

Example 1

For example, if \(V = \{1,2,3,4\}\), \(C=\{\{1\},\{2\},\{2,3\},\{1,4\},\{2,3,4\}\}\), and \(D=\{\textsf{a, b}\}\), then we want to maximize the expression

$$\begin{aligned} f_{\{1\}}(x_{1})+f_{\{2\}}(x_{2})+f_{\{2,3\}}(x_{2},x_{3})+f_{\{1,4\}}(x_{1},x_{4})+f_{\{2,3,4\}}(x_{2},x_{3},x_{4}) \end{aligned}$$

over \(x_{1},x_{2},x_{3},x_{4}\in \{\textsf{a, b}\}\). We have, e.g.,

$$\begin{aligned} T_{{\{2,3\}}} = \{(\{2,3\},(\textsf{a, a})), (\{2,3\},(\textsf{a, b})), (\{2,3\},(\textsf{b, a})), (\{2,3\},(\textsf{b, b}))\}. \end{aligned}$$

Remark 1

In some formalisms [6, 22], the objective (2) is to be minimized. For our purposes, these settings are equivalent and the results for minimization problems are analogous as one can invert the sign of all weights and maximize instead. Next, some papers consider only non-negative weights and the empty (nullary) scope \(\emptyset \in C\) whose weight \(f_{\emptyset }\) constitutes a bound on the WCSP optimal value [6, 22]. However, we will later need both positive and negative weights in a WCSP, so we require \(\emptyset \notin C\) to simplify notations (also, with both positive and negative weights, \(f_{\emptyset }\) would not yield a bound on the optimal value).

We will use another notation for the WCSP objective, which is common in machine learning, see, e.g., [3, §3]. We define an indicator map \(\phi :D^{V}\rightarrow \{0,1\}^{T}\) by

$$\begin{aligned} \phi _{t}(x) = \llbracket x[S]=k\rrbracket \quad \text { for each } t=(S,k)\in T \end{aligned}$$
(3)

where \(\llbracket \cdot \rrbracket\) denotes the Iverson bracket, which equals 1 if the logical expression in the bracket is true and 0 if it is false. The WCSP objective (2) can now be written as the dot product

$$\begin{aligned} \sum _{S\in C} f_{S}(x[S]) = \sum _{t\in T} f_{t}\phi _{t}(x) = \langle f,\phi (x)\rangle . \end{aligned}$$
(4)

This makes explicit that the WCSP objective is linear in the weight vector f. The WCSP optimal value is

$$\begin{aligned} \max _{x\in D^{V}} \langle f, \phi (x)\rangle = \max _{\mu \in M} \langle f,\mu \rangle \end{aligned}$$
(5)

where

$$\begin{aligned} M = \phi (D^{V}) = \{ \phi (x)\mid x\in D^{V}\} \subseteq \{0,1\}^{T} . \end{aligned}$$
(6)

Note that M is defined only by the structure (DVC).

3 Bounding the WCSP optimal value

We define the function \(B:\mathbb {R}^{T}\rightarrow \mathbb {R}\) by

$$\begin{aligned} B(f) = \sum _{S\in C} \max _{k \in D^{S}}f_{S}(k) = \sum _{S \in C}\max _{t \in T_{S}} f_{t}. \end{aligned}$$
(7)

This is a convex piecewise-affine function. For \(f\in \mathbb {R}^{T}\), we call a tuple \(t=(S,k)\in T\) activeFootnote 4 if

$$\begin{aligned} f_{t} = \max _{t^{\prime }\in T_{S}} f_{t^{\prime }} . \end{aligned}$$
(8)

The set of all tuples that are active for f is denotedFootnote 5 by \(A^{*}(f)\subseteq T\). Note, \(A^{*}(f)\subseteq T\) can be interpreted as a CSP.

Theorem 1

([2]) For every WCSP \(f\in \mathbb {R}^{T}\) and every assignment \(x\in D^{V}\) we have:

  1. (a)

    \(B(f)\ge \langle f, \phi (x)\rangle\),

  2. (b)

    \(B(f)=\langle f, \phi (x)\rangle\) if and only if \(x\in \text {SOL}(A^{*}(f))\).

Proof

Statement (a) can be checked by comparing expressions (2) and (7) term by term.

Statement (b) says that \(B(f)=\langle f, \phi (x)\rangle\) if and only if \((S,x[S])\in A^{*}(f)\) for all \(S\in C\). This is again straightforward from (2) and (7).

Theorem 1 says that B(f) is an upper bound on the WCSP optimal value. Moreover, it shows that \(B(f)=\langle f, \phi (x)\rangle\) implies that x is a maximizer of the WCSP objective (2).

Example 2

Let \(V=\{1,2\}\), \(D=\{\textsf{a},\,\textsf{ b}\}\), and \(C=\{\{1\},\{2\},\{1,2\}\}\). For this structure, the set of tuples is

$$\begin{aligned} T= & {} \{(\{1\},\textsf{a}),(\{1\},\textsf{ b}),(\{2\},\textsf{ a}),(\{2\},\textsf{ b}),\nonumber \\{} & {} (\{1,2\},(\textsf{a},\textsf{ a})),(\{1,2\},(\textsf{a},\textsf{ b})),(\{1,2\},(\textsf{b},\textsf{ a})),(\{1,2\},(\textsf{b},\textsf{ b}))\}. \end{aligned}$$
(9)

For assignment \(x=(\textsf{ a},\textsf{ b})\in D^{V}\) (i.e., \(x_{1}=\textsf{ a}\), \(x_{2}=\textsf{ b}\)), we have \(\phi (x) = (1,0,0,1,0,1,0,0)\in \{0,1\}^{T}\) where the order of the tuples is given by (9).

An example of a WCSP f with this structure is shown in Fig. 1a. The set of tuples active for f is

$$\begin{aligned} A^{*}(f)= \{(\{1\},\textsf{ b}),(\{2\},\textsf{ a}),(\{1,2\},(\textsf{b},\textsf{ a})),(\{1,2\},(\textsf{b},\textsf{ b}))\} \end{aligned}$$
(10)

and the weight vector reads \(f=(3,4,6,2,-2,-4,1,1)\in \mathbb {R}^{T}\) (where the ordering is again given by (9)). Thus, the objective value of WCSP f for \(x=(\textsf{ a},\textsf{ b})\) is \(\langle f, \phi (x)\rangle =3+2-4=1\). The upper bound equals \(B(f)=4+6+1=11\) and is tight because the CSP \(A^{*}(f)\) is satisfiable (recall Theorem 1). In particular, \(\langle f, \phi (\textsf{b},\textsf{a})\rangle =11\).

Fig. 1
figure 1

Visualisations of two WCSPs f and d with structure as in Example 2. Variables (elements of V) are depicted as rounded rectangles, tuples (elements of T) as circles and line segments, and weights \(f_{t}\) (and \(d_{t}\)) are written next to the circles and line segments. Black circles and full lines indicate active tuples, whereas white nodes and dashed lines indicate non-active tuples

3.1 Minimal upper bound over reparametrizations

We say that a WCSP \(f\in \mathbb {R}^{T}\) is reparametrization of a WCSP \(g\in \mathbb {R}^{T}\) (also known as an equivalence-preserving transformation of g) [1,2,3, 5,6,7, 10, 11, 18] if

$$\begin{aligned} \langle f, \phi (x)\rangle =\langle g, \phi (x)\rangle \quad \forall x\in D^{V} . \end{aligned}$$
(11)

That is, \(f-g\in M^{\perp }\) where

$$\begin{aligned} M^{\perp } = \{ d\in \mathbb {R}^{T}\mid \langle d,\mu \rangle =0\; \forall \mu \in M\} = \{ d\in \mathbb {R}^{T}\mid \langle d,\phi (x)\rangle =0\; \forall x\in D^{V}\} \end{aligned}$$
(12)

is the orthogonal space [33, Chapter 1] of the set (6). Here, ‘d’ stands for ‘direction’ but note that any \(d\in M^{\perp }\), as a vector from \(\mathbb {R}^{T}\), can be also seen as a standalone WCSP. The set \(M^{\perp }\) is a subspace of \(\mathbb {R}^{T}\), consisting of all WCSPs that have zero objective value for all assignments. Although \(M^{\perp }\) is defined by an exponential number of equalities in (12), it has a simple, polynomial-sized description (for binary WCSP see [2, §B], for WCSPs of any arity see [11, §3.2]). An example of WCSP \(d\in M^{\perp }\) is in Fig. 1b. The set of all reparametrizations of f is the affine subspaceFootnote 6\(f+M^{\perp }=\{ f+d\mid d\in M^{\perp }\}\). Clearly, the binary relation ‘is a reparametrization of’ (on the set of WCSPs with a fixed structure) is reflexive, transitive and symmetric, hence an equivalence.

Given a WCSP \(g\in \mathbb {R}^{T}\), it is a natural idea to minimize the upper bound on its optimal value by reparametrizations:

$$\begin{aligned} \min \left\{ B(f) \mid f \text{ is a reparametrization of } g\right\} \;\;=\;\; \min _{f\in g+M^{\perp }} B(f) . \end{aligned}$$
(13)

By introducing auxiliary variables (as in Footnote 4), this problem can be transformed to a linear program, which is the dual LP relaxation of the WCSP g [1, 2, 5, 11]. Every f feasible for (13) satisfies

$$\begin{aligned} B(f) \ge \langle f, \phi (x)\rangle = \langle g, \phi (x)\rangle \quad \forall x\in D^{V} , \end{aligned}$$
(14)

i.e., B(f) is an upper bound on the optimal value \(\max _{x}\langle g, \phi (x)\rangle\) of WCSP g. If inequality (14) holds with equality for some x, then f is optimal for (13) and the LP relaxation is tight. Necessary and sufficient conditions for optimality can be obtained from complementary slackness, see [2, 5, 11].

Problem (13) has been widely studied [1, 2, 4,5,6, 18, 23] and many approaches for its (approximate) large-scale optimization have been proposed, typically based on block-coordinate descent [2, 7,8,9,10, 21, 25, 27] or constraint propagation [2, 6, 16, 22, 34]. If f is optimal for (13), then the CSP \(A^{*}(f)\) has a non-empty pairwise-consistency (PWC) closure (for binary WCSPs, PWC reduces to arc consistency) [11]. We conjecture that PWC is in general the strongest level of local consistency of \(A^{*}(f)\) that can be achieved by reparametrizations without enlarging the WCSP structure (i.e., without introducing new weight functions).

We remark that some approaches [6, 18] achieve only (generalized) arc consistency rather than PWC because they optimize over a subset of all possible reparametrizations corresponding to a subspace of \(M^{\perp }\). In this case, WCSPs f optimal for (13) have been called optimally soft arc consistent (OSAC).

3.2 Minimal upper bound over super-reparametrizations

We say that a WCSP \(f\in \mathbb {R}^{T}\) is a super-reparametrizationFootnote 7 of a WCSP \(g\in \mathbb {R}^{T}\) if

$$\begin{aligned} \langle f, \phi (x)\rangle \ge \langle g, \phi (x)\rangle \quad \forall x\in D^{V} . \end{aligned}$$
(15)

That is, \(f-g\in M^{*}\) where

$$\begin{aligned} M^{*} = \{ d\in \mathbb {R}^{T}\mid \langle d,\mu \rangle \ge 0\; \forall \mu \in M \} = \{ d\in \mathbb {R}^{T}\mid \langle d,\phi (x)\rangle \ge 0\; \forall x\in D^{V}\} \end{aligned}$$
(16)

is the dual cone [33, Chapter 1] to the set (6). It is a polyhedral convex cone, consisting of the WCSPs that have nonnegative objective value for all assignments. This cone contains a line because \(M^{\perp }\subseteq M^{*}\) and the subspace \(M^{\perp }\) is non-trivial (assuming \(|V|>1\)). Precisely, we have \(M^{*}\cap (-M^{*})=M^{\perp }\) where \(-M^{*}=\{-d\mid d\in M^{*}\}\). The set of all super-reparametrizations of f is the translated cone \(f+M^{*}=\left\{ f+d\mid d\in M^{*}\right\}\). For a given \(d\in \mathbb {R}^{T}\), deciding whether \(d\notin M^{*}\) is NP-complete, as shown later in Corollary 2.

The binary relation ‘is a super-reparametrization of’ (on the set of WCSPs with a fixed structure) induced by the convex cone \(M^{*}\) is reflexive and transitive, hence a preorder. It is not antisymmetric: \(f-g\in M^{*}\) and \(g-f\in M^{*}\) does not imply \(f=g\) but merely \(f-g\in M^{\perp }\), i.e., that f is a reparametrization of g. This is because the cone \(M^{*}\) may contain a line, see [35, §2] and [36, §2.4].

Remark 2

The optimal value (5) of a WCSP f can be also written as

$$\begin{aligned} \max _{\mu \in M} \langle f,\mu \rangle = \max _{\mu \in \text {conv} M} \langle f,\mu \rangle \end{aligned}$$
(17)

where \(\text {conv}\) denotes the convex hull operator [36]. The equality in (17) follows from the well-known fact that a linear function on a polytope attains its maximum in at least one vertex of the polytope [5, Corollary 3.44]. The set \(\text {conv} M\subseteq [0,1]^{T}\) is known as the marginal polytope and has the central role in approaches to WCSP based on linear programming (see [3, 5] and references therein). It is easy to show that

$$\begin{aligned} M^{*} = (\text {conv} M)^{*} = (\text {cone} M)^{*} \end{aligned}$$
(18)

where \(\text {cone}\) denotes the conic hull operator [36] and \(^{*}\) the dual cone operator. Thus, (16) can also be seen as the dual cone to the marginal polytope which, to the best of our knowledge, has not been mentioned before.

Following [25], we consider the problem

$$\begin{aligned} \min \left\{ B(f) \mid f \text{ is a super-reparametrization of } g \right\} \;\;=\;\; \min _{f\in g+M^{*}} B(f). \end{aligned}$$
(19)

Again, this can be reformulated as a linear program. Every f feasible for (19) (i.e., every super-reparametrization of g) satisfies

$$\begin{aligned} B(f) \ge \langle f, \phi (x)\rangle \ge \langle g, \phi (x)\rangle \qquad \forall x\in D^{V}. \end{aligned}$$
(20)

The next theorem characterizes optimal solutions:

Theorem 2

Let f be feasible for (19). The following are equivalent:

  1. (a)

    f is optimal for (19).

  2. (b)

    \(\displaystyle B(f) = \max _{x\in D^{V}} \langle f, \phi (x)\rangle = \max _{x\in D^{V}} \langle g, \phi (x)\rangle\)

  3. (c)

    CSP \(A^{*}(f)\) has a solution x satisfying \(\langle f, \phi (x)\rangle =\langle g, \phi (x)\rangle\).

Proof

(a)\(\Leftrightarrow\)(b): Denote \(m=\max _{x}\langle g, \phi (x)\rangle\), which by (20) implies \(B(f)\ge m\). To see that this bound is attained, define f by \(f_{t}=m/|C|\) for all \(t\in T\). It can be checked from (2) and (7) that \(B(f)=\langle f, \phi (x)\rangle =m\) for all x, so f is feasible and optimal.

(b)\(\Rightarrow\)(c): Since every feasible f satisfies (20), (b) implies \(B(f)=\langle f, \phi (x)\rangle =\langle g, \phi (x)\rangle\) for some x. By Theorem 1(b), this implies (c).

(c)\(\Rightarrow\)(b): By Theorem 1(b) together with (20), (c) implies \(B(f)=\langle f, \phi (x)\rangle =\langle g, \phi (x)\rangle\) for some x. Statement (b) now follows from Theorem 1(a).

Theorem 2 in particular says that the optimal value of (19) is equal to the optimal value of WCSP g (this has been observed already in [25, Theorem 1]). As stated in [25], this is not surprising because the complexity of the WCSP is hidden in the exponential set of constraints of (19). Let us remark that for \(f\in g+M^{*}\), deciding whether f is optimal for (19) is NP-complete, as shown later in Corollary 3.

Theorem 2 has a simple corollary:

Theorem 3

Let \(g\in \mathbb {R}^{T}\). CSP \(A^{*}(g)\) is satisfiable if and only if \(B(g)\le B(f)\) for every \(f\in g+M^{*}\).

Proof

By Theorem 2, \(A^{*}(g)\) is satisfiable if and only if (19) attains its optimum at the point \(f=g\), i.e., \(B(g)\le B(f)\) for every \(f\in g+M^{*}\).

4 Iterative method to improve the bound by super-reparametrizations

In this section we present an iterative method to suboptimally solve (19). Starting from a feasible solution to (19), every iteration finds a new feasible solution with a lower objective, which by (20) corresponds to decreasing the upper bound on the optimal value of the initial WCSP.

4.1 Outline of the method

Consider a WCSP f feasible for (19), i.e., \(f\in g+M^{*}\). By Theorem 2, a necessary (but not sufficient) condition for f to be optimal for (19) is that CSP \(A^{*}(f)\) is satisfiable. By Theorem 3, \(A^{*}(f)\) is satisfiable if and only if \(B(f)\le B(f^{\prime })\) for all \(f^{\prime }\in f+M^{*}\). In summary, we have the following implications and equivalences:

$$\begin{array}{ccc} f \text{ is optimal for (19)} &\Longrightarrow &{~} \text{CSP } A^{*}(f) \text { is satisfiable } \\ \left\Updownarrow \phantom{\sum_{1}^{2}}\right. &{}&{} \left\Updownarrow \phantom{\sum_{1}^{2}}\right.\\ B(f)\le B(f^{\prime }) \;\;\forall f^{\prime }\in g+M^{*} &{}\Longrightarrow &{} B(f)\le B(f^{\prime }) \;\;\forall f^{\prime }\in f+M^{*} \end{array}$$
(21)

The left-hand equivalence is just the definition of the optimum of (19), the right-hand equivalence is Theorem 3, and the top implication follows from Theorem 2. The bottom implication independently follows from transitivity of super-reparametrizations, which says that \(f^{\prime }\in f+M^{*}\) implies \(f^{\prime }\in g+M^{*}\) (assuming \(f\in g+M^{*}\)).

Suppose for the moment that we have an oracle that, for a given \(f\in \mathbb {R}^{T}\), decides if \(A^{*}(f)\) is satisfiable and if it is not, finds some \(f^{\prime }\in f+M^{*}\) such that \(B(f^{\prime })<B(f)\) (which exists by Theorem 3). By transitivity of super-reparametrizations, such \(f^{\prime }\) is feasible for (19). This suggests an iterative scheme to improve feasible solutions to (19). We initialize \(f^{0}:=g\) and then for \(k=0,1,2,\dots\) repeat the following iteration:

figure a

Note that transitivity of super-reparametrizations implies \(f^{k}\in f^{0}+M^{*}\) for every k, so every \(f^{k}\) is feasible for (19) as expected. An example of a single iteration is shown in Fig. 2a and 2b.

Fig. 2
figure 2

Example of one iteration on a binary WCSP whose (hyper)graph is a cycle of length 4

This iterative method belongs to the class of local search methods to solve (19): having a current feasible estimate \(f^{k}\), we search for the next estimate \(f^{k+1}\) with a strictly better objective within a neighborhood \(f^{k}+M^{*}\) of \(f^{k}\). We can define local optima of (19) with respect to this method to be super-reparametrizations f of g such that \(A^{*}(f)\) is satisfiable.

4.1.1 Properties of the method

Fig. 3
figure 3

The shrinking of the search space of the iterative method. The figure illustrates the translated cones \(f^{i}+M^{*}\) and several contours of the objective B(f). After the second iteration, all global minima of the original problem (marked in grey) become inaccessible as the right hand side of (23) increases

Fig. 4
figure 4

Illustration to the iterative scheme: B(g) and \(B(f^{k})\) are shown by the full lines, \(\max _{x}\langle g, \phi (x)\rangle\) and \(\max _{x} \langle f^{k}, \phi (x)\rangle\) are represented by the dashed lines

By transitivity of super-reparametrizations, for every k we have

$$\begin{aligned} f^{k+1}+M^{*} \subseteq f^{k}+M^{*} \end{aligned}$$
(22)

which holds with equality if and only if \(f^{k+1}\in f^{k}+M^{\perp }\) (i.e., \(f^{k+1}\) is a reparametrization of \(f^{k}\)). This shows that the search space of the method may shrink with increasing k, in other words, a larger and larger part of the feasible set \(f^{0}+M^{*}\) of (19) is cut off and becomes forever inaccessible. If, for some k, all (global) optima of (19) happen to lie in the cut-off part, the method has lost any chance to find a global optimum. This is illustrated in Fig. 3.

This has the following consequence. Every \(f^{k}\) satisfies

$$\begin{aligned} B(f^{k}) \;\;\ge \;\; \min _{f\in f^{k}+M^{*}} B(f) = \max _{x\in D^{V}} \langle f^{k}, \phi (x)\rangle . \end{aligned}$$
(23)

In every iteration, the left-hand side of inequality (23) decreases and the right-hand side increases or stays the same due to (22). If both sides meet for some k, the CSP \(A^{*}(f^{k})\) becomes satisfiable by Theorem 1(b) and the method stops. Monotonic increase of the right-hand side can be seen as ‘greediness’ of the method: if we could choose \(f^{k+1}\) from the initial feasible set \(f^{0}+M^{*}\) rather than from its subset \(f^{k}+M^{*}\), the right-hand side could also decrease. Any increase of the right-hand side is undesirable because the bounds \(B(f^{k})\) in future iterations will never be able to get below it. This is illustrated in Figs. 4 and 5. Unlike in (13), note that not every optimal assignment for WCSP f is optimal for WCSP g. We will return to this in §5.

If \(A^{*}(f^{k})\) is unsatisfiable, there are usually many vectors \(f^{k+1}\in f^{k}+M^{*}\) satisfying \(B(f^{k+1})<B(f^{k})\). We should choose among them the one that does not cause ‘too much’ shrinking of the search space and/or increase of the right-hand side of (23). Inclusion (22) holds with equality if and only if \(f^{k+1}\in f^{k}+M^{\perp }\), so whenever possible we should choose \(f^{k+1}\) to be a reparametrization (rather than just a super-reparametrization) of \(f^{k}\). Unfortunately, we know of no other useful theoretical results to help us choose \(f^{k+1}\), so we are left with heuristics. One natural heuristic is to choose \(f^{k+1}\) such that the vector \(f^{k+1}-f^{k}\) is sparse (i.e., has only a small number of non-zero components) and its positive components are small. Unfortunately, this can sometimes be too restrictive because, e.g., vectors from \(M^{\perp }\) can be dense and their components have unbounded magnitudes.

Fig. 5
figure 5

WCSP f is a super-reparametrization of WCSP g and this pair of WCSPs satisfies \(B(f)=11<B(g)=12\) and \(\max _{x\in D^{V}} \langle f, \phi (x)\rangle =11>\max _{x\in D^{V}} \langle g, \phi (x)\rangle =8\). Assignment \(x=(\textsf{b, b})\) is not optimal for g despite that \(B(f)=\langle f, \phi (x)\rangle\). The fact that f is a super-reparametrization of g can be verified by computing the objective value for each assignment, e.g., for assignment \(x=({\textsf{a }},{\textsf{ a}})\) we have \(\langle g, \phi (x)\rangle =3+4+1=8 \le \langle f, \phi (x)\rangle =3+2+4=9\)

4.1.2 Employing constraint propagation

So far we have assumed we can always decide if CSP \(A^{*}(f)\) is satisfiable. This is unrealistic because the CSP is NP-complete. Yet the approach remains applicable even if we detect unsatisfiability of \(A^{*}(f)\) only sometimes, e.g., using constraint propagation. Then our iteration changes to:

figure b

In this case, stopping points of the method will be even weaker local minima of (19), but they nevertheless might be still non-trivial and useful.

In the sequel we develop this approach in detail. In particular we show, if \(A^{*}(f^{k})\) is unsatisfiable, how to find a vector \(f^{k+1}\in f^{k}+M^{*}\) satisfying \(B(f^{k+1})<B(f^{k})\). We will do it in two steps. First (in §4.2), given the CSP \(A^{*}(f^{k})\) we find a direction \(d\in M^{*}\) using constraint propagation. This direction is a certificate of unsatisfiability of the CSP \(A^{*}(f^{k})\) and, at the same time, an improving direction for (19). Second (in §4.3), given d and \(f^{k}\), we find a step size \(\alpha >0\) such that \(f^{k+1}=f^{k}+\alpha d\) and \(B(f^{k})>B(f^{k+1})\). An example of such a certificate of unsatisfiability is shown in Fig. 2c.

4.1.3 Relation to existing approaches

The Augmenting DAG algorithm [2, 16] and the VAC algorithm [6] are (up to the precise way of computing certificates d and step sizes \(\alpha\)) an example of the described approach, which uses arc consistency to attempt to prove unsatisfiability of \(A^{*}(f^{k})\). In this favorable case, there exist certificates \(d\in M^{\perp }\), so we are, in fact, applying local search to (13) rather than (19). For stronger local consistencies, such certificates, in general, do not exist (i.e., inevitably \(\langle d, \phi (x)\rangle >0\) for some x).

The algorithm proposed in [25] can be also seen as an example of our approach. It interleaves iterations using arc consistency (in fact, the Augmenting DAG algorithm) and iterations using cycle consistency.

As an alternative to our approach, stronger local consistencies can be achieved by introducing new weight functions (of possibly higher arity) into the WCSP objective (2) and minimizing an upper bound by reparametrizations, as in [11, 21,22,23, 37]. In our particular case, after each update \(f^{k+1}=f^{k}+\alpha d\) we could introduce a new weight function with scope

$$\begin{aligned} S^{\prime } = \bigcup \left\{ S \mid (S,k)\in T, \; d_{S}(k)\ne 0\right\} \end{aligned}$$
(24)

and weights

$$\begin{aligned} f_{S^{\prime }}(k) = -\alpha \underset{S \subseteq S^{\prime }}{\underset{S \in C}{\sum }} d_{S}(k[S]) \end{aligned}$$
(25)

where \(k\in D^{S^{\prime }}\). Notice that such an added weight function would not increase the bound (7) since its weights are non-positive due to the fact that it needs to decrease the objective value for some assignments. In this view, our approach can be seen as enforcing stronger local consistencies but omitting these compensatory higher-order weight functions, thus saving memory.

Finally, the described approach can be seen as an example of the primal-dual approach [38] to optimize linear programs using constraint propagation. In detail, [38] proposed to construct the complementary slackness system for a given feasible solution and apply constraint propagation to detect if the system is satisfiable. If it is not satisfiable, this implies the existence of a certificate of unsatisfiability that can be used to improve the current solution. In our particular case, if (19) is formulated as a linear program, then the complementary slackness conditions (expressed in terms of the dual variables) are equivalent to the optimality conditions stated in Theorem 2 expressed as a set of linear equalities with an exponential number of non-negative variables. Applying constraint propagation on this system is in correspondence with constraint propagation on a CSP.

4.2 Certificates of unsatisfiability of CSP

Constraint propagationFootnote 8 is an iterative algorithm, which in each iteration (executed by a propagator) infers that some allowed tuples \(R\subseteq A\) of a current CSP \(A\subseteq T\) can be forbidden without changing its solution set, i.e., \(\text {SOL}(A)=\text {SOL}(A-R)\), and forbids these tuples, i.e., sets \(A:=A-R\). The algorithm terminates when it is no longer able to forbid any tuples (in which case the propagator returns \(R=\emptyset\)) or when it becomes explicit that the current CSP is unsatisfiable. The former usually happens when the CSP achieves some local consistency level \(\Phi\). The latter happens if \(A\cap T_{S}=\emptyset\) for some \(S\in C\), which implies unsatisfiability of A becauseFootnote 9 every assignment has to use one tuple from each \(T_{S}\).

In this section, we show how to augment constraint propagation so that if it proves a CSP unsatisfiable, it also provides its certificate of unsatisfiability \(d\in M^{*}\). This certificate is needed as an improving direction for (19), as was mentioned in §4.1.2. First, in §4.2.1, we introduce a more general concept, deactivating directions. One iteration of constraint propagation constructs an R-deactivating direction for the current CSP A, which certifies that \(\text {SOL}(A)=\text {SOL}(A-R)\). Then, in §4.2.2, we show how to compose the deactivating directions obtained from individual iterations of constraint propagation to a single deactivating direction for the initial CSP. If the initial CSP has been proved unsatisfiable by the propagation, this composed deactivating direction is then its certificate of unsatisfiability.

4.2.1 Deactivating directions

Definition 1

Let \(A\subseteq T\) and \(R \subseteq A\), \(R\ne \emptyset\). An R-deactivating direction for CSP A is a vector \(d \in M^{*}\) satisfying

  1. (a)

    \(d_{t} < 0\) for all \(t \in R\),

  2. (b)

    \(d_{t} = 0\) for all \(t \in A-R\).

For fixed A and R, all R-deactivating directions for A form a convex cone. Here, we show one way of constructing a deactivating direction:

Theorem 4

Let \(R\subseteq A\subseteq T\) be such that \(\text {SOL}(A)=\text {SOL}(A-R)\) and \(R\ne \emptyset\). Denote Footnote 10

$$\begin{aligned} \delta =|{\left\{ S \in C \mid T_{S}\cap R\ne \emptyset \right\} }|. \end{aligned}$$
(26)

Then vector \(d\in \mathbb {R}^{T}\) with components

$$\begin{aligned} d_{t} = \left\{ \begin{array}{lll} -1 &{} \text {if } t \in R\\ \delta &{} \text {if } t \in T-A\\ 0 &{} \text {otherwise (i.e., } t\in A-R\text {)} \end{array}\right. \end{aligned}$$
(27)

is an R-deactivating direction for A.

Proof

Conditions (a) and (b) of Definition 1 are clearly satisfied, so it only remains to show that \(d\in M^{*}\). We have

$$\begin{aligned} \langle d, \phi (x)\rangle = \sum \limits _{t\in T} d_{t}\phi _{t}(x) = \sum \limits _{t\in R}-\phi _{t}(x) + \sum \limits _{t\in T-A}\delta \phi _{t}(x) = -n_{1}(x)+\delta n_{2}(x) \end{aligned}$$
(28)

where \(n_{1}(x)=|{\left\{ S\in C \mid (S,x[S])\in R\right\} }|\) and \(n_{2}(x)=|{\left\{ S\in C \mid (S,x[S])\in T-A\right\} }|\).

For contradiction, let \(x\in D^{V}\) satisfy \(\langle d, \phi (x)\rangle <0\). This implies \(n_{1}(x)>0\) and \(n_{2}(x)=0\), where the latter is because \(n_{1}(x)\le \delta\) by the definition of \(\delta\). That is, we have \((S^{*},x[S^{*}])\in R\) for some \(S^{*}\in C\) and \((S,x[S])\in A\) for all \(S\in C\). But the latter means \(x\in \text {SOL}(A)\) and the former implies \(x\notin \text {SOL}(A-R)\), a contradiction.

Theorem 5

Let \(A\subseteq T\) and \(R \subseteq A\). If there exists an R-deactivating direction for A, then \(\text {SOL}(A)=\text {SOL}(A-R)\).

Proof

Observe that \(\text {SOL}(A)=\text {SOL}(A-R)\) is equivalent to \(\text {SOL}(A)\subseteq \text {SOL}(A-R)\) because forbidding tuples may only remove solutions, i.e., \(\text {SOL}\) is an isotone map (see §5.1).

Let d be an R-deactivating direction for A and let \(x\in \text {SOL}(A)-\text {SOL}(A-R)\), so \((S,x[S])\in R\) for some \(S\in C\). By (4), we have \(\langle d, \phi (x)\rangle < 0\) because \(d_{S}(x[S]) = 0\) for all \((S,x[S])\in A-R\) by condition (b) in Definition 1 and \(d_{S}(x[S]) < 0\) for all \((S,x[S])\in R\) by condition (a). This contradicts \(d\in M^{*}\).

Combining Theorems 4 and 5 yields that for any \(R\subseteq A\) with \(R\ne \emptyset\), an R-deactivating direction for A exists if and only if \(\text {SOL}(A)=\text {SOL}(A-R)\). Thus, any R-deactivating direction for A is a certificate of the fact that \(\text {SOL}(A)=\text {SOL}(A-R)\).

Unfortunately, vectors d calculated naively by (27) can have many non-zero components, which is undesirable as explained in §4.1.1. However, it is clear from Definition 1 that if \(A\subseteq A^{\prime }\subseteq T\) and d is an R-deactivating direction for \(A^{\prime }\), then d is an R-deactivating direction also for A. Moreover, (27) shows that larger sets A give rise to sparser vectors d. This offers us a possibility to obtain a sparser R-deactivating direction for A if we can provide a superset \(A^{\prime }\supseteq A\) of the allowed tuples satisfying \(\text {SOL}(A^{\prime })=\text {SOL}(A^{\prime }-R)\).

Given \(A\subseteq T\) and \(R\subseteq A\), finding a maximal (w.r.t. the partial ordering by inclusion) superset \(A^{\prime }\supseteq A\) such that \(\text {SOL}(A^{\prime })=\text {SOL}(A^{\prime }-R)\) is closely related to finding a minimal unsatisfiable coreFootnote 11 of an unsatisfiable CSP. While finding a maximal such subset is very likely intractableFootnote 12, for obtaining a ‘sparse enough’ vector d it suffices to find a ‘large enough’ such superset \(A^{\prime }\). Such a superset is often cheaply available as a side result of executing the propagator. Namely, we take \(A^{\prime }=T-P\) where P is the set of forbidden tuples that were visited during the run of the propagator. Clearly, tuples not visited by the propagator could not be needed to infer \(\text {SOL}(A)=\text {SOL}(A-R)\). Note that P need not be the same for each CSP instance, even for a fixed level of local consistency: for example, if the arc consistency closure of A is empty, then A is unsatisfiable but a domain wipe-out may occur sooner or later depending on A, which affects which tuples needed to be visited.

Let us emphasize that an R-deactivating direction for A need not be always obtained using formula (27), any other method can be used as long as d satisfies Definition 1. We will now give examples of deactivating directions corresponding to some popular constraint propagation rules. In these examples, we assume that our CSP contains all unary constraints (i.e., \(\{i\}\in C\) for each \(i\in V\)), so that rather than deleting domain values we can forbid tuples of the unary constraints.

Example 3

Let us consider (generalized) arc consistency (AC). A CSP A is (G)AC if for all \(S\in C\), \(i\in S\) and \(k\in D\) we have the equivalence Footnote 13

$$\begin{aligned} (\{i\},k)\in A \quad \iff \quad (\exists l\in D^{S}:(S,l)\in A, \; l_{i}=k) . \end{aligned}$$
(29)

If, for some \(S\in C\), \(i\in S\) and \(k\in D\), the left-hand statement in (29) is true and the right-hand statement is false, the AC propagator infers \(\text {SOL}(A)=\text {SOL}(A-R)\) where \(R=\{(\{i\},k)\}\). To infer this, it suffices to know that the tuples \(P=\{ (S,l) \mid l\in D^{S}, \; {l_{i}=k}\}\) are all forbidden. An R-deactivating direction d for A can be chosen as in (27) where \(\delta =|{\{ S^{\prime }\in C \mid T_{S^{\prime }}\cap R\ne \emptyset \} }|=1\) and A is replaced by \(T-P\). Note that then we have \(d\in M^{\perp }\).

If the left-hand statement in (29) is false and the right-hand statement is true, the AC propagator infers \(\text {SOL}(A)=\text {SOL}(A-R)\) where \(R=\{ (S,l) \mid l\in D^{S}, \; l_{i}=k\} \cap A\). To infer this, it suffices to know that the tuple \(P=\{(\{i\},k)\}\) is forbidden. In this particular case, rather than using (27) (with A replaced by \(T-P\)), it is better to choose d as

$$\begin{aligned} d_{t} = \left\{ \begin{array}{lll} -1 &{} \text {if } t \in \{ (S,l) \mid l\in D^{S}, \; l_{i}=k\} \\ 1 &{} \text {if } t\in P\\ 0 &{} \text {otherwise} \end{array}\right. . \end{aligned}$$
(30)

Vector (30) satisfies \(d\in M^{\perp }\), in contrast to vector (27) which satisfies only \(d\in M^{*}\). Thus, the update \(f^{k+1}=f^{k}+\alpha d\) is a mere reparametrization, which is desirable as explained in §4.1.1.

We note that reparametrizations considered in the previous paragraphs correspond to soft arc consistency operations extend and project in [6].

Example 4

We now consider cycle consistency as defined in [25].Footnote 14 As this local consistency was defined only for binary CSPs, we assume that \(|S|\le 2\) for each \(S\in C\) and denote \(E=\left\{ S\in C \mid |S|=2\right\}\), so that (VE) is an undirected graph. Let \(\mathcal {L}\) be a (polynomially sized) set of cycles in the graph (VE). A CSP A is cycle consistent w.r.t. \(\mathcal {L}\) if for each tuple \((\{i\},k)\in A\) (where \(i\in V\) and \(k\in D\)) and each cycle \(L\in \mathcal {L}\) that passes through node \(i\in V\), there exists an assignment x with \(x_{i}=k\) that uses only allowed tuples in cycle L. It can be shown that the cycle repair procedure in [25] constructs a deactivating direction whenever an inconsistent cycle is found. Moreover, the constructed direction in this case coincides with (27) where A is replaced by \(T-P\) for a suitable set P that contains a subset of the forbidden tuples within the cycle.

Example 5

Recall that a CSP A is singleton arc consistent (SAC) if for every tuple \(t=(\{i\},k)\in A\) (where \(i\in V\) and \(k\in D\)), the CSPFootnote 15\(A\vert _{x_{i}=k} =A-(T_{\{i\}}-\{(\{i\},k)\})\) has a non-empty arc-consistency closure. Good (i.e., sparse) deactivating directions for SAC can be obtained as follows. For some \((\{i\},k)\in A\), we enforce arc consistency of CSP \(A|_{x_{i}=k}\), during which we store the causes for forbidding each tuple. If \(A|_{x_{i}=k}\) is found to have empty AC closure, we backtrack and identify only those tuples which were necessary to prove the empty AC closure. These tuples form the set P. The deactivating direction is then constructed as in (27) where \(R=\{(\{i\},k)\}\) and A is replaced by \(T-P\). Note that SAC does not have bounded support as many other local consistencies [26] do, so the size of P can be significantly different for different CSP instances. We show a detailed example of constructing a deactivating direction using SAC in Appendix.

4.2.2 Composing deactivating directions

Consider now a propagator which, for a current CSP \(A\subseteq T\), returns a set \(R\subseteq A\) such that \(\text {SOL}(A)=\text {SOL}(A-R)\) and an R-deactivating direction for A. This propagator is applied iteratively, each time forbidding a different set of tuples, until the current CSP achieves the desired local consistency level \(\Phi\) or it becomes explicit that the CSP is unsatisfiable. This is outlined in Algorithm 1, which stores the generated sets \(R_{i}\) of tuples being forbidden and the corresponding \(R_{i}\)-deactivating directions \(d^{i}\). By line 5 of the algorithm, we have \(A_{i} = A-\bigcup _{j=0}^{i-1} R_{j}\) for every \(i \in \{0, \ldots , n+1\}\). Therefore, by Theorem 5, we have \(\text {SOL}(A)=\text {SOL}(A_{1})=\text {SOL}(A_{2})=\cdots =\text {SOL}(A_{n+1})\), which implies that if \(A_{n+1}\) is unsatisfiable then so is A.

Algorithm 1 The procedure propagate applies constraint propagation to CSP \(A\subseteq T\) and returns the sequence \((R_{i})_{i=0}^{n}\) of tuple sets that were forbidden and the corresponding deactivating directions \((d^{i})_{i=0}^{n}\). If all tuples in some scope \(S \in C\) become forbidden during propagation, propagate returns also S, otherwise it returns \(S=\emptyset\).

In this section, we show how to compose the generated sequence of \(R_{i}\)-deactivating directions \(d^{i}\) for \(A_{i}\) into a single \(\big (\bigcup _{i=0}^{n} R_{i}\big )\)-deactivating direction for A. This can be done using the following composition rule:

Theorem 6

Let \(A\subseteq T\) and \(R,R^{\prime }\subseteq A\) where \(R\cap R^{\prime }=\emptyset\). Let d be an R-deactivating direction for A. Let \(d^{\prime }\) be an \(R^{\prime }\)-deactivating direction for \(A-R\). Let

$$\begin{aligned} \delta = \left\{ \begin{array}{ll} 0 &{} \text {if } d_{t}^{\prime } \le -1 \text { for all } t \in R ,\\ \max \{ (-1-d_{t}^{\prime })/d_{t} \mid t \in R, \; d_{t}^{\prime } > -1\} &{} \text {otherwise} . \end{array}\right. \end{aligned}$$
(31)

Then \(d^{\prime \prime } = d^{\prime }+\delta d\) is an \((R\cup R^{\prime })\)-deactivating direction for A.

Proof

First, if \(d_{t}^{\prime } \le -1\) for all \(t \in R\), then \(d^{\prime \prime }=d^{\prime }\) satisfies the required condition immediately. Otherwise, \(\delta > 0\) since \(d_{t} < 0\) for all \(t \in R\) by definition and \(-1-d^{\prime }_{t} < 0\) due to \(d^{\prime }_{t}>-1\) in the definition of \(\delta\). We will show that \(d^{\prime \prime }\) satisfies the conditions in Definition 1.

For \(t \in R\) with \(d^{\prime }_{t} \le -1\), \(d^{\prime \prime }_{t} = d^{\prime }_{t} + \delta d_{t} < d^{\prime }_{t} \le -1\) because \(\delta d_{t} < 0\). If \(t\in R\) and \(d^{\prime }_{t} > -1\), then \(\delta \ge (-1-d^{\prime }_{t})/d_{t}\), so \(d^{\prime \prime }_{t} = d_{t}^{\prime }+\delta d_{t} \le -1\). Summarizing, we have \(d^{\prime \prime }_{t}<0\) for all \(t\in R\).

For \(t\in R^{\prime }\), \(d^{\prime }_{t}<0\) and \(d_{t} = 0\) holds by definition due to \(R^{\prime }\subseteq A-R\), thus \(d^{\prime \prime }_{t} = d_{t}^{\prime }+\delta d_{t} = d_{t}^{\prime } < 0\) which together with the previous paragraph yields condition (a).

Due to \(A-R \supseteq (A-R)-R^{\prime }=A-(R\cup R^{\prime })\), for any \(t\in A-(R\cup R^{\prime })\) we have \(d_{t} = 0\) and \(d_{t}^{\prime } = 0\), which implies \(d^{\prime \prime }_{t} = d^{\prime }+\delta d = 0\), thus verifying condition (b).

Finally, we have \(d^{\prime \prime }\in M^{*}\) because \(d,d^{\prime }\in M^{*}\) and \(\delta \ge 0\).

Theorem 6 allows us to combine \(R_{i}\)-deactivating direction \(d^{i}\) for \(A_{i}=A_{i-1}- R_{i-1}\) with \(R_{i-1}\)-deactivating direction \(d^{i-1}\) for \(A_{i-1}\) into a single \((R_{i-1}\cup R_{i})\)-deactivating direction for \(A_{i-1}\). Iteratively, we can thus gradually build a \(\bigl (\bigcup _{i=0}^{n} R_{i}\bigr )\)-deactivating direction for A, which certifies unsatisfiability of A whenever Algorithm 1 detects on line 6 that \(A_{n+1}\) (and thus also A) is unsatisfiable.

However, it is not always necessary to construct a full \(\bigl (\bigcup _{i=0}^{n} R_{i}\bigr )\)-deactivating direction because not every iteration of constraint propagation may have been necessary to prove unsatisfiability of A. Instead, we can use the scope \(S\in C\) satisfying \(A_{n+1}\cap T_{S}=\emptyset\) (where \(A_{n+1}=A-\bigcup _{i=0}^{n} R_{i}\), as mentioned above) returned by Algorithm 1 on line 7 and construct an \(R^*\)-deactivating direction \(d^{*}\) for a (usually smaller) set \(R^{*} \subseteq \bigcup _{i=0}^{n} R_{i}\) such that \((A-R^{*})\cap T_{S}=\emptyset\). Such a direction \(d^{*}\) still certifies unsatisfiability of A and can be sparser and/or may have lower objective values \(\langle d^{*}, \phi (x)\rangle\) than a \(\bigl (\bigcup _{i=0}^{n} R_{i}\bigr )\)-deactivating direction, which is desirable as explained in §4.1.1.

This is outlined in Algorithm 2, which composes only a subsequence of directions \(d^{i}\) based on a given set of indices \(I \subseteq \{0,\dots ,n\}\) and constructs an \(R^*\)-deactivating direction with \(R^* \supseteq \bigcup _{i \in I}R_{i}\). Although Algorithm 2 is applicable for any set I, in our case I is obtained by taking a scope \(S\in C\) such that \(A_{n+1}\cap T_{S}=\emptyset\) and then setting

$$\begin{aligned} I = \left\{ i\in \{0,\dots ,n\} \mid R_{i}\cap T_{S}\ne \emptyset \right\} \end{aligned}$$
(32)

so that \((A-R^*)\cap T_{S} = \emptyset\) due to the following fact:

Proposition 1

Let \(S\in C\) be such that \((A-\bigcup _{i=0}^{n}R_{i})\cap T_{S}=\emptyset\). Let I be given by (32). Then \((A-\bigcup _{i\in I}R_{i})\cap T_{S}=\emptyset\).

Proof

For any sets \(A,R,T^{\prime }\subseteq T\) we have \((A-R)\cap T^{\prime }=(T^{\prime }-R)\cap A\). In particular, \((A-\bigcup _{i=0}^{n}R_{i})\cap T_{S} = (T_{S} - \bigcup _{i=0}^{n}R_{i}) \cap A\). But \(T_{S} - \bigcup _{i=0}^{n}R_{i} = T_{S} - \bigcup _{i\in I}R_{i}\) because for each \(i\notin I\) we have \(R_{i}\cap T_{S}=\emptyset\) which is equivalent to \(T_{S}-R_{i}=T_{S}\).

Algorithm 2 The procedure compose takes the sequences \((R_{i})_{i=0}^{n}\) and \((d^{i})_{i=0}^{n}\) (generated by the procedure propagate) and a non-empty index set \(I \subseteq \{0,\dots ,n\}\) and composes them to an \(R^*\)-deactivating direction \(d^{*}\) for A

Correctness of Algorithm 2 is given by the following theorem:

Proposition 2

Algorithm 2 returns an \(R^*\)-deactivating direction \(d^{*}\) for A where \(\bigcup _{i\in I}R_{i} \subseteq R^* \subseteq \bigcup _{i=0}^{n} R_{i}\).

Proof

The fact that \(R^{*} \supseteq \bigcup _{i\in I}R_{i}\) is obvious due to \(R_{\max I}\subseteq R^*\) by initialization on line 2 and \(R_{i}\subseteq R^*\) for any \(i\in I\) such that \(i<\max I\) because in such case the update on line 7 is performed. Similarly, \(R^*\subseteq \bigcup _{i=0}^{n} R_{i}\) holds by initialization of \(R^*\) on line 2 and updates on line 7.

It remains to show that \(d^{*}\) is \(R^*\)-deactivating, which we will do by induction. We claim that vector \(d^{*}\) is always \(R^*\)-deactivating direction for \(A_{i}\) on line 3 and \(R^*\)-deactivating direction for \(A_{i+1}\) on line 5.

Initially, we have \(d^{*}=d^{i}\), so \(d^{*}\) is \(R_{i}\)-deactivating (i.e., \(R^*\)-deactivating since \(R^*=R_{i}\) before the loop is entered) for \(A_{i}\). Also, when vector \(d^{*}\) is first queried on line 5, i decreased by 1 due to the update on line 4, so \(d^{*}\) is \(R^*\)-deactivating for \(A_{i+1}\). The required property thus holds when the condition on line 5 is first queried with \(i=\max I-1\).

We proceed with the inductive step. If the condition on line 5 is not satisfied, then necessarily \(d^{*}_{t}=0\) for all \(t\in R_{i}\). So, if \(d^{*}\) is \(R^*\)-deactivating for \(A_{i+1}\), then it is also \(R^*\)-deactivating for \(A_{i}=A_{i+1}\cup R_{i}\), as seen from Definition 1.

If the condition on line 5 is satisfied, \(d^{*}\) is \(R^*\)-deactivating for \(A_{i+1}\) before the update on lines 6-7. Since \(A_{i+1}=A_{i}-R_{i}\) and \(d^{i}\) is \(R_{i}\)-deactivating for \(A_{i}\), Theorem 6 can be applied to \(d^{i}\) and \(d^{*}\) to obtain an \((R^*\cup R_{i})\)-deactivating direction for \(A_{i}\). After updating \(R^*\) on line 7, it becomes \(R^*\)-deactivating for \(A_{i}\).

When eventually \(i=0\), \(d^{*}\) is \(R^*\)-deactivating for \(A_{0}=A\) by line 2 in Algorithm 1.

Remark 3

This is similar to what the VAC [6] or Augmenting DAG algorithm [2, 16] do for arc consistency. To attempt to disprove satisfiability of CSP \(A^{*}(f)\), these algorithms enforce AC of \(A^{*}(f)\), during which the causes for forbidding tuples are stored. If the empty AC closure of \(A^{*}(f)\) is detected (which corresponds to \(T_{S}\cap A_{n+1}=\emptyset\) for some \(S\in C\)), these algorithms do not iterate through all previously forbidden tuples but only trace back the causes for forbidding the elements of the wiped-out domain (here, the elements of \(T_{S}\)).

4.3 Line search

In §4.2 we showed how to construct an R-deactivating direction d for a CSP A, which certifies unsatisfiability of A whenever \((A-R)\cap T_{S}=\emptyset\) for some \(S\in C\). Given a WCSP \(f\in \mathbb {R}^{T}\) with \(A^{*}(f)=A\), to obtain \(f^{\prime }\in f+M^{*}\) with \(B(f^{\prime })<B(f)\) (as in Theorem 3), we need to find a step size \(\alpha >0\) so that \(f^{\prime }=f+\alpha d\), as discussed in §4.1.2. That means, we need to find \(\alpha >0\) such that \(B(f+\alpha d)<B(f)\). This task is known in numerical optimization as line search.

Finding the best step size (i.e., exact line search) would require finding a global minimum of the univariate convex piecewise-affine function \(\alpha \mapsto B(f+\alpha d)\). As this would be too expensive for large WCSP instances, we find only a suboptimal step size (approximate line search) in the following theorem.Footnote 16

Theorem 7

Let \(f \in \mathbb {R}^{T}\). Let d be an R-deactivating direction for \(A^{*}(f)\). Denote Footnote 17

$$\begin{aligned} \beta= & {} \min \left\{ \frac{\max\nolimits _{t\in T_{S^{\prime }}}f_{t}-f_{t^{\prime }}}{d_{t^{\prime }}} \bigg \vert S^{\prime }\in C, \; t^{\prime }\in T_{S^{\prime }}, \; d_{t^{\prime }}> 0 \right\} , \\ \gamma= & {} \min \left\{ \,\frac{f_{t}-f_{t^{\prime }}}{d_{t^{\prime }}-d_{t}} \;\bigg \vert \; S\in C, \; (A^{*}(f)- R)\cap T_{S}=\emptyset , \right. \\{} &\qquad\qquad\qquad\quad\;\; {} t \in T_{S}\cap R, \; t^{\prime }\in T_{S}-R, \; d_{t^{\prime }} > d_{t}\,\bigg\} . \end{aligned}$$

Then \(\beta ,\gamma >0\) and for every \(S\in C\) and \(\alpha \in \mathbb {R}\), WCSP \(f^{\prime }= f + \alpha d\) satisfies:

  1. (a)

    If \((A^{*}(f)-R)\cap T_{S}\ne \emptyset\) and \(0\le \alpha \le \beta\), then \(\max _{t \in T_{S}}f^{\prime }_{t} = \max _{t \in T_{S}}f_{t}\).

  2. (b)

    If \((A^{*}(f)-R)\cap T_{S}\ne \emptyset\) and \(0<\alpha <\beta\), then \(A^{*}(f^{\prime })\cap T_{S} = (A^{*}(f)-R)\cap T_{S}\).

  3. (c)

    If \((A^{*}(f)-R)\cap T_{S}=\emptyset\) and \(0<\alpha \le \min \{\beta ,\gamma \}\), then \(\max _{t \in T_{S}}f^{\prime }_{t} < \max _{t \in T_{S}}f_{t}\).

Proof

We have \(\beta >0\) because \(d_{t^{\prime }} > 0\) implies \(t^{\prime }\) is an inactive tuple, so \(\max _{t \in T_{S}}f_{t}>f_{t^{\prime }}\). We have \(\gamma >0\) because in \(f_{t}-f_{t^{\prime }}\) tuple t is always active and \(t^{\prime }\) is inactive, hence \(f_{t} > f_{t^{\prime }}\).

To prove (a), let \(t^{*}\in (A^{*}(f)-R)\cap T_{S}\). Hence, by Definition 1, \(d_{t^{*}} = 0\) and the value \(\max _{t \in T_{S}}f^{\prime }_{t}\) does not decrease for any \(\alpha\) since \(f^{\prime }_{t^{*}}=f_{t^{*}}+\alpha d_{t^{*}}=f_{t^{*}}\). To show the maximum does not increase, consider a tuple \(t^{\prime } \in T_{S}\) such that \(d_{t^{\prime }}>0\) (due to \(\alpha \ge 0\), tuples with \(d_{t^{\prime }}\le 0\) cannot increase the maximum). It follows that \(\alpha \le \beta \le \tfrac{\max _{t \in T_{S}}f_{t}-f_{t^{\prime }}}{d_{t^{\prime }}}\), so \(f^{\prime }_{t^{\prime }} = f_{t^{\prime }}+d_{t^{\prime }}\alpha \le \max _{t \in T_{S}} f_{t}\).

To prove (b), let \((A^{*}(f)-R)\cap T_{S}\ne \emptyset\). As in (a), we have \(\max _{t\in T_{S}}f_{t}=\max _{t\in T_{S}}f_{t}^{\prime }\). If \(t \in (A^{*}(f)-R)\cap T_{S}\), then \(d_{t} = 0\) and such tuples remain active by \(f_{t}^{\prime }=f_{t}\). Tuples \(t \in R\cap T_{S}\) become inactive since \(f^{\prime }_{t} = f_{t}+d_{t}\alpha <f_{t} = \max _{t^{\prime } \in T_{S}}f_{t^{\prime }}\) by \(d_{t} < 0\) and \(\alpha > 0\). Tuples \(t\notin A^{*}(f)\) either satisfy \(d_{t}\le 0\) and cannot become active or satisfy \(d_{t}> 0\) and by \(\alpha < \beta \le \tfrac{\max _{t^{\prime }\in T_{S}}f_{t^{\prime }}-f_{t}}{d_{t}}\), \(f^{\prime }_{t} = f_{t}+d_{t}\alpha < \max _{t^{\prime } \in T_{S}} f_{t^{\prime }}\), so \(t\notin A^{*}(f^{\prime })\).

To prove (c), let \((A^{*}(f)-R)\cap T_{S}=\emptyset\). For all \(t \in T_{S}\cap R\), we have \(f^{\prime }_{t} = f_{t}+\alpha d_{t}<f_{t}\) by \(d_{t} < 0\) and \(\alpha > 0\), i.e., \(\max _{t \in T_{S}\cap R} f^{\prime }_{t} < \max _{t \in T_{S}\cap R} f_{t}\). We proceed to show that \(f_{t}^{\prime }\le \max _{t^{\prime } \in T_{S}\cap R} f^{\prime }_{t^{\prime }}\) for every \(t^{\prime }\in T_{S}-R\). Let \(t^{*}\in T_{S}\cap R\) satisfy \(f_{t^{*}}^{\prime }=\max _{t \in T_{S}\cap R} f^{\prime }_{t}\). If \(d_{t^{\prime }}>d_{t^{*}}\), \(\alpha \le \gamma \le \tfrac{f_{t^{*}}-f_{t^{\prime }}}{d_{t^{\prime }}-d_{t^{*}}}\) implies \(f^{\prime }_{t^{*}} = f_{t^{*}} + \alpha d_{t^{*}} \ge f_{t^{\prime }}+\alpha d_{t^{\prime }} = f^{\prime }_{t^{\prime }}\). If \(d_{t^{\prime }} \le d_{t^{*}}\), then also \(\alpha d_{t^{\prime }}\le \alpha d_{t^{*}}\) and \(f^{\prime }_{t^{\prime }} = f_{t^{\prime }} + \alpha d_{t^{\prime }} \le f_{t^{*}}+\alpha d_{t^{*}} = f^{\prime }_{t^{*}}\) holds for any \(\alpha \ge 0\) since \(f_{t^{\prime }}< f_{t^{*}}\). As a result, \(\max _{t^{\prime } \in T_{S}-R} f^{\prime }_{t^{\prime }} \le \max _{t \in T_{S}\cap R}f_{t}^{\prime } < \max _{t \in T_{S}\cap R} f_{t} = \max _{t\in T_{S}} f_{t}\).

If d is an R-deactivating direction for CSP \(A^{*}(f)\) and for all \(S\in C\) we have \((A^{*}(f)-R)\cap T_{S}\ne \emptyset\) then, by Theorem 7(a,b), there is \(\alpha >0\) such that \(f^{\prime }=f+\alpha d\) satisfies \(B(f^{\prime })=B(f)\) and \(A^{*}(f^{\prime })=A^{*}(f)-R\). This justifies why such direction d is called R-deactivating: a suitable update of f along this direction makes tuples R inactive for f.

Remark 4

This might suggest that to improve the current bound B(f), we need not use Algorithm 2 to construct an \(R^*\)-deactivating direction \(d^{*}\) with \((A^{*}(f)-R^*)\cap T_{S}=\emptyset\) for some \(S \in C\), but instead, perform steps using the intermediate \(R_{i}\)-deactivating directions \(d^{i}\) to create a sequence \(f^{i+1}=f^{i}+\alpha _{i} d^{i}\) satisfying \(B(f^{0})=B(f^{1})=\cdots =B(f^{n})>B(f^{n+1})\). Unfortunately, it is hard to make this work reliably as there are many choices for the intermediate step sizes \(0<\alpha _{i}<\beta _{i}\). We empirically found Algorithm 3 to be preferable.

If d is an R-deactivating direction for \(A^{*}(f)\) and for some \(S\in C\) we have \((A^{*}(f)-R)\cap T_{S}=\emptyset\), then, by Theorem 7(a,c), there is \(\alpha >0\) such that \(f^{\prime }=f+\alpha d\) satisfies \(B(f^{\prime })<B(f)\). The following corollary of Theorem 7 finally justifies why the certificate d of unsatisfiability of CSP \(A^{*}(f)\) is an improving direction for (19):

Corollary 1

CSP \(A\subseteq T\) is unsatisfiable if and only if there is \(d\in M^{*}\) such that for every \(f\in \mathbb {R}^{T}\) with \(A=A^{*}(f)\) there exists \(\alpha >0\) such that \(B(f+\alpha d)<B(f)\)

Proof

First, if for some \(S\in C\) we have that \(A\cap T_{S}=\emptyset\), A is unsatisfiable and no \(f\in \mathbb {R}^{T}\) satisfies \(A=A^{*}(f)\), so the second condition is trivially satisfied by choosing any \(d\in M^{*}\).

Otherwise, let d be any A-deactivating direction (which exists by Theorem 4). It follows from Theorem 7 that for any \(f\in \mathbb {R}^{T}\) with \(A^{*}(f)=A\), we can compute a suitable step size \(\alpha > 0\) such that \(B(f+\alpha d)<B(f)\). The remaining part follows from Theorem 3.

4.4 Final algorithm

Algorithm 3 The final algorithm to iteratively improve feasible solutions to (19)

Having certificates of unsatisfiability from §4.2 and step sizes from §4.3, we can now formulate in detail the iterative method outlined in §4.1.2, see Algorithm 3. First, constraint propagation is applied to CSP \(A^{*}(f)\) by Algorithm 1 until either \(A^{*}(f)\) is proved unsatisfiable or no more propagation is possible. In the latter case, the algorithm halts and returns B(f) as the best achieved upper bound on the optimal value of WCSP g. Otherwise, if \(A^{*}(f)\) is proved unsatisfiable due to \(A_{n+1}\cap T_{S}=\emptyset\) for some \(S\in C\), define I as in (32) so that \((A^{*}(f)-\bigcup _{i\in I} R_{i})\cap T_{S}=\emptyset\), and compute an \(R^*\)-deactivating direction \(d^{*}\) where \(R^*\supseteq \bigcup _{i\in I} R_{i}\) using Proposition 2. Since \((A^{*}(f)-R^*)\cap T_{S}=\emptyset\), we can update WCSP f using Theorem 7. Consequently, the bound B(f) strictly improves after each update on line 7.

Remark 5

In the maximization version of WCSP, hard constraints can be modelled by allowing minus-infinite weights, i.e., we then have \(g\in (\mathbb {R}\cup \{-\infty \})^{T}\). We argue that Algorithm 3 can be easily extended to such a setting. Without loss of generality, one can assume that

$$\begin{aligned} \forall S\in C \,\exists t\in T_{S}:g_{t}\in \mathbb {R}, \end{aligned}$$
(33)

i.e., there is at least one finite weight in each scope for the input WCSP g (as otherwise the WCSP is infeasible).

With this assumption, the definition of the active-tuple CSP \(A^{*}(\cdot )\) remains unchanged and the tuples with minus-infinite weights are never active. Next, see that propagation in the active-tuple CSP and construction of the improving direction depend only on \(A^{*}(f)\), so these subroutines need not be modified and, consequently, the improving direction \(d^{*}\) still contains only finite weights, i.e., \(d^{*}\in \mathbb {R}^{T}\).

The only difference can arise when computing the step size \(\alpha =\min \{\beta ,\gamma \}\) by Theorem 7. If \(f\in \mathbb {R}^{T}\), then \(\alpha\) is always finite. In contrast, if \(f\in (\mathbb {R}\cup \{-\infty \})^{T}\), then it may happen that \(\beta =\gamma =\infty\), so \(\alpha =\min \{\beta ,\gamma \}=\infty\) where we assume the usual arithmetic with infinities, so, e.g., \(a-(-\infty )=\infty\) for \(a\in \mathbb {R}\). As discussed earlier, the weights of the active tuples are always finite, which avoids indeterminate expressions when computing \(\beta\) and \(\gamma\). Note, arithmetic with infinities is different from the addition with ceiling operator [6, 46] (unless the ceiling is infinite).

Next, we comment on the finite and infinite case:

  • If the computed step size is finite, then \(\alpha d^{*}\in \mathbb {R}^{T}\) and the update on line 7 can be performed following the aforementioned arithmetic with infinities. In this case, condition (33) holds for the updated f since the set of tuples with minus-infinite weight (i.e., the set \(\{t\in T\mid f_{t}=-\infty \}\)) is kept unchanged by the update and we can continue with the next iteration.

  • On the other hand, if the step size is infinite, then the bound \(B(f+\alpha d^{*})\) can be made arbitrarily low by setting \(\alpha\) large enough. Stated formally, this means \(\forall b\in \mathbb {R}\,\exists \alpha >0:B(f+\alpha d^{*})\le b\), which proves infeasibility of the WCSP instance, so the algorithm should return \(-\infty\) and terminate.

All in all, if hard constraints are allowed, the only required change in Algorithm 3 is that, if \(\beta =\gamma =\infty\) on line 7, then the algorithm should terminate and return \(-\infty\) (which is an upper bound on the infeasible initial WCSP).

In Algorithm 3 we additionally used a heuristic analogous to capacity scaling in network flow algorithms [47, §7.3]. On line 3 of Algorithm 3, we replace the active tuples \(A^{*}(f)\) with ‘almost’ active tuples

$$\begin{aligned} A^{*}_{\theta }(f) = \left\{ t=(S,k) \in T \bigg\vert f_{t} \ge \max _{t^{\prime } \in T_{S}}f_{t^{\prime }}- \theta \right\} \end{aligned}$$
(34)

for some threshold \(\theta >0\).Footnote 18 This forces the algorithm to disprove satisfiability using tuples that are far from being active, thus hopefully leading to larger step sizes and faster decrease of the bound. Initially, \(\theta\) is set to a high value and whenever we are unable to disprove satisfiability of \(A^{*}_{\theta }(f)\), the current \(\theta\) is decreased as \(\theta := \theta /10\). The process continues until \(\theta\) becomes very small.

Although our theoretical results are more general, our implementation is limited only to binary WCSPs, i.e., instances where the maximum arity of the weighted constraints is at most 2. We implemented two versions of Algorithm 3 (including capacity scalingFootnote 19), differing in the local consistency used to attempt to disprove satisfiability of CSP \(A^{*}(f)\):

  • Virtual singleton arc consistency via super-reparametrizations (VSAC-SR) uses singleton arc consistency. Precisely, we alternate between AC and SAC propagators: whenever a single tuple \((i,k)\in V\times D\) is removed by SAC, we step back to enforcing AC until no more AC propagations are possible, and repeat.

  • Virtual cycle consistency via super-reparametrizations (VCC-SR) is the same as VSAC-SR except that SAC is replaced by CC. Though our implementation is different than [25] (we compose deactivating directions rather than alternate between the cycle-repair procedure and the Augmenting DAG algorithm), it has the same fixed points.

The procedures for generating deactivating directions for AC, SAC and CC were implemented as described in Examples 3, 5, and 4. We used AC3 algorithm to enforce AC. In SAC and CC it is useful to step back to AC whenever possible because deactivating directions of AC correspond to reparametrizations rather than super-reparametrizations, which is desirable as explained in §4.1.1.

Remark 6

In analogy to [6, 22], let us call a WCSP instance f virtual \(\Phi\)-consistent (e.g., virtual AC or virtual RPC) if \(A^{*}(f)\) has a non-empty \(\Phi\)-consistency closure. Then, a virtual \(\Phi\)-consistency algorithm naturally refers to an algorithm to transform a given WCSP instance to a virtual \(\Phi\)-consistent WCSP instance. In the VAC algorithm, this transformation is equivalence-preserving, i.e., a reparametrization. But in our case, it is a super-reparametrization, which is why we call our algorithms VSAC-SR and VCC-SR.

Since we restricted ourselves to binary WCSPs, let \(E=\left\{ S\in C\mid |S|=2\right\}\) so that (VE) is an undirected graph. The cycles in VCC-SR were chosen as follows: if \(2|E|/|V|\le 5\) (i.e., the average degree of the nodes in (VE) is at most 5), then all cycles of length 3 and 4 present in the graph (VE) are used. If \(2|E|/|V|\le 10\), then all cycles of length 3 present in the graph are used. If \(2|E|/|V|>10\) or the above method did not result in any cycles, we use all fundamental cycles w.r.t. a spanning tree of the graph (VE).Footnote 20 No additional edges are added to the graph. Note, [25] experimented with grid graphs (where cycles of length 4 and 6 of the grid were used) and complete graphs (where cycles of length 3 were used).

Since both VSAC-SR and VCC-SR start by enforcing VAC (i.e., making \(A^{*}(f)\) arc consistent by reparametrizations), before running these methods we used toulbar2 to reparametrize the input WCSP instance to a VAC state (because a specialized algorithm is faster than the more general Algorithm 3). We employed specialized data structures for storing the sequences \((R_{i})_{i=0}^{n}\) and \((d^{i})_{i=0}^{n}\) from Algorithm 1, which utilize the property that the sets \((R_{i})_{i=0}^{n}\) are disjoint and make easier sequential querying of (sparse) vectors \((d^{i})_{i=0}^{n}\) in Algorithm 2. Note that the sequence \((A_{i})_{i=0}^{n+1}\) need not be stored and is only needed for theoretical analysis. Moreover, sparse representations were used when composing deactivating directions in Algorithm 2. To avoid working with ‘structured’ tuples (1), we employed a bijection between T and \(\{1, \ldots , |T|\}\) to work with numerical indices instead.

Besides the above improvements, we did not fine-tune our implementation for efficiency. Thus, the set \(A^{*}(f)\) was always calculated by iterating through all tuples (which could be made faster if sparsity of the improving direction was taken into account). The hyper-parameters of our algorithm (e.g., the decrease schedule of \(\theta\) or constants mentioned in Footnote 19) were not learned nor systematically optimized. SAC was checked on all active tuples without warm-starting or using any faster SAC algorithm than SAC1 [44, 50]. Perhaps most importantly, we did not implement inter-iteration warm-starting as in [17, 45], i.e., after updating the weights on line 6 of Algorithm 3, some deactivating directions in the sequence that were not used to compose the improving direction may be preserved for the next iteration instead of being computed from scratch. Except for computing deactivating directions, the code was the same for VSAC-SR and VCC-SR. We implemented everything in Java.

4.5 Experiments

Table 1 Results on instances from Cost Function Library: Average normalized bounds
Table 2 Results on instances from Cost Function Library: Average CPU time in seconds

We compared the bounds calculated by VSAC-SR and VCC-SR with the bounds provided by EDAC [14], VAC [6], pseudo-triangles (option -t=8000 in toulbar2, adds up to 8 GB of ternary weight functions), PIC, EDPIC, maxRPC, and EDmaxRPC [22], which are implemented in toulbar2 [51]. Our motivation for choosing these local consistencies is as follows: EDAC is the typically chosen local consistency that is maintained during branch-and-bound search. VAC is highly related to our approach and can be used in pre-processing (as it is faster than OSAC which is usually too memory- and time-consuming for practical purposes). Finally, we consider a class of recently proposed triangle-based consistencies [22] that enforce stronger forms of local consistency.

We did the comparison on the Cost Function Library benchmark [52]. Due to limited computation resources, we used only the smallest 16500 instances (out of 18132). Of these, we omitted instances containing weight functions of arity 3 or higher. Moreover, to avoid easy instances, we omitted instances that were solved by VAC without search (i.e., toulbar2 with options -A -bt=0 found an optimal solution). We also omitted the validation instances that are used for testing and debugging. Overall, 5371 instances were left for our comparison.

For each instance and each method, we only calculated the upper bound and did not do any search. For each instance and method, we computed the normalized bound \(\frac{B_{w}-B_{m}}{B_{w}-B_{b}}\) where \(B_{m}\) is the bound computed by the method for the instance and \(B_{w}\) and \(B_{b}\) is the worst and best bound for the instance among all the methods, respectively. Thus, the best boundFootnote 21 transforms to 1 and the worst bound to 0, i.e., greater is better.

For 26 instances, at least one method was not able to finish in the prespecified 1-hour CPU-time limit. These timed-out methods were omitted from the calculation of the normalized bounds for these instances. From the point of view of the method, the instance was not incorporated into the average of the normalized bounds of this particular method. We note that implementations of VSAC-SR and VCC-SR provide a bound when terminated at any time, whereas the implementations of the other methods provide a bound only when they are left to finish. Time-out happened 5, 2, 3, 6, and 24 times for pseudo-triangles, PIC, EDPIC, maxRPC, and EDmaxRPC, respectively. This did not affect the results much as there were 5371 instances in total.

The results in Table 1 show that no method is best for all instance groups, instead, each method is suitable for a different group. However, VSAC-SR performed best for most groups and otherwise was often competitive to the other strong consistency methods. VSAC-SR seems particularly good at spinglass_maxcut [53], planning [54] and qplib [55] instances. Taking the overall unweighted average of group averages (giving the same importance to each group), VSAC-SR achieved the greatest average value. We also evaluated the ratio to worst bound, \(B_{m}/B_{w}\), for instances with \(B_{w}\ne 0\); the results were qualitatively the same: VSAC-SR again achieved the best overall average of 3.93 (or 4.15 if only groups with \(\ge 5\) instances are considered) compared to second-best pseudo-triangles with 2.71 (or 2.84).

The runtimes (on a laptop with i7-4710MQ processor at 2.5 GHz and 16GB RAM) are reported in Table 2. Again, the results are group-dependent and one can observe that the methods explore different trade-offs between bound quality and runtime. However, the strong consistencies are comparable in terms of runtime on average, except for pseudo-triangles, which is a faster method that however needs significantly more memory.

The code that was used to obtain these results is available at https://cmp.felk.cvut.cz/~dlaskto2/code/VSAC-SR.zip.

5 Additional properties of super-reparametrizations

In this section, we present a more detailed study of properties of WCSPs that are preserved by (possibly optimal) super-reparametrizations. To that end, we first revisit in §5.1 the notion of a minimal CSP for a set of assignments. The key result of §5 is presented in §5.2, where we study the relation of the set of optimal assignments of some WCSP to the set of optimal assignments of its super-reparametrization optimal for (19), showing that they need not coincide in general. In §5.3, we give some properties of general (i.e., not necessarily optimal for (19)) super-reparametrizations.

5.1 Minimal CSP

Let us ask when for a given set \(X\subseteq D^{V}\) of assignments (i.e., a |V|-ary relation over D) does there exist \(A\subseteq T\) such that \(X=\text {SOL}(A)\), i.e., when is X representable as the solution set of a CSP with a given structure (DVC). For that, denote

$$\begin{aligned} \textstyle A_{\min }(X) = \bigcap \mathcal {A}^{\uparrow }(X) \quad \text {where}\quad \mathcal {A}^{\uparrow }(X) = \left\{ A\subseteq T \mid X\subseteq \text {SOL}(A) \right\} . \end{aligned}$$
(35)

Thus, \(\mathcal {A}^{\uparrow }(X)\) is the set of all CSPs whose solution set includes X and \(A_{\min }(X)\) is the intersection of these CSPs. We call \(A_{\min }(X)\) the minimal CSP for X. For CSPs with only binary relations, this concept was studied in [56] and [57, §2.3.2].

Proposition 3

The map \(\text {SOL}\) preserves intersections Footnote 22, i.e., for any \(A_{1},A_{2}\subseteq T\) we have \(\text {SOL}(A_{1}\cap A_{2})=\text {SOL}(A_{1})\cap \text {SOL}(A_{2})\).

Proof

For any \(x\in D^{V}\) we have

$$\begin{aligned} x\in \text {SOL}(A_{1})\cap \text {SOL}(A_{2})\iff & {} x\in \text {SOL}(A_{1}), \; x\in \text {SOL}(A_{2})\\ \iff & {} \forall S\in C:(S,x[S])\in A_{1}, \; (S,x[S])\in A_{2}\\ \iff & {} \forall S\in C:(S,x[S])\in A_{1} \cap A_{2}\\ \iff & {} x\in \text {SOL}(A_{1}\cap A_{2}). \end{aligned}$$

Proposition 4

For any \(X\subseteq D^{V}\), the set \(\mathcal {A}^{\uparrow }(X)\) is closed under intersections, i.e., for any \(A_{1},A_{2}\subseteq T\) we have \(A_{1},A_{2}\in \mathcal {A}^{\uparrow }(X)\implies A_{1}\cap A_{2}\in \mathcal {A}^{\uparrow }(X)\).

Proof

If \(X\subseteq \text {SOL}(A_{1})\) and \(X\subseteq \text {SOL}(A_{2})\), then \(X\subseteq \text {SOL}(A_{1})\cap \text {SOL}(A_{2})=\text {SOL}(A_{1}\cap A_{2})\), where the equality holds by Proposition 3.

Proposition 4 implies \(A_{\min }(X)\in \mathcal {A}^{\uparrow }(X)\), i.e., \(X\subseteq \text {SOL}(A_{\min }(X))\). This shows that \(A_{\min }(X)\) is the smallest CSP whose solution set includes X. It follows that \(X=\text {SOL}(A_{\min }(X))\) if and only if \(X=\text {SOL}(A)\) for some \(A\subseteq T\).

The minimal CSP for X can be equivalently defined in terms of tuples:

Proposition 5

([56, 57]) We have \(A_{\min }(X) = \left\{ (S,k)\in T \mid \exists x\in X:x[S]=k\right\}\).

Proof

Denote \(A^{\prime }=\left\{ (S,k)\in T \mid \exists x\in X:x[S]=k\right\}\). By definition of \(A^{\prime }\) we have \(\text {SOL}(A^{\prime })\supseteq X\), so \(A^{\prime }\in \mathcal {A}^{\uparrow }(X)\) and \(A_{\min }(X)=\bigcap \mathcal {A}^{\uparrow }(X) \subseteq A^{\prime }\).

It remains to show that \(A_{\min }(X)\supseteq A^{\prime }\). For contradiction, suppose there is a tuple \((S^{*},k^{*})\in A^{\prime }-A_{\min }(X)\). By definition of \(A^{\prime }\), there exists \(x\in X\) such that \(x[S^{*}]=k^{*}\). However, since \((S^{*},x[S^{*}])=(S^{*},k^{*})\notin A_{\min }(X)\), we have \(x\notin \text {SOL}(A_{\min }(X))\). By \(x\in X\), this contradicts \(X\subseteq \text {SOL}(A_{\min }(X))\).

Recall that a CSP A is positively consistent [58, 59] (called ‘minimal’ in [56, 57]) if and only if for each \((S,k)\in A\) there exists \(x\in \text {SOL}(A)\) such that \(x[S]=k\), i.e., each allowed tuple is used by at least one solution, i.e., no tuple can be forbidden without losing some solutions. Proposition 5 shows that \(A_{\min }(X)\) is positively consistent for every \(X\subseteq D^{V}\).

Theorem 8

For any \(X\subseteq D^{V}\) and \(A\subseteq T\), we have

$$\begin{aligned} A_{\min }(X)\subseteq A \quad \iff \quad X \subseteq \text {SOL}(A). \end{aligned}$$
(36)

Proof

If \(A_{\min }(X) \subseteq A\), then by isotony of \(\text {SOL}\) we have \(\text {SOL}(A_{\min }(X))\subseteq \text {SOL}(A)\). Since \(X\subseteq \text {SOL}(A_{\min }(X))\), we have \(X \subseteq \text {SOL}(A)\).

If \(X \subseteq \text {SOL}(A)\), i.e., \(A\in \mathcal {A}^{\uparrow }(X)\), then \(A_{\min }(X)=\bigcap \mathcal {A}^{\uparrow }(X)\subseteq A\).

Comparing (36) with (35) shows that the set \(\mathcal {A}^{\uparrow }(X)\) is just an interval (w.r.t. the partial ordering by inclusion):

$$\begin{aligned} \mathcal {A}^{\uparrow }(X) = \left\{ A \mid A_{\min }(X) \subseteq A \subseteq T \right\} = [A_{\min }(X),T] . \end{aligned}$$
(37)

Theorem 8 further reveals that the maps \(A_{\min }\) and \(\text {SOL}\) form a Galois connection [60] between sets \(2^{(D^{V})}\) and \(2^{T}\), partially ordered by inclusion (to our knowledge, we are the first to notice this). Associated with the Galois connection are the closure operator \(\text {SOL}\circ A_{\min }\) and the dual closure operatorFootnote 23\(A_{\min }\circ \text {SOL}\). We have already seen their meaning:

  • For any CSP \(A\subseteq T\), the CSP \(A_{\min }(\text {SOL}(A))\) is the positive consistency closureFootnote 24 of A, i.e., the smallest CSP with the same solution set as A. A CSP A is positively consistent if and only if \(A_{\min }(\text {SOL}(A))=A\).

  • For any set of assignments \(X\subseteq D^{V}\), \(\text {SOL}(A_{\min }(X))\) is the smallest (possibly non-strict) superset of X which is the solution set of some CSP \(A\subseteq T\). We have \(X=\text {SOL}(A_{\min }(X))\) if and only if \(X=\text {SOL}(A)\) for some \(A\subseteq T\).

Remark 7

Following [60, §7.27], it is easy to see in this case that the maps \(A_{\min }\) and \(\text {SOL}\) are mutually inverse bijections (even order-isomorphisms) if we restrict ourselves only to positively consistent CSPs, i.e., \(\left\{ A\subseteq T\mid \text {SOL}(A_{\min }(A))=A\right\}\) and sets of assignments representable as solution sets of some CSP \(A\subseteq T\), i.e., \(\left\{ \text {SOL}(A)\mid A\subseteq T\right\}\).

5.2 Optimal assignments from optimal super-reparametrizations

Theorem 2 says that the optimal value of (19) coincides with the optimal value \(\max _{x} \langle g, \phi (x)\rangle\) of WCSP g. We now focus on the optimal assignments (rather than optimal value) of WCSP g. For brevity, we will denote the set of all optimal assignments of WCSP g as

$$\begin{aligned} \text {OPT}(g)= \underset{x\in D^{V}}{\arg \max }\langle g, \phi (x)\rangle \subseteq D^{V} . \end{aligned}$$
(38)

Theorem 9

If f is optimal for (19), thenFootnote 25\(\text {OPT}(g) \subseteq \text {OPT}(f) = \text {SOL}(A^{*}(f))\).

Proof

To show \(\text {OPT}(g)\subseteq \text {OPT}(f)\), let \(x^{*}\in \text {OPT}(g)\). By Theorem 2, \(\langle g, \phi (x^{*})\rangle = B(f)\). Analogously to the proof of Theorem 2: since \(B(f)\ge \langle f, \phi (x^{*})\rangle \ge \langle g, \phi (x^{*})\rangle\), we have that \(B(f)= \langle f, \phi (x^{*})\rangle = \langle g, \phi (x^{*})\rangle\), thus \(x^{*}\) is optimal for WCSP f.

The equality \(\text {OPT}(f) = \text {SOL}(A^{*}(f))\) follows from \(B(f)=\max _{x\in D^{V}}\langle f, \phi (x)\rangle =\max _{x\in D^{V}}\langle g, \phi (x)\rangle\) and Theorem 1.

Fig. 6
figure 6

WCSP f is an optimal super-reparametrization of WCSP g. It is easy to verify that \(A^{*}(f)=A_{\min }(\text {OPT}(g))\) but \(\text {OPT}(f)=\text {SOL}(A^{*}(f))\supsetneq \text {OPT}(g)\)

Our main goal in §5 is to characterize when the inclusion in Theorem 9 holds with equality, which is given by Theorem 11 below.

Proposition 6

For every \(g\in \mathbb {R}^{T}\) and \(A\subseteq T\) such that \(\text {OPT}(g)\subseteq \text {SOL}(A)\), there exists \(f\in \mathbb {R}^{T}\) optimal for (19) such that \(A=A^{*}(f)\).

Proof

Define the vector f as

$$\begin{aligned} f_{t} = \left\{ \begin{array}{ll} F_{1}/|C| &{} \text {if } t\in A \\ F_{2}/|C| &{} \text {if } t\notin A \end{array}\right. \quad \forall t \in T \end{aligned}$$
(39)

where

$$\begin{aligned} F_{1} = \max _{x\in D^{V}} \langle g, \phi (x)\rangle , \quad F_{2} = \max \{ \langle g, \phi (x)\rangle \mid x\in D^{V}, \langle g, \phi (x)\rangle < F_{1}\} \end{aligned}$$
(40)

are the best and the second-best objective value of WCSP g. Note, if \(\text {OPT}(g)=D^{V}\), then \(F_{2}\) is undefined but it does not matter because it is never used in (39).

Since \(\emptyset \ne \text {OPT}(g)\subseteq \text {SOL}(A)\), CSP A is satisfiable. Therefore for each \(S \in C\) we have \(A\cap T_{S}\ne \emptyset\), hence

$$\begin{aligned} \max _{t\in T_{S}} f_{t} = F_{1}/|C|. \end{aligned}$$
(41)

Equality \(A=A^{*}(f)\) now follows from (39).

To show that f is feasible for (19), we distinguish two cases:

  • If \(x\in \text {OPT}(g)\), i.e., \(\langle g, \phi (x)\rangle =F_{1}\), then \(x\in \text {SOL}(A)=\text {SOL}(A^{*}(f))\). Therefore for all \(S\in C\) we have \((S,x[S])\in A^{*}(f)\), hence \(f_{S}(x[S])=F_{1}/|C|\) by (41). Substituting into (4) yields \(\langle f, \phi (x)\rangle = F_{1}\). Hence \(\langle f, \phi (x)\rangle = F_{1} = \langle g, \phi (x)\rangle\).

  • If \(x\notin \text {OPT}(g)\), we have \(f_{t}\ge F_{2}/|C|\) for all \(t\in T\), hence \(\langle f, \phi (x)\rangle \ge F_{2}\) by (4). By (40) we also have \(\langle g, \phi (x)\rangle \le F_{2}\). Hence \(\langle f, \phi (x)\rangle \ge F_{2} \ge \langle g, \phi (x)\rangle\).

To show that f is optimal for (19), we use (41) to obtain \(B(f) = \sum _{S\in C} F_{1}/|C| = F_{1} = \max _{x}\langle g, \phi (x)\rangle\) and apply Theorem 2.

Theorem 10

For every \(g\in \mathbb {R}^{T}\), we have

$$\begin{aligned} \mathcal {A}^{\uparrow }(\text {OPT}(g)) = \left\{ A^{*}(f) \mid f \text{ is optimal for } (19)\right\} . \end{aligned}$$
(42)

Proof

The inclusion \(\supseteq\) says that for every optimal f we have \(\text {OPT}(g)\subseteq \text {SOL}(A^{*}(f))\), which was proved in Theorem 9. The inclusion \(\subseteq\) was proved in Proposition 6.

Now we combine the results of §5.1 and §5.2 to obtain the main result of §5. First observe that, by (37), the set (42) is just the interval \([A_{\min }(\text {OPT}(g)),T]\).

Theorem 11

For every \(g\in \mathbb {R}^{T}\), the following statements are equivalent:

  1. (a)

    \(\text {OPT}(g)=\text {SOL}(A)\) for some \(A\subseteq T\),

  2. (b)

    \(\text {OPT}(g) = \text {OPT}(f)\) for some f optimal for (19).

If both statements are true, then statement (a) holds, e.g., for \(A=A_{\min }(\text {OPT}(g))\) and statement (b) holds, e.g., if \(A^{*}(f)=A_{\min }(\text {OPT}(g))\).

Proof

Let \(g\in \mathbb {R}^{T}\). By Theorem 10, there exists f optimal for (19) satisfying \(A^{*}(f)=A_{\min }(\text {OPT}(g))\). By Theorem 9, this f satisfies \(\text {OPT}(f)=\text {SOL}(A^{*}(f))\).

By the results of §5.1, statement (a) is equivalent to \(\text {OPT}(g)=\text {SOL}(A_{\min }(\text {OPT}(g)))\). Therefore, if (a) holds, then (b) holds for the above f. In the other direction, if (b) holds for the above f, then (a) holds.

Theorem 11 shows that the inclusion in Theorem 9 holds with equality for some optimal f if and only if the set \(\text {OPT}(g)\) of optimal assignments of WCSP g is representable as a solution set of some CSP with the same structure. If no such CSP exists, then \(\text {OPT}(g)\subsetneq \text {OPT}(f)\) for all optimal f. An example of WCSP g for which no such CSP exists is in Fig. 6.

It is natural to ask which WCSPs possess this property. Though we are currently unable to provide a full characterization of such WCSPs, we identify two such classes:

Theorem 12

[1, 2]] If the LP relaxation (13) of a WCSP \(g\in \mathbb {R}^{T}\) is tight, then \(\text {OPT}(g)=\text {SOL}(A)\) for some \(A\subseteq T\).

Proof

If the LP relaxation (13) is tight, then there exists a vector \(f \in \mathbb {R}^{T}\) such that \(B(f)=\max _{x\in D^{V}}\langle g, \phi (x)\rangle\) and f is a reparametrization of g, i.e., \(\langle f, \phi (x)\rangle =\langle g, \phi (x)\rangle\) for all \(x\in D^{V}\), thus, f is also optimal for (19). It follows that the sets of optimal assignments for f and g coincide. By Theorem 9, \(A^{*}(f)\) is the required CSP.

Theorem 13

If a WCSP \(g\in \mathbb {R}^{T}\) has a unique optimal assignment (i.e., \(|{\text {OPT}(g)}|=1\)), then \(\text {OPT}(g)=\text {SOL}(A)\) for some \(A\subseteq T\).

Proof

The case with \(|D|=1\) is trivial, so let \(|D|\ge 2\) and \(\text {OPT}(g)=\{x\}\).

We claim that \(A=\left\{ (S,x[S])\mid S\in C\right\}\) is the required CSP, i.e., \(\text {SOL}(A)=\{x\}\). For contradiction, suppose that \(x^{\prime }\in \text {SOL}(A)\) and \(x^{\prime }\ne x\). By \(x^{\prime }\in \text {SOL}(A)\), we necessarily have that \(x^{\prime }[S]=x[S]\) for all \(S\in C\). By definition (4), this implies \(\langle g, \phi (x^{\prime })\rangle =\langle g, \phi (x)\rangle\). Thus, \(\{x^{\prime },x\}\subseteq \text {OPT}(g)\), which is contradictory with \({|{\text {OPT}(g)}|=1}\).

5.3 Properties of general super-reparametrizations

Finally, we present one property of general super-reparametrizations f of a fixed WCSP \(g\in \mathbb {R}^{T}\), i.e., f is only feasible (but possibly not optimal) for (19).

Theorem 14

For every \(g\in \mathbb {R}^{T}\) we have

$$\begin{aligned} \{ A^{*}(f) \mid f\in \mathbb {R}^{T} \text{ is a super-reparametrization of } g \} = \{ A^{*}(f)\mid f\in \mathbb {R}^{T}\} . \end{aligned}$$
(43)

Proof

The inclusion \(\subseteq\) is trivial. To prove \(\supseteq\), let \(f^{\prime }\in \mathbb {R}^{T}\) be arbitrary. Define \(f\in \mathbb {R}^{T}\) as \(f_{t} = B(g)/|C|+\llbracket t \in A^{*}(f^{\prime })\rrbracket\), \(t\in T\). Clearly, f is a super-reparametrization of g due to \(\langle f, \phi (x)\rangle \ge B(g)\ge \langle g, \phi (x)\rangle\) for any \(x\in D^{V}\). In addition, by definition of f, \(\max _{t\in T_{S}}f_{t} = B(g)/|C|+1\) for any \(S\in C\), hence \(A^{*}(f^{\prime })=A^{*}(f)\).

Theorem 14 shows that the left-hand set in (43) does not depend on g at all. Therefore, if we approximately optimize (19), i.e., we find a (possibly non-optimal) super-reparametrization f of g, then there is in general no relation between sets \(\text {OPT}(f)\) and \(\text {OPT}(g)\). However, as shown in §3.2, an arbitrary super-reparametrization still maintains the valuable property that it provides an upper bound B(f) on the optimal value \(\max _{x}\langle g, \phi (x)\rangle\) of WCSP g.

6 Hardness remarks

Unsurprisingly, a number of decision and optimization problems related to (19) is computationally hard since the optimization problem (19) is hard itself. We overview a number of such problems here.

Theorem 15

The following problem is NP-complete: Given \(f,g\in \mathbb {Q}^{T}\), decide whether f is not a super-reparametrization of g (i.e., whether f is not feasible for (19)).

Proof

Membership in NP can be shown easily by the notion of a non-deterministic algorithm [62, §10]. First, one can choose any \(x\in D^{V}\) and then in polynomial time decide whether \(\langle f, \phi (x)\rangle <\langle g, \phi (x)\rangle\).

To show NP-hardness, we perform a reduction from CSP satisfiability which is known to be NP-complete. Let \(A\subseteq T\) be a CSP. We would like to decide whether \(\text {SOL}(A)\ne \emptyset\).

Let us define \(g\in \{0,1\}^{T}\) by

$$\begin{aligned} g_{S}(k) = \llbracket (S,k)\in A\rrbracket \quad \forall (S,k)\in T. \end{aligned}$$
(44)

Thus, for any \(x\in D^{V}\), \(\langle g, \phi (x)\rangle\) equals to the number of constraints in CSP A that are satisfied by the assignment x. So, \(\langle g, \phi (x)\rangle \in \{0, 1, \ldots , |C|\}\) and \(\langle g, \phi (x)\rangle =|C|\) if and only if \(x\in \text {SOL}(A)\). Consequently, \(\max _{x} \langle g, \phi (x)\rangle \le |C|-1\) if and only if \(\text {SOL}(A)=\emptyset\).

We define \(f\in \mathbb {Q}^{T}\) by \(f_{t} = (|C|-1)/|C|\), \(t\in T\). In analogy to Theorem 2, \(\langle f, \phi (x)\rangle =|C|-1\) for all assignments \(x\in D^{V}\). Hence, \(\text {SOL}(A)=\emptyset\) if and only if f is a super-reparametrization of g.

Corollary 2

The following problem is NP-complete: Given \(d\in \mathbb {Q}^{T}\), decide whether \(d\notin M^{*}\).

Proof

Membership in NP is analogous to Theorem 15. The question of whether f is not a super-reparametrization of g from Theorem 15 reduces to whether \(d=f-g\notin M^{*}\).

Corollary 3

The following problem is NP-complete: Given \(f,g\in \{0,1\}^{T}\) where f is a super-reparametrization of g, decide whether f is optimal for (19).

Proof

Membership in NP follows from Theorem 2: as in Theorem 15, one can choose \(x\in D^{V}\) and then in polynomial time decide whether \(\langle f, \phi (x)\rangle =\langle g, \phi (x)\rangle\) and \(x\in \text {SOL}(A^{*}(f))\).

The hardness part is completely analogous to the proof of Theorem 15 except that we define \(f\in \{0,1\}^{T}\) by \(f_{t} = 1\), \(t\in T\), so \(B(f)=|C|\). Clearly, f is element-wise greater or equal to g, so it is a super-reparametrization. Moreover, \(|C|=\max _{x} \langle g, \phi (x)\rangle\) if and only if \(\text {SOL}(A)\ne \emptyset\), so f is optimal for (19) if and only if \(\text {SOL}(A)\ne \emptyset\).

Recall that in formula (27), the number \(\delta\) had the concrete value given by Theorem 4. However, sometimes the value of \(\delta\) can be decreased while (27) still remains to be an R-deactivating direction for A. Finding a small such \(\delta\) is desirable because then (27) results in smaller objective values \(\langle d, \phi (x)\rangle\), as explained in §4.1.1. Unfortunately, finding the least value of \(\delta\) is likely intractable:

Theorem 16

The following problem is NP-complete: Given \(\delta \in \mathbb {Q}\) and \(R\subseteq A \subseteq T\) satisfying \(\text {SOL}(A)=\text {SOL}(A-R)\), decide whether vector d given by (27) is not an R-deactivating direction for A.

Proof

Membership in NP is analogous to Theorem 15. Since conditions (a) and (b) from Definition 1 are satisfied, the question boils down to deciding whether \(d\in M^{*}\). This cannot be reduced to the case in Corollary 2 because d in (27) has a special form.

To show hardness, we proceed by reduction from the 3-coloring problem [49, 63]: given a graph \(G^{*}=(V^{*},E^{*})\), decide whether it is 3-colorable. Let \(G=(V,C)\) be the graph sum (also known as the disjoint union of graphs) of \(G^{*}\) and \(K_{4}\) [49, §8.1.2]. \(K_{4}\) is the complete graph with 4 vertices. Informally, G is the graph obtained from \(G^{*}\) by adding 4 new vertices and including an edge between each pair of these new vertices.

Let CSP A have the structure (DVC) where \(|D|=3\) and

$$\begin{aligned} A = \{(\{i,j\},(k_{i},k_{j})) \mid \{i,j\}\in C, k\in D^{\{i,j\}}, k_{i}\ne k_{j}\}. \end{aligned}$$
(45)

Hence, any \(x\in D^{V}\) can be interpreted as an assignment of colors to the nodes of G and \(x\in \text {SOL}(A)\) if and only if x is a 3-coloring of G. Since G contains \(K_{4}\) as its subgraph, it is not 3-colorable and A is unsatisfiable. Hence, setting \(R=A\) satisfies \(\text {SOL}(A-R)=\text {SOL}(\emptyset )=\emptyset =\text{SOL}(A)\).

For the purpose of our reduction, let us define \(\delta = (|C|-2)/2 >0\). We will show that for such a setting, d is not an R-deactivating direction for A if and only if \(G^{*}\) is 3-colorable.

Plugging the above-defined sets A and R into the definition of d in (27) yields

$$\begin{aligned} \langle d, \phi (x)\rangle = \underset{{x_{i}\ne x_{j}}}{\underset{\{i,j\}\in C}{\sum }} (-1) + \underset{x_{i}= x_{j}}{\underset{\{i,j\}\in C}{\sum }} \delta = \delta (|C|-\text {COL}(x))-\text {COL}(x) \end{aligned}$$
(46)

where \(\text {COL}(x) = |{\{ \{i,j\}\in C \mid x_{i} \ne x_{j}\} }|\) is the number of edges in G whose adjacent vertices have different colors in assignment \({x\in D^{V}}\).

If \(G^{*}\) is 3-colorable, then there is \(x\in D^{V}\) such that \(\text {COL}(x)=|C|-1\). In other words, only for a single edge in \(C-E^{*}\) (i.e., edge of graph \(K_{4}\)), the adjacent vertices are assigned the same color, so \(\langle d, \phi (x)\rangle = -|C|/2<0\) by (46) and definition of \(\delta\). Hence, d is not an R-deactivating direction for A.

For the other case, if \(G^{*}\) is not 3-colorable, then for any \(x\in D^{V}\), \(\text {COL}(x)\le |C|-2\). The reason is that for at least one edge in \(K_{4}\) and at least one edge in \(G^{*}\), the adjacent vertices will be assigned the same color in any assignment. By substituting the value of \(\delta\) and a simple manipulation of (46), one obtains

$$\begin{aligned} \langle d, \phi (x)\rangle = \delta |C|-(\delta +1) \text {COL}(x) = |C|\left( |C|-2-\text {COL}(x)\right) /2 \ge 0 \end{aligned}$$
(47)

where the term in brackets is non-negative due to \(\text {COL}(x)\le |C|-2\) for any \(x\in D^{V}\). So, d is an R-deactivating direction for A.

In connection to §5.1, a number of decision problems concerning the minimal CSP have been also proved hard. For recent results, see [64, 65].

7 Summary and discussion

We have proposed a method to compute upper bounds on the (maximization version of) WCSP. The WCSP is formulated as a linear program with an exponential number of constraints, whose feasible solutions are super-reparametrizations of the input WCSP instance (i.e., WCSP instances with the same structure and greater or equal objective values). Whenever the CSP formed by the active (i.e., maximal in their weight functions) tuples of a feasible WCSP instance is unsatisfiable, there exists an improving direction (in fact, a certificate of unsatisfiability of this CSP) for the linear program. As this approach provides only a subset of all possible improving directions, it can be seen as a local search. We showed how these improving directions can be generated by constraint propagation (or, more generally, by other methods to prove unsatisfiability of a CSP). We showed that super-reparametrizations are closely related to the dual cone to the well-known marginal polytope.

Special cases of our approach are the VAC / Augmenting DAG algorithm [2, 6, 16], which uses arc consistency, and the algorithm in [25], which uses cycle consistency. We have implemented the approach for singleton arc consistency, resulting in VSAC-SR algorithm. When compared to existing soft local consistency methods on a public dataset, VSAC-SR provides comparable or better bounds for many instances. Although the runtimes are higher than those of the simpler techniques, such as EDAC or VAC, one can control different trade-offs between bound quality and runtime by stopping the method prematurely, e.g., when the step size becomes small or terminating already with a greater value of \(\theta\) (see Footnote 19).

The approach in general requires storing all the weights of the super-reparametrized WCSP instance. This may be a drawback when the domains are large and/or the weight functions are not given explicitly as a table of values but rather by an algorithm (oracle).

We expect our improved bounds to be useful when solving practical WCSP instances. Applications may include, e.g., using the method in preprocessing, pruning the search space during branch-and-bound search, providing tighter optimality gaps for solutions proposed by heuristic approaches, or generating high-quality proposals for solutions, as in [25]. However, we have done no experiments with this, so it is open whether the tighter bounds would outweigh the higher complexity of the algorithm. Due to the many options in which the method can be used, we leave this for future research. In addition, our approach can be also useful to solve more WCSP instances even without search (similarly, as the VAC algorithm solves all supermodular WCSPs without search) or, given a suitable primal heuristic, to solve WCSP instances approximately.

The approach can be straightforwardly extended to WCSPs with different domain sizesFootnote 26 and some weights equal to minus infinity (i.e., some constraints being hard). Of course, further experiments would be needed to evaluate the quality of the bounds if infinite weights are allowed. The WCSP framework also usually assumes a pre-defined specific finite bound that is updated during branch-and-bound [32] – although the presented pseudocode does not support this, it is not difficult to extend it in this way.

Finally, we presented a theoretical analysis of the concept of super-reparametrizations of WCSPs, describing the properties of optimal super-reparametrizations and characterizing the set of active-tuple CSPs induced by different optimal super-reparametrizations. For example, even an optimal super-reparametrization may change the set of optimal assignments, as shown in §5.2. Additionally, we have shown that general (i.e., possibly non-optimal) super-reparametrizations are only weakly related to the original WCSP instance.