1 Background

We consider the primal problem of finding

$$\begin{aligned} f^* := \mathrm{infimum} ~\,&f(x), \end{aligned}$$
(1a)
$$\begin{aligned} \mathrm{subject~to} ~\,&g(x) \le 0^m, \end{aligned}$$
(1b)
$$\begin{aligned}&x \in X, \end{aligned}$$
(1c)

where the set \(X \subseteq \mathbb {R}^n\) and the functions \(f : \mathbb {R}^n \mapsto \mathbb {R}\) and \(g : \mathbb {R}^n \mapsto \mathbb {R}^m\). With the vector \(u \in \mathbb {R}^m_+\) of Lagrangian multipliers for the constraint (1b), the dual function associated with the Lagrangian relaxation of this constraint is

$$\begin{aligned} \theta (u) := \mathop {{\mathrm{infimum}}}\limits _{x \in X} \, \left\{ f(x) + u^{\text {T}} g(x) \right\} , \qquad u \in \mathbb {R}^m_+, \end{aligned}$$
(2)

while

$$\begin{aligned} \theta ^* := \mathop {{\mathrm{supremum}}}\limits _{u \in \mathbb {R}^m_+} ~\, \theta (u) \end{aligned}$$
(3)

is the Lagrangian dual problem.

We assume that the set X is non-empty and compact, and that the functions f and g are continuous on X. Then the relaxed problem (2) has an optimal solution for every \(u\in \mathbb {R}^m_+\). We further assume that the primal problem fulfils some constraint qualification which ensures that the dual problem (3) has an optimal solution (such as a Slater condition, see e.g. [1, Proposition 2.4.1]). Optimal solutions to problems (1) and (3) are denoted \(x^*\) and \(u^*\), respectively. The duality gap for the primal–dual pair is \(\varGamma := f^* - \theta ^*\). To ensure that the duality gap is zero, the primal problem must have a convexity property; cf. [2, Theorem 6.2.4], [3, Chapter 5], and [4, Chapter 6]. In case the primal problem is non-convex (e.g., a discrete optimization problem), a positive duality gap can be expected. (Readers that are not well acquainted with Lagrangian duality are referred to e.g. [2,3,4].)

If the duality gap is zero, optimal solutions to both the primal problem (1) and its Lagrangian dual problem (3) can be characterized through the classic global optimality conditions, see e.g. [5, Theorem 5.1] and [2, Theorem 6.2.5]. Letting \((x,u) \in X \times \mathbb {R}^m_+\), these can be stated as

$$\begin{aligned} f(x) + u^{\text {T}} g(x)&\le \theta (u), \end{aligned}$$
(4a)
$$\begin{aligned} g(x)&\le 0^m, \end{aligned}$$
(4b)
$$\begin{aligned} u^{\text {T}} g(x)&= 0. \end{aligned}$$
(4c)

The interpretation of these three conditions is optimality in the Lagrangian relaxed problem (2), feasibility in the relaxed constraint (1b), and complementarity in this constraint, respectively. The following result establishes the equivalence of the consistency of the system (4) and primal–dual optimality with a zero duality gap. The following theorem can be found in e.g. [2, Theorem 6.2.5].

Theorem 1.1

(primal–dual optimality condition) A pair \((x,u) \in X \times \mathbb {R}^m_+\) satisfies the system (4) if and only if x solves the primal problem (1), u solves the dual problem (3), and \(f^* = \theta ^*\) holds.

A conclusion from this theorem is that the system (4) is inconsistent whenever u is not optimal in the Lagrangian dual problem (3) or there is a positive duality gap. In the case when the duality gap is zero and the dual vector is optimal in the Lagrangian dual problem (3), the result of Theorem 1.1 can be used to characterize all optimal solutions to the primal problem (1).

Corollary 1.1

(characterization of optimal primal solutions) If \(f^* = \theta ^*\) holds and u solves the dual problem (3), then an \(x \in X\) solves the primal problem (1) if and only if it, together with u, satisfies the system (4).

This characterization has been generalized [6, Proposition 5] to allow for a positive duality gap, the use of a \(u \in \mathbb {R}^m_+\) that is not necessarily optimal in the dual problem (3), and also to describe near-optimal solutions to the primal problem (1). This generalization is based on the following relaxed global optimality conditions for the problem (1). Here, \(\beta \in \mathbb {R}_+\) and again we let \((x,u) \in X \times \mathbb {R}^m_+\).

$$\begin{aligned} f(x) + u^\mathrm{T} g(x)&\le \theta (u) + \varepsilon \end{aligned}$$
(5a)
$$\begin{aligned} g(x)&\le 0^m \end{aligned}$$
(5b)
$$\begin{aligned} u^\mathrm{T} g(x)&\ge - \delta \end{aligned}$$
(5c)
$$\begin{aligned} \varepsilon + \delta&\le f^*-\theta (u)+\beta \end{aligned}$$
(5d)

Note that the quantities \(\varepsilon \) and \(\delta \) will always become non-negative whenever \((x,u) \in X \times R_+^m\) and (5) hold. They capture near-optimality in the Lagrangian relaxed problem (2) and near-complementarity in the relaxed constraint (1b), respectively. The following theorem is a restatement of [6, Proposition 5].

Theorem 1.2

(characterization of near-optimal primal solutions) For any given \(u \in \mathbb {R}^m_+\), an \(x \in X\) is \(\beta \)-optimal in the primal problem (1) if and only if it, together with u and some values of \(\varepsilon \) and \(\delta \), satisfies the system (5).

Note that for any \(u \in \mathbb {R}^m_+\) and the choice \(\beta = 0\), the system (5) characterizes all primal optimal solutions. Further, if the duality gap is zero, u solves the dual problem, and \(\beta = 0\), this characterization reduces to that of Corollary 1.1.

The characterization in Theorem 1.2 can be simplified by introducing the function \(\varepsilon : X \times \mathbb {R}^m_+ \mapsto \mathbb {R}_+\) with \(\varepsilon (x,u) = f(x) + u^\mathrm{T} g(x) - \theta (u)\), which for a given u measures the degree of near-optimality of an \(x \in X\) in the Lagrangian relaxation, and the function \(\delta : X \times \mathbb {R}^m_+ \mapsto \mathbb {R}_+\) with \(\delta (x,u) = \max \{ 0, - u^\mathrm{T} g(x)\}\), which for a given u measures the degree of near-complementarity of an \(x \in X\) in the relaxed constraint.

Note that

$$\begin{aligned} f(x) - \theta (u) = \varepsilon (x,u) + \delta (x,u) \end{aligned}$$
(6)

holds for any choice of primal feasible solution x and \(u\in \mathbb {R}^m_+\). Further, for any such choice, the identity (6) provides a dissection of the difference between the primal and dual objective values into a non-negative Lagrangian near-optimality term and a non-negative near-complementarity term. In particular, \(f^* - \theta ^* = \varepsilon (x^*,u^*) + \delta (x^*,u^*) = \varGamma \). With this new notation, Theorem 1.2 can be restated as follows.

Corollary 1.2

(characterization of near-optimal primal solutions) For any given \(u \in \mathbb {R}^m_+\), an x that is feasible in the primal problem (1) is \(\beta \)-optimal if and only if

$$\begin{aligned} \varepsilon (x,u) + \delta (x,u) \le f^*-\theta (u)+\beta \end{aligned}$$

holds.

The functions \(\varepsilon \) and \(\delta \) were introduced in [7, 8] (although those works did not include the maximum operator in the definition of \(\delta \)). Further, their values were interpreted in the Lagrangian dual space; see also Fig. 6 below. We here give an interpretation with respect to the supporting hyperplane illustration of the duality gap.

2 Supporting hyperplane illustrations

We now consider the case \(m=1\), introduce auxiliary variables \(z \in \mathbb {R}\) and \(v \in \mathbb {R}\), which describe values of functions f and g, respectively, and define the set \((g,f)(X) = \left\{ (g(x),f(x))~|~x \in X \right\} \subset \mathbb {R}^2\). The Lagrangian relaxed problem (2) can then be restated as

$$\begin{aligned} \theta (u) = \mathrm{infimum} ~\,&z + uv, \end{aligned}$$
(7a)
$$\begin{aligned} \mathrm{subject~to} ~\,&(v,z) \in (g,f)(X). \end{aligned}$$
(7b)

Figures 1 and 2 show the classical geometric illustrations of Lagrangian dualization, see e.g. [2], for the case of a zero and a positive duality gap, respectively. Here, and in the remainder of this section, \(\cdot \,^*\) denotes an optimal value. Points in the set (gf)(X) with \(g(x)\le 0\) are indicated by the gray area, and \((g^*,f^*)=(g(x^*),f(x^*))\).

Fig. 1
figure 1

Classical illustration for a zero duality gap

Fig. 2
figure 2

Classical illustration for a positive duality gap

The functions \(\varepsilon \) and \(\delta \) are now introduced, and in Fig. 3 we show their optimal values. Since \(\varepsilon (x^*,u^*)\) measures the degree of near-optimality of \(x^*\in X\) in the Lagrangian relaxation (2), the line \(z + u^*v = \theta ^* + \varepsilon (x^*,u^*)\) will pass through the point \((g^*,f^*)\). This line intersects the z-axis at \(\theta ^* + \varepsilon (x^*,u^*)\). The geometric interpretation of \(\delta (x^*,u^*)\) follows from \(\varGamma =\varepsilon (x^*,u^*) + \delta (x^*,u^*)\). Alternatively, it follows from the definition \(\delta (x,u) = \max \{ 0, - u^\mathrm{T} g(x)\}\), giving \(\delta (x^*,u^*) = - u^*g^*\).

Fig. 3
figure 3

Geometric interpretation of \(\varepsilon \) and \(\delta \) for \(x^*\) and \(u^*\)

Next, in Fig. 4, we illustrate the dissection of \(f(x) - \theta (u)\) into \(\varepsilon (x,u)\) and \(\delta (x,u)\) for a non-optimal primal feasible solution \(\bar{x}\) and a non-optimal \(\bar{u}\in \mathbb {R}^m_+\). Here, \((\bar{g},\bar{f})=(g(\bar{x}),f(\bar{x}))\). Since \(\bar{u}\) is not optimal, the line \(z+\bar{u}v = \theta (\bar{u})\) supports the set (gf)(X) at only one point (which may correspond to an \(x \in X\) that is feasible or infeasible). The construction of the geometric interpretation of \(\varepsilon (\bar{x},\bar{u})\) and \(\delta (\bar{x},\bar{u})\) follows the same arguments as in Fig. 3.

Fig. 4
figure 4

Geometric interpretation of \(\varepsilon \) and \(\delta \) for non-optimal \(\bar{x}\) and \(\bar{u}\)

To make the interpretations very concrete, we conclude this section with a detailed analysis of a numerical example, which is a knapsack problem.

$$\begin{aligned} f^* = \mathrm{minimize} ~\,&f(x)=5x_1+8x_2+13x_3+14x_4+11x_5 \end{aligned}$$
(8a)
$$\begin{aligned} \mathrm{subject~to} ~\,&g(x)=28-14x_1-12x_2-11x_3-8x_4-5x_5 \le 0 \end{aligned}$$
(8b)
$$\begin{aligned}&x \in X = \{0,1\}^5 \end{aligned}$$
(8c)

The optimal solution is \(x^*=(1,1,0,0,1)\) and \(f^*=24\), with \(g^*=-3\). The dual optimum is \(u^*=\frac{13}{11}\) and \(\theta ^*=15\frac{4}{11}\). Hence, \(\varGamma = 8\frac{7}{11}\), with optimal near-complementarity \(\delta (x^*,u^*) = -\frac{13}{11}(-3) = 3\frac{6}{11}\) and Lagrangian near-optimality \(\varepsilon (x^*,u^*) = \varGamma - \delta (x^*,u^*) = 5\frac{1}{11}\). The problem is illustrated in Fig. 5. The set (gf)(X) is here discrete and indicated by circles. The circles corresponding to primal feasible solutions are in gray, and \((g^*,f^*)\) is in black. For \(u=u^*\), the Lagrangian relaxed problem has the two optimal solutions \(x^1=(1,1,0,0,0)\) and \(x^2=(1,1,1,0,0)\), with \((g(x^1),f(x^1))=(2,13)\) and \((g(x^2),f(x^2))=(-9,26)\). The line \(z+u^*v=\theta ^*\) passes through these two points. The values \(\varepsilon (x^*,u^*)\) and \(\delta (x^*,u^*)\) can also be interpreted in the Lagrangian dual space [7, 8]. For our numerical example, this is shown in Fig. 6.

Fig. 5
figure 5

Geometric interpretation of \(\varepsilon \) and \(\delta \) for the numerical example

Fig. 6
figure 6

Geometric interpretation of \(\varepsilon \) and \(\delta \) in the Lagrangian dual space for the numerical example

3 A practical implication

The purpose of this section is to illustrate how the quantities \(\varepsilon \) and \(\delta \) can be exploited when designing solution approaches for certain problem structures. Preliminary results along this line of research are presented in [8]. We here present a slight extension of the findings from that reference.

We consider the Set Covering Problem (SCP) stated as

$$\begin{aligned} \text {minimize} \quad&\sum _{j\in \mathcal {J}} c_{j} x_{j} \end{aligned}$$
(9a)
$$\begin{aligned} \text {subject~to} \quad&\sum _{j\in \mathcal {J}} a_{ij} x_{j} \ge 1,\ i\in \mathcal {I}, \end{aligned}$$
(9b)
$$\begin{aligned}&0 \le x_j \le 1\ \text {and integer},\ j\in \mathcal {J}, \end{aligned}$$
(9c)

where \(\mathcal {J}= \{1,\ldots ,n\}\), \(\mathcal {I}= \{1,\ldots ,m\}\), and all \(c_j > 0\) and \(a_{ij} \in \{0,1\}\). Lagrangian relaxing constraint (9b) with multipliers \(u \in \mathbb {R}^m_+\), the dual function is \(h:\mathbb {R}^m_+ \rightarrow \mathbb {R}\) with

$$\begin{aligned} h(u) = \sum _{i\in \mathcal {I}} u_i + \min _{x \in \{0,1\}^n} \sum _{j\in \mathcal {J}} \Big (c_j - \sum _{i\in \mathcal {I}} u_i a_{ij}\Big )x_j . \end{aligned}$$

The dual problem is \(h^*=\max _{u \in \mathbb {R}^m_+ } ~ h(u)\). This Lagrangian relaxation has the integrality property [9, p. 177]. Hence, \(h^*\) coincides with the optimal value of the linear programming relaxation of the SCP. Further, any optimal solution to the dual of the latter problem is an optimal solution to the Lagrangian dual problem. Since the upper bounds on the variables are redundant in SCP, we may consider the linear programming relaxation without these bounds. Let \(u^*\) be an optimal dual solution to this problem. Then \(\overline{c}_j=c_j-\sum _{i\in \mathcal {I}} u^*_i a_{ij} \ge 0\) holds for all \(j\in \mathcal {J}\).

For the SCP, \(\varepsilon :\{0,1\}^n \times \mathbb {R}^m_+ \rightarrow \mathbb {R}_+\) with

$$\begin{aligned} \varepsilon (x,u)=\sum _{i\in \mathcal {I}} u_i + \sum _{j\in \mathcal {J}} \Big (c_j-\sum _{i\in \mathcal {I}} u_ia_{ij}\Big )x_j-h(u), \end{aligned}$$

and \(\delta :\{0,1\}^n \times \mathbb {R}^m_+ \rightarrow \mathbb {R}_+\) with

$$\begin{aligned} \delta (x,u)= \max \Big \{ 0, -\sum _{i\in \mathcal {I}} u_i\Big (1-\sum _{j\in \mathcal {J}}a_{ij}x_j\Big )\Big \}. \end{aligned}$$

Let \(\bar{x}\) be any feasible solution to SCP. Since \(h^*=\sum _{i\in \mathcal {I}} u_i^*\), we get

$$\begin{aligned} \varepsilon (\bar{x},u^*) = \sum _{i\in \mathcal {I}} u_i^* + \sum _{j\in \mathcal {J}} \overline{c}_j\bar{x}_j - h^*= \sum _{j\in \mathcal {J}} \overline{c}_j\bar{x}_j . \end{aligned}$$
(10)

Further,

$$\begin{aligned} \delta (\bar{x},u^*) = -\sum _{i\in \mathcal {I}} u_i^*\Big (1-\sum _{j\in \mathcal {J}}a_{ij}\bar{x}_j\Big ). \end{aligned}$$
(11)

From (6) we have that \(\sum _{j\in \mathcal {J}}c_j\bar{x}_j-h^* = \varepsilon (\bar{x},u^*)+\delta (\bar{x},u^*)\), and in particular if \(\bar{x}\) is optimal we obtain that \(\varGamma = \varepsilon (\bar{x},u^*) + \delta (\bar{x},u^*)\).

We study 11 challenging SCP problem instances taken from the OR-Library [10]; details concerning the computational setup can be found in [8]. The instances are listed in Table 1. The first five are artificial and taken from [11], and the other six originate from a rail crew scheduling application [12]. The former have a density of 5% (by construction) and the latter have densities between 0.2% and 1.3%. Three of the instances could be solved to proven optimality.

Table 1 Results for 11 challenging SCP problem instances

For the optimal or best found solution, denoted \(x^*\), and its objective value \(z^*_\text {IP}\), we calculate the following quantities: relative gap \(\varGamma _{\text {rel}}:=(z^*_{\text {IP}}\) \(-~h^*)/h^*\), relative near-optimality \(\varepsilon _{\text {rel}}:=\varepsilon (x^*,u^*)/(z^*_{\text {IP}}-h^*)\), and relative near-complementarity \(\delta _{\text {rel}}:=\delta (x^*,u^*)/(z^*_{\text {IP}}-h^*)\). We also calculate the quantity Average Excess Coverage \((\text {AEC}):=\tfrac{1}{m}\sum _{i\in \mathcal {I}} (\sum _{j\in \mathcal {J}} a_{ij}x^*_j - 1)\). Note that \(\varepsilon _{\text {rel}} + \delta _{\text {rel}} = 1\).

The functions \(\varepsilon \) and \(\delta \) depend on both x and u. Hence, if there are alternative optimal primal or dual solutions, then the contributions of \(\varepsilon \) and \(\delta \) to the duality gap may vary between these solutions; this was noticed already in [6]. To study this aspect, we solved the two problems

$$\begin{aligned} \min /\max \Bigg \{\delta (x,u^*) \ \Big \vert \ \sum _{j\in \mathcal {J}} c_{j} x_{j} \le z_{\text {IP}}^{*},\ \sum _{j\in \mathcal {J}} a_{ij} x_{j} \ge 1,\ i\in \mathcal {I}, \ x \in \{0,1\}^n \Bigg \}. \end{aligned}$$

Their optimal values give the full range for \(\delta (x,u^*)\) over all solutions to SCP that are at least as good as \(x^*\). These problems are actually sometimes harder to solve than the original SCP, but most were solved to proven optimality. Detailed results are given in Table 1. (The analysis of the full range of \(\delta (x,u)\) with respect to both optimal x and u is a much more complex task.)

As can be seen in the table, the primal–dual gap \(\varepsilon (x^*,u^*) + \delta (x^*,u^*)\) can be caused by either of the terms. For the first five instances, this gap is vastly dominated by the violation of complementarity, while for the rail instances it can be composed by either the Lagrangian near-optimality term or the near-complementarity term, or a combination of them. Further, large gaps are consistently caused solely by violation of complementarity, due to excess coverage of constraints.

Our observations can be utilized when designing core problem solution strategies for classes of set covering problems with known characteristics. A core problem is a restricted but feasible version of an original problem; such a problem should be of a manageable size and is constructed by selecting a subset of the original variables, see for example [12]. Our results indicate that if the duality gap is expected to be large then it can also be expected that the near-optimality term is relatively small. Since \(\varepsilon (x^*,u^*)=\sum _{j\in \mathcal {J}}\overline{c}_jx_j^* \ge 0\), it is then likely that \(x_j^*=0\) holds whenever \(\bar{c}_j\) is large. Therefore, variables with large values of \(\bar{c}_j\) can most likely be excluded from the core problem. Otherwise, if the gap is expected to be moderate, then the near-optimality term can be relatively large, and therefore the core problem should also contain variables with relatively large reduced costs. These conclusions give a theoretical justification of the core problem construction used in [12].

4 Conclusion

We have extended the classical supporting hyperplane illustration of the duality gap for non-convex optimization problems, by dissecting the gap into two contributions: near-optimality in the Lagrangian relaxation and near-complementarity in the Lagrangian relaxed constraints. This dissection adds improved understanding of the nature of the duality gap. We have also demonstrated that this dissection may have implications on the design of solution approaches.