1 Introduction

Studying polyhedral structures lies at the heart of mixed-integer programming. It is well-known to anyone in the field that a good understanding of the facial structure of a given integer linear optimization problem both informs theory and practical algorithm development in a very beneficial way. This includes tight problem relaxations, extended formulations, cutting plane algorithms, only to name a few. A more or less complete understanding of a polyhedron can be claimed if one accomplishes a so-called convex-hull proof, i.e. a proof that a given inequality description is sufficient to describe its feasible set. The book [19] includes a popular list of possible approaches to obtain such a proof. They include total unimodularity, TDI-ness, projection or a direct proof that all vertices are integral, among a couple of others. For all of these approaches, there are numerous examples where they have been used successfully, and typically each of these methods works especially well for particular types of problems (such as total unimodularity for network-type problems or TDI-ness for balanced matrices)

A relatively new technique for convex-hull proofs has been given in [22] by Zuckerberg, with precursors in [1, 16, 21]. It is a geometric approach based on subset algebra. The core of Zuckerberg’s method is a novel type of criterion for showing that any given point within a given polytope H (in H-description) lies within a second polytope P (in V-description), with the aim to show \( P = H \). It works by giving an implicit (rather than an explicit), set-theoretic representation of this point as a convex combination of the vertices of the latter. The actual proof takes the form of an algorithm which computes such a set-theoretic representation. In contrast to other popular approaches, like TU or TDI, Zuckerberg’s method is constructive, in the sense that it is even possible to obtain an algorithm to compute the convex combination in explicit form. Zuckerberg himself refers to his method either as geometric convex-hull proofs or as the proof-by-picture method, because this algorithm and its result can be visualized in a diagram incorporating all necessary information. For the sake of simplicity, we will use the name Zuckerberg’s method throughout to refer to this technique as well as our extensions of it.

Although the examples outlined in [22] already convey the impression of a very powerful proof technique, it has only been scarcely adopted in the literature so far. We assume that this is due to the rather heavy algebraic framework that has been used to derive and state the method. Zuckerberg has stated his method in terms of abstract measure spaces over which set-theoretic expressions have to be derived. In the recent work [11], the authors give a significantly simplified version of his approach by passing over to a concrete measure space: a real interval equipped with the Lebesgue measure. Their actual aim in this article are proofs on the facial structure of the graphs of bilinear functions. However, they also give a short introduction to his proof technique and find a way to state it mostly without using set-algebraic terms. Afterwards, they show that their simplification of Zuckerberg’s method allows for elegant proofs of their statements. In [12], the authors continue the work of [11] and give further convex-hull results on special graph classes. The authors of [6] have adopted their simplified approach of Zuckerberg’s method in order to give convex-hull proofs for special cases of the Boolean quadric polytope (see [18]) with multiple-choice constraints.

In [1, 21], the authors introduced the Bienstock–Zuckerberg hierarchy, which has since then been simplified in [2, 9]. As stated in [22], Zuckerberg’s method arose as a by-product of the analysis of this hierarchy in [21]. While the Bienstock–Zuckerberg hierarchy itself entails a procedure to obtain increasingly tighter convex relaxations for integer sets, Zuckerberg’s method is an approach for proving the completeness of convex-hull descriptions. Our work builds upon the latter.

Contribution In the present article, we aim to show the power and flexibility of Zuckerberg’s approach to conduct convex-hull proofs. To this end, we give a concise and accessible derivation of the technique and relate it to the method in its original form. Our novel way to introduce the method is based on so-called set characterizations, which provide a structured way of devising the algorithmic parts of the convex-hull proofs. It directly relates the set-theoretic representations to be found to the constraints determining the integer points within the polyhedron to be analysed. Most notably, we use this concept to significantly increase the scope of Zuckerberg’s method. While the original method is only applicable to 0/1-polytopes, we extend it from binary polytopes to arbitrary, especially integer polyhedra and even much more general convex sets.

We demonstrate the wide applicability of our set characterization framework by reproving several known convex-hull results for both binary and integral polyhedra. To facilitate the design of Zuckerberg convex-hull proofs, we connect these examples with the introduction of three basic proof strategies, namely greedy placement, feasibility subproblems and transformation. Altogether, this allows us to give simple constructions to represent a fractional point in a given polyhedron as a convex combination of its vertices where this was not straightforward before. Moreover, we give further extensions of the method to enable convex-hull proofs for function graphs over polytopes. On the one hand, these extensions allow to prove convex-hull descriptions for graphs of Boolean functions over 0/1-polytopes. On the other hand, they can be applied to bilinear functions over arbitrary polytopes, generalizing the result from [11] for bilinear functions over unit-boxes. In summary, we show that Zuckerberg’s method is a valuable tool for conducting convex-hull proofs. At the same time, our extensions of the framework even allow to use it in much more general cases.

Structure This article is structured as follows. We start by giving a detailed introduction to Zuckerberg’s proof technique for 0/1-polytopes in Sect. 2. We also establish our framework of set characterizations for geometric convex-hull proofs. Section 3 features three indicative examples of its application. Each example highlights a novel algorithmic strategy to conduct Zuckerberg-type convex-hull proofs. In Sect. 4, we generalize Zuckerberg’s method to arbitrary convex sets by passing from one-dimensional set-theoretic representations to two-dimensional ones. In particular, we will derive new techniques for convex-hull proofs for the case of integer polyhedra. Analogously to the binary case, Sect. 5 gives examples for the use of our extended technique in the context of mixed-binary optimization problems. Among others, we show how to use the scheme to construct the representation of any point inside an integer polyhedron as a convex combination of its vertices (or, alternatively, other interior integral points). In Sect. 6, we derive further extensions of our approach which allow to give convex-hull proofs for the graphs of Boolean and bilinear functions over polytopal domains and introduce a generalized framework of set characterizations for this purpose. Our conclusions can be found in Sect. 7. Finally, in the online supplement [7] to this article, we provide several further examples for the application of our framework in the context of stable-set problems, mixed-integer models for piecewise linear functions as well as interval matrices and give some proofs omitted in Sect. 6.

Notation To facilitate notation, we denote the power set of a set A by \( {\mathcal {P}}(A) \). Further, we write [n] for the set \( \{1, \ldots , n\} \) for any \( n \in \mathbbm {N}\). Especially, \( [0] :=\emptyset \).

2 Geometric convex-hull proofs for 0/1-polytopes

In this section, we revisit Zuckerberg’s method for convex-hull proofs for combinatorial decision or optimization problems (see [1, 22]). We start by briefly summarizing it, based on the condensed version of the method that was derived in [11]. Then we introduce the concept of set characterizations to significantly simplify the derivation of the set construction algorithms which form the core of Zuckerberg-type convex-hull proofs. Furthermore, we give set characterizations for many types of constraints which typically occur in combinatorial optimization and give some first indicative examples for their practical use. Finally, we put our new approach into context with the original framework by Zuckerberg to highlight how much simpler convex-hull proofs can now be conducted.

Consider a 0/1-polytope \( P :={{\,\mathrm{conv}\,}}({\mathcal {F}}) \) with vertex set \( {\mathcal {F}}\subseteq \{0, 1\}^n \) together with a second polytope \( H \subseteq \mathbbm {R}^n \) which is given via an inequality description. If we want to prove \( P = H \), we can proceed by verifying both \( {\mathcal {F}}\subseteq H \) and \( H \subseteq P \). The first inclusion is typically easy to show; for the latter we can use Zuckerberg’s method, as outlined in the following.

Define \( U :=[0, 1) \), let \( {\mathcal {L}}\) be the set of all unions of finitely many half-open disjoint subintervals of U, and let \( \mu \) be the Lebesgue measure restricted to \( {\mathcal {L}}\), that is

$$\begin{aligned} {\mathcal {L}}&:= \left\{ \, \bigcup \limits _{i = 1}^k [a_i, b_i) \ \bigg \vert \ k \in \mathbbm {N}\wedge 0 \le a_1< b_1< a_2< b_2< \cdots< a_k < b_k \le 1 \, \right\} ,\\ \mu (S)&:=\sum _{i = 1}^k (b_i - a_i) \text { for any } S = [a_1, b_1) \cup \cdots \cup [a_k, b_k) \in {\mathcal {L}}. \end{aligned}$$

Consider now the indicator function \( \phi :U \times {\mathcal {L}} \rightarrow \{0, 1\} \),

$$\begin{aligned} \phi (t, S) :={\left\{ \begin{array}{ll} 1 &{} \text { if }\; t \in S,\\ 0 &{} \text { otherwise}, \end{array}\right. } \end{aligned}$$

and let \( \varphi :U \times {\mathcal {L}}^n \rightarrow \{0, 1\}^n , \varphi (t, S_1, \ldots , S_n) = v \), where \( v_i :=\phi (t, S_i) \) for \( i \in [n] \). In other words, \( \varphi \) maps the sets which are active at a certain \( t \in U \) onto the corresponding incidence vector in \( \{0, 1\}^n \).

The following result uses the above formalism to give a concise criterion for H being a complete polyhedral description of \( {{\,\mathrm{conv}\,}}({\mathcal {F}}) \).

Theorem 1

([11, Theorem 4], Zuckerberg’s convex-hull characterization) Let \( {\mathcal {F}}\subseteq \{0, 1\}^n \) and \( h \in [0, 1]^n \). Then we have \( h \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) iff there are sets \( S_1, \ldots , S_n \in {\mathcal {L}}\) such that both \( \mu (S_i) = h_i \) for all \( i \in [n] \) and \( \varphi (t, S_1, \ldots , S_n) \in {\mathcal {F}}\) for all \( t \in U \).

Theorem 1 provides a certificate for a point \( h \in H \) to be in \( {{\,\mathrm{conv}\,}}({\mathcal {F}}) \). Thus, if we can find sets \( S_1, \ldots , S_n \) as required by Theorem 1 for each point \( h \in H \), we have shown \( H \subseteq P \) as well. Using the above framework even allows us to write a point \( h \in H \) as a convex combination of points in \( {\mathcal {F}}\), as the following corollary tells us. This yields an algorithm which produces integer feasible solutions to the problem. To this end, we define

$$\begin{aligned} L_{\xi }(S_1,\dotsc ,S_n) := \left\{ \, t\in U \ \bigg \vert \ \varphi (t, S_1, \ldots , S_n) = \xi \, \right\} . \end{aligned}$$

to denote the support of a each vertex \( \xi \in {\mathcal {F}}\) in U with respect to \(S_1,\ldots ,S_n\).

Corollary 1

(Convex combinations) Under the same assumptions as in Theorem 1, let \( \lambda _\xi :=\mu (L_{\xi }(S_1,\dotsc ,S_n)) \) for each \( \xi \in {\mathcal {F}}\). Then we have \( h = \sum _{\xi \in {\mathcal {F}}} \lambda _\xi \xi \), \( \sum _{\xi \in {\mathcal {F}}} \lambda _\xi = 1 \) and \( \lambda _\xi \ge 0 \) for all \( \xi \in {\mathcal {F}}\).

The above corollary was not stated explicitly in [11], but it is one direction of the proof of Theorem 4 therein. We already remark here that both Theorem 1 and Corollary 1 are special cases of the results we will prove in Sect. 4 for general convex sets (and integer polyhedra in particular).

In combinatorial optimization, the vertex set \( {\mathcal {F}}\) is typically implicitly defined via an inequality description that separates the feasible binary points from the infeasible ones (and not more). We will now show that based on such a description, we can make the expression \( \varphi (t, S_1, \ldots , S_n) \in {\mathcal {F}}\) in Theorem 1 more concrete. For this purpose, we translate each constraint defining \( {\mathcal {F}}\) into a logic statement of the following form.

Definition 1

(Set characterization of a constraint) Let \( f :\{0, 1\}^n \rightarrow \mathbbm {R} \), let \( b \in \mathbbm {R}\), and let \( S_1, \ldots , S_n \in {\mathcal {L}}\). The set characterization of some constraint \( f(x) \le b \) is the following logic statement:

$$\begin{aligned} f(\phi (t, S_1), \ldots , \phi (t, S_n)) \le b \quad \text { holds for all }\quad t \in U. \end{aligned}$$

Note that this definition allows for arbitrary constraints on the incidence vectors, not only linear ones. We now observe that if \( {\mathcal {F}}\) is given by such an implicit outer description, we need to satisfy all set characterizations of the corresponding constraints to fulfil the requirements of Theorem 1 and Corollary 1.

Lemma 1

Let \( {\mathcal {F}}:=\{x \in \{0, 1\}^n \mid f_j(x) \le b_j\, \forall j \in [m]\} \) for some \( m \in \mathbbm {N}\). Further, let \( P :={{\,\mathrm{conv}\,}}({\mathcal {F}}) \), and let \( H \subseteq [0, 1]^n \) be some polytope. We have \( H = P \) iff both \( {\mathcal {F}}\subseteq H \) holds and for each \( h \in H \) there are sets \( S_1, \ldots , S_n \in {\mathcal {L}}\) with \( \mu (S_i) = h_i \) for all \( i \in [n] \) which satisfy the set characterization for each constraint \( f_j(x) \le b_j \), \( j \in [m] \).

Note that in principle it would suffice to find the sets \( S_i \) in the above lemma for those points h which are vertices of H. However, this usually does not simplify the resulting convex-hull proofs – at least these sets are not easier to place for vertices than for arbitrary points in H in the examples we present in this article. More generally, if one wants to use the assumption that h is a vertex of H, then it is necessary to characterize the vertices of H. If this is easily possible, one could probably directly prove that \( H \subseteq P \).

If some concrete function f is given, along with some \( b \in \mathbbm {R}\), then the set characterization for the constraint \( f(x) \le b \) given in Definition 1 can be simplified in many cases. To give a first example, take the constraint \( x_1 \le x_2 \) for some binary variables \( x_1, x_2 \in \{0, 1\}\). Its set characterization reads

$$\begin{aligned} \phi (t, S_1) \le \phi (t, S_2) \quad \forall t \in U. \end{aligned}$$

Recalling the definition of \( \phi \), this says that if for some \( t \in U \) the condition \( t \in S_1 \) holds, then \( t \in S_2 \) follows. So we can equivalently state the set characterization as \( S_1 \subseteq S_2 \).

For many common combinatorial constraints, we have derived corresponding simplified set characterizations, which are displayed in Table 1. The set characterizations of the constraints defining P as in Lemma 1 provide hints on how to effectively design the sets \( S_1, \ldots , S_n \) as we will see in the following indicative examples.

Table 1 Simplified set characterizations for combinatorial constraints with coefficients from \( \{-1, 0, 1\} \)

2.1 Connection between set characterization and algorithmic set construction

We consider the McCormick-linearization of a bilinear term as a first example to illustrate the use of set characterizations within convex-hull proofs. The example also illustrates that the set characterizations typically depend on the inequality description of \( {\mathcal {F}}\). Let

$$\begin{aligned} H :=\{(x, y, z) \in [0, 1]^3 \mid z \ge 0,\, z \le x,\, z \le y,\, x + y - z \le 1\}. \end{aligned}$$

We will compare the following two possible representations of the integral points in H:

$$\begin{aligned} {\mathcal {F}}_1&:=\{(x, y, z) \in \{0, 1\}^3 \mid z \le x, z \le y, x + y - z \le 1\}, \end{aligned}$$
(1)
$$\begin{aligned} {\mathcal {F}}_2&:=\{(x, y, z) \in \{0, 1\}^3 \mid xy = z\}. \end{aligned}$$
(2)

In (2), one single non-linear constraint replaces the three linear constraints in (1). For each constraint in the two representations, we need to derive a set characterization. We can directly take them from Table 1:

$$\begin{aligned} S_z \subseteq S_x, \quad S_z \subseteq S_y, \quad S_x \cap S_y \subseteq S_z \end{aligned}$$

for \( {\mathcal {F}}_1 \) and

$$\begin{aligned} S_x \cap S_y = S_z \end{aligned}$$
(3)

for \( {\mathcal {F}}_2 \). One directly sees that both set characterizations are equivalent. However, the second one is more compact. In both cases, the sets need to have Lebesgue measures equalling the coordinates of the arbitrary point \( h \in H \) to represent and need to satisfy the set characterizations of the constraints defining the vertex set. Throughout this article, we will give the convex-hull proofs via Zuckerberg’s method mainly in the form of algorithmic schemes to define sets fulfilling these two conditions of Lemma 1. As we will see, all these algorithms can be illustrated via diagrams depicting the constructed sets in a coordinate system.

The construction rule for the sets in the McCormick-example is given via the routine Define-McCormick-Subsets in Fig. 1. Based on representation (3), it places \( S_z \) such that it exhausts the total overlap of \( S_x \) and \( S_y \). By construction, \( \mu (S_x) = h_x \), \( \mu (S_y) = h_y \) and \( \mu (S_z) = h_z \) hold for all \( h \in H \). The inequalities in the definition of H further ensure that the so-defined sets are all subsets of U. This finishes the proof of \( H = {{\,\mathrm{conv}\,}}\{(x, y, z) \in \{0, 1\}^3 \mid xy = z\} \).

Fig. 1
figure 1

Routine Define-McCormick-Subsets (top), exemplary construction for the point h with \( (h_x, h_y, h_z) = (0.5, 0.7, 0.2) \). The solution can be written as a convex combination of \( h = 0.3 (1, 0, 0) + 0.2 (1, 1, 1) + 0.5 (0, 1, 0) \). Those parts of the sets that belong to the same vertices are marked with the same colour (colour figure online)

Once the sets for the given point h are constructed, Corollary 1 tells us how to derive the coefficients to express h as a convex combination of the vertices of H. The latter are given by \( \xi _1 :=(0, 0, 0) \), \( \xi _2 :=(1, 0, 0) \), \( \xi _3 :=(0, 1, 0) \) and \( \xi _4 :=(1, 1, 1) \) in our example. Each point \( t \in U \) is now mapped to some vertex \( \xi _t \) of H via the mapping \( \varphi \). By measuring the union of all points that map to a certain vertex, we can derive the coefficient for this vertex. For the routine Define-McCormick-Subsets, we obtain

$$\begin{aligned} \mu (L_{\xi _1}(S_1, \ldots , S_n))&= \mu ([h_x - h_z + h_y, 1)) = 1 - h_y - h_x + h_z,\\ \mu (L_{\xi _2}(S_1, \ldots , S_n))&= \mu ([0, h_x - h_z)) = h_x - h_z,\\ \mu (L_{\xi _3}(S_1, \ldots , S_n))&= \mu ([h_x, h_x - h_z + h_y)) = h_y -h_z,\\ \mu (L_{\xi _4}(S_1, \ldots , S_n))&= \mu ([h_x - h_z, h_x)) = h_z. \end{aligned}$$

Thus, we know \( h = (1 - h_x - h_y + h_z) \xi _1 + (h_x - h_z) \xi _2 + (h_y -h_z) \xi _3 + h_z \xi _4 \), cf. the example given in Fig. 1.

2.2 Non-uniqueness of set representations

In a second example, we illustrate that the choice of the set construction used for Lemma 1 determines which vertices are used to write a point \( h \in H \) as a convex combination of vertices in \( {\mathcal {F}}\). In particular, this choice is not unique.

Consider the two-dimensional unit-box \( H :=[0, 1]^2 \) and take \( {\mathcal {F}}:=\{0, 1\}^2 \). As there are no constraints on the binary points in \( {\mathcal {F}}\), no set characterization needs to hold. We thus only have to fulfil the measure criteria. In Fig. 2, we give two different construction rules for the sets \( S_x \) and \( S_y \) via the routines Define-Box-Subsets-A and Define-Box-Subsets-B. Note that the definition of H ensures that the sets \( S_x \) and \( S_y \) are always subsets of U. Both routines define valid choices for the two sets for each point \( h \in H \). However, the resulting convex combinations of h via vertices in \( {\mathcal {F}}\) obtained via Corollary 1 are different from each other.

Fig. 2
figure 2

The two routines Define-Box-Subsets-A (left top) and Define-Box-Subsets-B (left bottom) together with exemplary constructions for the point h with \( (h_x, h_y) = (0.5, 0.5) \) for Define-Box-Subsets-A (right top) and Define-Box-Subsets-B (right bottom). Routine Define-Box-Subsets-A results in the representation \( h = 0.5 (1, 1) + 0.5 (0, 0) \), while Define-Box-Subsets-B yields \( h = 0.5 (1, 0) + 0.5 (0, 1) \). Those parts of each set which belong to the same vertices in the convex combination representing h are marked with the same colour (colour figure online)

2.3 Connection to the original method

Zuckerberg’s method for proving convex-hull characterizations was first published in concise form in [22], although an antecedent had already appeared in his PhD thesis (see [21]). The main result is stated there in a very general form: instead of choosing subsets of a real line segment as described above, the sets could be chosen from an arbitrary measure space. This requires more complex definitions and notation. We will shortly review Zuckerberg’s original theorem here to put our approaches into context before we continue with and build upon the condensed version.

Using the same notation as above, we are given a 0/1-polytope \( P = {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) with vertex set \( {\mathcal {F}}\subseteq \{0, 1\}^n \) together with a second polytope H, and the task is to prove \( H \subseteq P \). According to Zuckerberg’s original approach, we first need to represent \( {\mathcal {F}}\) as a finite set-theoretic expression consisting of unions, intersections and complements of the sets

$$\begin{aligned} A_i := \left\{ \, a \in \{0, 1\}^n \ \bigg \vert \ a_i = 1 \, \right\} , \quad i = 1, \ldots , n. \end{aligned}$$

Let \( F(\{A_i\}) \) be such a representation of \( {\mathcal {F}}\). Note that this is possible for any \( {\mathcal {F}}\) as we can always choose \( {\mathcal {F}}= F_1(A_1, \ldots , A_n) :=\cup _{v \in {\mathcal {F}}} ((\cap _{i \in [n]:\, v_i = 1} A_i) \cap (\cap _{i \in [n]:\, v_i = 0} {\bar{A}}_i)) \), where \( {\bar{A}}_i \) denotes the complement of \( A_i \) (in \( \{0, 1\}^n \)). Zuckerberg’s original result can now be stated as follows.

Theorem 2

([22, Theorem 7]) Let \( {\mathcal {F}}\subseteq \{0, 1\}^n \), and let \( F(\{A_i\}) \) be a set-theoretic expression of finitely many unions, intersections and complementations of sets from \( \{A_1, \ldots , A_n\} \) such that \( F(A_1, \ldots , A_n) = {\mathcal {F}}\). Further, let \( {\mathcal {Q}}= (\hat{U}, {\hat{{\mathcal {L}}}}) \) be any algebra with a basic set \( \hat{U} \) and a family \( {\hat{{\mathcal {L}}}} \) of subsets of \( \hat{U} \), and let \( \varXi \) be any probability measure on \( {\mathcal {Q}}\). Then \( x \in [0, 1]^n \) belongs to \( {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) if there are sets \( S_i \in {\hat{{\mathcal {L}}}} \), \( i = 1, \ldots , n \) with \( x_i = \varXi (S_i) \) for all i and \( \varXi (F(\{S_i\})) = 1 \).

In order to use Theorem 2, we first need to find a set-theoretic expression to represent \( {\mathcal {F}}\). While the representation \( F_1 \) is always possible, it is not helpful, since it does not allow to easily derive criteria for how to find suitable sets \( S_i \). For instance, for the McCormick-example in Sect. 2.1, the vertex set can be written as \( {\mathcal {F}}= A_x \cap A_y \Leftrightarrow A_z \). One can easily verify that if the sets \( S_x \), \( S_y \) and \( S_z \) satisfy condition (3), namely \( S_x \cap S_y = S_z \), then \( \varXi (F(\{S_i\})) = 1 \) holds. Conversely though, there is no straightforward way to the derive set characterizations from the set-theoretic expression F. The possibility to directly derive set characterizations from the constraints defining \( {\mathcal {F}}\), however, significantly reduces the effort to conduct Zuckerberg convex-hull proofs and is only given in the simplified version. To introduce this concept is therefore one of the main contributions of this article.

The simplified version of Zuckerberg’s results we build on was introduced in [11] by choosing \( \hat{U} = U \), \( {\hat{{\mathcal {L}}}} = {\mathcal {L}}\) and \( \varXi = \mu \). The condition \( \varXi (F(\{S_i\})) = 1 \) can then be replaced by \( F(\{S_i\}) = U \). The authors also show that this allows to drop the set-theoretic expression F entirely and further allows to replace \( F(\{S_i\}) = U \) with \( \varphi (t, S_1, \ldots , S_n) \in {\mathcal {F}}\) for all \( t \in U \). Their main result is then Theorem 1 from above.

The real line is probably the simplest possible choice for the measure space in Theorem 2, and via Theorem 1 it has the same expressive power as any other measure space. Thus, on the one hand, the choice of more complex measure spaces might allow for easier-to-state convex-hull proofs in certain cases (which Zuckerberg himself states as an avenue for future research). On the other hand, however, the real line is sufficient to prove a vast variety of results, as the examples in the following section as well as those provided in [6, 11, 12, 22] show. Furthermore, it allows for a much more lightweight notation and enables us to use the concept of set characterizations we have introduced above. Finally, this concise form will enable us to derive several significant extensions of Zuckerberg’s approach, in particular a proof technique applicable to general convex sets and criteria for convex-hull proofs for graphs of certain functions over polytopal domains.

3 Set characterizations and proof strategies for binary problems

In the following, we will show how to use our concept of set characterizations to give Zuckerberg convex-hull proofs for more complex 0/1-polytopes. We do this by reproving several known, popular results to demonstrate how set characterizations help define the sets \( S_i \) for Lemma 1. The order in which to define these sets is highly problem specific. We will see that very often a certain “natural” ordering can be used to successfully conduct convex-hull proofs. In an example involving the shortest-path problem, we will use a topological ordering of the nodes of the underlying graph. The second example for a certain set-packing problem shows how to exploit a depth-first-search on a tree. It will also turn out here that we can use Zuckerberg’s method to compute the vertices spanning a point inside the polytope, which was not straightforward to do beforehand. And in the last example, where we consider the odd-hole inequality for the stable-set problem, we follow neighbourly nodes along the underlying cycle. These examples are representative for three promising general strategies to define the sets \( S_i \). The first one is a greedy strategy which places the sets according to local criteria. The second strategy extracts the placement of a group of sets from the solution of an auxiliary optimization problem. Finally, the third strategy transforms the point \( h \in H \) to an auxiliary point \( {\bar{h}} \in H \) for which the placement of the sets is easier, and afterwards retransforms the sets in order to the express the original point.

The core of a Zuckerberg convex-hull proof is an algorithmic scheme to define the sets required in Lemma 1. To this end, we first define the subroutine Partition in Fig. 3. It is useful in problems where the feasible set of binary points is constrained by multiple-choice constraints. Its inputs are a set \( S \in {\mathcal {L}}\) together with a list of diameters \( (w_1, \ldots , w_k) \in [0,1)^n \) for some \( k \ge 1 \) and with \(w_i \le \mu (S)\) for all \(i \in [k]\). The output is then a list of subsets \( (S_1, \ldots , S_k) \) of S with \( \mu (S_i) = w_i \) for all \( i \in [k] \). If \( w_1 + \cdots + w_k \le \mu (S) \) holds, these subsets are pairwise disjoint (cf. the set characterization for a multiple-choice constraint stated in Table 1).

Fig. 3
figure 3

Subroutine Partition (left) and exemplary output for defining three subsets of some set S (right)

3.1 The greedy strategy

In the greedy proof strategy, we place the current set to be defined to the first spot which satisfies all set characterizations, without considering the subsequent sets to be placed. When conducting Zuckerberg proofs, this is generally the first strategy one should try. This is because of its simplicity, and if this strategy works, it typically leads to very short proofs. We showcase the use of this technique for a polytope which arises as the convex hull of incidence vectors of certain paths in an acyclic graph.

Let \( G = (V, A) \) be a directed and acyclic graph (DAG). The node set V contains two special nodes s and d, and the goal is to find a path from s to d. For ease of exposition, the node s shall only have outgoing arcs, while d only has incoming arcs and each node should belong to at least one s-d-path. The set of feasible paths can be represented by introducing a binary variable \( x_a \in \{0, 1\}\) for each \( a \in A \) to model the choice of arcs together with the following system of linear constraints:

$$\begin{aligned} \sum _{a \in \delta ^{-}(s) } x_a&= 1 \end{aligned}$$
(4)
$$\begin{aligned} \sum _{a \in \delta ^{-}(v) } x_a - \sum _{a \in \delta ^{+}(v) } x_a&= 0 \quad \forall v \in V \setminus \{s, d\} \end{aligned}$$
(5)
$$\begin{aligned} \sum _{a \in \delta ^{+}(d) } x_a&= 1 \end{aligned}$$
(6)
$$\begin{aligned} 0 \le x_a&\le 1. \end{aligned}$$
(7)

We now give a Zuckerberg-type proof for the well-known result stating the integrality of the above system.

Theorem 3

Let \( P :={{\,\mathrm{conv}\,}}\{x \in \{0, 1\}^{|A |} \mid (4)\ to \ (7)\} \) be the s-d-path-polytope and \( H :=\{x \in [0, 1]^{|A |} \mid \ (4)\ to \ (7)\} \) its linear relaxation. Then we have \( P = H \).

Proof

It is obvious that \( P \subseteq H \). In order to prove \( H \subseteq P \), we need to transform the constraints (4) to (6) into set characterizations. Referring to Table 1, we can directly state them as follows:

$$\begin{aligned} |\{i \in \delta ^{-}(s) \mid t \in S_i\} | = 1&\quad \forall t \in U, \end{aligned}$$
(8)
$$\begin{aligned} |\{i \in \delta ^{-}(v) \mid t \in S_i\} | = |\{j \in \delta ^{+}(v) \mid t \in S_j\} |&\quad \forall v \in V \setminus \{s, d\}, \forall t \in U, \end{aligned}$$
(9)
$$\begin{aligned} |\{i \in \delta ^{+}(d) \mid t \in S_i\} | = 1&\quad \forall t \in U. \end{aligned}$$
(10)

Note that inequalities (7) do not have a set characterization of their own above as they are already implied by the fact that all sets need to be subsets of \( U = [0, 1) \). Further, inequality (6) is redundant and only stated for better readability. Therefore, set characterization (10) is already implied by (8) and (9).

For each point \( h \in H \), we now need to find sets \( S_a \) for all \( a \in A \) such that \( \mu (S_a) = h_a \) as well as set characterizations (8) to (10) hold. The sets \( S_a \) are defined via the routine Define-s-d-Path-Subsets presented in Fig. 4. The algorithm processes the nodes in the graph in topological order, where TopologicalSort is any routine producing such an order. In each iteration, it places the sets for all outgoing arcs of the current node via a call to the routine Partition. This ensures that conditions (8) to (10) are satisfied. By starting at node s and processing the nodes in topological order, we are sure that once a node is reached all sets for the incoming arcs have been defined. Finally, the make-up of subroutine Partition guarantees \( \mu (S_a) = h_a \) for all \( a \in A \). Thus, we have proved \( H \subseteq P \). \(\square \)

Fig. 4
figure 4

Routine Define-s-d-Subsets (top), exemplary graph with 6 nodes (bottom left) and possible output of the routine for the point \( h = (0.8, 0.1, 0.1, 0.6, 0.3, 0.4, 0.3, 0.2) \) (bottom right). There are four paths in the graph, namely \( (a_1, a_2, a_3) \), \( (a_8, a_4) \), \( (a_1, a_7, a_5) \) and \( (a_1, a_6, a_4) \). Those parts of the sets corresponding to a certain path are marked in the same colour (colour figure online)

The greedy proof technique is most promising if the problem at hand only features local constraints (like flow conservation or variable bounds) as they allow to place the sets in consecutive fashion. Constraints inducing global couplings between the variables make it harder to use. In the online supplement [7], we give further examples for the use of this technique in the context of clique and stable-set problems. A natural further question is whether the proof can be extended to general graphs instead of acyclic ones. The vertices are then not only paths, but rather combinations of paths and cycles. For the placement of the sets \(S_i\), we need to assign cycles to paths, for which there are many different options, and there is no clear order in which to process the sets to obtain a feasible solution.

3.2 Zuckerberg proofs via feasibility subproblems

A further strategy for Zuckerberg proofs is to place groups of related sets simultaneously. If the correct placement of these sets is too difficult to be stated explicitly, it can be worthwhile to define an auxiliary optimization problem from whose solution a feasible placement of the sets can be extracted. It is then necessary to prove that this subproblem is feasible for each point \( h \in H \) to be tested. In case the optimization problem is a linear program, one can try to use the Farkas lemma for the feasibility proof. We highlight this technique at the hand of a polynomial-time solvable special case of the clique problem with multiple-choice constraints.

Let \( G = (V, E) \) be an m-partite graph for some \( m \ge 1 \), and let \( {\mathcal {V}}= \{V_1, \ldots , V_m\} \) be the corresponding partition of the node set V. The clique problem with multiple-choice constraints (CMPC) asks to find a clique of cardinality m in G. While it is NP-complete in general to decide if such a clique exists (see [5]), there are several relevant special cases where this is possible in polynomial time. These include CPMC under staircase compatibility [3, 4] and CPMC under a cycle-free dependency graph [5]. The referenced works give complete convex-hull descriptions for these two cases.

The CPMC polytope is the convex hull of all incidence vectors of m-cliques in G. In the online supplement [7], we will reprove the result from [3] that staircase compatibility allows for totally unimodular formulations of polynomial size for the CPMC polytope. Here we consider the case where there are no cyclic dependencies between the subsets \( V_i \). The authors of [5] give a complete convex-hull description for this case whose correctness they prove via the alternating colouration theorem (see [13]). Alternatively, they hint a proof via the strong perfect-graph theorem (see [8]). Both approaches lead to the result that the graph G is perfect. As the CPMC polytope is a face of the ordinary clique polytope on G, this readily yields a complete description of its convex hull. In the following, we will give a much more elementary convex-hull proof based on Zuckerberg’s method which does neither use alternating colourations nor perfectness. In addition, we will be able to state the vertices spanning any given point in the CPMC polytope, for which there is no obvious derivation using the approaches presented in [5].

Let \( {\mathcal {G}}:=({\mathcal {V}}, {\mathcal {E}}) \) with

$$\begin{aligned} {\mathcal {E}}:=\left\{ \{V_i, V_j\} \subseteq {\mathcal {V}}\mid (\exists u \in V_i)(\exists v \in V_j)\, \{u, v\} \notin E\right\} \end{aligned}$$

denote the dependency graph of G. Let \( G_{ij}= (V_{ij}, E_{ij})\) be the subgraph induced by \( V_i \cup V_j \). Note that \( \{V_i, V_j\} \in {\mathcal {E}}\) is equivalent to the subgraph  \( G_{ij}\) not being a complete bipartite graph. For ease of notation, we further define the neighbourhood \( N_j(U) \subseteq V_j \) of a subset \( U \subseteq V \) in \( V_j \) as

$$\begin{aligned} N_j(U) :=\{v \in V_j \mid (\exists u \in U)\, \{u, v\} \in E\}. \end{aligned}$$

It represents those nodes in \( V_j \) for which there is a compatible node in U.

We will now show that the CPMC polytope is completely described via the stable-set constraints and the trivial constraints if the dependency graph is a forest.

Theorem 4

[5, Theorem 3.1] Let

$$\begin{aligned} P(G, {\mathcal {V}}) :={{\,\mathrm{conv}\,}} \left\{ \, x \in \{0, 1\}^{|E |} \ \bigg \vert \ \begin{aligned} \sum _{v \in V_i} x_v&= 1 \quad \forall V_i \in {\mathcal {V}}\\ x_i + x_j&\le 1 \quad \forall \{i, j\} \notin E \end{aligned} \, \right\} \end{aligned}$$

be the CPMC polytope and

$$\begin{aligned} H(G, {\mathcal {V}}) :={{\,\mathrm{conv}\,}} \left\{ \, x \in [0, 1]^{|E |} \ \bigg \vert \ \begin{aligned} \sum _{v \in V_i} x_v&= 1 \quad \forall V_i \in {\mathcal {V}}\\ \sum _{v \in C} x_v&\le 1 \quad \forall \text { stable sets } C \subseteq V \end{aligned} \, \right\} \end{aligned}$$

its stable-set relaxation. If \( {\mathcal {G}}\) has no cycles, we have \( P(G, {\mathcal {V}}) = H(G, {\mathcal {V}}) \).

Proof

The inclusion \( P(G, {\mathcal {V}}) \subseteq H(G, {\mathcal {V}}) \) holds trivially. We now show the reverse inclusion. The procedure to define the sets \( S_i \) for \( i \in V \) is given via the two routines Define-CMPCF-Subsets and Traverse-Tree in Fig. 5.

Fig. 5
figure 5

Routine Define-CMPCF-Sets

The former routine iterates over all connected components in the dependency graph \( {\mathcal {G}}\), which are trees in our case. In Line 3, it selects an arbitrary node (subset in the partition) \( V_i \) as the root node of the current tree. Then it fixes an arbitrary ordering of the elements \( v \in V_i \) and places the corresponding sets next to each other via a call to subroutine Partition in Line 5. Finally, it traverses the tree recursively in Lines 6–8 by calling the routine Traverse-Tree, whose input is a subset \( V_i \) for which all sets have already been defined, together with a set \( V_j \), which is a neighbour of \( V_i \). The routine then places all sets for \( V_j \). To do so, it solves a linear feasibility problem in Line 12 which is defined as follows: the variables \( x_{ij} \in \mathbbm {R}_+ \) encode the measure of the overlap between the sets \( S_i \) and \( S_j \). These overlaps need to fulfil the set characterizations

$$\begin{aligned} |\{v \in V_i \mid t \in S_v\} |&= 1 \quad \forall V_i \in {\mathcal {V}}, \forall t \in U, \end{aligned}$$
(11)
$$\begin{aligned} |\{v \in C \mid t \in S_v\} |&\le 1 \quad \forall \text { stable sets } C \subseteq V, \forall t \in U, \end{aligned}$$
(12)

which leads to the following linear programming system:

$$\begin{aligned} \sum _{j \in N(i)} x_{ij}&= h_i \quad \forall i \in V_i, \end{aligned}$$
(13a)
$$\begin{aligned} \sum _{i \in N(j)} x_{ij}&= h_j \quad \forall j \in V_j, \end{aligned}$$
(13b)
$$\begin{aligned} x_{ij}&\ge 0 \quad \forall (i,j) \in E_{ij}. \end{aligned}$$
(13c)

In Lines 13–22, the routine chooses the sets for all elements in \( V_j \) accordingly. It then proceeds recursively in Lines 23–25.

It remains to show that problem Eq. (13) is feasible for all \( h \in H \). To prove this, we analyse its dual Farkas system, which is given by

$$\begin{aligned} \sum _{i \in V_i} h_i y_i + \sum _{j \in V_j} h_j y_j&< 0, \end{aligned}$$
(14a)
$$\begin{aligned} y_i + y_j&\ge 0 \quad \forall (i, j) \in E_{ij}. \end{aligned}$$
(14b)

We will prove by contradiction that Eq. (14) has no solution in order to show the feasibility of Eq. (13). To this end, consider some point \( h \in H \) and let \( {\bar{y}}\) be a corresponding solution of Eq. (14). We first argue that we can assume \( {\bar{y}}\in \{-1, 0, 1\}^{|V_{ij} |} \) w.l.o.g. Via rescaling, we can assume that the lowest entry of \( {\bar{y}}\) is \(-1\). Now let \( W :=\{k \in V_{ij} \mid {\bar{y}}_k < 0\} \). We can assume \( {\bar{y}}_k = 0 \) for all elements in \( V_{ij} \setminus (W \cup N(W)) \), since this is always feasible if \( {\bar{y}}_k > 0 \) was feasible. Further, let \( P :=(p_1, \ldots , p_q) \) be a sorted list of the elements in \( \{{\bar{y}}_k \in \mathbbm {R}\mid k \in W\} \) in decreasing order with \(q :=|P |\). For \( p \in P \), let \( Q_p :=\{k \in W \mid y_k = p\} \), and let \( R :=N(Q_{p_1}) \setminus N(Q_{p_2}) \cup \cdots \cup N(Q_{p_q}) \). Then check if \( \sum _{v \in R} h_v {\bar{y}}_v + \sum _{v \in Q_{p_1}} h_v {\bar{y}}_v \ge 0 \) holds. If yes, set \( {\bar{y}}_k = 0 \) for \( k \in Q_{p_1} \cup R \). If no, set \( {\bar{y}}_k = p_2 \) for \( k \in Q_{p_1} \) and \( {\bar{y}}_k = -p_2 \) for \( k \in R \). Now update W and let \( P :=(p_1, \ldots , p_{q - 1}) \) again be a sorted list of the elements in \( \{{\bar{y}}_k \in \mathbbm {R}\mid k \in W\} \) in decreasing order. This procedure lets P now contain precisely one element less than before. Repeat this until there is only one element in P left, which has to be \(-1\), so we can set \( {\bar{y}}_k = 1 \) for all \( k \in N(W) \). This way, we have found an integral solution to Eq. (14). We then have

$$\begin{aligned} \sum _{k \in N(W)} h_k {\bar{y}}_k + \sum _{k \in W} h_k {\bar{y}}_k&< 0,\\ \sum _{k \in N(W)} h_k - \sum _{k \in W} h_k&< 0,\\ 1 - \sum _{k \in V_{ij} \setminus N(W)} h_k - \sum _{k \in W} h_k&< 0,\\ \sum _{k \in V_{ij} \setminus N(W)} h_k + \sum _{k \in W} h_k&> 1. \end{aligned}$$

However, this is impossible, since the nodes \( (V_{ij} \setminus N(W)) \cup W \) form a stable set, which leads to a contradiction. \(\square \)

Via Corollary 1, this directly allows us to represent a point \( h \in H \) as a convex combination of the vertices of the CMPC polytope, which extends the results from [5].

This technique could be generalized by passing from linear to more complex auxiliary problems to determine the placement of the sets. The core of this proof technique consists in analysing the auxiliary problem to verify its feasibility for any inputs arising within the algorithmic scheme.

3.3 The transformation strategy

The third proof strategy we present makes use of the fact that it can be easier to place the sets for some points within a given polytope than for others. Thus, it is sometimes helpful to transform the arbitrary point to be tested for membership in Lemma 1 to another, auxiliary point first. Then, after placing the sets for this auxiliary point, they are retransformed to represent the original point. Such a transformation must respect the set characterizations of the vertex set. We present this technique exemplarily for the convex hull of all incidence vectors of stable sets in a single odd cycle.

The stable-set polytope of a graph \( G = (V, E) \) is defined as the convex hull of all vectors \( x \in \{0, 1\}^{|V |} \) that satisfy the edge constraints

$$\begin{aligned} x_i + x_j \le 1 \quad \forall (i, j) \in E. \end{aligned}$$
(15)

If G is a cycle, the odd-cycle inequality

$$\begin{aligned} \sum _{i \in V} x_i \le \frac{|V | - 1}{2} \end{aligned}$$
(16)

is valid for the corresponding stable-set polytope. For an odd-cycle, it is sufficient to describe the complete convex hull, together with inequalities (15) and the trivial inequalities.

Theorem 5

Let \( G = (V, E) \) be an odd hole, let \( P(G) :=\{x \in \{0, 1\}^{|V |} \mid \ (15) \ and \ (16)\} \) be the stable-set polytope on G, and let \( H(G) :=\{x \in [0, 1]^{|V |} \mid \ (15) \ and \ (16)\} \) be its linear relaxation. Then we have \( P = H \).

Proof

It is obvious that \( P(G) \subseteq H(G) \). For the converse, consider the set characterizations of (15) and (16), which are given by:

$$\begin{aligned} S_i \cap S_j&= \emptyset \quad \forall (i, j) \in E, \end{aligned}$$
(17)
$$\begin{aligned} |\{i \in B \mid t \in S_i\} |&\le \frac{|V | - 1}{2} \quad \forall t \in U \end{aligned}$$
(18)

(cf. Table 1). For a given point \( h \in H(G) \), we then need to find sets \( S_v \) for each \( v \in V \) such that \( \mu (S_v) = h_v \) and the above conditions hold. We define these sets in routine Define-Odd-Cycle-Stable-Sets-Subsets, given in Fig. 6. First, in Line 2/3, we fix an ordering of the nodes which respects the order of the cycle. In Lines 4–8, the point h is then shifted to a point \( {\bar{h}} \) on the boundary of H(G) by increasing h componentwise until in each iteration at least one of the inequalities (15) and (16) becomes active. By induction, for the resulting point \( {\bar{h}} \) the inequality \( {\bar{h}} \ge h \) holds component-wise and we have \( \sum _{i = 1}^{|V |} {\bar{h}}_i = (|V | - 1)/2 \). Now, auxiliary sets \( {\bar{S}}_v \), \( v \in V \), are placed in consecutive order along the cycle in Line 9, based on the diameters stored in \( {\bar{h}} \). Observe that, in particular, the first set is defined as \( {\bar{S}}_{v_1} = [0, {\bar{h}}_{v_1}) \) and the last one as \( {\bar{S}}_{v_{|V |}} = [1 - {\bar{h}}_{v_{|V |}}, 1) \), thus they satisfy set characterizations (17) and (18). Finally, in Lines 10–12, the diameters of the auxiliary sets are reduced such that they correspond to the components of h to obtain the final sets \( S_v \), \( v \in V \). It is obvious that these sets satisfy \( \mu (S_v) = h_v \) for all \( v \in V \), and the reduction does not invalidate any of the set characterizations (17) or (18). Therefore, we have proved \( H(G) \subseteq P(G) \). \(\square \)

Fig. 6
figure 6

Routine Define-odd-cycle-Stable-Set-Subsets (top), exemplary graph with five nodes (bottom left) and possible output of the routine for the point given by \( h = (0.5, 0.2, 0.3, 0.1, 0.1) \). It is blown up to \( {\bar{h}} = (0.8, 0.2, 0.8, 0.1, 0.1) \). The point h can be written as a convex combination of incidence vectors belonging to five stable sets, namely \( \{u_1, u_3\} \), \( \{u_1\} \), \( \emptyset \), \( \{u_2, u_4\} \) and \(\{u_2, u_5\} \), each non-empty one marked with same colour (bottom right) (colour figure online)

In the above proof, an auxiliary point \( {\bar{h}} \in H \) is constructed by greedily increasing the coordinates of the point h to be tested. The sets for \( {\bar{h}} \) are then placed next to each other, modulo 1 (the diameter of U). The backward transformation then simply shrinks the sets to fit the size of the original coordinates of h while maintaining the validity of all set characterizations. As shown in Fig. 6, the final sets after backward transformation are not always placed next to each other due to the gaps arising from the shrinking step. A direct placement of these sets for the original point seems to more involved, since it is not obvious how to calculate the gaps between adjacent sets a priori.

It is a natural consideration to extend the proofs of Theorems 4 and 5 to more general graphs. For example, it might be possible to use Zuckerberg’s method to show that the convex hull of the clique polytope on a perfect graph is given by the stable-set constraints and the non-negativity constraints. This would directly imply Theorems 4 after showing that the graph G is perfect under the stated assumption. Similarly, using Zuckerberg’s method to show that the stable-set polytope on a series-parallel graph is given by the edge constraints, the odd-hole constraints and the non-negativity constraints would reprove the according result from [17] and easily imply Theorem 5. However, both seem not straightforward to do, as one would first need to find a natural order along which to process each variable for placing the corresponding sets \( S_i \). A suitable decomposition approach based on graph structure is usually beneficial here. For example, in the case of series-parallel graphs, an ear-decomposition of the graph seems to be a good starting point. This could be an interesting topic for future research.

4 Extensions of Zuckerberg’s method for general convex sets

Both the original proof technique by Zuckerberg from [22] and its simplification in [11] are applicable to 0/1-polytopes only. In the following, we will derive extensions of Zuckerberg’s method which enable us to conduct geometric convex-hull proofs for arbitrary convex sets. This includes, in particular, general integer polyhedra. The underlying idea is to pass from intervals in \( U = [0, 1) \) to rectangles in \( \mathbbm {R}^2 \) when constructing the sets to represent a given point h in some convex set H. Recall that the original method interprets each of these dimension-many sets as either a 0- or a 1-coordinate of a vertex in a 0/1-polytope; a coordinate of the vertex which belongs to some \( t \in U \) is 1 if the corresponding set includes t, and 0 otherwise. The vertices associated with the sets representing a point h in the polytope define a convex combination spanning h. Our extension of Zuckerberg’s method gives the intervals making up these sets a height to encode the coordinates of arbitrary points in H instead of only 0/1-points. This idea will lead to generalized versions of the theorems in Sect. 2 which can be used to prove the completeness of convex-hull representations for general convex sets. Furthermore, they also allow to compute convex combinations spanning a certain \( h \in H \) using any points in H, not necessarily extreme points.

To formalize the new approach, we first define the set

$$\begin{aligned} {\mathcal {R}}^Q := \left\{ \, ([a, b), c) \in {\mathcal {P}}(Q) \times \mathbbm {R} \ \bigg \vert \ a < b,\, c \ne 0 \, \right\} , \end{aligned}$$

where Q is chosen as either \( U = [0, 1) \) or as \( \mathbbm {R}_+ \). The set Q specifies the range of coefficients which are allowed in a linear combination representing some \( h \in H \). We use \( Q = U \) to construct convex combinations and \( Q = \mathbbm {R}_+ \) for conic combinations. We interpret \( {\mathcal {R}}^Q \) as the set of all non-degenerate, axis-parallel rectangles \(R=([a, b), c) \in \mathbbm {R}^2 \), which are uniquely defined by stating the two diagonally opposite vertices  (a, 0) and  (bc) . The sign of c indicates if a rectangle points into the upper half-space (\( c > 0 \)) or the lower half-space (\( c < 0 \)). Let \( q(R) :=(b - a) c \) and \( z(R) :=c \) denote the signed area and the signed height of the rectangle R respectively. Further, let \( y:Q \times {\mathcal {R}}^Q \rightarrow \{0, 1\}\) be an indicator function defined as follows. For some \( t \in Q \) and \( R =([a, b), c) \in {\mathcal {R}}^Q \) it is \( y(t, R) = 1 \) if \( a \le t \le b \), and \( y(t, R) = 0 \) otherwise. In other words, y indicates whether t belongs to the support of R, in which case we call R active at t. We call two rectangles \( R_1 \) and \( R_2 \) non-overlapping if there exists no \( t \in Q \) such that both \( y(t, R_1) = 1 \) and \( y(t, R_2) = 1 \) hold.

In a similar fashion as in Sect. 2, we then define \( {\bar{{\mathcal {L}}}}^{Q} \) as the set of all unions of finitely many non-degenerate, non-overlapping rectangles from \( {\mathcal {R}}^Q \) and \( {\bar{\mu }}^{Q} \) as the Lebesgue measure restricted to \( {\bar{{\mathcal {L}}}}^{Q} \), that is

$$\begin{aligned} {\bar{{\mathcal {L}}}}^{Q}:= & {} \left\{ \, \{R_1,\ldots ,R_k\} \ \bigg \vert \ \begin{aligned}&k \in \mathbbm {N}\wedge R_1, \ldots , R_k \in {\mathcal {R}}^Q \wedge \\&R_i \text { and } R_j \text { are non-overlapping} \quad \forall i, j \in [k], i \ne j \end{aligned} \, \right\} \\ {\bar{\mu }}^{Q} (S):= & {} \sum _{i = 1}^k q(R_i) \text { for any } S = \{R_1, \ldots , R_k\} \in {\bar{{\mathcal {L}}}}^{Q} . \end{aligned}$$

Moreover, we define the indicator function \( {\bar{\phi }}^{Q} :Q \times {\bar{{\mathcal {L}}}}^{Q} \rightarrow \mathbbm {R} \),

$$\begin{aligned} {\bar{\phi }}^{Q} (t, S) :={\left\{ \begin{array}{ll} z(R) &{} \text {if } y(t, R) = 1 \text { for any } R \in \{R_1, \ldots , R_k\},\\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

where S is uniquely represented as \( S = \{R_1,\ldots ,R_k\} \) for some \( k \in \mathbbm {N}\) in \( R_i \in {\mathcal {R}}^Q \), \( i \in [k] \). It returns the height of the rectangle which is active at \( t \in Q \) if there is one. Note that the active rectangle is unique in this case as the \( R_i \) forming S are non-overlapping. Finally, let \( {\bar{\varphi }}^{Q} :Q \times ( {\bar{{\mathcal {L}}}}^{Q} )^n \rightarrow \mathbbm {R}^n , {\bar{\varphi }}^{Q}(t, S_1, \ldots , S_n) = v \), where \( v_i := {\bar{\phi }}^{Q} (t, S_i) \) for \( i \in [n] \). Here we interpret the heights of the rectangles which are active at t as the coordinates of a vector in \( \mathbbm {R}^n \).

With the above definitions, we are equipped to state our extensions of Zuckerberg’s method. As a useful shorthand notation used in the proofs, we define the union \( S_1 \cup S_2 \) of two sets \( S_1, S_2 \in {\bar{{\mathcal {L}}}}^{Q} \) as the unique \( S \in {\bar{{\mathcal {L}}}}^{Q} \) such that for all \( t \in Q \) we have \( {\bar{\phi }}^{Q} (t, S) = {\bar{\phi }}^{Q} (t, S_1) + {\bar{\phi }}^{Q} (t, S_2) \). Informally speaking, this means we add the heights of the rectangles which are active at a certain t to form the union S.

We start with an extension which enables us to conduct geometric convex-hull proofs for general convex sets.

Theorem 6

(Zuckerberg’s method for general convex sets) Let \( {\mathcal {F}}\subseteq \mathbbm {R}^n \) and \( h \in \mathbbm {R}^n \). Then we have \( h \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) iff there are sets \( S_1, \ldots , S_n \in {\bar{{\mathcal {L}}}}^{U} \) such that \( {\bar{\mu }}^{U} (S_i) = h_i \) for all \( i \in [n] \) and \( {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) for every \( t \in U \).

Proof

If \( h \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \), then there exist \( \xi ^1, \ldots , \xi ^r \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \), for some \( r \in \mathbbm {N}_+ \), such that h can be written as \( h = \lambda _1 \xi ^1 + \cdots + \lambda _r \xi ^r \) with \( \lambda _1 + \cdots + \lambda _r = 1 \) and \( \lambda _k \ge 0 \) for all \( k \in [r] \). We can then define a partition \( U = I_1 \cup \cdots \cup I_r \) by setting \( I_1 :=[0, \lambda _1) \) and \( I_k :=\left[ \lambda _1 + \cdots + \lambda _{k - 1}, \lambda _1 + \cdots + \lambda _k\right) \) for \( k \in \{2, \ldots , r\} \). This allows us to set

$$\begin{aligned} S_i :=\bigcup _{k \in [r]:\, \xi ^k_i \ne 0}(I_k, \xi ^k_i) \in {\bar{{\mathcal {L}}}}^{U} , \quad i \in [n], \end{aligned}$$

with rectangles \( (I_k, \xi ^k_i) \in {\mathcal {R}}^U \). For all \( i \in [n] \), we can conclude

$$\begin{aligned} {\bar{\mu }}^{U} (S_i) = \sum _{k \in [r]:\, \xi ^k_i \ne 0} {\bar{\mu }}^{U} ((I_k, \xi ^k_i)) = \sum _{k \in [r]:\, \xi ^k_i \ne 0} \lambda _k \xi ^k_i = \sum _{k \in [r]} \lambda _k \xi ^k_i = h_i. \end{aligned}$$

Furthermore, for every \( t \in U \) there is a unique index k with \( t \in I_k \), and thus we have

$$\begin{aligned} {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \in {{\,\mathrm{conv}\,}}({\mathcal {F}}). \end{aligned}$$

Conversely, if the \( S_i \) are sets with the described properties, let \( \xi ^1, \ldots , \xi ^r \) be an ordering of the elements in

$$\begin{aligned} \left\{ \, \xi \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi \;\text { for some }\; t \in U \, \right\} . \end{aligned}$$

The above set is finite, since each \( S_i \) is a finite union of rectangles. We can set, by slight abuse of notation,

$$\begin{aligned} \lambda _k := {\bar{\mu }}^{U} \left( \left\{ \, (t, 1) \in U \times \{1\} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \right) \end{aligned}$$

for \( k \in [r] \) to obtain the required convex representation \( h = \lambda _1 \xi ^1 + \cdots + \lambda _r \xi ^r \). To see this, we can easily verify

$$\begin{aligned} S_i = \bigcup _{k \in [r]:\, \xi ^k_i \ne 0} \left\{ \, (t, \xi _i^k) \in U \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \end{aligned}$$

for all \( i \in [n] \). We then conclude for all \( i \in [n] \):

$$\begin{aligned} \sum _{k \in [r]} \lambda _k \xi ^k_i&= \sum _{k \in [r]} {\bar{\mu }}^{U} \left( \left\{ \, (t, 1) \in U \times \{1\} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \right) \xi ^k_i\\&= \sum _{k \in [r]:\, \xi ^k_i \ne 0} {\bar{\mu }}^{U} \left( \left\{ \, (t, \xi ^k_i) \in U \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \right) \\&= {\bar{\mu }}^{U} \left( \bigcup _{k \in [r]:\, \xi ^k_i \ne 0} \left\{ \, (t, \xi ^k_i) \in U \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \right) \\&= {\bar{\mu }}^{U} (S_i) = h_i.\\ \end{aligned}$$

\(\square \)

Note that in the first line of the proof, h does not necessarily have to be a convex combination of points in \({\mathcal {F}}\), but we can also make use of points in \({{\,\mathrm{conv}\,}}({\mathcal {F}})\). This is by intention in order to allow for more freedom in conducting convex-hull proofs. In Sect. 4.1 we will give a corresponding example in which we express h as integer points (not vertices) inside \({\mathcal {F}}\).

Theorem 6 generalizes Theorem 1 in two ways. The method now works for arbitrary convex sets, instead of only 0/1-polytopes. Note that Zuckerberg’s original method can be recovered by discarding the height of the rectangles and only checking whether a given set is active at some \( t \in U \). We also remark that in Theorem 6 we can now write the point h as a linear combination of points in \( {{\,\mathrm{conv}\,}}({\mathcal {F}}) \), not only points in \( {\mathcal {F}}\). This allows an additional degree of freedom for convex-hull proofs which Theorem 1 does not offer.

Using our extended framework, we can also determine a representation of any given point \( h \in H \) as a convex combination of points in \( {\mathcal {F}}\) if we find corresponding sets \( S_1, \ldots , S_n \) fulfilling the requirements of Theorem 6. To state this result, we define the two sets

$$\begin{aligned} {\bar{{\mathcal {F}}}}(S_1, \ldots , S_n) := \left\{ \, \xi \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi \;\text { for some }\; t \in U \, \right\} \end{aligned}$$

and, for each \( \xi \in \mathbbm {R}^n \),

$$\begin{aligned} {\bar{L}}^{U}_{\xi }(S_1,\dotsc ,S_n) := \left\{ \, t \in U \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi \, \right\} . \end{aligned}$$

The following corollary then directly follows from the proof of Theorem 6.

Corollary 2

(Convex combinations for general convex sets) Under the same assumptions as in Theorem 6, let \( \lambda _\xi := {\bar{\mu }}^{U} ({\bar{L}}^{U}_{\xi }(S_1, \ldots , S_n)) \) for each \( \xi \in {\bar{{\mathcal {F}}}}(S_1, \ldots , S_n) \). Then we have \( h = \sum _{\xi \in {\bar{{\mathcal {F}}}}(S_1, \ldots , S_n)} \lambda _\xi \xi \), \( \sum _{\xi \in {\bar{{\mathcal {F}}}}} \lambda _\xi = 1 \) and \( \lambda _\xi \ge 0 \) for all \( \xi \in {\bar{{\mathcal {F}}}}(S_1, \ldots , S_n) \).

The definition of a set characterization (Definition 1) can now be restated in a more general form as well.

Definition 2

(Set characterization of a constraint) Let \( f :{\mathcal {F}} \rightarrow \mathbbm {R} \), let \( b \in \mathbbm {R}\), and let \( S_1, \ldots S_n \in {\bar{{\mathcal {L}}}}^{U} \). The set characterization of some constraint \( f(x) \le b \) is the following logic statement:

$$\begin{aligned} f( {\bar{\phi }}^{U} (t, S_1), \ldots , {\bar{\phi }}^{U} (t, S_n)) \le b \text { holds for all } t \in U. \end{aligned}$$

Similar to before, we can use this concept to facilitate finding sets \( S_i \) which characterize a point \( h \in H \) according to Theorem 6.

Lemma 2

Let \( {\mathcal {F}}:=\{x \in \mathbbm {Z}^n \mid f_j(x) \le b_j\, \forall j \in [m]\} \) for some \( m \in \mathbbm {N}\). Further, let \( P :={{\,\mathrm{conv}\,}}({\mathcal {F}}) \), and let \( H \subseteq \mathbbm {R}^n \) be some convex set. We have \( H = P \) iff both \( {\mathcal {F}}\subseteq H \) holds and for each \( h \in H \) there are sets \( S_1, \ldots , S_n \in {\bar{{\mathcal {L}}}}^{U} \) with \( {\bar{\mu }}^{U} (S_i) = h_i \) for all \( i \in [n] \) which satisfy the set characterization for each constraint \( f_j(x) \le b_j \), \( j \in [m] \) and \( {\bar{\phi }}^{U} (t, S_i) \in \mathbbm {Z}\) for all \( i \in [n], t \in U \).

Polyhedra, which are special convex sets, can be written as a convex combination of a finite sets of points plus a conic combination of a finite set of rays In the following, we give an alternative version of Theorem 6 for polyhedra making use of this fact.

Theorem 7

(Zuckerberg’s method for polyhedra) Let \( {\mathcal {F}}\subseteq \mathbbm {R}^n \) and \( {\mathcal {E}}\subseteq \mathbbm {R}^n \) be a finite, non-empty set of points, \( h \in \mathbbm {R}^n \). Then we have \( h \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) + {{\,\mathrm{cone}\,}}({\mathcal {E}}) \) iff there are sets \( S_1, \ldots , S_n \in {\bar{{\mathcal {L}}}}^{U} \) and sets \( S'_1, \ldots ,S'_n \in {\bar{{\mathcal {L}}}}^{\mathbbm {R}_+} \) such that \( {\bar{\mu }}^{U} (S_i) + {\bar{\mu }}^{\mathbbm {R}_+} (S'_i) = h_i \) for all \( i \in [n] \), \( {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) for all \( t \in U \) and \( {\bar{\varphi }}^{\mathbbm {R}_+}(t', S'_1, \ldots , S'_n) \in {{\,\mathrm{cone}\,}}({\mathcal {E}}) \) for all \( t' \in \mathbbm {R}_+ \).

Proof

If \( h \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) + {{\,\mathrm{cone}\,}}({\mathcal {E}}) \), then there exist \( \xi ^1, \ldots , \xi ^r \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) for some \( r \in \mathbbm {N}_+ \) and \( \zeta ^1, \ldots , \zeta ^q \in {{\,\mathrm{cone}\,}}({\mathcal {E}}) \) for some \( q \in \mathbbm {N}_+ \) such that h can be written as \( h = \lambda _1 \xi ^1 + \cdots + \lambda _r \xi ^r + \eta _1 \zeta ^1 + \cdots + \eta _q \zeta ^q \) with \( \lambda _1 + \cdots + \lambda _r = 1 \), \( \lambda _k \ge 0 \) for all \( k \in [r] \) and \( \eta _k \ge 0 \) for all \( k \in [q] \). Define now the partition \( U = I_1 \cup \cdots \cup I_r \) by setting \( I_1 :=[0, \lambda _1) \) and \( I_k :=\left[ \lambda _1 + \cdots + \lambda _{k - 1}, \lambda _1 + \cdots + \lambda _k\right) \) for \( k \in \{2, \ldots , r\} \). In addition, we define \( {\bar{I}}_1 :=[0, \eta _1) \) and \( {\bar{I}}_k :=\left[ \eta _1 + \cdots + \eta _{k - 1}, \eta _1 + \cdots + \eta _k\right) \) for \( k \in \{2, \ldots , q\} \). With

$$\begin{aligned} S_i :=\bigcup _{k \in [r]:\, \xi ^k_i \ne 0}(I_k, \xi ^k_i), \quad S'_i :=\bigcup _{k \in [q]:\, \zeta ^k_i \ne 0} ({\bar{I}}_k, \zeta ^k_i) \end{aligned}$$

for \( i \in [n] \), we find

$$\begin{aligned} {\bar{\mu }}^{U} (S_i) + {\bar{\mu }}^{\mathbbm {R}_+} (S'_i)&= \sum _{k \in [r]:\, \xi ^k_i \ne 0} {\bar{\mu }}^{U} ((I_k,\xi ^k_i)) + \sum _{k \in [q]:\, \xi ^k_i \ne 0} {\bar{\mu }}^{\mathbbm {R}_+} (({\bar{I}}_k, \zeta ^k_i))\\&= \sum _{k \in [r]:\, \xi ^k_i \ne 0} \lambda _k \xi ^k_i + \sum _{k \in [q]:\, \zeta ^k_i \ne 0} \eta _k \zeta ^k_i = \sum _{k \in [r]} \lambda _k \xi ^k_i + \sum _{k \in [q]} \eta _k \zeta ^k_i\\&= h_i. \end{aligned}$$

Moreover, for each \( t \in U \), there is a unique index k with \( t \in I_k \), and thus

$$\begin{aligned} {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \in {{\,\mathrm{conv}\,}}({\mathcal {F}}). \end{aligned}$$

Similarly, for each \( t \in \mathbbm {R}_+ \), there is either a unique index k with \( t \in {\bar{I}}_k \), or there is no such index. Thus, we conclude

$$\begin{aligned} {\bar{\varphi }}^{\mathbbm {R}_+}(t, S'_1, \ldots , S'_n) = \zeta \in {{\,\mathrm{cone}\,}}({\mathcal {E}}). \end{aligned}$$

Especially, if there is no index k as outline above, we have \( \zeta = 0 \).

Conversely, if \( S_i \) and \( S'_i \) are sets with the stated properties, let \( \xi ^1, \ldots \xi ^r \) be an ordering of the elements in

$$\begin{aligned} \left\{ \, \xi \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi \;\text { for some }\; t \in U \, \right\} , \end{aligned}$$

and let \( \zeta _1, \ldots \zeta _q \) be an ordering of the elements in

$$\begin{aligned} \left\{ \, \zeta \in {{\,\mathrm{cone}\,}}({\mathcal {E}}) \ \bigg \vert \ {\bar{\varphi }}^{\mathbbm {R}_+}(t, S'_1, \ldots , S'_n) = \zeta \;\text { for some }\; t \in \mathbbm {R}_+ \, \right\} . \end{aligned}$$

Both sets are finite, since all \( S_i \) and \( S'_i \) are finite unions of rectangles. For \( k \in [r] \) and \( p \in [q] \), we can set

$$\begin{aligned} \lambda _k&:= {\bar{\mu }}^{U} \left( \left\{ \, (t, 1) \in U \times \{1\} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \right) ,\\ \eta _p&:= {\bar{\mu }}^{\mathbbm {R}_+} \left( \left\{ \, (t, 1) \in \mathbbm {R}_+ \times \{1\} \ \bigg \vert \ {\bar{\varphi }}^{\mathbbm {R}_+}(t, S'_1, \ldots , S'_n) = \zeta ^p \, \right\} \right) \end{aligned}$$

to obtain the required convex representation \( h = \lambda _1 \xi ^1 + \cdots + \lambda _r \xi ^r + \eta _1 \zeta ^1 + \cdots + \eta _q \zeta ^q \). To this end, observe that for all \(i \in [n] \) we have

$$\begin{aligned} S_i&= \bigcup _{k \in [r]:\, \xi ^k_i \ne 0} \left\{ \, (t, \xi _i^k) \in U \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} ,\\ S'_i&= \bigcup _{k \in [q]:\, \zeta ^k_i \ne 0} \left\{ \, (t, \xi _i^k) \in \mathbbm {R}_+ \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{\mathbbm {R}_+}(t, S'_1, \ldots , S'_n) = \zeta ^k \, \right\} . \end{aligned}$$

For all \( i \in [n] \), this leads to

$$\begin{aligned} \sum _{k \in [r]} \lambda _k \xi ^k_i&= \sum _{k \in [r]} {\bar{\mu }}^{U} \left( \left\{ \, (t, 1) \in U \times \{1\} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \right) \xi ^k_i\\&= \sum _{k \in [r]:\, \xi ^k_i \ne 0} {\bar{\mu }}^{U} \left( \left\{ \, (t, \xi ^k_i) \in U \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \right) \\&= {\bar{\mu }}^{U} \left( \bigcup _{k \in [r]:\, \xi ^k_i \ne 0} \left\{ \, (t, \xi ^k_i) \in U \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) = \xi ^k \, \right\} \right) \\&= {\bar{\mu }}^{U} (S_i) \end{aligned}$$

and

$$\begin{aligned} \sum _{k \in [q]} \eta _k \zeta ^k_i&= \sum _{k \in [q]} {\bar{\mu }}^{\mathbbm {R}_+} \left( \left\{ \, (t, 1) \in \mathbbm {R}_+ \times \{1\} \ \bigg \vert \ {\bar{\varphi }}^{\mathbbm {R}_+}(t, S'_1, \ldots , S'_n) = \zeta ^k \, \right\} \right) \zeta ^k_i\\&= \sum _{k \in [q]:\, \zeta ^k_i \ne 0} {\bar{\mu }}^{\mathbbm {R}_+} \left( \left\{ \, (t, \zeta ^k_i) \in \mathbbm {R}_+ \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{\mathbbm {R}_+}(t, S'_1, \ldots , S'_n) = \zeta ^k \, \right\} \right) \\&= {\bar{\mu }}^{\mathbbm {R}_+} \left( \bigcup _{k \in [p]:\, \zeta ^k_i \ne 0} \left\{ \, (t, \zeta ^k_i) \in \mathbbm {R}_+ \times \mathbbm {R} \ \bigg \vert \ {\bar{\varphi }}^{\mathbbm {R}_+}(t, S'_1, \ldots , S'_n) = \zeta ^k \, \right\} \right) \\&= {\bar{\mu }}^{\mathbbm {R}_+} (S'_i). \end{aligned}$$

This yields

$$\begin{aligned} \sum _{k = 1}^r \lambda _k \xi ^k_i + \sum _{k = 1}^q \eta _k \zeta ^k_i = {\bar{\mu }}^{U} (S_i) + {\bar{\mu }}^{\mathbbm {R}_+} (S'_i) = h_i.\\ \end{aligned}$$

\(\square \)

When we conduct a convex-hull proof via Theorem 6, we implicitly write the given point h as a convex combination of other points in H (most often extreme points). In contrast, Theorem 7 allows us to express h as both a convex and conic combination of points spanning H. This is especially interesting for polyhedra, which can be split into a convex and a conic part. Both versions are valuable tools and allow for different proof strategies as the example in Sect. 4.2 shows.

If we succeed in giving a convex-hull via Theorem 7, we can again deduce convex and conic combinations afterwards.

Corollary 3

(Convex combinations for polyhedra) Under the same assumptions as in Theorem 7, let \( \lambda _\xi := {\bar{\mu }}^{U} ({\bar{L}}^{U}_{\xi }(S_1, \ldots , S_n)) \) for all \( \xi \in {\mathcal {F}}\) and \( \eta _\zeta := {\bar{\mu }}^{\mathbbm {R}_+} ({\bar{L}}^{\mathbbm {R}_+}_{\zeta }(S'_1, \ldots , S'_n)) \) for all \( \zeta \in {\mathcal {E}}\). Then we have \( h = \sum _{\xi \in {\mathcal {F}}} \lambda _\xi \xi + \sum _{\zeta \in {\mathcal {E}}} \eta _\zeta \zeta \), \( \sum _{\xi \in {\mathcal {F}}} \lambda _\xi = 1 \) with \( \lambda _\xi \ge 0 \) for all \( \xi \in {\mathcal {F}}\) and \( \eta _\zeta \ge 0 \) for all \( \zeta \in {\mathcal {E}}\).

It is straightforward to adjust the definition of set characterizations from Definition 2 to include the conic part as well. We will, however, skip this for reasons of space.

To define the sets in a convex-hull proof according to Theorems 6 or 7, it will be helpful to introduce the auxiliary function \( o:Q \times U \rightarrow {\mathcal {L}}\times U \),

$$\begin{aligned} o(t, a) :={\left\{ \begin{array}{ll} ([t, t + a), t + a) &{} \text {if } t + a \le 1,\\ ([t, 1) \cup [0, t + a - 1), t + a - 1) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

This function determines an interval starting at \( t \in Q \) and of diameter \( a \in U \), modulo 1. It returns an ordered pair consisting of the interval and its end point, which will be useful when placing rectangles adjacent to each other.

We will now give some indicative first examples to illustrate how the results derived in this section can be used to give convex-hull proofs.

4.1 Convex-hull proofs using interior points

We start with the example of a simplex to show that the point \( h \in H \) does not necessarily have to be written as a convex combination of vertices, but that it is also possible to characterize it via sets corresponding to other points in the interior. Let \( {\mathcal {F}}:=\{x \in \mathbbm {Z}^n \mid \sum _{i = 1}^n x_i \le b,\, x \ge 0\} \) and \( H :=\{x \in \mathbbm {R}^n \mid \sum _{i = 1}^n x_i \le b,\, x \ge 0\} \) with some \( b \in \mathbbm {N}\). The set characterizations for the simplex constraint and the non-negativity constraint can be stated as

$$\begin{aligned} \sum _{i = 1}^n {\bar{\phi }}^{U} (t, S_i)&\le b \quad \forall t \in U, \end{aligned}$$
(19)
$$\begin{aligned} {\bar{\phi }}^{U} (t, S_i)&\ge 0 \quad \forall i \in [n],\, \forall t \in U \end{aligned}$$
(20)

respectively. A possible construction of the sets \( S_i \) for Theorem 7 is given in routine Define-Simplex-Subsets-A, and a different variant is given in routine Define-Simplex-Subsets-B, both stated in Fig. 7. Via the first variant, the point \( h \in H \) is always written as a convex combination of vertices of H, while in the second one the point may also be represented using integral points inside the polytope. By construction, both routines return sets \( S_i \) with \( {\bar{\mu }}^{U} (S_i) = h_i \) for all \( i \in [n] \). The inequalities defining H ensure that the combined width of the rectangles fits into U, and thus the requirements of Theorem 7 are fulfilled in both cases. This yields two different proofs for \( H = {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) and shows the additional flexibility Theorem 7 offers.

Fig. 7
figure 7

Routines Define-Simplex-Subsets-A (top left) and Define-Simplex-Subsets-B (bottom left), exemplary constructions for the 3-dimensional simplex H with right-hand side \( b = 4\) for the point h with \( (h_1, h_2, h_3) = (1, 1.5, 0.8) \) for Define-Box-Subsets-A (top right) and Define-Box-Subsets-B (bottom right). Via Define-Box-Subsets-A, we obtain \( h = 0.25 (4, 0, 0) + 0.375 (0, 4, 0) + 0.2 (0, 0, 4) + 0.175 (0, 0, 0) \) while Define-Box-Subsets-B yields \( h = 0.3 (1, 2, 1) + 0.2 (1, 2, 0) + 0.5 (1, 1, 1) \). Those parts of the sets which belong to the same vertex are marked with the same colour; the numbers represent the height of each rectangle, i.e. the coordinates of the vertices (colour figure online)

4.2 Convex-hull proofs for unbounded polyhedra

We continue with a modification of the previous simplex example, where we demonstrate the difference it makes to apply either Theorems 6 or 7 when showing integrality of an unbounded polyhedron. Let \( {\mathcal {F}}:=\{x \in \mathbbm {Z}^n \mid \sum _{i = 1}^n x_i \ge b,\, x \ge 0\} \) and \( H :=\{x \in \mathbbm {R}^n \mid \sum _{i = 1}^n x_i \ge b,\, x \ge 0\} \) with some \( b \in \mathbbm {N}\). The set characterizations for the two constraints defining \( {\mathcal {F}}\) and H can be stated as

$$\begin{aligned}&\sum _{i = 1}^n {\bar{\phi }}^{U} (t, S_i) \ge b \quad \forall t \in U, \end{aligned}$$
(21)
$$\begin{aligned}&{\bar{\phi }}^{U} (t, S_i) \ge 0 \quad \forall i \in [n], \quad \forall t \in U. \end{aligned}$$
(22)

We can reuse routine Define-Simplex-Subsets-B from Fig. 7 to construct adequate sets \( S_i \) for Theorem 6, which proves the equivalence of \( {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) and H. An alternative representation of \( {\mathcal {F}}\) is given by \( {\mathcal {F}}= b{{\,\mathrm{conv}\,}}(e_1, \ldots , e_n) + {{\,\mathrm{cone}\,}}(e_1, \ldots , e_n) \). Using the construction provided by routine Define-Conv-Cone-Subsets in Fig. 8, we can invoke Theorem 7 and thus prove \( H = {{\,\mathrm{conv}\,}}({\mathcal {F}}) \) in an alternative fashion. We can use both methods in order to prove the same statement. However, we obtain different linear combinations representing a given point h. Theorem 6 gives us a convex combination of arbitrary points in H. In contrast, Theorem 7 returns two sets of points, vertices and rays, such that a convex combination of the vertices plus a conic combination of the rays yields h. Depending on the problem at hand, both strategies might be the one which is best suited for a convex-hull proof.

Fig. 8
figure 8

Routine Define-Simplex+Cone-Subsets (top), exemplary construction for the 3-dimensional polyhedron H with right-hand side \( b = 1\) for the point h with \( (h_1, h_2, h_3) = (1, 1.5, 0.8) \) for Define-Conv-Cone-Subsets (bottom). The latter decomposes h into \( h = g + v \), where \( g \approx 0.3 (1, 0, 0) + 0.45 (0, 1, 0) + 0.24 (0, 0, 1) \) and \( v \approx 0.7 (1, 0, 0) + 1.05 (0, 1, 0) + 0.56 (0, 0, 1) \). The vector g is represented by a convex combination of the vertices of the polytopal part of H, and v as a conic combination of its rays. Those parts of the sets which belong to the same vertex or ray are marked with the same colour; the numbers represent their coordinates (colour figure online)

4.3 Convex-hull proofs for non-linear convex sets

Finally, we show that our new criteria for convex-hull proofs can also be used with non-polyhedral convex sets. To do so, we use the example of the unit-ball in \( \mathbbm {R}^2 \). Let \( {\mathcal {F}}:=\{x \in \mathbbm {R}^2 \mid (x_1 - 1)^2 + (x_2 - 1)^2 = 1\} \) and \( H :=\{x \in \mathbbm {R}^2 \mid (x_1 - 1)^2 + (x_2 - 1)^2 \le 1\} \). The set characterization for the quadratic constraint defining \( {\mathcal {F}}\) is given by

$$\begin{aligned} ( {\bar{\phi }}^{U} (t, S_1) - 1)^2 + ( {\bar{\phi }}^{U} (t, S_2) - 1)^2&= 1 \quad \forall t \in U. \end{aligned}$$
(23)

Note that, unlike what Zuckerberg’s original method allows, the set \( {\mathcal {F}}\) is not only infinite, as in the previous example, but even uncountable.

A set construction fulfilling the prerequisites of Theorem 6 is given by routine Define-Ball-Subsets in Fig. 9.

Fig. 9
figure 9

Routine Define-Ball-Subsets (left), exemplary construction for the point h with \( (h_1, h_2) = (1.1, 1.9) \) (right). The routine returns the representation \( h \approx 0.39 (0.56, 1.9) + 0.61 (1.44, 1.9) \). The figure inside a rectangle indicates its height

It computes two points on the boundary of the unit-ball which have the same y-coordinate and then calculates the corresponding coefficients to represent h as a convex combination of the two.

5 Set characterizations for integer problems

In this section, we give two indicative convex-hull proofs to illustrate the potential of our extended Zuckerberg framework. We show that it can be applied to mixed-integer problems, which the original method does not allow. Furthermore, we show that it is well-suited to be used with a non-fixed right-hand side. At the same time, we demonstrate that out extension yields a constructive way of computing the representation of a point inside a integer polyhedron as a convex combination of its vertices (or other interior integral points)

5.1 Convex-hull proofs for mixed-integer problems

To give a prominent example for the use of our extended Zuckerberg framework in the mixed-integer case, we give a convex-hull proof for the (single-item) uncapacitated lot-sizing problem (LS-U for short). This problem asks for a cost-optimal production plan for a given product over n time periods to fulfil the customer demand \( d_j \in \mathbbm {R}_+ \) in each period \( j \in [n] \) (see [19] for an extensive introduction.)

The authors of [15] introduce the following extended formulation for the feasible set of LS-U (extended with respect to a straightforward formulation with linearly-many variables, cf. [19]): let the variable \( w_{uj} \in \mathbbm {R}_+ \) denote how much of the product is produced in period \( u \in [n] \) for sale in the same or later period \( j \in [n] \setminus [u - 1] \). Furthermore, variable \( y_u \in \{0, 1\}\) models the decision to perform any production in period \( u \in [n] \) or not. Then we can represent the set of feasible production plans as

$$\begin{aligned}&\sum _{u = 1}^j w_{uj} = d_j \quad \forall j \in [n], \end{aligned}$$
(24)
$$\begin{aligned}&w_{uj} \le d_j y_u \quad \forall u \in [n], \quad \forall j \in [n] \setminus [u - 1], \end{aligned}$$
(25)
$$\begin{aligned}&w_{uj} \in \mathbbm {R}_+ \quad \forall u \in [n], \quad \forall j \in [n] \setminus [u - 1], \end{aligned}$$
(26)
$$\begin{aligned}&y_u \in \{0, 1\}\quad \forall u \in [n]. \end{aligned}$$
(27)

Indeed, it is shown in [15] that the above model is integral, i.e. the set of solutions does not change when relaxing \( y \in \{0, 1\}^n \) to \( y \in [0, 1]^n \). We will give an alternative proof based on Theorem 6.

Theorem 8

[15] Let \( P :={{\,\mathrm{conv}\,}}\{(y, w) \in \{0, 1\}^n \times \mathbbm {R}^{n^2 - n}_+ \mid (24) \ and \ (25)\} \) and \( H :=\{(y, w) \in [0, 1]^n \times \mathbbm {R}^{n^2 - n}_+ \mid \ (24)\ and \ (25)\} \) its linear relaxation. Then we have \( H = P \).

Proof

The relation \( H \subseteq P \) can easily be seen. In order to prove the reverse, we transform the constraints defining P into set characterizations:

$$\begin{aligned}&\sum _{u = 1}^j {\bar{\phi }}^{U} (S_{w_{uj}}, t) = d_j \quad \forall j \in [n], \quad \forall t \in U, \end{aligned}$$
(28)
$$\begin{aligned}&{\bar{\phi }}^{U} (S_{w_{uj}}, t) \le d_t {\bar{\phi }}^{U} (S_{y_u}, t) \quad \forall \, 1 \le u \le j \le n, \quad \forall t \in U, \end{aligned}$$
(29)
$$\begin{aligned}&{\bar{\phi }}^{U} (S_{w_{uj}}, t) \in \mathbbm {R}_+ \quad \forall \, 1 \le u \le j \le n, \quad \forall t \in U, \end{aligned}$$
(30)
$$\begin{aligned}&{\bar{\phi }}^{U} (S_{y_u}, t) \in \{0, 1\}\quad \forall u \in [n], \quad \forall t \in U. \end{aligned}$$
(31)

For a given point \( h = (h_w, h_y) \in H \), a corresponding set construction is given in routine Define-Lot-Sizing-Sets in Fig. 10. In Lines 2–8, the routine places the sets for the y-variables such that (31) is satisfied. Then the w-variables are placed in Lines 9–17. The variables \( w_{uj} \) get a non-empty set only if \( h_{y_u} \ne 0 \). The corresponding sets \( S_{w_{uj}} \) are defined such that they have the same support as \( S_{y_{u}} \). The construction satisfies (28) and (30). Additionally, the defined sets fulfil \( {\bar{\mu }}^{U} (S_{w_{uj}}) = h_{w_{uj}} \) for all \( 1 \le u \le j \le n \) and \( {\bar{\mu }}^{U} (S_{y_u}) = h_{y_u} \) for all \( 1 \le u \le n \), which finishes the proof. \(\square \)

Fig. 10
figure 10

Routine Define-Lot-Sizing-Sets

Our Zuckerberg proof for the lot-sizing problem is an example of the greedy proof strategy from Sect. 3.1, now applied to the mixed-integer case. In the online supplement [7], we give further such examples in the context of mixed-integer models for piecewise linear functions.

5.2 Computing convex combinations

Via our extension of Zuckerberg’s method, it is also possible to represent points in an integer polyhedron via combinations of its vertices – a property which many common approaches for convex-hull proofs, such as total unimodularity, do not possess. We will demonstrate the principle at the hand of the well-known result that the incidence matrix of a bipartite graph is totally unimodular.

Let \( G = (V, E) \) be an undirected, bipartite graph, and let \( b \in \mathbbm {Z}^{|E |} \) be an arbitrary integral vector. Further, let P be the polytope defined as the convex hull of all vectors \( x \in \mathbbm {N}^{|V |} \) that satisfy

$$\begin{aligned} x_i + x_j \le b_{ij} \quad \forall \{i, j\} \in E. \end{aligned}$$
(32)

The constraint matrix A corresponding to system (32) is the transpose of a node-edge incidence matrix of a bipartite graph and thus totally unimodular. Thus, the integrality of system (32) is implied by the following famous characterization of integer polyhedra.

Theorem 9

(Hoffmann and Kruskal, [14]) Let \( A \in \{0, 1, -1\}^{m \times n} \). Then A is totally unimodular iff \( \{x \in \mathbbm {R}\mid Ax \le b, x \ge 0\} \) has only integral vertices for all \( b \in \mathbbm {Z}^n \).

We will now give an alternative, very simple integrality proof for system (32) based on Theorem 6.

Theorem 10

Let \( P :={{\,\mathrm{conv}\,}}\{x \in \mathbbm {N}^{|V |} \mid (32)\} \) and \( H :=\{x \in \mathbbm {R}_+^{|V |} \mid (32)\} \) its linear relaxation. Then we have \( P = H \).

Proof

The relation \( P \subseteq H \) is obvious. In order to prove \( H \subseteq P \), we transform constraint (32) into the set characterization

$$\begin{aligned} {\bar{\phi }}^{U} (S_i, t) + {\bar{\phi }}^{U} (S_j, t) \le b_{ij}&\quad \forall \{i, j\} \in E, \quad \forall t \in U. \end{aligned}$$
(33)

W.l.o.g., we can assume \( b_{ij} \ge 0 \) for all \( \{i, j\} \in E \), since otherwise the polytope H is empty. For each point \( h \in H \), we then need to find sets \( S_a \) for all \( a \in A \) such that they fulfil \( {\bar{\mu }}^{U} (S_a) = h_a \) and the above conditions hold. Let \( W, Y \subseteq V \) be a bipartition of G. The sets \( S_a \) are defined in routine Define-Incidence-Matrix-Subsets given in Fig. 11.

Fig. 11
figure 11

Routine Define-Incidence-Matrix-Subsets (left), exemplary construction for the graph with two connected nodes, with \(b_{12}=3\) and the point h with \( (h_1, h_2) = (1.4, 1.6) \) (right)

It places the sets differently depending on if they belong to W or Y. From the above construction it is apparent that for each \( h \in H \) the corresponding sets satisfy (33). Thus, we have proved \( H \subseteq P \). \(\square \)

Fig. 12
figure 12

Routine Get-convex-combinationincidence-matrix computes the integral points and the corresponding multipliers which express h as a convex combination of other integral points inside the polyhedron

Our proof via Zuckerberg’s method, as opposed to using total unimodularity has the advantage that we can now use Corollary 2 to explicitly construct a convex combination of vertices or other interior integral points to represent any given point in the polyhedron. In Fig. 12, we give the resulting algorithm to explicitly calculate the integral points spanning a given point \( h \in H \). from a valid proof certificate \( S_1, \ldots , S_n \). The algorithm scans the real line from \( t = 0 \) to \( t = 1 \) in an increasing fashion. Whenever a new rectangle becomes active, a new integral point arises. The set L in Step 2 collects all points \( t \in [0, 1) \), at which this happens.

The above example also shows that the Zuckerberg approach can be used to consider arbitrary right-hand sides. A further example for this concept is given in the online supplement [7]. Note that the precise makeup of the proof determines whether we obtain a representation of h as a convex combination of vertices or, alternatively, as a convex combination of other integral points.

6 Extensions of Zuckerberg’s method for graphs of functions

In [11], Zuckerberg’s method was adapted to characterize the convex hull of the graphs of certain bilinear functions defined over the unit cube. Using our extended framework for convex-hull proofs from Sect. 4, we will generalize these results in a twofold manner. Firstly, we extended the machinery introduced there for bilinear functions to general boolean functions. This allows us to treat common functions like the \(\max \)-function. In addition, we generalize the applicability of Zuckerberg’s method to non-box domains, such that it works with functions defined over any 0/1-polytope. Secondly, we will derive a criterion to prove convex-hull results for the convex hull of graphs of bilinear functions over general polytopal domains.

6.1 Extension for boolean functions over 0/1-polytopes

Let \( {\mathcal {F}}\subseteq \mathbbm {R}^n \) be a finite set of points, and let \( T :={{\,\mathrm{conv}\,}}({\mathcal {F}}) \) be their convex hull. We will now consider functions \( f:{\mathcal {F}}\rightarrow \mathbbm {R}\) of the form

$$\begin{aligned} f(x) = \sum _{i = 1}^k a_i \varPsi _i(x_1, \ldots x_n), \end{aligned}$$

with \( \varPsi _i :{\mathcal {F}} \rightarrow \mathbbm {R} \) and \( a_i \in \mathbbm {R}\) for \( i \in [k] \). The convex hull of the graph of f is the set

$$\begin{aligned} X(f) :={{\,\mathrm{conv}\,}} \left\{ \, (x, z) \in T \times \mathbbm {R} \ \bigg \vert \ z = f(x) \, \right\} . \end{aligned}$$

Further, let the two functions \( {{\,\mathrm{vex}\,}}[f]:T \rightarrow \mathbbm {R}\) and \( {{\,\mathrm{cav}\,}}[f]:T \rightarrow \mathbbm {R}\), denoting the convex and the concave envelope of f over T, respectively, be defined as

$$\begin{aligned} {{\,\mathrm{vex}\,}}[f](x)&:=\min \left\{ \, z \in \mathbbm {R} \ \bigg \vert \ (x, z) \in X(f) \, \right\} ,\\ {{\,\mathrm{cav}\,}}[f](x)&:=\max \left\{ \, z \in \mathbbm {R} \ \bigg \vert \ (x, z) \in X(f) \, \right\} , \end{aligned}$$

so that we have

$$\begin{aligned} X(f) = \left\{ \, (x, z) \in T \times \mathbbm {R} \ \bigg \vert \ {{\,\mathrm{vex}\,}}[f](x) \le z \le {{\,\mathrm{cav}\,}}[f](x) \, \right\} . \end{aligned}$$

Introducing variables \( y_i \) to represent the products \( \varPsi _i(x_1, \ldots x_n) \), we are interested in describing X(f) in terms of the x- and y-variables. To be more precise, we define a function \( \pi [f]:\mathbbm {R}^n \times \mathbbm {R}^k \rightarrow \mathbbm {R}^{n + 1} \) via

$$\begin{aligned} \pi [f](x, y) = \left( x, \sum _{i = 1}^k a_i y_i\right) \end{aligned}$$

and extend it to the power set of \( \mathbbm {R}^n \times \mathbbm {R}^k \) in a canonical fashion:

$$\begin{aligned} \pi [f](P) = \left\{ \, \pi [f](x, y) \ \bigg \vert \ (x, y) \in P \, \right\} \end{aligned}$$

for every \( P \subseteq \mathbbm {R}^n \times \mathbbm {R}^k \). For a polytope P, let the functions \( {{\,\mathrm{LB}\,}}_{P}[f]:T \rightarrow \mathbbm {R}\) and \( {{\,\mathrm{UB}\,}}_{P}[f]:T \rightarrow \mathbbm {R}\) be defined as

$$\begin{aligned} {{\,\mathrm{LB}\,}}_{P}[f](x)&= \min \left\{ \, \sum _{i = 1}^k a_i y_i \ \bigg \vert \ (x, y) \in P \, \right\} = \min \left\{ \, z \in \mathbbm {R} \ \bigg \vert \ (x, z) \in \pi [f](P) \, \right\} ,\\ {{\,\mathrm{UB}\,}}_{P}[f](x)&= \max \left\{ \, \sum _{i = 1}^k a_i y_i \ \bigg \vert \ (x, y) \in P \, \right\} = \max \left\{ \, z \in \mathbbm {R} \ \bigg \vert \ (x, z) \in \pi [f](P) \, \right\} , \end{aligned}$$

respectively, so that

$$\begin{aligned} \pi [f](P) = \left\{ \, (x, z) \in T \times \mathbbm {R} \ \bigg \vert \ {{\,\mathrm{LB}\,}}_{P}[f](x) \le z \le {{\,\mathrm{UB}\,}}_{P}[f](x) \, \right\} . \end{aligned}$$

The goal is to give a criterion which allows to prove \( X(f) = \pi [f](P) \) for some given function f and polytope P, which is equivalent to \( {{\,\mathrm{vex}\,}}[f](x) = {{\,\mathrm{LB}\,}}_{P}[f](x) \) and \( {{\,\mathrm{cav}\,}}[f](x) = {{\,\mathrm{UB}\,}}_{P}[f](x) \) for all \( x \in T \). To this end, we define the set

$$\begin{aligned} Z(x) := \left\{ \, (S_1, \ldots , S_n) \in {\mathcal {L}}^n \ \bigg \vert \ \begin{aligned} \mu (S_i) = x_i \quad \forall i \in [n],\\ \varphi (t, S_1, \ldots , S_n) \in {\mathcal {F}}\quad \forall t \in U \end{aligned} \, \right\} . \end{aligned}$$

It contains all tuples of admissible sets \( S_1, \ldots , S_n \) which express some point \( x \in [0, 1]^n \) via the vertices of \( {\mathcal {F}}\) using Zuckerberg’s certificate. Finally, let the function \( \Omega :{\mathcal {L}}^n \times \mathbbm {R}^{{\mathcal {F}}} \rightarrow U \),

$$\begin{aligned} \Omega (S_1, \ldots , S_n, \varPsi ) :=\mu \left( \left\{ \, t \in U \ \bigg \vert \ \varPsi (\varphi (t, S_1, \ldots , S_n)) = 1 \, \right\} \right) \end{aligned}$$

measure the size of the support of \( \varPsi \circ \varphi \) for some \( \varPhi :{\mathcal {F}}\rightarrow \mathbbm {R}\) and some fixed \( (S_1, \ldots S_n) \in {\mathcal {L}}^n \). The proof of \( \pi [f](P) = X(f) \) can be split up into \( \pi [f](P) \subseteq X(f) \) and \( X(f) \subseteq \pi [f](P) \). The first inclusion is often comparably easy to prove, and for the validity of the second inclusion we give the following criterion.

Theorem 11

If \( {\mathcal {F}}\subseteq \{0, 1\}^n \) and \( f = \sum _{i = 1}^k a_i \varPsi _i \), with \( \varPsi _i :\{0, 1\}^n \rightarrow \{0, 1\} \) and \( a_i \in \mathbbm {R}\) for \( i \in [k] \), we have

$$\begin{aligned} {{\,\mathrm{vex}\,}}[f](x)&= \min \left\{ \, \sum _{i \in [k]} a_i \Omega (S_1, \ldots , S_n, \varPsi _i) \ \bigg \vert \ (S_1, \ldots , S_n) \in Z(x) \, \right\} ,\\ {{\,\mathrm{cav}\,}}[f](x)&= \max \left\{ \, \sum _{i \in [k]} a_i \Omega (S_1, \ldots , S_n, \varPsi _i) \ \bigg \vert \ (S_1, \ldots , S_n) \in Z(x) \, \right\} \end{aligned}$$

for all \( x \in T \). In particular, for a polytope \( P \subseteq \mathbbm {R}^{n + k} \) with \( \pi [f](P) \subseteq X(f) \) we have \( \pi [f](P) = X(f) \) iff for every \( x \in T \) there are sets \( (S_1, \ldots , S_n) \in Z(x) \) and \( (S'_1, \ldots , S'_n) \in Z(x) \) with

$$\begin{aligned} \sum _{i \in [k]} a_i \Omega (S_1, \ldots , S_n, \varPsi _i)&= {{\,\mathrm{LB}\,}}_P[f](x),\\ \sum _{i \in [k]} a_i \Omega (S'_1, \ldots , S'_n, \varPsi _i)&= {{\,\mathrm{UB}\,}}_P[f](x). \end{aligned}$$

Theorem 11 gives us a Zuckerberg-type characterization of \( {{\,\mathrm{vex}\,}}[f] \) and \( {{\,\mathrm{cav}\,}}[f] \). To apply it, we need to design for a general point \( x \in T \) the sets \( S_1, \ldots , S_n \in Z(x) \) such that \( \sum _{i \in [k]} a_i \Omega (S_1, \ldots , S_n, \varPsi _i) \) is minimized and \( S'_1, \ldots , S'_n \in Z(x) \) such that \( \sum _{i \in [k]} a_i \Omega (S'_1, \ldots , S'_n, \varPsi _i) \) is maximized. The proof of Theorem 11 is given in the online supplement [7].

The expression \( \Omega (S_1, \ldots , S_n, \varPsi ) \) can be made more tractable when some specific functions \( \varPsi \) is given. Consider, for instance, \( \varPsi (x_1, x_2) = x_1 x_2 \). Then we can simplify:

$$\begin{aligned} \Omega (S_1, S_2, \varPsi ) = \mu ( \left\{ \, t \in U \ \bigg \vert \ \phi (t, S_1)\phi (t, S_2) = 1 \, \right\} ) = \mu (S_1 \cap S_2). \end{aligned}$$

We exemplarily give similar representations for \( \Omega (S_1, \ldots , S_n, \varPsi ) \) for some further Boolean functions in Table 2.

Table 2 Simplifications of \( \Omega (S_1, \ldots , S_n, \varPsi ) \) for specific boolean \( \varPsi \) functions

A specialization of Theorem 11 for the case \( {\mathcal {F}}= \{0, 1\}^n \) and \( f(x) = \sum _{1 \le i < j \le n} a_{ij} x_i x_j \) was proved in [11] using the above simplification. In the following, we will demonstrate how to use Theorem 11 to give convex-hull proofs for more general domains and functions.

6.1.1 Convex-hull proofs for polytopal domain

Generalizing an example given in [11] for unit-box domains, we show here how to characterize the McCormick-relaxation of the product of two binary variables over a non-box binary polytope. Let \( {\mathcal {F}}:=\{x \in \{0, 1\}^2 \mid x_1 + x_2 \ge 1\} \) and \( f :{\mathcal {F}} \rightarrow \{0, 1\} ,\, f(x_1, x_2) = x_1 x_2 \), and let

$$\begin{aligned} P := \left\{ \, (x_1, x_2, z) \in [0, 1]^3 \ \bigg \vert \ z \le x_1,\, z \le x_2,\, z \ge x_1 + x_2 - 1,\, x_1 + x_2 \ge 1 \, \right\} . \end{aligned}$$

The direction \( \pi [f](P) \subseteq X(f) \) can easily be verified by checking if the extreme points of X(f) , namely (0, 1, 0), (1, 0, 0) and (1, 1, 1) , are feasible for P. For the reverse direction, we plug in the simplification of \( \Omega \) for f given in Table 2 into Theorem 11. We deduce that we need to find two sets \( S_1 \) and \( S_2 \) which fulfil

$$\begin{aligned} \mu (S_1 \cap S_2) \le \min \{x_1,x_2\} . \end{aligned}$$

It follows \( {{\,\mathrm{cav}\,}}[f](x) \le \min \{x_1, x_2\} \), and with \( S_1 :=[0, x_1) \) and \( S_2 :=[0, x_2) \) we see that this bound is attained for all \( x \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \). Therefore, the concave envelope of f is given by the inequalities \( z \le x_1 \) and \( z \le x_2 \). Similarly,

$$\begin{aligned} \mu (S_1 \cap S_2) \ge x_1 + x_2 - 1 \end{aligned}$$

leads to the bound \( {{\,\mathrm{vex}\,}}[f](x) \ge x_1 + x_2 - 1 \), and with \( S_1 :=[0, x_1) \) and \( S_2 :=[1 - x_2, 1) \) it is attained for all \( x \in {{\,\mathrm{conv}\,}}({\mathcal {F}}) \). Thus, the convex envelope of f is given by \( z \ge x_1 + x_2 - 1 \). Finally, the constraint \( x_1 + x_2 \ge 1 \) is needed for the initial restriction of the domain.

6.1.2 Convex-hull proofs for general functions

Now we present an example for the \( \max \)-function, which shows that our framework is applicable to more general functions than the bilinear functions studied in [11]. We consider the case \( f :\{0, 1\}^n \rightarrow \{0, 1\} ,\, f(x) = \max (x_1, \ldots , x_n) \) and

$$\begin{aligned} P := \left\{ \, (x_1, \ldots , x_n, z) \in [0, 1]^{n + 1} \ \bigg \vert \ z \le x_1 + \cdots + x_n,\, z \le 1,\, z \ge x_i\, \forall i \in [n] \, \right\} . \end{aligned}$$

As in the previous example, it is straightforward to verify \( \pi [f](P) \subseteq X(f) \). For the converse, we use the simplification for f given in Table 2 to see that the sets \( S_1, \ldots , S_n \) need to fulfil

$$\begin{aligned} \mu (S_1 \cup \cdots \cup S_n) \le \min \left\{ x_1 + \cdots + x_n, 1\right\} \end{aligned}$$

in order to satisfy Theorem 11. Consequently, we have \( {{\,\mathrm{cav}\,}}[f](x) \le \min \{x_1 + \cdots + x_n, 1\} \). With \( S_1 :=[0, x_1) \) and \( S_i :=[x_{i - 1}, x_{i - 1} + x_i) \mod 1 \) for \( i \in \{2, \ldots , n\} \), this bound is attained for all \( x \in [0, 1]^n \). Therefore, the concave envelope of f is given by \( z \le x_1 + \cdots + x_n \) and \( z \le 1 \). Furthermore, from

$$\begin{aligned} \mu (S_1 \cup \cdots \cup S_n) \ge \max \{x_1, \ldots , x_n\} \end{aligned}$$

we obtain the bound \( {{\,\mathrm{vex}\,}}[f](x) \ge \max \{x_1, \ldots , x_n\} \), and setting \( S_i :=[0, x_i) \) for \( i \in \{1, \ldots , n\} \) makes it tight for all \( x \in [0, 1]^n \). Thus, the convex envelope of f is given by \( z \ge x_i \) for \( i \in \{1, \ldots , n\} \).

6.2 Extension for bilinear functions over general polytopes

Finally, we derive a generalization of the results from [11] which allows us to compute the convex hull of the graph of a bilinear function over a general polytopal domain T. We start by defining the set

$$\begin{aligned} {\bar{Z}} (x) := \left\{ \, (S_1, \ldots , S_n) \in ( {\bar{{\mathcal {L}}}}^{U} )^n \ \bigg \vert \ \begin{aligned} {\bar{\mu }}^{U} (S_i) = x_i \text { for all } i \in [n],\\ {\bar{\varphi }}^{U}(t, S_1, \ldots , S_n) \in {\mathcal {F}}\, \forall t \in U \end{aligned} \, \right\} , \end{aligned}$$

which contains all admissible sets \( S_1, \ldots , S_n \) for Theorem 11. Furthermore, let the two functions \( {\bar{\psi }}_-, {\bar{\psi }}_+ :P \rightarrow \mathbbm {R}\) with

$$\begin{aligned} {\bar{\psi }}_-(x)&= \min \left\{ \, \sum _{\xi \in {\mathcal {F}}} {\bar{\mu }}^{U} ({\bar{L}}^{U}_\xi (S_1, \ldots , S_n)) f(\xi ) \ \bigg \vert \ (S_1, \ldots , S_n) \in {\bar{Z}} (x) \, \right\} ,\\ {\bar{\psi }}_+(x)&= \max \left\{ \, \sum _{\xi \in {\mathcal {F}}} {\bar{\mu }}^{U} ({\bar{L}}^{U}_\xi (S_1, \ldots , S_n)) f(\xi ) \ \bigg \vert \ (S_1, \ldots , S_n) \in {\bar{Z}} (x) \, \right\} \end{aligned}$$

encode the convex and concave envelope, respectively, in a Zuckerberg fashion. We now derive an auxiliary representation of X(f) in terms of these two functions.

Lemma 3

For every function \( f:{\mathcal {F}}\rightarrow \mathbbm {R}\), we have

$$\begin{aligned} X(f) = \left\{ \, (x, z) \in T \times \mathbbm {R} \ \bigg \vert \ {\bar{\psi }}_-(x) \le z \le {\bar{\psi }}_+(x) \, \right\} . \end{aligned}$$

Proof

First, assume \( (x, z) \in X(f) \). This means

$$\begin{aligned} (x, z) = \sum _{k = 1}^{|{\mathcal {F}} |} \lambda _k (\xi ^k, f(\xi ^k)) \end{aligned}$$

for some \( \lambda _k \ge 0 \) for \( k = 1, \ldots , |{\mathcal {F}} | \) with \( \sum _{k = 1}^{|{\mathcal {F}} |} \lambda _k = 1 \) and a fixed ordering \( \xi ^1, \ldots , \xi ^{|{\mathcal {F}} |} \) of \( {\mathcal {F}}\). The sets \( S_i \in {\bar{{\mathcal {L}}}}^{U} \) with \( {\bar{\mu }}^{U} (S_i) = x_i \) are defined exactly as in the proof of Theorem 6: for the partition \( U = I_1 \cup \cdots \cup I_{|{\mathcal {F}} |} \) with \( I_1 = [0, \lambda _1) \) and \( I_k = [\lambda _1 + \cdots + \lambda _{k - 1}, \lambda _1 + \cdots + \lambda _k) \) for \( k \in \{2, \ldots , |{\mathcal {F}} |\} \), we set

$$\begin{aligned} S_i = \bigcup _{k:\, \xi ^k_i \ne 0}(I_k, \xi ^k_i). \end{aligned}$$

For every \( k = 1, \ldots , |{\mathcal {F}} | \), we have \( {\bar{{\mathcal {L}}}}^{U} _{\xi ^k}(S_1,\ldots , S_n) = I_k \), and consequently \( {\bar{\mu }}^{U} ( {\bar{{\mathcal {L}}}}^{U} _{\xi ^k}(S_1, \ldots , S_n)) = \lambda _k \). With

$$\begin{aligned} z = \sum _{k = 1}^{|{\mathcal {F}} |} \lambda _k f(\xi ^k) = \sum _{k = 1}^{|{\mathcal {F}} |} {\bar{\mu }}^{U} ( {\bar{{\mathcal {L}}}}^{U} _{\xi ^k}(S_1, \ldots , S_n)) f(\xi ^k), \end{aligned}$$

it follows that \( {\bar{\psi }}_-(x) \le z \le {\bar{\psi }}_+(x) \).

For the converse, assume \( {\bar{\psi }}_-(x) \le z \le {\bar{\psi }}_+(x) \) and let \( (S_1, \ldots , S_n) \) and \( (S'_1, \ldots , S'_n) \) be optimizers for the minima and maxima defining \( {\bar{\psi }}_-(x) \) and \( {\bar{\psi }}_+(x) \) respectively. We write \( z = t{\bar{\psi }}_-(x) + (1 - t) {\bar{\psi }}_+(x) \) for some \( t \in [0, 1] \) and set

$$\begin{aligned} \lambda (\xi ) :=t {\bar{\mu }}^{U} ( {\bar{{\mathcal {L}}}}^{U} _{\xi }(S_1, \ldots , S_n)) + (1 - t) {\bar{\mu }}^{U} ( {\bar{{\mathcal {L}}}}^{U} _{\xi }(S'_1, \ldots , S'_n)) \end{aligned}$$

for all \( \xi \in {\mathcal {F}}\). This yields a representation of (xz) as the convex combination

$$\begin{aligned} (x, z) = \sum _{\xi \in {\mathcal {F}}} \lambda (\xi )(\xi , \psi (\xi )). \end{aligned}$$

\(\square \)

With Lemma 3, we can express the convex hull of the graph of a function via functions defined over Zuckerberg sets. We will now make these abstract expressions more concrete for the case of bilinear functions. For this purpose, we consider an arbitrary bilinear function

$$\begin{aligned} f :{\mathcal {F}} \rightarrow \mathbbm {R} , \quad f(x) = \sum _{ij \in E}^k a_{ij} x_i x_j \end{aligned}$$
(34)

with coefficients \( a_{ij} \in \mathbbm {R}\) and a subset \( E \subseteq \{(i, j) \in \mathbbm {N}^2 \mid 1 \le i < j \le n \} \). Furthermore, we define the function \( M :( {\bar{{\mathcal {L}}}}^{U} )^2 \rightarrow \mathbbm {R} \) in Fig. 13, which measures a kind of generalized overlap between two sets \( S_1, S_2 \in {\bar{{\mathcal {L}}}}^{U} \).

Fig. 13
figure 13

The function \( M :( {\bar{{\mathcal {L}}}}^{U} )^2 \rightarrow \mathbbm {R} \)

For the particular case of a bilinear function f, Lemma 3 yields the following characterizations of \( {{\,\mathrm{cav}\,}}[f](x) \) and \( {{\,\mathrm{vex}\,}}[f](x) \).

Theorem 12

For the bilinear function f of the form (34), we have

$$\begin{aligned} {{\,\mathrm{vex}\,}}[f](x)&= \min \left\{ \, \sum _{ij \in E} a_{ij} M(S_i, S_j) \ \bigg \vert \ (S_1, \ldots , S_n) \in {\bar{Z}} (x) \, \right\} ,\\ {{\,\mathrm{cav}\,}}[f](x)&= \max \left\{ \, \sum _{ij \in E} a_{ij} M(S_i, S_j) \ \bigg \vert \ (S_1, \ldots , S_n) \in {\bar{Z}} (x) \, \right\} \end{aligned}$$

for all \( x \in T \). In particular, for a polytope \( P\subseteq \mathbbm {R}^{n(n + 1) / 2} \) with \( \pi [f](P) \subseteq X(f) \), we have \( \pi [f](P) = X(f) \) if and only if for every \( x \in T \) there are sets \( (S_1, \ldots , S_n) \in {\bar{Z}} (x) \) and \( (S'_1, \ldots , S'_n) \in {\bar{Z}} (x) \) with

$$\begin{aligned} \sum _{ij \in E} a_{ij} M(S_i, S_j)&= {{\,\mathrm{LB}\,}}_P[f](x),\\ \sum _{ij \in E} a_{ij} M(S'_i, S'_j)&= {{\,\mathrm{UB}\,}}_P[f](x). \end{aligned}$$

Proof

We observe that

$$\begin{aligned} \sum _{\xi \in {\mathcal {F}}} {\bar{\mu }}^{U} ( {\bar{{\mathcal {L}}}}^{U} _{\xi }(S_1, \ldots , S_n)) f(\xi )&= \sum _{\xi \in {\mathcal {F}}} {\bar{\mu }}^{U} ( {\bar{{\mathcal {L}}}}^{U} _{\xi }(S_1, \ldots , S_n)) \sum _{ij \in E} a_{ij} \xi _i \xi _j\\&= \sum _{ij \in E} a_{ij} \sum _{\xi \in {\mathcal {F}}} {\bar{\mu }}^{U} ( {\bar{{\mathcal {L}}}}^{U} _\xi (S_1, \ldots , S_n)) \xi _i \xi _j\\&= \sum _{ij \in E} a_{ij} M(S_i, S_j).\\ \end{aligned}$$

\(\square \)

Theorem 12 allows us to give compact representations for \( {{\,\mathrm{vex}\,}}[f] \) and \( {{\,\mathrm{cav}\,}}[f] \) over polytopal domains. To do so, we need to design suitable sets \( S_1, \ldots , S_n \in {\bar{Z}} \) such that \( \sum _{ij \in E} a_{ij} M(S_i, S_j) \) is minimized and \( S'_1, \ldots , S'_n \in {\bar{Z}} \) such that \( \sum _{ij \in E} a_{ij} M(S'_i, S'_j) \) is maximized, both for an arbitrary point \( x \in \mathbbm {R}^n \).

6.2.1 Convex-hull proofs for non-0/1 domain

We can use Theorem 12 to prove again that X(f) for \( f(x_1, x_2) = x_1 x_2 \) is given by the McCormick-inequalities. However, we will now do this over the bounds \( 0 \le x_1 \le u_1 \) and \( 0 \le x_2 \le u_2 \) for \( u_1, u_2 \ge 0 \) instead of the unit bounds as in Sect. 2.1. Let

$$\begin{aligned} P := \left\{ \, (x_1, x_2, z) \in \mathbbm {R}^3 \ \bigg \vert \ z \le u_2 x_1, z \le u_1 x_2, z \ge 0, z \ge u_2 x_1 + u_1 x_2 - u_1 u_2 \, \right\} . \end{aligned}$$

The direction \( \pi [f](P) \subseteq X(f) \) can easily be verified.

For the reverse direction, we conclude from

$$\begin{aligned} M(S_1, S_2) \le u_1 u_2 \min \left\{ \frac{1}{u_1} x_1, \frac{1}{u_2} x_2\right\} \end{aligned}$$
(35)

that \( {{\,\mathrm{cav}\,}}[f](x) \le \{(1 / u_1) x_1, (1 / u_2) x_2\} \), and with \( S_1 = ([0, x_1/u_1), u_1) \), \( S_2=([0, x_2/u_2), u_2) \) this bound is attained for all \( x_1 \in [0, u_1] \), \( x_2 \in [0, u_2] \). Therefore, the concave envelope is given by \( z \le u_2 x_1 \) and \( z \le u_1 x_2 \). In a similar fashion, it follows from

$$\begin{aligned} M(S_1, S_2) \ge u_1 u_2 \max \left\{ 0, \frac{1}{u_1} x_1 + \frac{1}{u_2} x_2 - 1\right\} \end{aligned}$$
(36)

that \( {{\,\mathrm{vex}\,}}[f](x) \ge \max \{0, (1 / u_1) x_1 + (1 / u_2) x_2 - 1\} \). By choosing the sets \( S_1 = ([0, x_1 / u_1), u_1) \), \( S_2 = ([1 - x_2 / u_2, 1), u_2) \) we can show that this bound is attained for all \( x_1 \in [0, u_1] \), \( x_2 \in [0, u_2] \). Thus, the convex envelope is given by \( z \ge 0 \) and \( z \ge u_2 x_1 + u_1 x_2 - u_1 u_2 \).

7 Conclusion

We have presented a vastly simplified framework for Zuckerberg’s geometric proof technique for convex-hull results. By restating the method in terms of our notion of set characterizations, we were able to accomplish several benefits. Firstly, we have identified three major strategies one can pursue in Zuckerberg-type convex-hull proofs. This underlines the high flexibility in devising algorithmic schemes the method offers. Secondly, we have significantly extended the expressive power of Zuckerberg’s technique by basing it on a different underlying subset algebra. It can now be used to characterize the convex hulls of general convex sets, including, but not limited to integer polyhedra. Using this extension, we give characterizations of the convex hull of Boolean and bilinear functions over polytopal domains. Finally, we have given a variety of indicative examples for the use of our framework with the intention to convey the ideas as hands-on as possible.

We find it a very interesting avenue for future research to develop further algorithmic strategies for Zuckerberg proofs and to extend the scope of those we have introduced. For example, one could not only consider linear programs but pass to (mixed-)integer ones when following the technique using feasibility subproblems. This might entail the consideration of minimally infeasible subsystems to verify the set characterizations.

Altogether, we make a strong case for the canonization of Zuckerberg’s proof technique in standard text books on integer programming and polyhedral combinatorics. We are certain it will enable many more interesting convex-hull results in the future.