1 Introduction

Modeling of optimization problems frequently involves representing functions that are piecewise, discontinuous or nonsmooth. This includes inherently piecewise economical and physical characteristics [5, 24, 29], construction of surrogate models by sampling of simulators [15, 32, 58], and approximate or exact representation of nonconvex functions [2, 10, 37, 39, 47]. In this paper, we study the problem of efficiently representing and solving optimization problems containing piecewise polynomial (PWP) constraints. Piecewise polynomials are used in a wide range of disciplines, including efficiency curve modeling in electric-power unit commitment [40], rigid motion systems [11], image processing and data compression [44, 51], probability density estimation [61], flow networks [5, 15, 25] and in optimal control [4, 41].

We consider optimization problems where either or both of the objective function and a subset of the constraints are piecewise polynomial functions. Each polynomial may be nonconvex, and the piecewise polynomial function itself lower semi-continuous. There exist few targeted optimization methods for this class of optimization problems, while some approaches that exploit special structures of nonsmooth optimization problems are applicable, subject to certain modification methods: Womersley and Fletcher [62] developed a descent method for solving composite nonsmooth problems consisting of a finite number of smooth functions. Conn and Mongeau [8] constructed a method based on non-differentiable penalty functions for solving discontinuous piecewise linear optimization problems, sketching an extension to problems with PWP constraints. Scholtes [47] developed an active-set method for dealing with nonlinear programs (NLPs) with underlying combinatorial structure in the constraints. Li [30] used a conjugated gradient method for minimizing an unconstrained, strictly convex, quadratic spline. None of these methods are currently available in standard optimization software.

From a broader perspective, applicable solution approaches to PWP optimization problems include methods based on general nonsmooth optimization, smoothing techniques and mixed integer programming (MIP). Bundle-type and subgradient methods [21], originally developed for nonsmooth convex optimization, may be applied to optimization problems with general nonsmooth structures such as PWPs through Clark’s generalized gradients [48]. These generalized methods for nonsmooth optimization are known to have poor convergence properties for nonconvex structures [47]. Smoothing techniques for nonsmooth functions encompass a variety of techniques, seeking to ensure sufficient smoothness for gradient-based methods [64]. Many of these methods are, however, designed for optimizing a nonsmooth function on a convex set, e.g [7, 38]. Meanwhile, smoothing techniques for discontinuities by means of step-function approximations (e.g. [64]) are known to be prone to numerical instabilities, particularly for increasing accuracies of the discontinuity [8, 60]. Exploitation of MIP for solving PWP optimization problems beyond complete approximative linearization [37] and direct solution as a nonconvex mixed integer nonlinear programming (MINLP) problem appears to be limited.

We adopt disjunctive representations of PWP constraints, drawing upon the extensive work on disjunctive programming (DP) formulations and representation of piecewise linear (PWL) functions [1, 39, 50, 54]. Modeling piecewise functions as disjunctions enables application of MIP techniques, or specialized branch-and-bound or branch-and-cut schemes with a set condition for representing the piecewise constraints [2, 28, 39]. While adopting MIP techniques and formulations for solving PWP-constrained optimization problems facilitates exploitation of advancements in global optimization solvers [35, 36, 57], careful constraint formulations are required to overcome the inherent problem complexity. To this end, polynomial spline formulations [49] such as the B-spline is an attractive approach. Polynomial splines are constructed from overlapping (piecewise) polynomials with local support, and embodies a versatile set of techniques for modeling PWPs with favorable smoothness and numerical properties. For decades, polynomial splines, which we simply refer to as splines in this paper, have played an important role in function approximation and geometric modeling. In particular, they have been popular as nonlinear basis functions in regression problems [12, 20], for example in kernel methods [22, 63], and in finite element methods [23]. Yet, few references [5, 15] apply splines within mathematical programming beyond the optimization of spline design parameters [45, 59], trajectory optimization [18, 34, 43], and optimization of piecewise linear splines [33].

The availability of spline-compatible optimization algorithms and codes is limited. In a recent work, [16] develop a spatial branch-and-bound (sBB) algorithm for global optimization of spline-constrained problems. While the algorithm was shown to be highly efficient, it has only support for a limited set of algebraic functions, is only available as a specialized code and requires software for spline generation [17]. To address the comparably high modeling and implementation effort required for using the specialized sBB algorithm of [14, 16] proposed an explicit constraint-formulation for continuous splines, yielding an ad-hoc mixed-integer quadratically constrained programming (MIQCP) model. In this paper, we build upon and significantly extend [14, 16] to construct a general-purpose framework for mathematical programming of piecewise polynomial constraints, subsuming spline constraints. The framework is based on an epigraph formulation and we show how it accommodates lower semi-continuous PWPs given in the monomial, Bernstein or B-spline basis. The extension to lower semi-continuous PWPs has not been explicitly covered in previous works. However, the epigraph formulation in [14] can be applied to lower semi-continuous PWPs written as B-splines.

The main advancement of our work from previous works is our representation of PWPs as a disjunction of polynomial pieces. This allows us to exploit the fact that all the polynomial pieces can be written as a linear combination of a single multivariate polynomial basis. This leads to formulations that are minimal in the number of nonlinear (non-convex) constraints. Furthermore, we exploit properties of the polynomial bases and the grid structure for bound tightening and derivation of Bernstein cuts. Exact reformulations of the DP models yield MINLP formulations, which we benchmark and compare with existing solution methods.

The remainder of the paper is organized as follows. In Sects. 2 and 3, we present background theory of Bernstein polynomials, piecewise polynomials and polynomial splines. In Sect. 4, we present DP formulations of PWPs which we in Sect. 5 reformulate to MINLP models. In Sect. 6, we present computational results of the proposed formulations, comparing the results with existing methods for optimizing PWP functions. Concluding remarks in Sect. 7 ends the paper.

2 Background on polynomial bases

This section provides background material on polynomial functions to cover the theory needed for developing an optimization framework for piecewise polynomial functions. The theory is presented as a series of propositions that summarize some classical results for polynomials; cf. [13, 31, 42]. For brevity, most propositions are given without rigorous proofs; each proposition may, however, be proved by simple algebraic manipulations. To further simplify the disposition, we have put some computational details in “Appendix A”.

We begin by introducing the monomial and Bernstein basis for polynomials in one variable. Several propositions are provided that ultimately enable computation of lower and upper bounds on any polynomial. These results are then extended to the multivariate case.

2.1 Univariate polynomials and the Bernstein basis

Let \(\mathbb {P}_{p}\) denote the vector space of polynomial functions in one real variable with degree less than or equal to \(p \in \mathbb {N}\), i.e.

$$\begin{aligned} \mathbb {P}_{p} = \text {span}\{M_{i}\}_{i=0}^{p}, \quad M_{i} : \mathbb {R} \rightarrow \mathbb {R}, \quad M_{i}(x) = x^i, \quad 0 \le i \le p. \end{aligned}$$
(1)

The set \(\{M_{i}\}_{i=0}^{p}\) is commonly referred to as the monomial basis or power basis of \(\mathbb {P}_{p}\). Any polynomial \(f \in \mathbb {P}_{p}\) can be written as

$$\begin{aligned} f(x) = \sum \limits _{i=0}^{p} a_i M_{i}(x) = \varvec{a}^{\textsf {T}} \varvec{M}_{p}(x), \end{aligned}$$
(2)

where the vector of coefficients \(\varvec{a} = [a_i]_{i=0}^{p} \in \mathbb {R}^{p+1}\), and the vector of monomial basis functions \(\varvec{M}_{p}(x) = [M_{i}(x)]_{i=0}^{p} \in \mathbb {R}^{p+1}\).

An alternative basis for \(\mathbb {P}_{p}\) is the Bernstein basis

$$\begin{aligned} B_{i,p} = \left( {\begin{array}{c}p\\ i\end{array}}\right) x^i(1-x)^{p-i}, \quad 0 \le i \le p. \end{aligned}$$
(3)

Since \(\text {span}\{B_{i,p}\}_{i=0}^{p} = \mathbb {P}_{p}\), any polynomial \(f \in \mathbb {P}_{p}\) may be expressed in the Bernstein basis as

$$\begin{aligned} f(x) = \sum \limits _{i=0}^{p} c_i B_{i,p}(x) = \varvec{c}^{\textsf {T}} \varvec{B}_{p}(x), \end{aligned}$$
(4)

where the vector of coefficients \(\varvec{c} = [c_i]_{i=0}^{p} \in \mathbb {R}^{p+1}\), and the vector of Bernstein basis functions \(\varvec{B}_{p}(x) = [B_{i, p}(x)]_{i=0}^{p} \in \mathbb {R}^{p+1}\).

The monomial and Bernstein basis are related via the linear mapping \(\varvec{M}_{p} = Q_p \varvec{B}_{p}\), where \(Q_p \in \mathbb {R}^{(p+1) \times (p+1)}\) is the transformation matrix given in “Appendix A.1”. Consequently, \(\varvec{a}^\textsf {T}\varvec{M}_p(x) = \varvec{c}^\textsf {T}\varvec{B}_p(x)\), given that \(\varvec{c} = Q_{p}^{\textsf {T}} \varvec{a}\).

The Bernstein polynomials possess several useful properties that facilitate the study of signs and bounds on polynomial functions. These properties, to be presented next, will later be utilized to devise bounds on PWPs.

Lemma 1

(Convex combination property of Bernstein polynomials) The following holds true for a set of degree p Bernstein polynomials \(\{B_{i,p}\}_{i=0}^{p}\):

$$\begin{aligned} \begin{aligned}&B_{i,p}(x) \ge 0,\quad \forall x \in [0, 1], \quad i = 0,\ldots ,p \\&\sum \limits _{i=0}^{p} B_{i,p}(x) = 1, \quad \forall x \in \mathbb {R} \end{aligned} \end{aligned}$$
(5)

Proof

The lemma is proved by applying Newton’s binomial identity to (3). \(\square \)

Proposition 1

(Bounds on Bernstein polynomials) Let \(f \in \mathbb {P}_{p}\) be a polynomial expressed in the Bernstein basis (4), and denote \(c^{L} = \min \{c_i\}_{i=0}^{p}\) and \(c^{U} = \max \{c_i\}_{i=0}^{p}\). Then, a valid bound on f is \(c^{L} \le f(x) \le c^{U} ~\forall x \in [0, 1]\).

Proof

From Lemma 1 it follows that

$$\begin{aligned} c^{L} = \sum \limits _{i=0}^{p} c^{L} B_{i,p}(x) \le \underbrace{\sum \limits _{i=0}^{p} c_i B_{i,p}(x)}_{f(x)} \le \sum \limits _{i=0}^{p} c^{U} B_{i,p}(x) = c^{U}. \end{aligned}$$

\(\square \)

Observe that Proposition 1 holds for \(x \in [0,1]\). To obtain a bounding box on a general domain \([x^L, x^U]\), we perform an affine change of variable.

Proposition 2

(Reparametrization of polynomial) Let \(f \in \mathbb {P}_{p}\) be a polynomial \(f(x) = \varvec{a}^{\textsf {T}} \varvec{M}_{p}(x)\), for \(x \in [x^L, x^U]\). Consider the affine change of variables \(x = (x^U - x^L)u + x^L\), with \(u \in [0, 1]\). The polynomial f can be reparametrized from x to u via the linear mapping \(\varvec{M}_{p}(x) = R_p \varvec{M}_{p}(u)\), where \(R_p \in \mathbb {R}^{(p+1) \times (p+1)}\) is the reparametrization matrix given in “Appendix A.2”.

Proof

Cf. [42]. \(\square \)

By combining Propositions 1 and 2, we obtain a bound for polynomials on a general domain \(x \in [x^L, x^U]\).

Proposition 3

(Polynomial bounds on general intervals) Let \(\varvec{a}\) be the coefficients of a polynomial \(f \in \mathbb {P}_{p}\) in the monomial basis. Furthermore, let \(\varvec{c} = (R_p Q_p)^{\textsf {T}} \varvec{a}\) be the coefficients

obtained by first reparametrizing the monomial basis from \(x \in [x^L, x^U]\) to \(u \in [0, 1]\), and then transforming the monomial basis to the Bernstein basis in u. Then,

$$\begin{aligned} c^{L} \le f(x) \le c^{U} \quad \forall x \in [x^L, x^U], \end{aligned}$$
(6)

where \(c^{L} = \min \{c_i\}_{i=0}^{p}\) and \(c^{U} = \max \{c_i\}_{i=0}^{p}\).

Proof

The result follows directly from Propositions 1 and 2. \(\square \)

2.2 Multivariate polynomials

A vector space of multivariate polynomials \(f : \mathbb {R}^d \rightarrow \mathbb {R}\) in the variables \(\varvec{x} = (x_1, \ldots , x_d) \in \mathbb {R}^d\), can be constructed by taking the tensor product of univariate polynomial bases. Specifically, we construct a multivariate polynomial basis as

$$\begin{aligned} \varvec{M}_{p}^{d}(\varvec{x}) = \bigotimes \limits _{j=1}^{d} \varvec{M}_{p}(x_j), \end{aligned}$$
(7)

where \(\varvec{M}_{p}(x_j) = [M_{i}(x_j)]_{i=0}^{p}\) is a vector of \(p+1\) monomial basis functions in the variable \(x_j\).

In (7), \(\varvec{M}_{p}^{d}\) is a vector of \(n = (p+1)^d\) polynomials of degree less than or equal to dp in d variables: i.e. each multivariate basis function results from d products of univariate polynomial basis functions of degree less than or equal to p. The basis spans a (tensor product) vector space of multivariate polynomials, denoted \(\mathbb {P}_{p}^d = \text {span}\{M_{i,p}^{d}\}_{i=0}^{n-1}\) with \(\dim (\mathbb {P}_{p}^d) = (p+1)^d\). The exponential growth in the number of basis functions with the number of variables d, is a phenomenon often referred to as the curse of dimensionality [20]. This phenomenon limits most practical applications of tensor product bases to 5-6 variables.

For notational brevity, we assume in the above construction that the polynomial basis and degree are equal for all variables. These assumptions can be removed without loss of generality as the multivariate basis may be constructed from any combination of univariate polynomial bases of varying degrees. Subsequently, we consider also the multivariate Bernstein basis for \(\mathbb {P}_{p}^d\), which we denote by \(\varvec{B}_{p}^{d}\).

Using the multivariate polynomial basis, we may express any \(f \in \mathbb {P}_{p}^d\) as

$$\begin{aligned} f(\varvec{x}) = \sum \limits _{i = 0}^{n-1} a_{i} M_{i,p}^{d}(\varvec{x}) = \varvec{a}^{\textsf {T}} \varvec{M}_{p}^{d}(\varvec{x}). \end{aligned}$$
(8)

The important property of the bounding box in Proposition (1) naturally extends to the multivariate case.

Proposition 4

(Multivariate polynomial bounds on the unit cube) Let \(f \in \mathbb {P}_{p}^d\) be a polynomial expressed in the multivariate Bernstein basis

$$\begin{aligned} f(\varvec{x}) = \varvec{c}^{\textsf {T}} \varvec{B}_{p}^{d}(\varvec{x}) = \varvec{c}^{\textsf {T}} \bigotimes \limits _{j=1}^{d} \varvec{B}_{p}(x_j). \end{aligned}$$
(9)

Then, \(c^{L} \le f(\varvec{x}) \le c^{U} ~\forall \varvec{x} \in [0, 1]^d\), where \(c^{L} = \min \{c_i\}_{i=0}^{n-1}\) and \(c^{U} = \max \{c_i\}_{i=0}^{n-1}\).

Proof

For any \(1 \le r \le d\), let

$$\begin{aligned} \varvec{B}_{p}^{(d,-r)}(\varvec{x}) = \bigotimes \limits _{j=1, j \ne r}^{d} \varvec{B}_{p}(x_j). \end{aligned}$$
(10)

Then, given \(n = m^d = (p+1)^d\), it follows from Lemma 1 that

$$\begin{aligned} \begin{aligned} \sum \limits _{i=0}^{m^d - 1} B_{i,p}^{d}(\varvec{x})&= \left( \sum \limits _{i=0}^{p} B_{i,p}(x_r) \right) \left( \sum \limits _{i=0}^{m^{(d-1)} - 1} B_{i,p}^{(d,-r)}(\varvec{x}) \right) \\&= \sum \limits _{i=0}^{m^{(d-1)} - 1} B_{i,p}^{(d,-r)}(\varvec{x}). \end{aligned} \end{aligned}$$
(11)

The above relation implies that

$$\begin{aligned} \sum \limits _{i=0}^{n - 1} B_{i,p}^{d}(\varvec{x}) = 1. \end{aligned}$$
(12)

The identity in (12), combined with Lemma 1, ensures that for all \(\varvec{x} \in [0,1]^d\)

$$\begin{aligned} c^L \le c^L \sum \limits _{i=0}^{n-1} B_{i,p}^{d}(\varvec{x}) \le \underbrace{\sum \limits _{i=0}^{n-1} c_i B_{i,p}^{d}(\varvec{x})}_{f(\varvec{x})} \le c^U \sum \limits _{i=0}^{n-1} B_{i,p}^{d}(\varvec{x}) \le c^U, \end{aligned}$$
(13)

which proves the proposition. \(\square \)

Proposition 4 provides bounds on a polynomial expressed in the multivariate Bernstein basis for \(\varvec{x}\) constrained to the unit cube \([0, 1]^d\). Analogous to the univariate case, we obtain bounds for general domains by reparametrizing the basis.

Proposition 5

(Multivariate polynomial bounds on general domains) Let \(f \in \mathbb {P}_{p}^d\) be a polynomial expressed in the multivariate monomial basis

$$\begin{aligned} f(\varvec{x}) = \varvec{a}^{\textsf {T}} \bigotimes \limits _{j=1}^{d} \varvec{M}_{p}(x_j), \end{aligned}$$
(14)

for \(\varvec{x} \in [x_{1}^{L}, x_{1}^{U}] \times \cdots \times [x_{d}^{L}, x_{d}^{U}] = [\varvec{x}^L, \varvec{x}^U]\). For each variable \(x_j\), let \(R_{p,j}\) be the reparametrization matrix that reparametrizes the monomial basis from \(x_j \in [x_{j}^{L}, x_{j}^{U}]\) to \(u_j \in [0, 1]\), computed according to (33) in “Appendix A.2”. Furthermore, let \(Q_p\) be the transformation matrix computed as in (32) in “Appendix A.1”. Then, f may be mapped to the multivariate Bernstein basis as follows:

$$\begin{aligned} \begin{aligned} f(\varvec{u})&= \varvec{a}^{\textsf {T}} \bigotimes \limits _{j=1}^{d} R_{p,j} Q_p \varvec{B}_{p}(u_j) = \varvec{c}^{\textsf {T}} \bigotimes \limits _{j=1}^{d} \varvec{B}_{p}(u_j) \end{aligned} \end{aligned}$$
(15)

where

$$\begin{aligned} \varvec{c}^{\textsf {T}} = \varvec{a}^{\textsf {T}} \bigotimes \limits _{j=1}^{d} R_{p,j} Q_p. \end{aligned}$$
(16)

Finally, we may apply Proposition 4 to obtain the bounds

$$\begin{aligned} c^{L} \le f(\varvec{x}) \le c^{U} \quad \forall \varvec{x} \in [\varvec{x}^L, \varvec{x}^U], \end{aligned}$$
(17)

where \(c^{L} = \min \{c_i\}_{i=0}^{n-1}\) and \(c^{U} = \max \{c_i\}_{i=0}^{n-1}\).

Proof

The result follows directly from Proposition 4. \(\square \)

3 Piecewise polynomial functions

In this section, we describe the piecewise polynomial functions (PWPs) to which we first give a formal definition for the continuous case. We then depart from the continuity requirement in order to consider the more general case of lower semi-continuous PWPs. The definitions given below provide a framework for the development of the mathematical programming formulations in Sects. 4 and 5. The definitions are not novel, but help us highlight well-established connections between PWPs and splines; cf. [42, 49].

Definition 1

(Continuous piecewise polynomial function) Let \(D\in \mathbb {R}^d\) be a compact set. A function \(f : D\subset \mathbb {R}^d \rightarrow \mathbb {R}\) is a continuous piecewise polynomial if and only if there exists a finite family of polytopes \(\varPi \) such that \(D= \bigcup _{P\in \varPi } P\) and

$$\begin{aligned} f(\varvec{x}) = \left\{ f_{P}(\varvec{x}),\varvec{x} \in P,\forall P\in \varPi \right. , \end{aligned}$$
(18)

where \(f_{P} : P\subset \mathbb {R}^d \rightarrow \mathbb {R}\) and \(f_{P} \in \mathbb {P}_{p}^d\) for all \(P\in \varPi \), and some degree \(p \in \mathbb {N}_0\).

Note that the domain \(D\) does not need to be connected or convex. Continuity of f on \(D\) is ensured since \(f_{P_1}(\varvec{x}) = f_{P_2}(\varvec{x})\) if \(\varvec{x}\in P_1 \cap P_2\) for two adjacent polytopes \(P_1, P_2 \in \varPi \).

In the above definition, the polynomial pieces \(\{f_P\}_{P\in \varPi }\) are constructed from some basis that spans the space \(\mathbb {P}_{p}^d\). A special case occurs when \(d = 1\) and \(p = 1\), for which \(\{f_P\}_{P\in \varPi }\) are linear functions in one variable, and f a continuous piecewise linear function. Furthermore, for \(d > 1\) and \(p = 1\), f is a continuous piecewise multilinear function due to the tensor product construction of \(\mathbb {P}_{p}^d\). In general, the polynomial pieces are of degree less than or equal to \(dp \in \mathbb {N}_0\).

3.1 Polytopes on a rectilinear grid

Definition 1 of continuous PWPs does not prescribe the polytopes; they may for instance be given as the convex hull of a finite number of points, or as a system of linear inequalities. Some formulations for piecewise linear functions require the polytopes to be simplices resulting from a triangulation of \(D\) [55]. For most practical applications of PWPs, however, the polytopes are assumed to be n-orthotopes (hyperrectangles/boxes) arranged on an axis-aligned rectilinear grid.Footnote 1 In the rest of this paper, we will assume that the domain \(D\) is partitioned on such a rectilinear grid, for which the polytopes in \(\varPi \) are characterized as follows.

For \(i \in \{1,\ldots ,d\}\), let \(\pi ^{i} = \{\pi _{0}^{i}, \ldots , \pi _{m_{i}}^{i}\} \in \mathbb {R}\) denote a strictly monotonically increasing sequence of \(m_i\) real numbers, e.g. \(\pi _{0}^{i}< \pi _{1}^{i}< \cdots < \pi _{m_{i}}^{i}\). To index the polytopes we introduce the set \(K := \{(k_1, \ldots , k_d) : k_i \in \{1,\ldots , m_i\}, \forall i=1,\ldots ,d\}\). This lets us define the rectilinear grid \(G:= \{P_k : k \in K \}\), consisting of \(|G| = m_1 \cdots m_d\) boxes given by

$$\begin{aligned} P_k = \{ \varvec{x} \in \mathbb {R}^d : \pi _{k_i - 1}^{i} \le x_i \le \pi _{k_i}^{i}, \, \forall i \in \{1,\ldots ,d\} \}. \end{aligned}$$
(19)

The grid \(G\) is in compliance with Definition 1, since it is a family n-orthotopes \(P_k\), which are bounded polytopes. Subsequently, we consider PWPs partitioned on a rectilinear grid and write \(\varPi = G\). We will also denote with \(\mathbb {P}_{p}^d(G)\) the space of piecewise degree p polynomials with a partition of the domain \(D\) given by the rectilinear grid \(G\).

3.2 The epigraph of piecewise polynomials

A continuous PWP \(f : D\subset \mathbb {R}^n \rightarrow \mathbb {R}\) may be modeled by its epigraph \(\text {epi}(f) := \left\{ (\varvec{x},z) \in D\times \mathbb {R}: f(\varvec{x}) \le z \right\} \). We assume that \(D\) is a bounded domain and that f participates in a constraint of the form \(f(\varvec{x}) \le 0\) or in an objective function to be minimized. That is, the constraint \(f(\varvec{x}) \le 0\) can be modeled as \((\varvec{x}, z) \in \text {epi}(f)\), \(z \le 0\), and the objective f can be modeled as the minimization of z subject to \((\varvec{x},z) \in \text {epi}(f)\).

The epigraph of f can be expressed as the union of epigraphs of its pieces \(f_P\), i.e.

$$\begin{aligned} \text {epi}(f) = \bigcup \limits _{P\in \varPi } \text {epi}(f_P), \end{aligned}$$
(20)

where \(\text {epi}(f_P) := \left\{ (\varvec{x},z) \in P\times \mathbb {R}: f_P(\varvec{x}) \le z \right\} \). As illustrated by Fig. 1, the epigraph of a piecewise function is in general a nonconvex set. Note that \(\text {epi}(f_{P})\) is convex if and only if \(f_{P}\) is convex on \(P\). Furthermore, f may be nonconvex, even if \(f_{P}\) (and \(\text {epi}(f_{P})\)) is convex for all \(P\in \varPi \).

Fig. 1
figure 1

Continuous piecewise polynomials and their epigraph (colored grey). The five pieces of the piecewise polynomial are linear in a, quadratic in b, cubic in c, and quartic in d

The theory developed in [26, 27] shows that \(\text {epi}(f)\) can be modeled as a MILP if and only if f is piecewise linear and lower semi-continuous. Based on this theory, Vielma et al. [55, 56] derived new MILP models and presented a unifying framework for piecewise linear functions. To follow on these works, we continue by extending Definition 1 to handle lower semi-continuous piecewise polynomials.

3.3 Extension to lower semi-continuous piecewise polynomials

Below we provide a definition of PWPs that are not necessarily continuous. This allows us to analyze and integrate in the framework the subset of discontinuous PWPs that are lower semi-continuous. The extended definition lets us tie our PWP framework to the B-spline modeling framework, which facilitates construction of PWPs with predefined smoothness.

Before extending Definition 1 of PWPs, we consider the property of lower semi-continuity with some simple examples. Formally, a function f is lower semi-continuous if for any \(\varvec{x}_0 \in D\)

$$\begin{aligned} \underset{\varvec{x} \rightarrow \varvec{x}_0}{\lim \inf }\, f(\varvec{x}) \ge f(\varvec{x}_0). \end{aligned}$$
(21)

The importance of lower semi-continuity comes from the fact that the epigraph of a function is closed if and only if it is lower semi-continuous. It is hence a requirement for the epigraph model in Sect. 3.2. Since a continuous function is lower semi-continuous, the requirement always holds for continuous PWPs. To illustrate this property, consider the two PWPs in Fig. 2. The PWP in Fig. 2a is lower semi-continuous at \(x = 2\), but not at points \(x = 1\) and \(x = 3\). Thus, it is not a lower semi-continuous PWP and its epigraph is not closed. On the other hand, the PWP in Fig. 2b has one discontinuity (\(x = 2\)) at which it is lower semi-continuous, and its epigraph is closed.

Fig. 2
figure 2

Two discontinuous PWPs and their epigraph (colored grey). Both PWPs consist of five pieces defined on the intervals [0, 1), [1, 2), [2, 3), [3, 4), and [4, 5]; the open ends are marked with white-filled circles. The pieces are constant in a and linear in b

To allow discontinuities, [55] employed a characterization of the domain using copolytopes (sets defined by a finite set of strict and non-strict linear inequalities). Similarly, we use copolytopes to define not necessarily continuous PWPs as follows.

Definition 2

(Piecewise polynomial function) Let \(D\in \mathbb {R}^d\) be a compact set. A (not necessarily continuous) function \(f : D\subset \mathbb {R}^d \rightarrow \mathbb {R}\) is piecewise polynomial if and only if there exists a finite family of copolytopes \(\varPi \) such that \(D= \bigcup _{P\in \varPi } P\) and

$$\begin{aligned} f(\varvec{x}) = \left\{ f_{P}(\varvec{x}),\varvec{x} \in P,\forall P\in \varPi \right. , \end{aligned}$$
(22)

where \(f_{P} : P\subset \mathbb {R}^d \rightarrow \mathbb {R}\) and \(f_{P} \in \mathbb {P}_{p}^d\) for all \(P\in \varPi \), and some degree \(p \in \mathbb {N}_0\).

With a minor adjustment to the continuous case, we may express the epigraph of a function f defined according to Definition 2, as the union of epigraphs of its pieces \(f_P\). That is, we model

$$\begin{aligned} \text {epi}(f) = \bigcup \limits _{P\in \varPi } \text {epi}(f_P), \end{aligned}$$
(23)

where we now use \(\text {epi}(f_P) := \left\{ (\varvec{x},z) \in {\bar{P}} \times \mathbb {R}: f_P(\varvec{x}) \le z \right\} \) in which \({\bar{P}}\) is the closure of \(P\) (note that since \(f_P\in \mathbb {P}_{p}^{d}\) we may evaluate it at the right boundary of \({\bar{P}}\)). Recall that \(\text {epi}(f)\) is closed if and only if f is lower semi-continuous.

Similar to continuous PWPs, we consider a partition of the domain on a rectilinear grid \(G\). However, we now compose the grid of left half-closed boxes:

$$\begin{aligned} P_k = \{ \varvec{x} \in \mathbb {R}^d : \pi _{k_i - 1}^{i} \le x_i < \pi _{k_i}^{i}, \forall i \in \{1,\ldots ,d\} \}. \end{aligned}$$
(24)

Note that \(P_k\), being a left half-closed box, is a special type of copolytope.

A technicality arises with this partitioning in that the rightmost boundaries of \(D\) are open, and hence \(D\) is open, which breaks compatibility with Definition 2. To close these boundaries we must require that the rightmost boundaries of the rightmost boxes are closed; i.e. in (24) we must replace \(\pi _{k_i - 1}^{i} \le x_i < \pi _{k_i}^{i}\) with \(\pi _{k_i - 1}^{i} \le x_i \le \pi _{k_i}^{i}\) if \(k_i = m_i\). The addition of this requirement on the partitioning ensures that the domain is closed and hence compatible with Definition 2.

3.4 Relation to splines

With piecewise linear functions one is often concerned with \(\mathcal {C}^0\) continuity at intersections \(\varvec{x} \in P_1 \cap P_2\), where \(P_1, P_2 \in \varPi \). For PWPs in general, \(\mathcal {C}^1\), \(\mathcal {C}^2\), or higher-order continuity at the intersections is an obtainable and often desired property. Definition 1 only guarantees \(\mathcal {C}^0\) continuity, but does not exclude PWPs with higher order continuity.

Splines are PWPs constructed with continuity constraints. The most widely used framework for polynomial splines is the B-spline, in which PWPs with a desired degree of smoothness can be constructed by allowing basis functions to overlap. Consider the multivariate B-spline

$$\begin{aligned} f(\varvec{x}) = \varvec{c}^\textsf {T}\bigotimes \limits _{j=1}^{d} \varvec{N}_{p}(x_j; \varvec{t}_j) = \varvec{c}^\textsf {T}\varvec{N}_{p}^{d}(\varvec{x}; T), \end{aligned}$$
(25)

where \(\varvec{N}_{p}(x_j; \varvec{t}_j) = [N_{i,p}(x_j; \varvec{t}_j)]_{i=0}^{n_j-1}\) is a vector of \(n_j\) degree p B-spline basis functions in the variable \(x_j\), \(\varvec{c}\) are coefficients, and \(\varvec{t}_j\) is a non-decreasing sequence of reals called a knot sequence. The B-spline basis functions in a variable \(x_j\), \(\varvec{N}_{p}(x_j; \varvec{t}_j)\), are polynomials with local support that are spliced together at the points specified by the knot sequence \(\varvec{t}_j\).

The multivariate B-spline basis is denoted \(\varvec{N}_{p}^{d}(\varvec{x}; T)\), where \(T = \{\varvec{t}_j\}_{j=1}^{d}\) is the set of knot sequences that parametrize the partition of the domain \(D\). The vector space of B-splines spanned by the above basis is denoted \(\mathbb {S}_{p}^{d}(T) = \mathbb {S}_{p}(\varvec{t}_1) \times \cdots \times \mathbb {S}_{p}(\varvec{t}_d)\), where \(\mathbb {S}_{p}(\varvec{t}_j) = \text {span}\{N_{i,p}(x_j; \varvec{t}_j)\}_{i=0}^{n_j-1}\).

The B-spline basis can viewed as an extension of the Bernstein basis, generalizing the description of a single polynomial on a continuous interval to piecewise polynomials over a partitioned domain, specified by the knot sequence [13]. It shares the non-negativity and partition-of-unity properties of the Bernstein basis, and is invariant to an affine change of variables. The close relationship is made clear by the Proposition 6 in “Appendix B”, which shows that Bernstein and B-spline bases are equivalent for a certain knot sequence.

Multivariate B-splines are defined on a rectilinear grid of left half-closed boxes aligned to the variable axes, due to its construction by the Kronecker product. The partitioning is thus the same as for the piecewise polynomials in Definition 2. The well-known equality \(\mathbb {S}_{p}^{d}(T) = \mathbb {P}_{p}^d(G)\), where \(G\) is a rectilinear grid, implies that any spline can be accommodated by the framework in Definition 2, given appropriate knot sequences T. This equality was first stated for the univariate case in the Curry-Schoenberg theorem [9]. In Lemma 2, “Appendix B”, we restate this relationship for the multivariate case using our notational framework. Since most PWPs are built as splines, e.g. using cubic spline interpolation or smoothing splines, this relationship holds a practical value and we will later use it to generate PWPs for the numerical study.

4 Disjunctive formulations for piecewise polynomial functions

In this section, we use disjunctions as a means of representing PWP constraints. Consider a piecewise polynomial \(f : D\rightarrow \mathbb {R}\), defined as in Definition 1 with a rectangular domain \(D= \{ \varvec{x} : \varvec{x}^L \le \varvec{x} \le \varvec{x}^U \} = \bigcup _{P\in \varPi } P\subset \mathbb {R}^d\). The epigraph of f, \(\text {epi}(f) = \{ (\varvec{x},z) \in D\times \mathbb {R} : f(\varvec{x}) \le z \}\), can be represented by the disjunction [27]:

figure a

where we have associated a boolean variable \(Y_{P} \in \{\text {True}, \text {False}\}\) with each polytope \(P\in \varPi \). Each disjunctive term is related to a polytope \(P\), so that \(\varvec{x} \in P\) and z is restricted to the epigraph \(\text {epi}(f_P)\) of a piece \(f_P\) of f. The disjunction thus models \(\text {epi}(f)\) as the union of epigraphs in (20). The exclusive OR operator on the boolean variables ensures that exactly one polytope \(P\in \varPi \) is selected.

DP-1 is also a valid model for lower semi-continuous PWPs according to Definition 2, if we replace in each term the domain constraint with \(\varvec{x} \in {\bar{P}}\), where \({\bar{P}} = \mathbf{cl} (P)\). In both cases, DP-1 is a proper disjunction [1, 53] in the sense that no single polytope covers the entire feasible region. Observe that Definition 1 ensures continuity in the overlap between the polytopes constituting the disjunction. In the derivations that follow we assume that f is a continuous PWP, but remark that the resulting formulations are also valid for lower semi-continuous PWPs subject to the mentioned domain-substitution.

The disjunction DP-1 contains a nonlinear, possibly nonconvex inequality for each term, thereby severely impeding the scalability of the formulation and hence its practical application. As a partial remedy, we may utilize that each polynomial \(f_{P}\) can be expressed as a linear combination of the basis functions spanning \(\mathbb {P}_{p}^{d}\), that is, \(f_{P} \in \mathbb {P}_{p}^{d}\), for all \(P\in \varPi \). This salient characteristic allows us to exploit that the basis functions are independent of \(P\), and hence can be extracted outside the disjunction as a common set of nonlinearities. The reformulation simplifies DP-1 to

figure b

where the polynomial piece \(f_{P}\) is expressed as a linear combination of the \(n = \dim (\mathbb {P}_{p}^d)\) multivariate monomial basis functions \(\varvec{\beta } = \varvec{M}_{p}^{d}\). We stress that even though it is always possible to substitute nonlinearities with new variables to obtain linear disjunctions as in DP-2, the benefit comes solely from having common basis functions and hence reducing the number of nonlinear constraints.

Generally, DP-2 requires \(n = \dim (\mathbb {P}_{p}^d) = (p + 1)^d\) polynomial constraints to model a PWP, invariant to the discretization of the domain. Consider a PWP defined on a rectilinear grid of \(|\varPi | = m^d\) boxes, resulting from a discretization with \(m \ge 1\) intervals in each of the d variables. DP-1 requires \(m^d\) polynomial constraints to model this PWP. Thus, when \(m > p+1\), then \(m^d > (p+1)^d\), and the formulation in DP-2 is likely preferable to DP-1. To summarize the above argument: modeling the polynomial function space \(\mathbb {P}_{p}^d\) via its n basis functions, as opposed to modeling each of the \(|\varPi |\) polynomial pieces separately, will generally result in fewer nonlinear constraints. Still, the exponential increase with d in the number of nonlinear constraints puts a practical limit on formulations derived from either DP-1 or DP-2.

Remark 1

Piecewise McCormick envelopes and other linear relaxations (e.g. [19]) for bilinear terms \(f_{P}(x) = x_1 x_2\), with \((x_1,x_2) \in P\), can be derived from DP-1. Such relaxations of DP-1 render linear approximations, whereas DP-2 is an exact formulation.

4.1 Bounds on polynomial constraints

The efficiency of numerical methods for solving DP-2 relies strongly on the ability to derive strong upper bounds on the constraints \(f_P= \varvec{a}_{P}^{\textsf {T}} \varvec{\beta } \le z\), for \(P\in \varPi \). Suppose that \(\varvec{a}_{P}^{\textsf {T}} \varvec{\beta } \in [m_{P}^{L}, m_{P}^{U}]\) and that \(z^{L} \le z\) for any \(\varvec{x} \in D\). We may then define \(M_{P}^{U} := m_{P}^{U} - z^{L}\) so that, for any \(\varvec{x} \in D\),

$$\begin{aligned} \varvec{a}_{P}^{\textsf {T}} \varvec{\beta } - z \le M_{P}^{U}. \end{aligned}$$
(26)

To obtain a valid upper bound \(M_{P}^{U}\), we must determine the values of \(m_{P}^{U}\) and \(z^{L}\). We observe that any feasible solution must satisfy \(\varvec{a}_{P}^{\textsf {T}} \varvec{\beta } \le z\) for some \(P\in \varPi \). Thus, a valid lower bound on z is \(z \ge z^L = \min \{m_P^L\}_{P\in \varPi }\), and we may rewrite the upper bound on the polynomial constraint to

$$\begin{aligned} M_{P}^{U} = m_{P}^{U} - \min \{m_P^L\}_{P\in \varPi }. \end{aligned}$$
(27)

It is then obvious that computing \(M_{P}^{U}\) for each \(P\in \varPi \) requires a lower and upper bound on all polynomials \(\{f_P\}_{P\in \varPi }\).

Returning to DP-2, there is a subtlety due to the substitution of the nonlinearities that impedes the derivation of tight bounds on the polynomials. The issue is that the bounds on \(f_{P}(\varvec{x}) = \varvec{a}_{P}^{\textsf {T}} \varvec{\beta } \in [m_{P}^{L}, m_{P}^{U}]\) must be valid for all \(\varvec{x} \in [\varvec{x}^{L}, \varvec{x}^{U}]\), not only for \(\varvec{x} \in P\). This poses numerical problems since the piecewise polynomials may become prohibitively large on \([\varvec{x}^{L}, \varvec{x}^{U}]\), resulting in undesirably large bounds.

To solve this issue and obtain tighter bounds on the polynomials \(\{f_{P}\}_{P\in \varPi }\), we utilize the procedure in Proposition 5 to perform a reparametrization before computing the polynomial bounds. Let \(\varvec{u} \in [\varvec{0}, \varvec{1}] \in \mathbb {R}^d\) and \(f_{P} = \varvec{a}_{P}^{\textsf {T}} \varvec{M}_{p}^{d}(\varvec{x}) = \varvec{c}_{P}^{\textsf {T}} \varvec{B}_{p}^{d}(\varvec{u})\), where the coefficients of the reparametrized polynomial in Bernstein form are given as

$$\begin{aligned} \varvec{c}_{P}^{\textsf {T}} = \varvec{a}_{P}^{\textsf {T}} \bigotimes \limits _{j=1}^{d} R_{p,j} Q_p. \end{aligned}$$

Using the multivariate Bernstein basis \(\varvec{B}_{p}^{d}(\varvec{u})\) for \(\varvec{0} \le \varvec{u} \le \varvec{1}\) enables reformulation of DP-2 to the disjunction

figure c

In DP-3, the prescription of the polytopes in (19) is enforced using \(\varvec{u}\). We have also bounded the Bernstein basis functions to be in [0, 1], in accordance with Lemma 1.

Within each term \(P\in \varPi \) in DP-3, the variables \(\varvec{x}\) and \(\varvec{u}\) are linearly dependent. The reformulation DP-3 enables computation of bounds on the reparametrized polynomials. In particular, by invoking Proposition 4 we obtain \(c_{P}^{L} \le \varvec{c}_{P}^{\textsf {T}}\varvec{\beta } \le c_{P}^{U}\), where \(c_{P}^{L} = \min \{c_{i, P}\}_{i=0}^{n-1}\) and \(c_{P}^{U} = \max \{c_{i, P}\}_{i=0}^{n-1}\). This allows us to set

$$\begin{aligned} M_{P}^{U} = c_{P}^{U} - \min \{c_P^L\}_{P\in \varPi }, \end{aligned}$$
(28)

and thereby obtain a valid upper bound \(\varvec{c}_{P}^{\textsf {T}} \varvec{\beta } - z \le M_{P}^{U}\) for all \(\varvec{u} \in [\varvec{0}, \varvec{1}]\) and \(P\in \varPi \).

While (28) provides an upper bound on \(\varvec{c}_{P}^{\textsf {T}} \varvec{\beta } - z\) for \(P\in \varPi \), it is global in the sense that it involves all the polynomial pieces via the lower bound \(\min \{c_P^L\}_{P\in \varPi }\) on z. “Local” bounds that concern only a single polynomial, can be obtained by introducing auxiliary variables \(\{z_P\}_{P\in \varPi }\) and reformulating to

figure d

With DP-BASIC, it is sufficient to derive an upper bound \(M_{P}^{U}\) so that \(\varvec{c}_{P}^{\textsf {T}} \varvec{\beta } - z_{P} \le M_{P}^{U}\). A bound is readily obtained as

$$\begin{aligned} M_{P}^{U} = c_{P}^{U} - \min \{0, c_{P}^{L}\}. \end{aligned}$$
(29)

Remark 2

The special case of an equality constraint \(\varvec{c}_{P}^{\textsf {T}} \varvec{\beta } = z_{P}\) can be handled by writing \(0 \le \varvec{c}_{P}^{\textsf {T}} \varvec{\beta } - z_{P} \le 0\). Using the same arguments as above we obtain the bounds \(M_{P}^{L} \le \varvec{c}_{P}^{\textsf {T}} \varvec{\beta } - z_{P} \le M_{P}^{U}\), where \(M_{P}^{L} = c_{P}^{L} - \max \{0, c_P^U\}\).

4.2 Exploiting the rectilinear grid structure

In the preceding formulations, a disjunction of \(|\varPi | = \prod \nolimits _{i=1}^{d} m_i\) terms is used to enforce the box constraints \(\varvec{x} \in P_k\), where \(P_k\) is given as in (19). This seems unnecessary since the grid structure can be modeled by d disjunctions of \(m_i\) terms (for \(i=1,\ldots ,d\)), as shown next.

We introduce a boolean variable \(Y_{j}^{i}\) for each interval \(\pi _{j-1}^{i} \le x_i \le \pi _{j}^i\), for \(j=1,\ldots ,m_i\), and \(i=1,\ldots ,d\). Then, by adding the conditions

$$\begin{aligned} \underset{j = 1,\ldots ,m_i}{\veebar } Y_{j}^{i}, \quad \forall i \in \{1,\ldots ,d\}, \end{aligned}$$
(30)

we ensure that each variable \(x_i\) is constrained to a single interval. With this modeling strategy, we modify DP-BASIC to arrive at the following formulation.

figure e

Above, the \(|\varPi |\) disjunctions for the polynomial pieces are conditioned on the expression \(\underset{i = 1,\ldots ,d}{\wedge } Y_{k_i}^{i}\). This condition will be True only for polytope \(P_k\), ensuring that the polynomial pieces are exclusively selected. Otherwise, \(z_{P}\) is set to zero.

4.3 Partition-of-unity cut

The DP-GRID formulation can be augmented with a partition-of-unity cut that may strengthen its relaxation. The cut is given by the identity for Bernstein polynomials in (12), i.e. \(\varvec{1}^{\textsf {T}} \varvec{\beta } = 1\), where \(\varvec{1}\) is vector of n ones. Adding this cut to DP-GRID yields a formulation which we name DP-CUT.

The partition-of-unity cut was introduced for B-splines in [14]. The same cut is applicable here, as the Bernstein polynomials are a special case of B-splines (see Sect. 3.4).

4.4 Expanded Bernstein basis

The preceding formulations do not utilize the tensor product structure of the multivariate basis \(\varvec{B}_{p}^{d}\); cf. (9). In these formulations, the multivariate basis is formed by the tensor product of all univariate bases, resulting in constraints with polynomials of degree dp. In the following formulation, the multivariate basis is expanded to exploit the inherent structure due to the tensor product. The expansion lets us add additional polyhedral cuts (as was done for the multivariate basis in Sect. 4.3) and reduce the maximum degree of any polynomial in the set of constraints to \(\max \{2, p\}\) (for \(p \ge 1\)). It was demonstrated in [14] that an expansion of the tensor product can yield better mathematical programming formulations for multivariate splines.

Specifically, each univariate Bernstein basis is assigned to the \(p+1\) auxiliary variables \(\varvec{\xi }^{i} = \varvec{B}_{p}(u_i)\) for \(i \in \{1,\ldots ,d\}\). To express the multivariate Bernstein basis, we introduce additional auxiliary variables \(\varvec{\beta }^{i}\) for \(i \in \{1,\ldots ,d\}\), and the constraints \(\varvec{\beta }^1 = \varvec{\xi }^1\) and \(\varvec{\beta }^{i+1} = \varvec{\xi }^{i+1} \otimes \varvec{\beta }^{i}\), \(\forall i \in \{1, \ldots , d-1\}\). The multivariate basis is then represented by \(\varvec{\beta }^d\).

The expansion permits nonnegativity bounds and partition-of-unity cuts on the univariate basis functions \(\varvec{\xi }^i\) and the intermediate basis functions \(\varvec{\beta }^i\). For \(i \in \{1, \ldots , d\}\), we add the constraints \(\varvec{\xi }^{i} \ge \varvec{0}\), \(\varvec{1}^{\textsf {T}} \varvec{\xi }^{i} = 1\), \(\varvec{\beta }^{i} \ge \varvec{0}\), and \(\varvec{1}^{\textsf {T}} \varvec{\beta }^{i} = 1\), where \(\varvec{0}\) and \(\varvec{1}\) are vectors of zeros and ones of appropriate sizes.

In addition, we include the upper bounds on the Bernstein polynomials given in “Appendix A.3”. Denote with \(\bar{\varvec{B}}_p\) the upper bounds on the Bernstein polynomials \(\varvec{B}_p\). We then include the bounds \(\varvec{\xi }^{i} \le \bar{\varvec{B}}_p\) for \(i \in \{1, \ldots , d\}\).

The resulting formulation, with the expanded Bernstein basis and polyhedral cuts, is given below.

figure f

DP-EXP uses \(d(p+1) + \sum \nolimits _{i=1}^{d}(p+1)^i\) auxiliary variables to represent the multivariate polynomial basis. The univariate basis functions are expressed by \(d(p+1)\) polynomial constraints of degree p, while \(\sum \nolimits _{i=1}^{d-1}(p+1)^{i+1}\) bilinear constraints are used to form the multivariate basis (not counting the \(p+1\) constraints \(\varvec{\beta }^1 = \varvec{\xi }^1\), which are linear). In addition, 2d linear partition-of-unity cuts are included.

4.5 Expanded monomial basis

With DP-EXP the maximum degree of the polynomial constraints were reduced to \(\max \{2, p\}\) (for \(p \ge 1\)). For general polynomial constraints, it is possible to achieve a maximum degree of 2 by further expansion of the basis. We achieve this by utilizing the monomial basis \(\varvec{\xi }^i\) in variable \(u_i\), setting \(\xi _0^i = 1\) and \(\xi _{j}^i = \xi _{j-1}^i u_i\) for \(j \in \{1,\ldots ,p\}\). We compute the multivariate basis \(\varvec{\beta }^d\) as in DP-EXP, and form the polynomial pieces as \(f_P= \varvec{\alpha }_{P}^{\textsf {T}} \varvec{\beta }^{d} = \varvec{\alpha }_{P}^{\textsf {T}} \varvec{M}_{p}^{d}(\varvec{u})\), where the coefficients are given as

$$\begin{aligned} \varvec{\alpha }_{P}^{\textsf {T}} = \varvec{a}_{P}^{\textsf {T}} \bigotimes \limits _{j=1}^{d} R_{p,j}. \end{aligned}$$

Bounds \(\varvec{0} \le \varvec{\xi }^i \le \varvec{1}\) and \(\varvec{0} \le \varvec{\beta }^i \le \varvec{1}\) for all \(i \in \{1,\ldots ,d\}\), follow from the fact that \(\varvec{0} \le \varvec{u} \le \varvec{1}\).

With the expanded monomial basis we obtain the following formulation.

figure g

Remark 3

The partition-of-unity cuts that were added for the Bernstein basis in Sect. 4.3, can be applied to the monomial basis via a transformation. Let \(T_p = Q_{p}^{-1}\) so that \(\varvec{B}_p = T_p \varvec{M}_p\), where \(Q_p\) is the transformation matrix in “Appendix A.1”. Inserting this mapping into the cut constraints we obtain for the monovariable basis \(\varvec{1}^{\textsf {T}} T_p \varvec{\xi }^{i} = 1\) for \(i \in \{1,\ldots ,d\}\). It can be shown that these cuts are redundant since \(\varvec{1}^{\textsf {T}} T_p \varvec{\xi }^{i} = \xi _{0}^i = 1\). Furthermore, it can be shown that the transformed upper bounds are redundant as well; that is, \(\varvec{\xi }^i \le \varvec{1} \le Q_{p} \bar{\varvec{B}}_p\). We have thus omitted these in DP-MON.

5 MINLP formulations for piecewise polynomial functions

Algorithmic approaches for mathematical programming problems with disjunctions either reformulate the disjunction to enable mixed-integer programming, or seeks to exploit the disjunctive constraints explicitly in a branch-and-bound or cutting-plane algorithm, possibly through combinations thereof [3, 52]. Linear [1] and convex nonlinear disjunctions [6] may be reformulated either by its convex hull representations or by big-M reformulations. Pertaining to the linear disjunction in DP-3, the convex hull formulation [1] is at least as tight as big-M reformulations, though requiring more variables and constraints. Big-M formulations, on the other hand, are known to be prone to the choice of the big-M parameters, and often yield weaker relaxations. For large nonlinear, possibly nonconvex DP problems, it is important to keep the size of reformulation small, in which big-M reformulations may be advantageous [53]. Consequently, we pursue big-M based MINLP reformulations of the DP models of PWP constraints in Sect. 4. Utilizing the bounds derived in Sect. 4.1 upon an elementary big-M reformulation of DP-BASIC yields the MINLP formulation

figure h

where \(M_{P}^{U} = c_{P}^{U} - z_{P}^{L}\), \(z_{P}^{L} = \min \{0, c_{P}^{L}\}\), \(z_{P}^{U} = \max \{0, c_{P}^{U}\}\), and a binary variable \(y_{P}\) replaces \(Y_P\), for each \(P\in \varPi \). MINLP-BASIC has \(n = \dim (\mathbb {P}_{p}^d)\) nonlinear constraints; all other constraints are linear. The formulation has \(|\varPi |+d+n\) continuous auxiliary variables \(\{z_{P}\}_{P\in \varPi }\), \(\varvec{u}\) and \(\varvec{\beta }\), and \(|\varPi |\) binary variables \(\{y_{P}\}_{P\in \varPi }\).

5.1 Reformulation of DP-GRID, DP-EXP, and DP-MON

Analogous to the derivation of MINLP-BASIC, a MINLP reformulation of DP-GRID is obtained by introducing a binary variable \(y_{j}^{i}\) for each boolean variable \(Y_{j}^i\), and applying a big-M reformulation. The last disjunction requires special treatment, since each term contains a conjunction of clauses connected by the AND operator. This means that the term for \(P_k\) in the disjunction should be True if and only if \(Y_{k_1}^{1} = \cdots = Y_{k_d}^{d} = \text {True}\). A big-M reformulation for a given \(P_k\) can then be formed as \(\varvec{c}_{P}^{\textsf {T}} \varvec{\beta } - z_{P} \le M_{P}^{U}(d - \sum \nolimits _{i=1}^{d} y_{k_i}^{i})\), leading to the following MINLP formulation.

figure i

We note that the MINLP-GRID has \(\sum \nolimits _{i=1}^{d} m_i\) binary variables for polytope selection, in contrast to \(|\varPi | = \prod \nolimits _{i=1}^{d} m_i\) as for the MINLP-BASIC formulation.

In an analogous fashion, we derive MINLP reformulations of DP-CUT, DP-EXP, and DP-MON, and denote them MINLP-CUT, MINLP-EXP, MINLP-MON, respectively. We note here that MINLP-MON consists of linear and bilinear constraints only, and is hence a MIQCP formulation.

5.2 Summary of formulations

The size of the formulations in terms of number of variables and constraints are summarized in Table 1. To ease comparisons, we assume that \(m_1 = \cdots = m_d = M\), so that \(|\varPi | = M^d\), and utilize the geometric series

$$\begin{aligned} g_d(k) := \sum \limits _{i=1}^{d}k^i = (k^d - 1)k/(k-1), \quad \text { for } k \ne 1. \end{aligned}$$
Table 1 Sizes of formulations

The three formulations MINLP-BASIC, MINLP-GRID, and MINLP-CUT have \(n = \dim (\mathbb {P}_{p}^d) = (p+1)^d\) nonlinear equality constraints. These constraints model the Bernstein basis functions that span \(\mathbb {P}_{p}^d\), and are nonconvex since the Bernstein basis functions are polynomials of degree dp. MINLP-EXP has \(d(p + 1)\) polynomial constraints of degree p, and \(g_d(p+1) - (p+1)\) bilinear constraints. Finally, MINLP-MON has 2d less nonlinear constraints than MINLP-EXP, all being bilinear constraints. The reduction in nonlinear constraints is due to some basis functions being constant (equal to one).

Comparing number of binary variables, MINLP-BASIC seems to be at a disadvantage since \(M^d \ge dM\) for \(M > 1\) and \(d \ge 1\). The one-dimensional case (\(d=1\)) is an exception since MINLP-BASIC then becomes equivalent to MINLP-GRID.

Finally, we note that the MIQCP-CUT formulation from [14] scales poorly in the number of nonlinear constraints due to the terms pdM and \(g_d(M+p)\). It also has 2dp more binary variables than the formulation based on MINLP-GRID.

6 Numerical study

To benchmark the performance of the proposed MINLP formulations, we conducted a three-part numerical study. In the first part we minimized randomly generated cubic splines of varying input dimensions. In the second part, we solved a set of optimization problems involving five randomly generated PWP constraints of varying degrees. Finally, we solved a realistic production optimization case with ten PWP constraints.

The five formulations MINLP-BASIC, MINLP-GRID, MINLP-CUT, MINLP-EXP, and MINLP-MON, were solved using the global optimization solver BARON [46]. We compare the computational performance of the proposed MINLP formulations with the MIQCP-CUT formulation proposed in [14]. This formulation was also solved using BARON. We further include the results from solving the test problems using the special-purpose spline solver CENSO, which solves spline constrained problems as NLPs using a spatial branch-and-bound algorithm [16].

All solvers were run with an absolute \(\epsilon \)-convergence termination criteria of \(\epsilon = 1\cdot 10^{-6}\). Remaining settings were left at default values. The problems were solved on a computer equipped with an Intel Core i7-8700K processor and 32 GB of RAM.

6.1 Minimization of randomly generated PWPs

Three sets of test problems were created by randomly generating cubic splines. The three sets contain problems with a piecewise polynomial objective function in one, two, and three variables, respectively. For the monovariable problems (random1d), the variable is discretized into ten intervals, leading to ten polynomial pieces. The set of problems in two variables (random2d) have an objective function defined on a \(10 \times 10\) grid with 100 polynomial pieces. Finally, the problems with three variables (random3d) have an objective function defined on a \(6\times 6 \times 6\) grid, for a total of 216 n-orthotopes. Thus, the hardest problems (random3d) contain 216 polynomial pieces of degree \(dp = 3\times 3 = 9\). Properties of the sets of test problems are summarized in Table 2.

Table 2 Test sets of piecewise polynomial problems

The problems are in the epigraph form \(\underset{\varvec{x}, z}{\min } \left\{ z ~:~ f(\varvec{x}) \le z, ~\varvec{x} \in D\subset \mathbb {R}^d \right\} \), where \(f : D\rightarrow \mathbb {R}\) is a piecewise polynomial function. Continuous cubic B-splines with equidistant knots are constructed by randomly drawing coefficients from a Gaussian distribution with zero mean and unity standard deviation, that is \(c_i \sim \mathcal {N}(0, 1), ~\forall i=0,\ldots ,n-1\). Figure 3 shows one of the generated univariate cubic splines. Using the procedure described in “Appendix B”, the B-spline is transformed into a piecewise polynomial f compatible with Definition 2.

Fig. 3
figure 3

A cubic spline with randomly generated coefficients. The function has several local minima. The global minimum for \(x \in [0, 10]\) lies close to \(x=8\)

Table 3 gives a summary of the full results, which are reported in “Appendix C”, Table 6. Comparing the mean solve times, we see that MINLP-EXP outperforms the other MIP formulations, including the MIQCP-CUT formulation, on all three test sets. CENSO [16] had the lowest solve time on all problems.

Table 3 Mean solve times in seconds for randomly generated cubic splines

For the one-dimensional PWPs in random1d, the MINLP-BASIC and MINLP-GRID formulations are equivalent and achieve similar results. For random2d and random3d, MINLP-BASIC seems to perform slightly better than MINLP-GRID when comparing mean solve times. However, for random3d the MINLP-BASIC formulation has a higher variation in the solve times as seen from the standard deviation in Table 6, with some problems not being solved within the time limit. From these results it is difficult to decide which is the better of the two formulations. However, the improvements to MINLP-GRID made in MINLP-CUT and MINLP-EXP, have a large effect on solve times, especially for the three-dimensional PWPs in random3d.

We note that the monomial formulation MINLP-MON performs quite poorly on the random3d test set. It thus seems that the Bernstein basis is beneficial for these problems.

6.2 Optimization with randomly generated PWP constraints

We solved test problems of the following form:

$$\begin{aligned} \begin{aligned} \min {}&z_0&\\ \text {s.t. }&f_{0}(\varvec{x}) = z_0 \\&f_{i}(\varvec{x}) \le z_0, ~\forall i \in \{1,\ldots ,4\} \\&\varvec{x} \in D= [0, 10]^2 \end{aligned} \end{aligned}$$
(31)

where \(\{f_0, f_1, \ldots , f_4\}\) are PWPs in two variables (\(\varvec{x}\)). Three sets of test problems were generated, each with 20 problems consisting of either bi-linear, bi-quadratic, or bi-cubic PWPs. The PWPs were randomly generated as described in Sect. 6.1, and constructed on a \(10 \times 10\) grid, partitioning \(D\).

Table 4 Solve times in seconds for problems constrained by randomly generated PWPs

A summary of the results is given in Table 4. A complete table of results is given in “Appendix C”, Table 7. A boxplot of the solve times for bi-cubic PWPs is shown in Fig. 4.

Fig. 4
figure 4

A boxplot showing the results for problems with bi-cubic PWP constraints. Each box extends from the first to third quartile, and shows the median in red. The whiskers extending from each box represent the lower and upper value still within the lower and upper 1.5 interquartile range, respectively. Outlying values are not shown

Comparing the MIP formulations, MINLP-EXP has the overall best performance on the bi-quadratic and bi-cubic problems in terms of mean solve time. On the most challenging set of bi-cubic problems, the MIQCP-CUT formulation performs worst on average, followed by MINLP-BASIC and MINLP-MON. On the bi-linear problems, however, the MIQCP formulations MIQCP-CUT and MINLP-MON performs best. The results demonstrate that these formulations scale poorly with the degree p.

It is clear from the results that MINLP-BASIC is inferior to MINLP-GRID and the formulations derived from MINLP-GRID. This is to be expected based on the formulation sizes in Table 1, which shows that the number of binary variables in MINLP-BASIC scales exponentially with d. The developments made in MINLP-GRID, MINLP-CUT, and MINLP-EXP improve solve times and reduce variability. For example, the addition of cuts in MINLP-CUT and MINLP-EXP is clearly advantageous. We finally note that CENSO has the lowest solve times on all problems.

6.3 Production optimization case

To test the formulations on a real large-scale problem we solved a production optimization case adopted from [15]. The case involves eight subsea petroleum wells, each producing a mix of oil, gas, and water. The fluid streams are routed to a topside processing facility via two pipelines (often called risers). The objective is to maximize oil production by routing and choking the well flows, satisfying constraints on momentum balances and variable bounds.

The production system was modeled by a directed acyclic graph \(G=(N, E)\), with nodes \(N= N^{\text {w}}\cup N^{\text {m}}\cup N^{\text {s}}\) and edges \(E= E^{\text {d}}\cup E^{\text {r}}\). An illustration of the graph is given in Fig. 5. Notice that each well node \(i \in N^{\text {w}}\) has two leaving discrete edges\((i,j) \in E^{\text {d}}\) and \((i,k) \in E^{\text {d}}\) to the manifold nodes \(j,k \in N^{\text {m}}\). By allowing zero or one of these edges to be active, we model routing of fluid flows and shut-in of wells. The total flow entering a manifold node \(i \in N^{\text {m}}\) is then led to a respective separator node \(j \in N^{\text {s}}\) via a riser edge \((i,j) \in E^{\text {r}}\).

Fig. 5
figure 5

Production system flow graph, with nodes represented by grey circles and edges by arrows. Discrete (on/off) edges are dashed. The nonlinearities \(f_1\) and \(g_{(9,11)}\), related to Node 1 and Edge (9, 11), are shown

The variables in the problem are the pressure \(p_i\) at each node \(i \in N\), the pressure drop \(\varDelta p_e\) on each edge \(e \in E\), the flow rate \(q_{e,c}\) of phase \(c \in C\) on edge \(e \in E\), and the binary variable \(y_e\) on the discrete edge \(e \in E^{\text {d}}\). The variables \(y_e\) are used to model on/off switching of the discrete edges (\(y_e = 1\) means that the edge is on/open, while \(y_e = 0\) means that it is off/closed). In addition, auxiliary variables \(q_{e,\text {liq}}\) are introduced to model the liquid flow (oil + water) on each riser edge \(e \in E^{\text {r}}\). For further details about the modeling approach we refer the reader to [15].

The problem includes ten nonlinearities which were modeled using B-splines. The eight wells were modeled by well performance curves \(f_i(p_i)\) for \(i \in N^{\text {w}}\), represented by polynomial basis functions on nine boxes. The pressure drop over the two risers were modeled by B-splines \(g_e(q_{e, \text {liq}}, q_{e, \text {gas}}, p_i)\) for \(e \in E^{\text {r}}\), composed of polynomial basis functions on 64 boxes. The splines were fitted to simulated data from a multiphase flow simulator calibrated to real well test data. To give an example on problem size, for MINLP-BASIC the complete problem has 216 binary variables, 200 related to the splines, and 16 related to flow routing.

The results for various PWP degrees (\(p = 1,2,3\)) are given in Table 5. Among the MIP formulations, MINLP-EXP performed best. With this formulation, BARON solved the problems with \(p=1\) and \(p=2\) within the practical time limit of 3600 s. For \(p=3\), MINLP-EXP gave the best solution, although it did not close the optimality gap. This confirms the merits of MINLP-EXP observed in Tables 6 and 7 for increasing problem sizes.

We again observe that MIQCP-CUT and MINLP-MON performs well for \(p=1\), but poorly for higher degrees. The NLP solution method employed by CENSO does not seem to be affected much by the degree. In fact, it performs well for \(p=2\) and \(p=3\), which can be attributed to the availability of continuous derivatives in NLP searches,Footnote 2 and the weak dependence of the linear relaxations on p.

7 Concluding remarks

This paper has presented a MIP modeling framework for solving mathematical programs with continuous and semi-continuous PWP constraints. The numerical study indicates that among the derived mixed-integer formulations, MINLP-EXP has the best overall performance. Currently, it is the best-performing MIP formulation for PWP-constraints (solved by BARON). Although the MIQCP-CUT formulation is outperformed by several of the MINLP formulations, it has an advantage in that it is rather straightforward to implement when the PWP is given as a B-spline. In this case, the MINLP formulations require an extra preprocessing step to bring the spline to the PWP form in (18). With strong requirements on solve speed the special-purpose spline-based solver CENSO is still the best option, but it comes at the cost of using experimental software.

CENSO holds a great advantage in that it treats each PWP constraint as a single black-box constraint when searching for feasible solutions. Although the MINLP formulations herein allow the application of general-purpose global optimization solvers, the solution efficiency is still hampered by these solvers’ lack of support for treating PWPs as black-box functions. These solvers must satisfy all the constraints in the applied MINLP formulation when searching for feasible solutions (see formulation sizes in Table 1).

There is a growing use of splines for modeling purposes, causing a demand for PWP support in optimization solvers. To this end, the MINLP formulations presented in this paper provide a basis for solving PWP constrained engineering and economic optimization problems. Our strategy of modeling a PWP constraint via its epigraph provides a general framework compatible with semi-continuous piecewise polynomials and B-splines, encompassing a broad set of spline modeling techniques. Furthermore, the formulations herein may for many applications reduce the need for special-purpose solvers like CENSO.

Our study confirms that the main difficulties in solving PWP constrained problems lie in the modeling of the polynomial basis \(\mathbb {P}_{p}^{d}\) (requiring non-convex constraints) and the grid partitioning of the domain (requiring integer variables), which both contribute to the hardness of these problems. Besides optimization software support for PWP constraints, elements from related fields such as sum-of-squares programming, semidefinite relaxations and geometric programming, may be explored in combination with the proposed framework to improve the modeling of \(\mathbb {P}_{p}^{d}(G)\). We also emphasize that the formulations presented are exact; approximations of these formulations may hence aid efforts toward reducing the computational demand.